Neural network-based prediction of the secret-key rate of quantum key distribution

Numerical methods are widely used to calculate the secure key rate of many quantum key distribution protocols in practice, but they consume many computing resources and are too time-consuming. In this work, we take the homodyne detection discrete-modulated continuous-variable quantum key distribution (CV-QKD) as an example, and construct a neural network that can quickly predict the secure key rate based on the experimental parameters and experimental results. Compared to traditional numerical methods, the speed of the neural network is improved by several orders of magnitude. Importantly, the predicted key rates are not only highly accurate but also highly likely to be secure. This allows the secure key rate of discrete-modulated CV-QKD to be extracted in real time on a low-power platform. Furthermore, our method is versatile and can be extended to quickly calculate the complex secure key rates of various other unstructured quantum key distribution protocols.

to the corresponding optimization possibly taking minutes or even hours 51 . Therefore, it is especially important to develop tools for calculating the key rate that are more efficient than numerical methods.
In this work, we take the homodyne detection discrete-modulated CV-QKD 44 as an example to construct a neural network capable of predicting the secure key rate for the purpose of saving time and resource consumption. We apply our neural network to a test set obtained at different excess noises and distances. Excellent accuracy and time savings are observed after adjusting the hyperparameters. Importantly, the predicted key rates are highly likely to be secure. Note that our method is versatile and can be extended to quickly calculate the complex secure key rates of various other unstructured quantum key distribution protocols. Through some open source deep learning frameworks for on-device inference, such as TensorFlow Lite 54 , our model can also be easily deployed on devices at the edge of the network, such as mobile devices, embedded Linux or microcontrollers.

Results
Discrete-modulated CV-QKD. To clearly show the problem we try to solve, we briefly introduce the main ideas of discrete-modulated CV-QKD and give the convex optimization problem of finding its key rates in this section. See Ref. 44 and see description of "Discrete-modulated CV-QKD" in Methods.
The protocol involves two parties, Alice and Bob. Alice randomly prepares one of the four coherent states and sends it to Bob by an untrusted quantum channel. Bob measures the received coherent state using homodyne detection. After repeating N rounds, Alice and Bob perform sifting, parameter estimation, error correction and privacy amplification over the classical authentication channel to obtain the final secure key rates. The key rate formula in the asymptotic limit can be expressed according to Refs. 32,33 as where D(ρ�σ ) = Tr ρ log 2 ρ − Tr ρ log 2 σ is the quantum relative entropy; ρ AB is the bipartite state of Alice and Bob; G is the mapping to describe the postprocessing of the bipartite state ρ AB ; Z is a pinching quantum channel for reading out the results of the key rate mapping; S is the set of all density operators that match the experimental observations; p pass is a sifting factor that determines how many rounds of data are used for generating keys; δ EC represents the amount of information leakage per bit in the error-correction process.
The key to finding the secure key rates is to solve the minimum value of D G (ρ AB ) Z G (ρ AB ) , since p pass δ EC is a fixed quantity. The associated optimization problem is 44 where |x�� x| A is a local projective measurement operator of Alice's side, where x ∈ {0, 1, 2, 3} ; q = 1 where â and â † are the annihilation and creation operators of a single-mode state, respectively; ; �q� x , �p� x , �n� x and �d� x represent the corresponding expectation values of the operators q , p , n and d acting on ρ x B , respectively; is the state of Bob after Alice has performed measurement |x��x| on ρ AB , and p x is the corresponding probability; id B is the identity transformation acting on system B.
The first four constraints in Eq. (2) are derived from experimental observations. The fifth and sixth constraints are conditions that the density matrix must satisfy. The seventh constraint comes from the fact that Alice's states do not change because they do not go through insecure quantum channels.
The optimization problem in Eq. (2) is to find the optimal ρ AB in S such that R ∞ is minimized. ρ AB is infinitedimensional because the attacker has the ability to arbitrarily perturb the optical mode sent by Alice into an infinite-dimensional state to send to Bob. To solve this optimization problem using numerical methods, we need to apply the photon-number cutoff assumption to ρ AB to ensure that the number of variables is in a reasonable range. A detailed description of this method can be found in Ref. 44 .
After applying the photon-number cutoff assumption, the optimization problem in Eq. (2) can be solved by applying the numerical method in Refs. 33,44 , but this is very time consuming. In this work, to reduce the time to predict secure key rates, we use the key rates obtained by the numerical method in Refs. 33,44 as labels to train our neural network. www.nature.com/scientificreports/ Neural networks for predicting the key rates. We use an artificial neural network to predict the key rates of discrete-modulated CV-QKD. The general spirit of the work is to encode the optimization problem in Eq.
(2) on the loss function of a feedforward neural network and train the neural network by minimizing this loss function. The trained neural network can be seen as a mapping, which has learned the structure of the training set. For new instances, the neural network outputs the results directly via mapping, unlike traditional numerical methods that perform complex searches. As a result, the trained neural network saves a great deal of time, while ensuring a good level of accuracy. A more detailed description of neural networks can be found in Ref. 55 . A four-layer neural network model is designed to predict the key rates of discrete-modulated CV-QKD (Fig. 1). The input layer of the network has 29 neurons, which are used to receive the training inputs. The first hidden layer and the second hidden layer of the network have 400 and 200 neurons respectively, and their activation functions are the tanh function and sigmoid function, respectively. The output layer has only one neuron, which is used to predict secure key rates.
To train our neural network, we generate the data set containing 552,000 input instances {x i } and 552,000 corresponding labels y i using the numerical method in Refs. 33,44 . Each x i ∈ {x i } represents a vector of 29 variables, and label y i represents the corresponding key rate. There are 16 variables in each x i that are the right parts of the first four restrictions of Eq. (2), 12 variables in each x i are nondiagonal elements of the right side matrix of the last restriction of Eq. (2), and the remaining variable is excess noise ξ . The 29 variables in each x i can be calculated in the experiment by using experimental parameters and experimental observations. In our simulation, these random input instances {x i } are generated directly from seven experimental parameters (transmission distance L, light intensity µ , excess noise ξ , and probability p0, p1, p2 and p3) and the following method.
When the excess noise ξ is within 0.002-0.014, we first generate a two-dimensional grid with excess noise and distance in the horizontal and vertical coordinates, respectively. Specifically, the value of the distance is between 0 and 100 km in a step of 5 km. The value of the excess noise is between 0.002 and 0.014 in a step of 0.001. Then, each grid point is sampled 80 times. With each sampling, the excess noise fluctuates around the exact value, and the float range is 0.0005 up and down. Once the excess noise for this sampling is determined, the light intensity will take a value every 0.01 between 0.35 and 0.60. Each sampling needs to generate 25 input instances with a positive key rate; otherwise, the current round of sampling is discarded and restarted. In this way, 2000 input instances are generated on each grid point. Correspondingly, a total of 520,000 training inputs are generated on this two-dimensional grid. When the excess noise ξ is 0.015, a similar two-dimensional grid is generated. However, we only sample to 80 km, so only 32,000 instances are generated. In this way, we collect a total of 552,000 samples with excess noise ξ between 0.002 and 0.015. Using the numerical approach in Refs. 33,44 , we calculate the corresponding key rate for each sample as the label of the data set on the blade cluster system of the High Performance Computing Center of Nanjing University. We consume over 40, 000 core hours, and the node we used contains 4 Intel Xeon Gold 6248 CPUs, which involves immense computational power.
To improve the convergence speed and accuracy of our neural network, we preprocess the input instances {x i } and the corresponding labels y i . To demonstrate the necessity of the data preprocessing, we use the network structure shown in Fig. 1 to perform a controlled experiment with the mean square error as the loss function. www.nature.com/scientificreports/ With the excess noise of 0.002-0.005, the absolute values of the relative deviations between the key rates predicted by our neural network and the corresponding key rates obtained by the numerical method do not exceed 25% after the data preprocessing ( Fig. 2), whereas the absolute values of the relative deviations exceed 400% without the data preprocessing. Here, the relative deviation is the absolute deviation between the predicted value and true value divided by the true value. A detailed description of the data preprocessing can be found in "Details of data preprocessing" in Methods.
A new loss function is specifically designed to make key rates predicted by our neural network as informationtheoretically secure as possible, rather than using the traditional mean squared error as a loss function. The expression of the loss function is as follows: where n is the number of training inputs. e * i = y * p i − y * i is the residual error between the preprocessed label y * i and the corresponding output y * p i of the neural network. The minimum function part in Eq. (3) is the penalty term and is used to make the key rates predicted by the neural network as information-theoretically secure as possible. On the other hand, the part consisting of the maximum function and the squared term in Eq. (3) is used to bound the upper limit of e * i to obtain higher key rates. The parameter γ is used to balance the effects of the two parts. With the help of this loss function, we expect that the relative deviations between predicted value and true value can be bound in (ε − 1, 0) after choosing the proper ε and γ.
The performance of the neural networks is related to hyperparameters γ and ε . Without loss of generality, we take the examples of neural networks with excess noise ξ between 0.002 and 0.005 (Fig. 3). When γ = 0.20 and ε = 0.80 , the key rates predicted by the neural network are strictly lower than those obtained by the numerical method in Refs. 33,44 , which means that the key rates predicted by the neural network are information-theoretically secure. Meanwhile, the absolute values of the relative deviations are mainly distributed between 0.05 and 0.20 (Fig. 3a,b). Figure 3c-f plot the corresponding results for the hyperparameters γ = 0.20 , ε = 0.90 and γ = 0.80 , ε = 0.80 , respectively. Note that the partial key rates predicted by the neural networks under γ = 0.20 , ε = 0.90 and γ = 0.80 , ε = 0.80 are higher than the key rates obtained by the numerical method. This indicates that the performance of neural networks trained with hyperparameters γ = 0.20 , ε = 0.90 and γ = 0.80 , ε = 0.80 is not as good as that of neural network trained with hyperparameters γ = 0.20 and ε = 0.80 . Therefore, we need to carefully tune hyperparameters of the neural networks to ensure their stable performance.
The 552,000 data generated by the numerical method are split into a training set containing 524,400 data and a test set containing 27,600 data. The test set is sampled from the original data set and covers instances generated under all combinations of excess noise and distance. The data preprocessing procedure follows data splitting. The Adam optimization algorithm 56 is used to train our neural network. The initial learning rate is set to 0.001. For each training, we set 200 epochs and 256 batch sizes. In addition, techniques such as early stopping and dropout 57 are used to prevent overtting. The relative deviations of the trained network on the test set and the training set have similar distributions, which indicates that the model has good generalization performance.
Key rate comparison. We use our neural network to predict, given the optimal light intensity, key rates of discrete-modulated CV-QKD at different distances and different excess noises after training the neural network Relative deviations before and after data preprocessing. We use the network structure shown in Fig. 1 with the mean square error as the loss function to compare the results of data preprocessing (a) and without data preprocessing (b   www.nature.com/scientificreports/ under γ = 0.20 and ε = 0.80 according to the method described in "Methods" above. As shown in Fig. 4, we compare the key rates with the corresponding key rates obtained by the numerical method in Refs. 33,44 . The results show that all key rates predicted by the neural network are strictly lower than those obtained by the numerical method. It is worth noting that the relative deviations between them are basically within 20% (relevant data can be found in "Detailed data" in Methods).
To illustrate the more general case, we test the test set containing 27,600 samples mentioned at the end of "Methods". The results show that the number of samples, for which the key rates predicted by the neural network are lower than the corresponding results calculated by the numerical method, is 27,379. Namely, the probability that the key rate predicted by the neural network on the test set is secure is as high as 99.2%.
Our neural network shows greater advantages over the numerical method in terms of time and resource consumption. We compare the time required to predict the key rates with our neural network and the time required to calculate the key rates with the numerical method on a high-performance personal computer with a 3.3 GHz AMD Ryzen 9 4900H and 16 GB of RAM (Fig. 5). The neural network is 6-8 orders of magnitude of the numerical method for predicting the key rates of the discrete-modulated CV-QKD within 0-100 km for excess noise ξ = 0.008-0.012. In addition, as the excess noise increases, the speed of the neural network increases even more. Refer to "Detailed data" for more detailed data.

Discussion
We have constructed neural networks and shown that these neural networks can predict the information-theoretically secure key rates of homodyne detection discrete-modulated CV-QKD with a great probability (up to 99.2% ) at a distance of 0-100 km and an excess noise of no more than 0.015. In particular, with excess noise up to 0.008 or more, the speed of our method is at least improved by six orders of magnitude compared to that of the numerical method in Refs. 33,44 . For example, it takes an average of 190 s to numerically calculate the point with the excess noise ξ around 0.008, which greatly affects the efficiency of QKD systems to calculate the secure key rate. In contrast, a neural network can calculate tens of thousands of key rates in 1 s. Considering that it takes a certain amount of time for the QKD system to collect data, the speed of predicting the key rates by the neural network completely meets practical applications. This advantage brings us one step closer to achieving low latency for discrete modulated CV-QKD on a low-power platform. Our method is applicable in principle to any protocol that already has reliable numerical methods. However, for protocols such as 16/64/256 QAM DM-CVQKD protocol with analytical methods whose effects are very close to those of numerical methods, it is not necessary to use the method proposed in this paper. www.nature.com/scientificreports/ Recently, there have been two main types of situations in which machine learning is used in QKD. One is used for experimental parameter optimization 58,59 and the other is used to assist experimental control [60][61][62] . They all use machine learning to replace traditional optimization or feedback control algorithms, which are significantly different from our work. To the best of our knowledge, this is the first time we have tried to apply machine learning methods to predict key rates of QKD. This poses a greater challenge than parameter optimization with machine learning methods. This is because the parameters predicted by the neural networks are substituted into numerical or analytical methods to find the corresponding key rates, which naturally ensures that the key rates are information-theoretically secure. However, the key rates obtained by neural networks do not guarantee this naturally, which forces us to redesign the loss function and seek better data preprocessing methods to guarantee the acquired key rate with information-theoretic security. Note that the probability ( 0.8% ) of our neural network predicting an insecure key rate is too large compared to conventional security parameters of the QKD protocol (e.g. 10 −6 ). In practice, however, we need to sample thousands of data points and calculate their respective key rates to obtain a usable keystring. The key here is that when we sum and average the key rates of all data points predicted by our neural network, the insecure probability of this averaged key rate can be reduced very low. If there are enough data points, this insecure probability can also approximate conventional security parameters of the QKD protocol.
We expect that larger excess noises and longer distances will require a deeper network, more sophisticated loss functions, and more detailed data preprocessing methods to improve the performance of neural networks on the training set. More training data are also necessary to improve the generalization ability of the neural networks. For deep neural networks, the rapid growth or rapid disappearance of the transmitted gradient hinders the optimization process; therefore, the debugging process is highly technical. The debugging process can be guided by monitoring the activation function values of the neurons and histograms 1 of those gradients 55 .
Our machine learning approach is at least six orders of magnitude of the numerical method at predicting the secure key rates of homodyne detection discrete-modulated CV-QKD with excess noise up to 0.008 or more. However, training our neural network is still time consuming. This is because we need to use traditional numerical methods to obtain a number of key rates as the training set of the neural networks. In particular, the performance of our neural network is dependent on the choice of hyperparameters γ , ε and initial learning rate. This means that we may need to train several times to obtain a suitable neural network. To make our machine learning method more intelligent, further work is necessary to design another neural network to automatically find the most suitable hyperparameters. We have also tried other machine learning methods, such as boosting decision trees. These methods have smaller relative deviations, but have greater variances. We have left the fusion of these methods to future research.
The important contribution of our work is that it opens the door to using classical machine learning to predict QKD key rates. In particular, our ideas and methods are very easy to generalize to other QKD protocols. We expect that our work will stimulate further research to help most QKD systems run on low-power chips 63 in mobile devices 64 .
(2) Measurement.-Bob performs a homodyne measurement on the received state. He chooses to measure a certain orthogonal component (q or p) according to the probability of [p B , 1 − p B ] . If q is chosen, Bob notes b k = 0 , otherwise he notes b k = 1 . Then, Bob records his measurement outcome y k ∈ R.
(3) Announcement and sifting.-After repeating the first two steps N times, Alice and Bob communicate via the classical authentication channel and divide the obtained data into the following four subsets: where [N] denotes the set of all integers from 1 to N. Then Alice and Bob randomly select a subset I key of size m from I qq for generating keys. The key string X = (x 1 , x 2 , . . . , x m ) at Alice is also determined according to the following rules: where f(j) is a function that maps from I key to I qq . The remaining data in I qq , I qp , I pq and I pp are integrated into the set I test and used for parameter estimation.
(4) Parameter estimation.-Alice and Bob perform parameter estimation based on the data in I test . First, they calculate the first and second moments of q and p quadratures for each of the four coherent states sent by Alice. Then they calculate the secret key rate based on the convex optimization problem in Eq. (8).
If the result shows that the key rate is equal to 0, Alice and Bob abort the protocol and start over. Otherwise, they continue with the next step.  www.nature.com/scientificreports/ (5) Reverse reconciliation key map.-The key string Z = (z 1 , z 2 , . . . , z m ) at Bob is determined according to Bob's measurement outcome y k in step 2 and the following rules: where c ≥ 0 is determined by the postselection of data.
Alice and Bob then pick out the location of the symbol ⊥ and remove the data at that location by classical communication. The set X and Z after removing ⊥ is the raw key string.
(6) Error correction and privacy amplification.-Alice and Bob choose a suitable error-correction protocol and a suitable privacy-amplification protocol to generate secret key rates.
The key rate can be calculated using the well-known Devetak-Winter formula 65 in the asymptotic limit and under collective attacks. To apply this formula, we transform the prepare-and-measure protocol into the entanglement-based protocol.
Alice prepares the state according to the ensemble |ϕ x �, p x in the prepare-and-measure protocol. In the equivalent entanglement-based protocol, Alice prepares the bipartite state in the form of Here Alice keeps |x� A in register A and sends |ϕ x � A ′ to Bob. |ϕ x � A ′ changes as it passes through an insecure quantum channel. The process can be described by a completely positive and tracepreserving map E A ′ →B . The bipartite state ρ AB thus transforms into where id A is the identity transformation acting on A. Under reverse reconciliation 66 , the key rate formula can be expressed according to Refs. 32,33 as   www.nature.com/scientificreports/ Details of data preprocessing. To improve the performance of our neural network, we preprocess the training inputs {x i } before training the neural network. The process can be expressed as where x ij represents the j-th component of the i-th sample; x j and σ j are the mean and variance of the j-th component in all samples, respectively; x * ij is the j-th component of the i-th sample after being preprocessed. The preprocessed data {x * i } follow a standard normal distribution with a mean of 0 and a variance of 1. The process removes dimensional restrictions and facilitates the comparison of features of different dimensions. Since the maximum difference between different key rates in these samples is 4 orders of magnitude, we preprocess the labels as follows to speed up the training process of the neural networks: where y * i is the label corresponding to the i-th sample after being preprocessed. Note that the outputs predicted by the neural networks trained with preprocessed labels {y * i } need to be inverse solved using the following equation: where y * p i and y p i are the output value and the predicted key rate of the neural networks for the i-th sample, respectively.
Algorithms 1 and 2 show the detailed training process of the neural networks and the process of using trained neural networks to predict new samples, respectively.
Detailed data. Table 1 shows the relative deviations between the key rates predicted by our neural network and the corresponding key rates obtained by the numerical method for the given optimal light intensity at different distances and different excess noises. This table is a supplement to Fig. 4. Table 2 shows the specific data of the time consumption of the neural network and the numerical method with excess noise ξ of 0.008, 0.010 and 0.012. In the numerical method, each point with excess noise ξ of approximately 0.01 takes 200 s on average, which greatly affects the efficiency of the QKD system to calculate the secure key rate. In contrast, the neural network can calculate tens of thousands of key rates in 1 s. Considering that it takes a certain amount of time for the QKD system to collect data, the speed of predicting the key rates by the neural network completely meets practical applications.
y p i = 10 −y * p i ,