Quasi-cyclic multi-edge LDPC codes for long-distance quantum cryptography

The speed at which two remote parties can exchange secret keys in continuous-variable quantum key distribution (CV-QKD) is currently limited by the computational complexity of key reconciliation. Multi-dimensional reconciliation using multi-edge low-density parity-check (LDPC) codes with low code rates and long block lengths has been shown to improve error-correction performance and extend the maximum reconciliation distance. We introduce a quasi-cyclic code construction for multi-edge codes that is highly suitable for hardware-accelerated decoding on a graphics processing unit (GPU). When combined with an 8-dimensional reconciliation scheme, our LDPC decoder achieves an information throughput of 7.16 Kbit/s on a single NVIDIA GeForce GTX 1080 GPU, at a maximum distance of 142 km with a secret key rate of 6.64 × 10−8 bits/pulse for a rate 0.02 code with block length of 106 bits. The LDPC codes presented in this work can be used to extend the previous maximum CV-QKD distance of 100 km to 142 km, while delivering up to 3.50× higher information throughput over the tight upper bound on secret key rate for a lossy channel. Improvements in the post-processing algorithms for quantum cryptography can extend the secure transmission distance by over 40%. Quantum key distribution protocols rely on the transmission of quantum states, but also on classical post-processing to eliminate errors introduced by imperfect equipment or the interference of an attacker. Over long distances, the requirements of this classical 'reconciliation' processing can become the bottleneck for key exchange. Mario Milicevic and colleagues from the University of Toronto and the University of British Columbia in Canada have developed a high-throughput error correction scheme that increases the potential operating range for quantum key distribution from 100 to 143 km. Their method is fast enough that the rate of key distribution is instead limited by the physical properties of the communication channel.


I. INTRODUCTION
Quantum key distribution (QKD), also referred to as quantum cryptography, offers unconditional security between two remote parties that employ one-time pad encryption to encrypt and decrypt messages using a shared secret key, even in the presence of an eavesdropper with infinite computing power and mathematical genius [1]- [4].Unlike classical cryptography, quantum cryptography allows the two remote parties, Alice and Bob, to detect the presence of an eavesdropper, Eve, while also providing future-proof security against brute force, key distillation attacks that may be enabled through quantum computing [5].Today's public key exchange schemes such as Diffie-Hellman and encryption algorithms like RSA respectively rely on the computational hardness of solving the discrete log problem and factoring large primes [6], [7].Both of these problems, however, can be solved in polynomial time by applying Shor's algorithm on a quantum computer [8]- [10].
While quantum computing remains speculative, QKD systems have already been realized in several commercial and research settings worldwide [11]- [14].Figure 1 presents two different protocols for generating a symmetric key over a quantum channel: (1) discrete-variable QKD (DV-QKD) where Alice encodes her information in the polarization of singlephoton states that she sends to Bob, or (2) continuous-variable QKD (CV-QKD) where Alice encodes her information in the amplitude and phase quadratures of coherent states [4].In DV-QKD, Bob uses a single-photon detector to measure each received quantum state, while in CV-QKD, Bob uses homodyne or heterodyne detection techniques to measure the quadratures of light [4].While DV-QKD has been experimentally demonstrated up to a distance of 404km [15], the cryogenic temperatures required for single-photon detection at such extreme distances present a challenge for widespread implementation [4].CV-QKD systems on the other hand can be implemented using standard, cost-effective detectors that are routinely deployed in classical telecommunications equipment that operates at room temperature [4].The majority of QKD research focuses on applications over optical fiber, since quantum signals for both CV-and DV-QKD can be multiplexed over classical telecommunications traffic in existing fiber-optical networks [16]- [18].Nevertheless, there has been recent progress in chip-based QKD, as well as free-space and Earth-to-satellite QKD [19]- [21].It is noted here that quantum cryptography, i.e.QKD, differs from post-quantum cryptography, which is an evolving area of research that studies public-key encryption algorithms believed to be secure against an attack by a quantum computer [22].The discussion of postquantum cryptography is beyond the scope of this work.
The motivation of this work is to address the two key challenges that remain in the practical implementation of CV-QKD over optical fiber: (1) to extend the distance of secure communication beyond 100km with protection against collective Gaussian attacks, and (2) to increase the computational throughput of the key-reconciliation (error correction) algorithm in the post-processing step such that the maximum achievable secret key rate remains limited only by the fundamental physical parameters of the optical equipment at long distances [23], [25], [28].Jouguet and Kunz-Jacques showed that Mbit/s error-correction decoding of multi-edge low-density parity-check (LDPC) codes is achievable for distances up to 80km [25], while Huang et al. recently showed that the distance could be extended to 100km by controlling  CV-QKD DV-QKD Fig. 2: Throughput vs. distance of GPU-based LDPC decoders for CV-and DV-QKD.The reported throughput is the raw GPU throughput without code-or error-rate scaling.For CV-QKD implementations [23]- [25], the annotated values in parentheses indicate the LDPC code code block length n, the code rate R, the reconciliation efficiency β, and SNR of the quantum channel.For DV-QKD implementations, the annotated values indicate the block length n, code rate R, and QBER [12], [26], [27].
A particular challenge in designing LDPC codes for such long distances is the low signal-to-noise ratio (SNR) of the optical quantum channel, which typically operates below −15dB.At such low SNR, high-efficiency key reconciliation can be achieved only using low-rate codes with large block lengths on the order of 10 6 bits [30], [31], where approximately 98% of the bits are redundant parity bits that must be discarded after error-correction decoding.Such codes require hundreds of LDPC decoding iterations to achieve the asymptotic, near-Shannon limit error-correction performance in order to maxi-mize the secret key rate [25], [32].This is in contrast to LDPC codes employed in modern communication standards, such as IEEE 802.11ac (Wi-Fi) and ETSI DVB-S2X, where the target SNR is above 0dB and block lengths range from 648 bits to 64,800 bits [33], [34].In these standards, the LDPC decoder typically operates at 10 iterations to deliver Gbit/s decoding throughput [35]- [37].Long block lengths allow Alice and Bob to generate longer secret keys, which can be used to provide unconditional security by employing the one-time pad encryption scheme.Shorter codes with block lengths of 10 5 bits, for instance, would not be suitable for low-SNR channels beyond 100km due to their less robust error-correction performance [32], [38].In addition to long block-length codes, key reconciliation over multiple dimensions has also been shown to improve error-correction performance of multi-edge codes at low SNR [28], thereby increasing both the secret key rate and distance.However, the computational complexity and latency of LDPC decoding for long block lengths on the order of 10 6 bits remains a challenge.Figure 2 presents a comparison of LDPC decoding throughput versus distance for several stateof-the-art CV-and DV-QKD implementations, illustrating that high-throughput reconciliation at long distances is achievable only with large block-length codes that approach the Shannon limit with more than 90% efficiency for CV-QKD or less than 10% quantum bit error rate (QBER) for DV-QKD.This work introduces a new, quasi-cyclic (QC) code construction for multi-edge LDPC codes with block lengths on the order of 10 6 bits [39], [40].Computational acceleration is achieved through an optimized LDPC decoder design implemented on a state-of-the-art graphics processing unit (GPU).When combined with an 8-dimensional reconciliation scheme, the LDPC decoder achieves a raw decoding throughput of 1.72Mbit/s and an information throughput of 7.16Kbit/s using an NVIDIA GeForce GTX 1080 GPU at a maximum distance of 160km with a secret key rate of 4.10×10 −7 bits/pulse when finite-size effects are considered.The performance of this work in comparison to previous GPU-based decoders for QKD is plotted in Fig. 2, and discussed in greater detail in Section IV.This work extends the previous maximum CV-QKD distance of 100km to 160km, while delivering between 1.07× and 8.03× higher decoded information throughput over the upper bound on the secret key rate for a lossy channel [41].These results show that LDPC decoding is no longer the computational bottleneck in long-distance CV-QKD, and that the secret key rate remains limited only by the physical parameters of the quantum channel and the latency of privacy amplification.
CV-QKD provides a new use case for hardware-based LDPC codes.Over the past 15 years, research in LDPC decoder design has primarily focused on application-specific integrated circuit (ASIC) implementations for wireless, wireline, optical, and non-volatile memory systems, due to the widespread adoption of LDPC codes in modern communication standards [35]- [37].Although highly-customizable ASICs provide excellent energy efficiency, the silicon implementation of an LDPC decoder for long-distance CV-QKD with an LDPC code block length of 10 6 bits would require significant silicon die area, which may be prohibitively expensive to fabricate in a modern CMOS technology node [42].
The high availability of on-chip memory and floating-point computational precision make GPUs a highly suitable platform for LDPC decoder implementation in long-distance CV-QKD systems [43], [44], as opposed to ASICs or field-programmable gate arrays (FPGAs), which suffer from limited memory, fixedpoint computational precision, high-complexity routing, and silicon area constraints [45], [46].Since Alice and Bob are stationary and their communication occurs over a fixed-length fiber-optic cable, the traditional optimization parameters of energy efficiency and silicon chip area do not necessarily apply since the LDPC decoder does not need to assume an integrated circuit form factor, as in the case of a mobile hand-held device, where power consumption is a primary concern.Furthermore, GPUs seamlessly integrate into a post-processing computer system, and provide increasing computational performance at low cost with each successive architecture generation [47].The information throughput results presented in this work were measured using a single GPU, however, further computational speedup can be achieved with multiple GPUs.
The remainder of this paper focuses on the design and implementation of high-efficiency LDPC codes for reverse reconciliation in CV-QKD systems that operate in the low-SNR regime at long distances.Section II presents the background on CV-QKD.Section III introduces the application of LDPC codes for information reconciliation in CV-QKD.Section IV presents the GPU-based LDPC decoder design and achievable secret key rate results with multi-dimensional reconciliation, as well as a comparison of this work to a recently published CV-QKD work that addresses the computational bottleneck of post-processing algorithms.

II. BACKGROUND
In a QKD system, two remote parties, Alice and Bob, communicate over a private optical quantum channel, as well as an authenticated classical public channel to generate a shared secret key in the presence of an adversary or eavesdropper, Eve, who may have access to both channels [2].The security of QKD stems from the no-cloning theorem of quantum mechanics, which states that any observation or measurement of the quantum channel by Eve would disturb the coherent states transmitted from Alice to Bob [38], [48].Since Alice and Bob can calibrate their expected channel noise threshold for a fixed fiber-optic transmission distance prior to being deployed in the field, any quantum measurement by Eve would result in a channel noise increase, at which point, the reconciliation error rate would increase, and Alice and Bob could choose to terminate their communication if they suspect a man-inthe-middle attack [28].A typical prepare-and-measure CV-QKD system is based on the Grosshans-Grangier 2002 (GG02) protocol [48], which defines the following four steps presented in Fig. 3: quantum transmission, sifting, reconciliation, and privacy amplification.Fully secure QKD networks can be built by designating intermediate trusted nodes [3], [4], or through measurement-device-independent QKD (MDI-QKD) using untrusted relay nodes in both CV-and DV-QKD [15], [49], [50].MDI-QKD is beyond the scope of this work, however, it does provide a viable solution to the quantum hacking problem by removing all detector side channels [49].
This section first provides a fundamental overview of QKD, and then presents the mathematical framework for key reconciliation using LDPC codes over multiple dimensions, with the consideration of finite-size effects on the secret key rate.

A. Quantum Transmission and Measurement
To construct a secret key using the prepare-and-measure CV-QKD protocol, Alice first transmits N quantum coherent states to Bob over an optical fiber.Each coherent state is comprised of a pair of amplitude and phase quadrature operators, x and p, of the form |x + jp , j = √ −1.Using a quantum random number generator, Alice prepares each coherent state by randomly selecting her x A and p A quadrature values according to a zeromean Gaussian distribution with adjustable modulation variance σ 2 A = V A N 0 , where N 0 represents the shot noise variance defined by the Heisenberg inequality ∆x∆p ≥ N 0 [38], [48].Alice transmits her train of N quantum coherent states to Bob by modulating a light source with a pulse repetition rate f rep .She also records her selections of x A and p A for the next sifting step, by constructing a vector, A, of length 2N quantum , from her N quantum coherent state quadrature operator pairs A ). Bob randomly selects and measures either the x or p quadrature for each incoming pulse using an unbiased homodyne detector.Bob constructs his own vector, B, of length N quantum , comprised of the observed modulated quadrature measurements, where B i ∈ {x B i , p B i } with equal probability.Despite the losses in the optical fiber, and the added noise from the Bob and Eve's detection equipment, the x B and p B quadrature measurements can still be used to distill a secret key following the sifting and reconciliation (error correction) steps.
Without considering the presence of the eavesdropper (Eve), the quantum transmission is subject to path loss, excess noise in the single-mode fiber between Alice and Bob, the inefficiency of Bob's homodyne detection, as well as added electronic (thermal) noise [28].The optical experimental setup is beyond the scope of this work, thus experimental values from previously published works have been used to characterize the quantum channel [38].The excess channel noise expressed in shot noise units is assumed to be = 0.005, Bob's added electronic noise in shot noise units is chosen as V el = 0.041, Bob's homodyne detector efficiency is set to η = 0.606, and the single-mode fiber transmission loss is assumed to be 0.2dB/km, such that the transmittance of the quantum channel is given by T = 10 −αd/10 , where d is the transmission distance in kilometers and α = 0.2dB/km.The total noise between Alice and Bob is given by χ total = χ line + χhom T , where χ line = ( 1 T − 1) + is the total channel added noise referred to the channel input, and χ hom = 1+Vel η −1 is the noise introduced by the homodyne detector.The variance of Bob's measurement is given by σ 2 B = V B N 0 = ηT (V + χ total )N 0 .Although the adversary (Eve) may have access to the quantum channel, her presence is not considered in the channel characterization.Instead, the information leaked to Eve will be considered in the secret key rate calculation [48].
In the remaining post-processing steps of the QKD protocol, Alice and Bob communicate over an authenticated classical public channel, which is assumed to be noiseless and errorfree.Eve may have access to this channel, however, her eavesdropping does not introduce additional errors [48].

B. Sifting
Following the quantum transmission step, Alice's original transmission vector A contains twice as many elements as Bob's measurement vector B. In the sifting step, Bob informs Alice which of the x B or p B quadratures he randomly selected for each of his N quantum element measurements, such that Alice may respectively discard her N quantum unused x A and p A quadrature values [48].Af-ter sifting, Alice and Bob share correlated random sequences of length N quantum , herein defined as X 0 = (X 01 , X 02 , . . ., X 0 N quantum ) and Y 0 = (Y 01 , Y 02 , . . ., Y 0 N quantum ), respectively, where (X 0i , Y 0i ), i = 1, 2, . . ., N quantum , are independent and identically distributed realizations of some jointly Gaussian random variables (X 0 , Y 0 ).For example, Alice and Bob may have the following random sequences after sifting: In the following reconciliation and privacy amplification steps, Alice and Bob apply error correction and hashing techniques to build a secret key using their sifted sequences of correlated quadrature measurements.

C. Reconciliation
During information reconciliation, Alice and Bob perform the first step in building a unique secret key by: (1) encoding a randomly-generated message using the sifted quadrature measurements, (2) transmitting the encoded message over an authenticated classical channel, and then (3) applying an errorcorrection scheme to decode the original message [38], [51].In the direct reconciliation scheme, Alice generates and transmits a random message to Bob, who then performs the errorcorrection decoding based on his measured quadratures.However, previous works have shown that the transmission distance with direct reconciliation is limited to about 15km [52]- [54], and is thus not suitable for long-distance CV-QKD targeting transmission distances beyond 100km [32].
The long distance problem drives the need for an alternate, robust scheme that is capable of operating under the low-SNR conditions of the optical channel, even in the presence of excess noise introduced by an eavesdropper.In the reverse reconciliation scheme, the direction of classical communication between Alice and Bob is reversed.Reverse reconciliation achieves a higher secret key rate at longer distances in comparison to direct reconciliation, however, powerful error-correction codes are still required to combat the high channel noise at long distances without revealing unnecessary information to Eve during the reconciliation process [28], [32], [51].
Two-way interactive error-correction protocols such as Cascade or Winnow are not practical for long-distance QKD due to the large latency and communication overhead required to theoretically minimize the information leakage to Eve [55]- [58].Blind reconciliation using short block-length codes on the order of 10 3 bits with low interactivity was proposed to reduce decoding latency [26], however, the short block length is not suitable for error-correction at low SNR.Instead, one-way forward error-correction implemented using long block-length codes with iterative soft-decision decoding is required to achieve efficient error-correction at low SNR [30], [51].Jouguet et al. recently showed that multiedge LDPC codes combined with a multi-dimensional reverse reconciliation scheme can achieve near-Shannon limit errorcorrection performance at long distances [28], [32].However, the computational complexity of LDPC decoding remains a limitation to the maximum achievable secret key rate in a practical QKD implementation [25].Sections III and IV of this work present hardware-oriented optimization techniques to alleviate the time-intensive bottleneck of LDPC decoding for long distance CV-QKD systems, while the remainder of this section outlines the mathematical framework for long-distance reverse reconciliation.
1) Reconciliation at Long Distances: Strong errorcorrection schemes do not exist for systems with both a Gaussian input and Gaussian channel, as in the case of CV-QKD.However, at low SNR, the maximum theoretical secret key rate is less than 1 bit/pulse per channel use, and the Shannon limit of the additive white Gaussian noise (AWGN) channel approaches the limit of a binary-input AWGN channel (BIAWGNC) [51].This makes binary codes highly suitable for error correction in the low-SNR regime [59], [60], as opposed to non-binary codes, which outperform binary codes on channels with more than 1 bit/symbol per channel use [61].Since binary codewords can be encoded in the signs of Alice and Bob's correlated sequences, X 0 and Y 0 , the reconciliation system can therefore be modelled as a BIAWGNC [32].
2) Reverse Reconciliation Algorithm for BIAWGNC: A virtual model of the BIAWGNC can be induced from the physical parameters that characterize the quantum transmission [32], where the optical fiber and homodyne detector losses are captured in the form of a signal-to-noise ratio with respect to the optical input signal, whose variance is normalized based on Alice's modulation variance V A .Assuming that the BIAWGNC has a zero mean and noise variance of σ 2 Z , Z ∼ N (0, σ 2 Z ), the SNR can be expressed as s = 1/σ 2 Z .In order to perform key reconciliation, Alice and Bob now construct two new correlated Gaussian sequences from their sifted correlated sequences X 0 and Y 0 of length N quantum .Alice and Bob first select a subset of n elements from X 0 and Y 0 , where n < N quantum .Here, n is chosen to be equivalent to the LDPC code block length.Alice and Bob then normalize their subset of n elements by the modulation variance V A , such that Alice and Bob now share correlated Gaussian sequences X and Y, each of length n, where X ∼ N (0, 1), Y ∼ N (0, 1 + σ 2 Z ), and the property Y = X + Z holds [32].
Bob uses a quantum random number generator to generate a uniformly-distributed random binary sequence S of length k, where S i ∈ {0, 1}.He then performs a computationally inexpensive LDPC encoding operation to generate an LDPC codeword C of length n, where C i ∈ {0, 1}, by appending (n − k) redundant parity bits to S based on a binary LDPC parity-check matrix H that is also known to Alice.Eve may also have access to H, however, the QKD security proof still holds since Eve is assumed to have infinite mathematical genius.Bob prepares his classical message to Alice, M, by modulating the signs of his correlated Gaussian sequence Y with the LDPC codeword C, such that M i = (−1) Ci Y i for i = 1, 2, . . ., n.The symmetry in the uniform distribution of Bob's random binary sequence S ensures that the transmission of M over the authenticated classical public channel does not reveal any additional information about the secret key to Eve [38].
Assuming error-free transmission over the classical channel, Alice attempts to recover Bob's codeword using her correlated Gaussian sequence X based on the following division operation: for i = 1, 2, . . ., n.Here, Alice observes a virtual channel with binary input (±1) and additive noise (−1) Ci Zi Xi .In this case, the division operation in the noise term represents a fading channel, however, since Alice knows the value of each X i , the norm of X is revealed and the overall channel noise remains Gaussian with zero mean and variance σ 2 N i = σ 2 Z /||X i || for each i = 1, 2, . . ., n [32].Alice then attempts to reconstruct S by applying the computationally intensive Sum-Product belief propagation algorithm for LDPC decoding, to remove the channel noise from her received vector R.The LDPC decoding algorithm requires the channel noise variance σ 2 N i to be known for each i = 1, 2, . . ., n.By discarding the (n − k) parity bits from the decoded codeword, Alice can build an estimate Ŝ of Bob's original binary sequence S for further post-processing in the next privacy amplification step to asymptotically reduce Eve's knowledge about the secret key [48].LDPC decoding is successful if Ŝ = S, whereas a frame error is said to have occurred when Ŝ = S.
3) Multi-Dimensional Reconciliation: Up until this point, the discussion has assumed a 1-dimensional reconciliation scheme in R, with ±1 binary inputs on the virtual BIAWGNC.Leverrier et al. showed that the quantum transmission can be extended to longer distances with proven security by employing multi-dimensional reconciliation schemes constructed from spherical rotations in R 2 , R 4 , and R 8 , where the multiplication and division operators are defined [30], [51].These spaces are commonly referred to as the set of complex numbers C, the quaternions H, and the octonions O, respectively.As shown in Eq. 1, the division and multiplication operations must be defined for the reverse reconciliation procedure.By Hurwitz's theorem of composition algebras, normed division is only defined for four finite-dimensional algebras: the real numbers R (R d=1 ), the complex numbers C (R d=2 ), the quaternions H (R d=4 ), and the octonions O (R d=8 ) [62].Hence, the remainder of this discussion considers only the d = 1, 2, 4, 8 dimensions.
The multi-dimensional approach is a further reformulation of the reduction of the physical Gaussian channel to a virtual BIAWGNC at low SNR.For d-dimensional reconciliation, d ∈ {1, 2, 4, 8}, each consecutive group of d quantum coherentstate transmissions from Alice to Bob can be mapped to the same virtual BIAWGNC.As a result, the channel noise variance among all d virtual channels is uniform.For the d = 1 case, each R i defined in Eq. 1 has a unique channel noise variance defined by σ 2 which are constructed from the quadrature transmission of successive (M 2i−1 , M 2i ) pairs for i = 1, 2, . . ., n/2 in R d=2 .Similar to d = 1, each ith received value is still comprised of a ±1 binary input and a noise term, such that R 2i−1 = (−1) C2i−1 + N 2i−1 and R 2i = (−1) C2i + N 2i for i = 1, 2, . . ., n/2.While the real and imaginary noise components, N 2i−1 and N 2i , are not equal, the variance of the channel noise is uniform over both dimensions, such that σ 2 N (2i−1) = σ 2 N (2i) for each (R 2i−1 , R 2i ) pair.This can be extended to the d = 4 and d = 8 cases, where each d-tuple of successive R i values has a unique channel noise for each dimensional component, but the channel noise variance remains equal over all d dimensions.For example, for d = 4, each received 4-tuple, (R 4i−3 , R 4i−2 , R 4i−1 , R 4i ) for i = 1, 2, . . ., n/4, has a unique noise term for each of its four components, but the channel noise variance over all four dimensions remains uniform.
The following derivation extends Alice's message reconstruction calculation presented in Eq. 1 to d-dimensional vector spaces, d ∈ {1, 2, 4, 8}, where the multiplication and division operators are defined.The derivation of the channel noise for d = 2, 4, 8 is much more rigorous than for d = 1, however, the procedure can be simplified by applying associative and distributive algebraic properties that hold true for the complex, quaternion, and octonion vector spaces.It follows then that ( Here, R, M, X, Y, and Z are d-dimensional vectors, and U is the d-dimensional vector comprised of (−1) Ci components.For example, for The multi-dimensional noise for a virtual BIAWGNC is given by the term N = (UZX * )/||X|| 2 , such that R = U + N. The Cayley-Dickson construction can then be applied to complete the derivation of the multi-dimensional noise N for d = 2, 4, 8.
Since the noise is uniformly distributed in each dimension, C can be assumed to be the all-zero codeword, i.e.C i = 0 for all i = 1, 2, . . ., n, to further simplify the derivation.For d = 2, the channel noise of both the real and imaginary components can be expressed as for i = 1, 2, . . ., n/2.It follows then that the channel noise variance for d = 2 is given by σ The noise derivation for d = 4 and d = 8 is much longer and not included here.

4) Reconciliation Efficiency:
The reverse reconciliation algorithm for the BIAWGNC can be reduced to an asymmetric Slepian-Wolf source-coding problem with input M and side information X, where Alice and Bob observe correlated Gaussian sequences X and Y, respectively [57], [63].Since Alice must discard (n − k) parity bits from the linear block code after LDPC decoding, it follows then that the efficiency of the reverse reconciliation algorithm is given by β = Rcode I(X;Y ) , where I(X; Y ) is the mutual information between X and Y, and R code is the LDPC code rate defined as R code = k/n from the n-length codeword C and k-length random information string S [32], [63].The mutual information I(X; Y ) corresponds to the Shannon capacity of the quantum channel, hence the reconciliation efficiency can be expressed more simply as: where C(s) = 1 2 log 2 (1 + s) is the Shannon capacity and s is the SNR of the BIAWGNC.The Shannon capacity defines the maximum achievable code rate R max for a given SNR, and thus, the β-efficiency characterizes how close the reconciliation algorithm operates to this fundamental limit [60].
The reconciliation efficiency plays a crucial role in the performance of CV-QKD.The β-efficiency at a particular SNR operating point determines the code rate, and ultimately, the number of parity bits discarded in each message.Assuming that the LDPC coding scheme has been optimized for a particular SNR operating point such that the code rate R code is fixed, the reconciliation efficiency then depends solely on the SNR of the quantum channel, which is a function of Alice's coherentstate modulation variance and the physical transmission losses in the optical fiber.Hence, for a fixed optical transmission distance between Alice and Bob, the reconciliation efficiency can be optimized by tuning Alice's modulation variance V A , and designing an optimal error-correction scheme for a target SNR.Sections III and IV explore how changes in the βefficiency affect the reconciliation distance and maximum achievable secret key rate.

D. Privacy Amplification
Since Eve may have collected sufficient information during her observations of the quantum and classical channels, Alice and Bob must asymptotically reduce Eve's knowledge of the key by independently applying a shared universal hashing function on a concatenated block of their independent binary strings Ŝ and S, in order to generate a unique symmetric key [38], [48].Alice first discards her erroneously decoded Ŝ messages, and informs Bob as to which messages she discarded.Bob then discards his original S messages that correspond to the Ŝ messages that were discarded by Alice.Alice concatenates all of her correctly decoded Ŝ messages to construct a long secret key block of length N privacy = mk bits, where k is the length of the LDPC-decoded message Ŝ, and m is some large non-zero integer.Bob also concatenates his corresponding S messages to construct a long secret key, also of length N privacy bits.Alice and Bob then independently perform universal hashing on their independent secret key blocks to reduce Eve's knowledge of the key.Alice and Bob can use the resulting symmetric key to encrypt and decrypt messages with perfect secrecy using the one-time pad technique [3].
The speed of privacy amplification is an active area of research, with published results showing maximum speeds of 100Mb/s for a block size of N privacy = 10 8 bits [64].The computational complexity of universal hashing can be reduced from O(n 2 ) to O(n log 2 n) by applying the fast Fourier transform (FFT) or number theoretical transform (NTT) on a Toeplitz matrix [65].Estimation of security parameters is also performed during privacy amplification using (N quantum − N privacy ) bits, however, a complete discussion of parameter estimation and privacy amplification is beyond the scope of this work.

E. Maximizing Secret Key Rate with Collective Attacks
The primary metric that defines the performance of a QKD system is the maximum rate at which Alice and Bob can securely generate and reconcile keys over a fixed-distance optical fiber in the presence of an eavesdropper that has access to both the quantum and classical channels.The maximum secret key rate must be proven secure against a collective Gaussian attack, the most optimal man-in-the-middle attack, where Eve first prepares an ancilla state to interact with each one of Alice's coherent states during the quantum transmission, and then listens to the public communication between Alice and Bob during the reconciliation step in order to perform the most optimal measurement on her collected ancillae to reconstruct the classical messages transmitted by Bob [38].Assuming perfect error-correction during the reconciliation step, the maximum theoretical secret key rate for a CV-QKD system with one-way reverse reconciliation can be defined as where I AB is the mutual information between Alice and Bob, β is the previously defined reconciliation efficiency, and χ BE is the Holevo bound on the information leaked to Eve.Here, I AB is equivalent to the Shannon channel capacity, and is defined as where V = V A + 1, V A is Alice's adjustable modulation variance, and χ total is the total noise between Alice and Bob.The Holevo bound is defined as where G(x) = (x + 1) log 2 (x + 1) − x log 2 x, and the Eigenvalues λ 1,2,3,4 are given by .
Here, K opt represents the asymptotic limit on the secret key rate based on ideal theoretical security models, and does not consider the imperfections of a practical CV-QKD system, which might enable additional side-channel attacks [66].Such imperfections include the finite-size effects [67]- [69], excess electronic and phase noise from uncalibrated optical equipment, as well as discretized Gaussian modulation with finite bounds on the distribution and randomness [66].Leverrier proved that CV-QKD with coherent states provides composable security against collective attacks [70], however, extending the information-theoretic security proofs from collective attacks to general attacks in the finite-size regime of CV-QKD is currently an active area of research [69], [71].At the time of writing, the highest CV-QKD key rates can be achieved using coherent states and homodyne detection with security against collective attacks and some finite-size effects [28].The motivation of this work is to show that the key reconciliation (error correction) algorithm can be accelerated such that the throughput of LDPC decoding is higher than the asymptotic secret key rate achievable using realistic quantum channel parameters and optical equipment available today.The finitesize effects on secret key rate are considered later in this section, while the other imperfections of a practical CV-QKD system are beyond the scope of this work.Optimizing Alice's modulation variance for each quantum transmission distance ensures a maximum SNR on the BI-AWGNC [32], and thus, a maximum achievable secret key  rate K opt for a particular β-efficiency.Figure 4 presents the optimal modulation variance V A as a function of β for quantum transmission distances up to 180km, assuming perfect errorcorrection in the reconciliation step.Figure 5 shows the corresponding maximum theoretical secret key rate K opt for CV-QKD based on the computed optimal V A at each distance.Pirandola et al. recently showed that there exists an upper bound on the secret key rate for a lossy channel [41].This fundamental limit is determined by the transmittance T of the fiber-optic channel, and is given by The transmittance was previously defined as T = 10 −αd/10 , where the distance d is expressed in kilometers and the stan-dard loss of a fiber optic cable is assumed to be α = 0.2dB/km.The upper bound is plotted in Fig. 5.
The BIAWGNC model for long-distance CV-QKD under investigation in this work has also been proven secure against collective attacks, thus the expression for the asymptotic secret key rate K opt still holds [32], [51].At long distances, I AB and χ BE are nearly equal, thus in order to maximize the secret key rate, it would appear that the reconciliation efficiency β must also be maximized.However, this is not necessarily true since K opt only provides an expression for the maximum achievable secret key rate and does not consider the speed of reconciliation, nor the uncorrectable errors.The frame error rate (FER) of the reconciliation algorithm must also be considered.

F. Frame Error Rate for Reverse Reconciliation
In reverse reconciliation, Alice attempts to construct a decoded estimate Ŝ of Bob's original message S in order to perform privacy amplification and build a secret key.The tree diagram in Fig. 6  After LDPC decoding, Alice performs a parity check ĈH to validate that her decoded codeword Ĉ is valid.When the parity check fails, i.e.ĈH = 0, Alice knows that a decoding error has occurred and the frame is discarded since it can not be used to generate a secret key.However, when the parity check passes, i.e.ĈH = 0, Alice knows that Ĉ is a valid codeword, however, she does not yet know if Ĉ is equal to Bob's original encoded codeword C.
For any binary linear block code, the number of possible codewords is 2 k = 2 nRcode .Thus, for codes with a long block-length n, the number of possible codewords grows exponentially, and it is possible for the decoder to converge to a valid codeword where the decoded message is incorrect, i.e.Ŝ = S.In coding theory, this is referred to as an undetected error.This scenario is problematic for secret key generation where both parties must share the same message after decoding in order to perform universal hashing in the next privacy amplification step.
In order to detect invalid decoding errors when C Ĥ = 0, a cyclic redundancy check (CRC) of Bob's original message S can be transmitted as part of the frame, and then verified against the computed CRC of Alice's decoded message Ŝ.where k information bits are comprised of (k−N CRC ) message bits and N CRC CRC bits, followed by (n − k) parity bits to be discarded after LDPC decoding.If the CRC results of S and Ŝ are equal, then the decoding is successful and Ŝ can be used to distill a secret key, otherwise Alice knows that a decoding error has occurred and Ŝ is discarded.The CRC needs to be performed only when the parity check passes, otherwise the frame is known to contain an error and the CRC is skipped.A truly undetected error occurs when both the parity check and CRC pass, but Ŝ = S.
A frame error is said to have occurred when Ŝ = S, i.e. when the decoding fails to reproduce the original message.Both detected and undetected errors contribute to the overall FER.The probability of frame error is defined as follows: From Fig. 6, it follows then that the detected and undetected error probabilities are defined as There exists a rare case not shown in Fig. 6, where the parity check passes and CRC fails, yet Ŝ = S.Although the decoded message is correct, it will be discarded by the decoder due to the failed CRC check.As a result, there is a rare chance that this frame will be lost and the secret key rate will be reduced.However, this case is not considered by convention in communication theory.

G. Impact of Reconciliation Error and Efficiency on Key Rate
The remainder of this work investigates the trade-offs in error-correction performance, reconciliation efficiency, reconciliation distance, and secret key rate, by assuming that the physical parameters of the quantum channel are fixed, and that Alice's modulation variance V A has been optimally set for each transmission distance and desired β-efficiency.In practice, the asymptotic secret key rate K opt is scaled by the FER since decoded frames with known error can not be used to generate a secret key and must therefore be discarded.As such, the effective secret key rate of a practical CV-QKD system is given by Alice and Bob can discard frames with detected error, while frames with undetected error further reduce the mutual information I AB between Alice and Bob.In Section III, it is empirically shown that P undetected error = 0 using a 32-bit CRC code, thus the total decoding FER can be expressed more simply as P e = P detected error .This simplified expression for the FER is assumed for the remainder of this work, and thus the effective secret key rate expression given by Eq. 9 can be reduced to Up until this point, the β-efficiency has been assumed to be independent of the reconciliation algorithm, however, as shown in Eq. 9, the effective secret key rate K eff is dependent on both β and FER.Given the set of optimal V A values and assuming that the physical operating parameters of the quantum channel remain constant, the virtual BIAWGNC channel can be induced and described solely in terms of the SNR at a particular distance with an effective secret key rate K eff .As described further in Section III, there exists a trade-off between reconciliation distance and effective secret key rate, such that for a single SNR, one of the following two operating conditions is possible: (1) long distance with a low secret key rate, or (2) short distance with a high secret key rate.In fact, for a fixed LDPC code rate R code , the SNR only depends on the reconciliation efficiency and is independent of transmission distance.From Eq. 3, the SNR of a virtual BIAWGNC can be expressed as a function of β such that From a code design perspective then, an optimal rate R code LDPC code can be designed to achieve a target FER at a particular SNR.Since Alice and Bob remain stationary once deployed in the field, their transmission distance remains fixed, and thus an optimal LDPC code can be designed to achieve the maximum operating secret key rate over a range of distances by providing the optimal trade-off between β and FER.

H. Secret Key Rate with Finite-Size Effects
The security of the CV-QKD protocol must account for the finite length of the secret key, which is generated via universal hashing in the privacy amplification step using a block of length of N privacy bits.Alice constructs her privacy amplification block from her correctly decoded Ŝ messages, while Bob constructs his privacy amplification block from his original corresponding S messages.Due to the finite block size, the secret key rate is reduced by an offset coefficient ∆(N privacy ) and scaling coefficient N privacy /N quantum , where N quantum is the number of symbols sent from Alice to Bob during the first quantum transmission step.The secret key rate, accounting for finite-size effects, is given by Leverrier et al. showed that N quantum can be arbitrarily chosen as N quantum = 2N privacy [67], and that when N privacy > 10 4 , the finite-size offset factor ∆(N privacy ) can be approximated as where a conservative choice for the security parameter is = 10 −10 [67].The LDPC block length n is not directly included in this expression, however, the LDPC block length does affect the reconciliation efficiency β and FER P e .The impact of β and P e on reconciliation distance is discussed in greater detail in the next section of this paper.
The next section presents a study of the optimal block size N privacy for privacy amplification, as well as an overview of the design and application of LDPC codes for reverse reconciliation in long-distance CV-QKD systems on the virtual BIAWGNC.

III. LDPC CODES FOR RECONCILIATION
The design of efficient reconciliation algorithms is one of the central challenges of long-distance CV-QKD [32].Early reconciliation algorithms failed to achieve efficiencies above 80% [72], while more advanced algorithms that now achieve 95% efficiency suffer from computational complexity [25], [28].LDPC codes are highly suitable for low-SNR reconciliation in CV-QKD due to their near-Shannon limit error-correction performance and absence of patent licensing fees [60].However, designing efficient LDPC codes with block lengths on the order of 10 6 bits remains a challenge.
This section introduces a complexity-reduction technique for LDPC code design that has been widely adopted in hardwarebased LDPC decoders, namely through the implementation of architecture-aware codes [73].A popular class of such codes are quasi-cyclic codes, whose parity-check matrices are constructed from an array of cyclically-shifted identity matrices that provide a sufficient degree of randomness and enable computational decoding speedup as a result of their highly parallelizable structure, which provides a simple mapping to hardware [40], [73].Previous independent works by Martinez-Mateo and Walenta have explored the application of existing QC-LDPC codes from the IEEE 802.11n standard for DV-QKD, however, these works were not able to demonstrate reliable reconciliation beyond 50km [26], [74].While this distance may have been a limitation of DV-QKD, the short block lengths of such existing QC-LDPC codes (on the order of 10 3 bits) remain unsuitable for long-distance CV-QKD.Recently, Bai et al. theoretically showed that rate 0.12 QC codes with block lengths of 10 6 bits can be constructed using progressive edge growth techniques, or by applying a QC extension to random LDPC codes with block lengths of 10 5 bits [75].However, the reported QC codes target an SNR of -1dB, and are thus not suitable for long-distance CV-QKD beyond 100km.At the time of writing, there has not been any reported investigation of the construction of QC codes for multi-edge LDPC codes targeting low-SNR channels below -15dB for long-distance CV-QKD.This work shows that by applying the structured QC-LDPC code construction technique to the random multi-edge LDPC codes previously explored by Jouguet et al. for long-distance CV-QKD [32], it is possible to construct codes that achieve sufficient errorcorrection performance while enabling the acceleration of the computationally-intensive LDPC decoding algorithm such that the reconciliation step is no longer the bottleneck for secret key distillation beyond 100km.
As previously described, there exists a trade-off between reconciliation distance and maximum secret key rate for a given β-efficiency.Once the target FER and operating SNR are known, an optimal LDPC code, i.e. parity-check matrix H, can be designed independent of other CV-QKD system parameters.The reverse reconciliation problem can thus be reduced to the simpler model shown in Fig. 8 as a result of the BIAWGNC approximation at low SNR.The variables shown in Fig. 8 are the same variables described in Section II, where S is Bob's random binary sequence, C is Bob's binary LDPC-encoded codeword, the Gaussian channel is described by Alice's correlated Gaussian sequence, and Ŝ is Alice's LDPCdecoded estimate of Bob's original random binary sequence.Alice and Bob also share a predefined parity-check matrix H for encoding and decoding.
This work demonstrates the application of multi-edge QC-LDPC codes for long-distance CV-QKD through the design of several rate 0.02 binary parity-check matrices with block lengths on the order of 10 6 bits.While a complete QKD system would offer multi-rate code programmability for various operating channels, this work focuses on the design of a single, low-rate code for a large range of transmission distances to fully study the effects of β-efficiency and FER on the maximum achievable secret key rate and reconciliation distance.While some works have explored the use of rate-adaptive or repetition codes to achieve high-efficiency decoding with multiple code rates [32], the exploration of multi-rate code design for long-distance CV-QKD is beyond the scope of this work.The remainder of this section describes the construction of LDPC codes, the belief propagation decoding algorithm, the error-correction performance of the designed rate 0.02 codes, and the achievable secret key rates for multiple β-efficiencies beyond 100km.

A. General Construction of LDPC Codes
LDPC codes are a class of linear block codes defined by a sparse parity-check matrix H of size (n − k) × n, k ≤ n, and code rate R code = k/n [76].Given H, an equivalent definition of the code is given by its Tanner graph [77].A Tanner graph G is a bipartite graph with two independent vertex sets, known as check nodes (CNs) and variable nodes (VNs), which correspond to the rows and columns of H, respectively.As shown in Fig. 9, an edge between VN v i and CN c j belongs to G if and only if H(j, i) = 1.
An LDPC code of length n is fully specified by the number of variable and check nodes, and their respective degree distributions.The number of edges connected to a vertex in G is called the degree of the vertex.The degree distribution of G is a pair of polynomials ω(x) = i ω i x i and ψ(x) = i ψ i x i , where ω i and ψ i respectively denote the number of variable and check nodes of degree i in G.
The performance of tree-like Tanner graphs can be analyzed using a technique called density evolution [59].As n → ∞, the error-correction performance of Tanner graphs with the same degree distribution is nearly identical [59].Hence, the variable and check node degree distributions can be normalized to Ω(x) = i ωi n x i and Ψ(x) = i ψi n−k x i , respectively.The design of binary LDPC codes of rate R code and block length n consists of a two-step process.First, find the normalized degree distribution pair (Ω(x), Ψ(x)) of rate R code with the best performance.Then, if n is large, randomly sample a Tanner graph G that satisfies the degree distribution defined by ω(x) and ψ(x) (up to rounding error), and find the corresponding parity-check matrix H. Unfortunately, this method is non-trivial in the design of low-rate codes that approach Shannon capacity at low SNR.

B. Multi-Edge LDPC Codes
Multi-edge LDPC codes, first introduced by Richardson and Urbanke, provide two advantages over standard LDPC codes: (1) near-Shannon capacity error-correction performance for low-rate codes, and (2) low error-floor performance for highrate codes [39].The latter is not a significant concern for longdistance CV-QKD where the reconciliation FER is on the order of 10 −1 , however, the design of a high-performance low-rate code is crucial to achieving high β-efficiency [32].
The multi-edge framework can be applied to both regular and irregular LDPC codes with uniform and non-uniform vertex degree distributions, respectively, by introducing multiple edge types into the Tanner graph specifying the code [39].In a standard LDPC code, the polynomial degree distributions are limited to a single edge type, such that all variable and check nodes are statistically interchangeable.In order to improve performance, multi-edge LDPC codes extend the polynomial degree distributions to multiple independent edge types with an additional edge-type matching condition [39].
To describe the design of multi-edge LDPC codes, let the potential connections of a variable or check node be called its sockets.Let the vector d = (d 1 , d 2 , . . ., d t ) be a multiedge node degree of t types.A node of degree d has d 1 sockets of type 1, d 2 sockets of type 2, etc.When generating a Tanner graph, only sockets of the same type can be connected by an edge of that type.Multi-edge normalized degree distributions are straightforward generalizations based on multiedge degrees Ω(x 1 , x 2 , . . ., where Ω d1,d2,...,dt and Ψ d1,d2,...,dt are the respective fractions of variable and check nodes with d 1 edges of type 1, d 2 edges of type 2, etc.The rate of a multi-edge LDPC code is then defined as R code = Ω(1) − Ψ(1), where 1 denotes the all-ones vector with implied length [39].
The multi-edge LDPC code used in this work is rate 0.02 with normalized degree distribution This degree distribution was designed by Jouguet et al. by modifying a rate 1/10 multi-edge degree structure introduced by Richardson and Urbanke [32], [39].Structurally, the subgraph induced by edges of type 1 corresponds to a high-rate LDPC code with variable degree 3 and check degree 7 or 8. Parity checks are added to this code, each checking two or three variable nodes as specified by edges of type 2. The resulting parity bits are degree-1 variable nodes specified by edges of type 3.For the BIAWGNC, the minimum SNR for which the tree-like Tanner graph with this multi-edge degree distribution is error free is 2.863 × 10 −2 or -15.47dB [32].
The LDPC parity-check matrices in this work were generated by randomly sampling Tanner graphs that satisfied the multi-edge degree distribution defined by ω(x) and ψ(x), and the edge-type matching condition.The random sampling technique does not degrade code performance in this case, since the operating FER is known to be high (P e ≈ 10 −1 ).At such high FER, the error-floor phenomenon is not a significant concern as the code is strictly designed to operate in the waterfall region in order to achieve high β-efficiency [78].The rate 0.02 LDPC codes explored in this work target a block length of n = 1 × 10 6 bits in order to achieve near-Shannon capacity error-correction performance.As a result, the paritycheck matrix H has dimensions n − k = n(1 − R code ) = 9.8 × 10 5 by n = 1 × 10 6 .Due to the low code rate and large block length, the random parity-check matrix construction does introduce LDPC decoder implementation complexity, which directly affects decoding latency and maximum achievable secret key rate.The LDPC decoder implementation complexity for such a code can be reduced with minimal degradation in error-correction performance by imposing a quasi-cyclic structure to the parity-check matrix.

C. Quasi-Cyclic LDPC Codes
While purely-random LDPC codes have been shown to achieve near-Shannon capacity error-correction performance under belief propagation decoding [79], the hardware-based implementation of decoders for random codes is a challenge with large block lengths, especially on the order of 10 6 bits.The bottleneck stems from the complex interconnect network between CN and VN processing units that execute the belief propagation algorithm [45], [46].
In a traditional LDPC decoder, variable and check node processing units iteratively exchange messages across an interconnect network described by a Tanner graph, as shown in Fig. 9. Purely-random parity-check matrices introduce unstructured interconnect between the variable and check nodes, resulting in unordered memory access patterns and complex routing, which limit scalability in ASIC or FPGA implementations.Quasicyclic codes alleviate such decoder complexities by imposing a highly-regular matrix structure with a sufficient degree of randomness [40].This work extends the design of low-rate, mutli-edge LDPC codes to QC codes for hardware realization.
QC codes are defined by a parity-check matrix constructed from an array of q × q cyclically-shifted identity matrices or q × q zero matrices [40].As shown in Fig. 10, the tilings evenly divide the (n − k) × n parity-check matrix H into n/q QC macro-columns and (n − k)/q QC macro-rows.The expansion factor q in a QC parity-check matrix determines the trade-off between decoder implementation complexity and error-correction performance.For small q, the parity-check matrix exhibits a high degree of randomness, which improves error-correction performance, while a large q reduces decoder complexity with some performance degradation.Fig. 10: Sample quasi-cyclic binary parity-check matrix for q = 5 constructed from uniformly-sized (q × q), cyclically-shifted identity matrices and all-zero matrices.(a) Full q = 50 QC parity-check matrix structure with 1 × 10 6 columns and 9.8 × 10 5 rows.Empty space represents zeros.To design a multi-edge QC-LDPC code, repeat the random multi-edge sampling process using n/q as the block length instead of n to obtain a base Tanner graph G B .The base paritycheck matrix H B is obtained from G B by populating each nonzero entry by a random element of the set {1, 2, . . ., q}.Let I i be the circulant permutation matrix obtained by cyclically shifting each row of the q × q identity matrix to the right by i − 1.The QC parity-check matrix H is obtained from H B by replacing each non-zero entry of value i by I i , and each zero entry by the q × q all-zeros matrix.
In this work, multi-edge QC-LDPC parity-check matrices of rate 0.02 were generated for expansion factors q ∈ {21, 50, 100, 500, 1000}.Under belief propagation decoding, the error-correction performance of the q ∈ {100, 500, 1000} QC codes was significantly worse in comparison to a random multi-edge code with the same degree distribution.Thus, only the q ∈ {21, 50} codes are presented in the remainder of this study.In order to maintain the same degree distributions, the block length for the q = 21 code with rate R code = 0.02 was adjusted to n = 1.008×10 6 bits.Similarly, the q = 50 code has a block length of n = 1 × 10 6 bits and rate R code = 0.01995.
Figure 11 shows the structure of the parity-check matrices designed in this work.Both the purely-random and QC matrices have a similar structure, which contains a dense area of 1s or cyclic identity matrices on the left, and a long diagonal of degree-1 VNs to the right.The starting point of the diagonal is determined by the VN degree distribution of the multiedge matrix.In the case of the QC codes, no cyclic shifts are implemented along the diagonal, thus all submatrices are q × q I 1 identity matrices.This matrix structure greatly improves the decoding speed as degree-1 VNs along the diagonal need to pass VN-to-CN messages only in the first decoding iteration, while CN-to-VN messages need to be passed to degree-1 VNs only if the early-termination condition is enabled.The degree-1 VNs along the diagonal correspond to the majority (but not all) of the (n − k) parity bits that are discarded after decoding, thus the VN update computation needs to be performed in these degree-1 VNs only if a decision needs to be made when early termination is enabled.A small fraction of the (n − k) parity bits correspond to VNs with more than one CN connection in the denser area of the matrix to the left of the diagonal.These VNs must perform the VN update computation in each iteration along with the first k VNs, which correspond to the k information bits of the block.
The parity component of H, i > k in H(j, i), is lowertriangular for both the purely-random and QC parity-check matrices designed in this study.An example of this type of construction is shown in Fig. 10, and is also illustrated in Fig. 11.While the lower-triangular construction does not necessarily impact decoding complexity or error-correction performance, it does simplify the LDPC encoding procedure, which can be performed via forward substitution if H is of this form.Further investigation of LDPC encoding complexity for such large codes is beyond the scope of this work.

D. LDPC Decoding: Belief Propagation Algorithm
LDPC decoding is performed using belief propagation, an iterative message-passing algorithm commonly used to perform inference on graphical models such as factor graphs [80].In the context of LDPC, the decoding procedure attempts to converge on a valid codeword by iteratively exchanging probabilistic updates between variable and check nodes along the edges of the Tanner graph until the parity-check condition is satisfied or the maximum number of iterations is reached.The Sum-Product algorithm is the most common variant of belief propagation [80], and is described in Algorithm 1 for d = 1 dimensional reconciliation on a BIAWGNC.
In Algorithm 1, Step 1 prepares the Q v log-likelihood ratio (LLR) input values at each variable node based on Alice's correlated Gaussian sequence X and the channel noise variance σ 2 Z .All VN-to-CN messages from VN v are initialized to the received channel LLR Q v before the first message-passing iteration.As previously discussed in Section II, the expression for the channel noise variance σ 2 Nv is different for d = 2, 4, 8 reconciliation schemes, but the remainder of the algorithm is the same.Steps 2 to 5 specify the message-passing interaction between the CNs and VNs until the codeword syndrome defined by ĈH is equal to zero, or the maximum predetermined Algorithm 1 Sum-Product algorithm for d = 1 scheme Step 2: Check node update (CN-to-VN messages) sgn(m Step 3: Variable node update (VN-to-CN messages) Step 4: Hard decision and early termination check Due to the non-linearity of the tanh(x) function in the Sum-Product algorithm, most hardware-based LDPC decoders instead implement variants of the Min-Sum algorithm [45], [46], which provides an acceptable approximation to Sum-Product decoding without the need for complex lookup tables.Despite the benefit of computational speedup, Min-Sum does not perform well at low SNR [81], and is thus not suitable for long-distance CV-QKD.The results presented in the remainder of this work were achieved using Sum-Product decoding.

E. Error-Correction Performance of Multi-Edge QC Codes
The multi-edge LDPC codes designed in this work achieve similar FER performance on the BIAWGNC compared to those developed by Jouguet et al. for long-distance CV-QKD with multi-dimensional reconciliation [32].Table I    for the complete linear SNR range corresponding to the range of efficiencies between β = 0.8 and β = 0.99, as defined by Eq. 11.This range of β-efficiency values was chosen to illustrate the trade-off between distance and finite secret key rate.For clarity, however, Figures 12 and 13 present the FER results only for the SNR range corresponding to β-efficiencies between β = 0.88 and β = 0.99.
Despite their identical degree distributions, the q = 50 QC code achieves the best overall FER performance over d = 1, 2, 4, 8 dimensions in comparison to the purely-random and q = 21 QC codes, due to its slightly lower code rate.At low SNR where β is high, the q = 21 QC code also performs better than the purely-random code over all dimensions, likely due to the longer block length.At higher SNR though, the purelyrandom code achieves a lower error-floor than the q = 21 QC code due to higher randomness in the parity-check matrix.
In general, the d = 2, d = 4, and d = 8 reconciliation schemes achieve approximately 0.04dB, 0.08dB, and 0.2dB of coding gain, respectively, over the d = 1 scheme in the waterfall region for all three codes.As previously mentioned, FER performance in the waterfall region is of particular interest for long-distance CV-QKD since it corresponds to the high β-efficiency region of operation at low SNR close to the Shannon limit.The error-floor region beyond the waterfall is not of practical use in CV-QKD as it corresponds to the low β-efficiency region where transmission distance is limited.
As previously discussed in Section II, for any binary linear block code, the number of possible codewords is 2 k = 2 nRcode .In this case, when n = 1 × 10 6 bits and R code = 0.02, the number of possible valid codewords for the decoder to choose from is approximately 4 × 10 6020 .In order to detect invalid decoding errors when the parity check ĈH = 0 but Ŝ = S, a 32-bit CRC code is included in each LDPC frame.In this work, N CRC = 32 bits were sufficient to detect all invalid decoded messages without sacrificing information throughput.Having full control of the simulation environment, it was also empirically found that P undetected error = 0 using a 32-bit CRC code.
The probability of an invalid decoding error is given by

Number of CRC Errors Total Number of Frame Errors
.
Figure 14 shows the probability of an invalid decoding error over the SNR range of interest for d = 1, 2, 4, 8 reconciliation dimensions on the BIAWGNC for the three LDPC codes designed in this work.In general, the probability of invalid decoding increases with SNR and becomes the main source of frame error, particularly in the error-floor region as a result of the large block length and low code rate.In the region of operation for long-distance CV-QKD where the FER P e ≈ 1, invalid decoding convergence still contributes to nearly 10% of all frame errors.A concatenated higher-rate code was not included as part of the message component to correct residual errors [32], [38].Up until this point, the performance of the reconciliation algorithm has been presented as a coding theory problem, where an optimal LDPC code was designed to achieve a particular FER at a given SNR operating point.The SNR was considered as an abstraction of the virtualized BIAWGNC in order to demonstrate fixed-rate code performance, independent of other CV-QKD system parameters such as modulation variance, transmission distance, and physical losses.Assuming that the transmission distance and physical parameters of the quantum channel are fixed, Alice's modulation variance can be optimally tuned such that the effective secret key rate is then solely determined by the FER and β-efficiency of the LDPC-decoding reconciliation algorithm.
Figure 15 shows that for each fixed-rate LDPC code, there exists a unique FER-β pair, where each β corresponds to a particular SNR operating point based on Eq. 11.While it may appear from Eq. 10 that maximizing β would produce a higher effective secret key rate, Fig. 15 shows that β and FER are positively correlated, such that there exists an optimal tradeoff between β and FER where K eff is maximized for a fixed transmission distance.To achieve key reconciliation at long distances, the operating point must be chosen in the waterfall region where β is high, despite the high FER.
The results presented in this section showed that higher reconciliation schemes, namely d = 4 and d = 8, extend code performance to lower SNR where the FER P e > 0 and β → 1.As such, the d = 8 scheme would be most preferred for long-distance reconciliation.The next section examines the impact of reconciliation dimension, β-efficiency, and FER on the effective secret key rate over a range of transmission distances for the LDPC codes designed in this work.

F. Finite Secret Key Rate
This section extends the discussion of the effective secret key rate to include finite-size effects.Key reconciliation for a particular β-efficiency is only achievable over a limited range of distances where the finite secret key rate K finite > 0. In general, for a single FER-β pair, LDPC decoding can achieve either (1) a high secret key rate at short distance, or (2) a low secret key rate at long distance.For long-distance CV-QKD beyond 100km, key reconciliation is only achievable with high β-efficiency at the expense of low secret key rate.This section provides an overview of the maximum achievable finite secret key rates and reconciliation distances for the three LDPC codes designed in this work.Results are presented for the d = 1 and d = 8 reconciliation dimensions in order to demonstrate the effectiveness of higher-order dimensionality on reconciliation distance.The results also consider the finite size of the privacy amplification block.The d = 1 and d = 8 secret key rate results are shown for privacy amplification blocks of N privacy = 10 10 and N privacy = 10 12 bits to demonstrate the impact of block size on the maximum transmission distance.
The range of transmission distances for each β is limited by the total noise between Alice and Bob.From Eq. 5, the total noise can be expressed as a function of β, such that where V opt A (β) is a vector of Alice's optimal modulation variances for a particular β-efficiency from Fig. 4, and the virtual SNR s(β) is given by Eq. 11 for a fixed-rate LDPC code.From the expression for the total channel noise, χ total = χ line + χhom T , a set of transmission distance points for a particular β can then be described by the vector in order to compute the maximum theoretical finite secret key rate based on Eq. 12.
Figures 16 and 17 present the finite secret key rate results for the three LDPC codes over the transmission distance range of interest with N privacy = 10 10 bits based on the d = 1 and d = 8 reconciliation dimensions, respectively.Each βefficiency curve in Figures 16 and 17 represents a FER-β pair where the FER and SNR are constant over the entire transmission distance range, while V A is optimally chosen to achieve the maximum secret key rate at each distance point.When β is high, the FER P e → 1, and thus K finite → 0 as erroneous frames are discarded after decoding.As a result, the maximum reconciliation distance is limited by the errorcorrection performance of the LDPC code.Figures 18 and 19 present the finite secret key rate results with N privacy = 10 12 bits over d = 1 and d = 8 reconciliation dimensions, respectively.When N privacy = 10 12 bits, the maximum transmission distance is extended by 18km over the result with N privacy = 10 10 bits for d = 8 reconciliation with β = 0.99 efficiency.This demonstrates the importance of selecting a large block size for privacy amplification.
In each of the N privacy = 10 10 and N privacy = 10 12 cases, the three LDPC codes achieve similar finite secret key rates  and reconciliation distances with both d = 1 and d = 8 schemes for β ≤ 0.92, since the codes are operating close to their respective error floors.However, for β > 0.92, the FER becomes a limiting factor to achieving a non-zero secret key rate.The d = 1 scheme achieves a maximum efficiency of β = 0.96, where the maximum distance is limited to 124km with N privacy = 10 10 bits, and 132km with N privacy = 10 12 bits.For β > 0.96, the FER P e = 1, thus K finite = 0.The d = 8 scheme operates up to β = 0.99 efficiency, with a maximum distance of 142km with N privacy = 10 10 bits, and 160km with N privacy = 10 12 bits.Furthermore, the d = 8 scheme achieves higher secret key rates for all three LDPC codes at β = 0.95 and β = 0.96 in comparison to the d = 1 scheme since the code FER performance is higher.While the complete results are not shown here, the d = 2 and d = 4 schemes both achieve a maximum efficiency of β = 0.97, at 129km with N privacy = 10 10 bits, and 138km with N privacy = 10 12 bits.The finite secret key rate K finite results presented in this section were normalized to the pulse rate, without consideration of the light source repetition rate f rep .By considering the pulse rate, the complete operating secret key rate of the CV-QKD system can be defined as The next section presents an overview of a GPU-based LDPC decoder implementation where the information throughput for the three LDPC codes designed in this work is compared to the upper bound on secret key rate at the maximum reconciliation distance points.
IV. GPU-ACCELERATED LDPC DECODING GPUs are a highly suitable platform for the implementation of LDPC decoders that target high information throughput with long block-length codes.Computational acceleration of the belief propagation algorithm is achieved by parallelizing the check and variable node update operations across thousands of single-instruction multiple-thread (SIMT) cores, which provide floating-point precision, high-bandwidth read/write access to on-chip memory, and intrinsic mathematical libraries for the logarithmic functions of the Sum-Product algorithm [82]- [86].
This section provides an overview of the GPU-based LDPC decoder implementation in this work.GPU throughput results are presented for the maximum CV-QKD distances under d = 1, 2, 4, 8 dimensional reconciliation, and also compared to the maximum achievable secret key rates for reconciliation efficiencies β > 0.85.Finally, the implementation is compared to previous work by Jouguet and Kunz-Jacques for an LDPC code with block length of 2 20 bits [25], as well as other non-LDPC codes.The GPU decoding throughput results presented in this section quantitatively highlight the computational speedup that can be achieved using quasi-cyclic LDPC codes for long-distance CV-QKD.

A. GPU-based LDPC Decoder Implementation
The LDPC decoder was implemented on a single NVIDIA GeForce GTX 1080 (Pascal Architecture) GPU with 2560 CUDA cores using the NVIDIA CUDA C++ application programming interface.Figure 20 shows the data flow for a single decoding iteration of the parallelized Sum-Product algorithm, which is comprised of four multi-threaded compute kernels.Each kernel instantiates a different number of GPU threads depending on the level of parallelism for the operation.The individual compute operations of the Sum-Product algorithm are re-ordered to exploit the maximum amount of thread-level parallelism in each kernel such that the latency per iteration is minimized.The overall throughput of the GPU-based LDPC decoder is then determined by the number of iterations, the latency per iteration, and the block length.
The complexity of an LDPC decoder implementation stems from the highly-irregular interconnect structure between CNs and VNs described by the code's Tanner graph.For codes with short block lengths, the permutation network complexity does not introduce significant GPU decoding latency [82]- [84], however, for codes with block lengths on the order of 10 6 bits as those designed in this work, data permutation and message passing constitute between 25% to 50% of GPU runtime per decoding iteration, as shown in Table II.While arithmetic operations are relatively inexpensive on a GPU, addressing global memory is very costly in terms of compute time.The most expensive GPU operation is addressing unordered memory, i.e. accessing non-consecutive memory locations, as multiple transactions are required to perform the unordered memory read or write, and all kernel threads must be stalled [83].On the contrary, coalesced memory addressing, i.e. accessing consecutive memory locations, can be performed in a single transaction and allows for concurrent thread execution, which reduces the runtime of the kernel.Furthermore, uncoalesced memory writes are more expensive than uncoalesced memory reads.Thus, the throughput of a GPU-based LDPC decoder is highly dependent on memory access patterns.
The operations of the Sum-Product algorithm presented in Algorithm 1 were re-ordered to avoid uncoalesced memory  writes and to use the maximum amount of thread-level parallelism for arithmetic computations.For example, the VNto-CN message-passing permutation in kernel 1 also performs the Φ(•) computation from the next CN-update step in each thread.The CN-update kernel (2) does not fully compute the m cv messages from each CN to its connected VNs, but instead, the final CN-to-VN m cv messages are computed in the CN-to-VN message-passing kernel (3).Due to the Tanner graph structure and data permutation nature of the LDPC decoder, uncoalesced memory reads are still required when reading from edge memory in kernel 1 and reading from CN memory in kernel 3.However, the latency of these operations is negligible compared to the overall latency of an entire iteration.Fully-coalesced memory writes are enabled by the different ordering of connected edges in the VN-to-CN and CN-to-VN message-passing kernels (1 and 3).In the VNto-CN message-passing kernel (1), the edge connectivity is ordered by consecutive VNs, while in the CN-to-VN messagepassing kernel (3), the edges are ordered by consecutive CNs.Each CN-VN edge in the edge memory has a unique index that is addressed by both message-passing kernels (1 and 3).Several additional memory optimizations improve the overall GPU throughput.Shared memory is used in each thread to store local variables and to avoid expensive global memory accesses, while texture caches are used to store frequentlyaccessed static variables such as channel LLRs and the paritycheck matrix.
As shown in Fig. 20, message-passing kernels (1 and 3) instantiate up to T threads, where T is the maximum number of edge connections between all CNs and VNs, kernel 2 instantiates (n − k) threads equal to the total number of CNs in the matrix, and kernel 4 instantiates up to n threads equal to the total number of VNs in the matrix.When early termination is enabled, T threads are required in kernels 1 and 3, and n threads are required in kernel 4.However, when early termination is disabled, the number of threads instantiated in kernels 1, 3, and 4 can be reduced due to the long-diagonal construction of the parity-check matrices in this work.The message-passing kernels (1 and 3) need only to instantiate threads that correspond to the CN-VN connections to the left of the long diagonal in the matrix structure shown in Fig. 11.Similarly, the VN-update kernel (4) needs only to instantiate threads that correspond to VNs to the left of the long diagonal.This reduction in the number of threads provides a marginal speedup in each iteration.
While not shown in Fig. 20, the early-termination check is implemented via multiple kernels that perform a parallel reduction following the VN-to-CN message-passing kernel (1) in order to compute the parity at each CN.Additional computations and memory reads/writes are required in the message-passing and VN-update kernels (1, 3, and 4).The following additional operations must be performed to enable an early-termination check: send the decision bit from each VN to its connected CNs, send all m cv messages from each CN to its connected VNs (including those corresponding to connections along the long diagonal), and calculate the decision bit in each VN.To reduce overall decoding latency and maximize throughput, the early-termination check is performed only after a fixed number of decoding iterations.This fixed number of iterations corresponds to the average number of iterations required at each SNR point, and is pre-determined empirically through FER simulation for each code.The decoder uses a lookup table to decide after how many decoding iterations to enable the early-termination check based on the current SNR.
A quasi-cyclic matrix structure reduces data permutation and memory access complexity by eliminating random, unordered memory access patterns.In addition, QC codes require fewer memory lookups for message passing since the permutation network can be described with approximately q-times fewer terms, where q is the expansion factor of the QC parity-check matrix, in comparison to a purely-random matrix for the same block length.Table II presents a breakdown of the latency of each GPU kernel for the three LDPC codes designed in this work.While the CN and VN update kernels (2 and 4) have similar runtime for both random and QC codes, QC codes achieve faster runtime in data permutation kernels (1 and 3) due the approximately q-times fewer CN-VN edge connections in the parity-check matrix.Since the parity-check matrices designed in this work are sparse, a compressed data structure is used to store CN-VN edge connections to reduce memory read latency in the message-passing kernels.
Table II also highlights the respective error-correction performance and GPU throughput of the three codes at the maximum β = 0.99 efficiency with d = 8 reconciliation.The raw GPU throughput (including parity bits) is given by Block Length Latency Per Iteration × Iterations (bits/s).(17) Similar to the finite secret key rate, the information throughput of the GPU decoder must be scaled by (1) the FER P e to account for discarded frames when decoding is unsuccessful, i.e.CRC does not pass or parity check fails, and (2) the code rate R code to account for the parity bits that must be discarded after decoding.The average GPU information throughput is then given by Thus, for any LDPC code, the GPU throughput is determined by the latency per iteration and the number of decoding iterations.The latency per iteration depends on the LDPC code structure and the number of memory lookups, while the FER is bound by the maximum number of iterations.Some GPU-based LDPC decoders use fixed-point number representations and/or frame-level parallelism to maximize computational speedup for codes with short block lengths (n < 10 5 bits) in high-SNR regions above 0dB where the Min-Sum algorithm achieves sufficient error-correction performance [82]- [86].This work, however, uses single-precision floating point to minimize FER with Sum-Product decoding at SNRs below -15dB.Due to the large block length (n = 10   bits), all GPU threads are fully utilized, thus external (framelevel) parallelism does not provide additional speedup.Asynchronous data transfer to the GPU is another technique often employed to minimize overhead latency, however, this does not provide any significant performance boost as the Sum-Product computation dominates overall execution time due to the large number of iterations required for low-SNR decoding.

B. Information Throughput Results
Figure 21 presents the measured information throughput K GPU from the GPU decoder for all three LDPC codes at each β-efficiency point, which corresponds to a unique SNR-FER point in Figures 12 and 13 for the d = 1 and d = 8 dimensional reconciliation cases, respectively.Fig. 22: GPU information throughput K GPU of the q = 50 QC-LDPC code with d = 8 dimensional reconciliation up to the maximum distance point for each β efficiency with Nprivacy = 10 12 bits, maximum CV-QKD key rate with perfect reconciliation (FER = 0) K opt , and fundamental secret key rate limit for lossy channel K lim vs. distance.K opt and K lim are scaled by frep = 1MHz.
dimension.The q = 21 and q = 50 QC codes designed in this work achieve approximately 3× higher raw decoding throughput K raw GPU over the purely-random code with d = 1, 2, 4, 8 dimensional reconciliation at the maximum distance point for each β-efficiency.When scaled by the corresponding FER and code rate, the QC codes achieve between 5.1× and 12.8× higher information throughput K GPU over the purely-random code.Table III also presents the operating secret key rate K finite defined by Eq. 16, and the fundamental secret key rate limit K lim for a lossy channel defined by Eq. 7. Here, the fundamental limit is scaled by the light source repetition rate f rep , such that A realistic CV-QKD repetition rate of f rep = 1MHz is assumed for the comparison [28], [53], [87].For distances beyond 130km, the operating secret key rate K finite is between 2176× and 57112× lower than the fundamental limit K lim , with d = 8 and d = 1 dimensional reconciliation, respectively.The rightmost column in Table III (K GPU /K lim ) presents the two key results of this work.First, it shows that the GPU decoder can achieve between 1.07× and 8.03× higher information throughput K GPU over the fundamental secret key rate limit K lim with a 1MHz source using QC-LDPC codes with d = 4 and d = 8 dimensional reconciliation.Since the decoder delivers an information throughput higher than the fundamental key rate limit, it can be concluded that LDPC decoding is no longer the post-processing bottleneck in CV-QKD, and thus, the secret key rate remains only limited by the physical parameters of the quantum channel.The second result is that d = 1 and d = 2 dimensional reconciliation schemes are not well-suited for long-distance CV-QKD since the K GPU speedup over K lim is less than 1×.In general, Table III shows that QC codes achieve lower decoding latency than the purelyrandom code at long distances, thereby making them more suitable for reverse reconciliation at high β efficiencies.
The results presented in Table III and Fig. 22 assumed a light source repetition rate of f rep =1MHz.While a higher source repetition rate such as f rep = 100MHz or f rep = 1GHz would raise the fundamental secret key rate limit K lim above the maximum GPU decoder throughput K GPU , it would still not introduce a post-processing bottleneck for CV-QKD.The GPU decoder currently delivers an information throughput K GPU between 1868× and 18790× higher than the operating secret key rate K finite with a 1MHz light source at the maximum distance points for d = 1, 2, 4, 8 dimensional reconciliation schemes beyond 130km.Even with a source repetition rate of f rep = 1GHz, the GPU information throughput K GPU would still exceed the operating secret key rate K finite for distances beyond 130km by 1.8× and 18.7×, assuming the same quantum channel parameters.Therefore, GPUs remain a viable platform for the implementation of reconciliation algorithms for long-distance CV-QKD.This is further illustrated in Fig. 22 where the GPU information throughput K GPU of the q = 50 QC-LDPC code with d = 8 dimensional reconciliation is compared to the asymptotic secret key rate limit K opt for each β-efficiency with perfect reconciliation (P e = 0), as presented in Fig. 5. Here, a 1MHz source is assumed, and the scaled K opt is given by The result presented in Fig. 22 also shows that K GPU is higher than the upper bound on secret key rate K lim on a lossy channel with a 1MHz source from β = 0.8 to β = 0.99.

C. Comparison to Other CV-QKD Implementations
While QKD has been well-studied over the past 30 years, the exploration of long-distance CV-QKD is still nascent, with very few published implementations in the low-SNR regime for optical transmission distances beyond 100km.Hardwarebased implementations of DV-QKD and short-distance CV-QKD have previously been demonstrated using FPGAs and GPUs [26], [27], [74], [88], however, at the time of writing, there is only one reported CV-QKD implementation designed to operate in the low-SNR regime for long-distance reconciliation [25].GPU Memory Bandwidth (GB/s) 320 264 (1) FER P e corresponds to the probability of detected error, since P undetected = 0 with 32-bit CRC.All three codes achieve a CV-QKD distance of 83.8km based on the quantum channel parameters assumed in this work. (2)Latency per iteration is an average for the full decoding of a single frame, and also includes the data transfer latency between the CPU and GPU.
Jouguet and Kunz-Jacques reported a GPU-based LDPC decoder implementation that achieves 7.1Mb/s throughput at SNR = 0.161 (β = 0.93) on the BIAWGNC [25], for a random multi-edge LDPC code with a block length of 2 20 bits based on the rate 1/10 multi-edge code designed by Richardson and Urbanke with an SNR threshold of 0.1556 [39].For throughput comparison purposes, two additional multi-edge codes with the same code rate, block length, and SNR threshold were designed in this work: a purely-random code and a q = 512 QC code.
Table IV presents a performance comparison between the two designed rate 1/10 codes and the result achieved by Jouguet and Kunz-Jacques at SNR = 0.161 on the BI-AWGNC [25].The two designed codes achieve a FER of approximately 0.04 under the same decoding conditions as the comparison work with d = 8 dimensional reconciliation.Similar to the results presented in Tables II and III, the q = 512 QC code achieves approximately 3× lower latency per iteration than the purely-random rate 1/10 code designed in this work.Rate 1/10 QC codes with expansion factors q ∈ {64, 128, 256} were also designed, however, the q = 512 QC code achieved the lowest latency per iteration due to the lower number of required memory accesses in the GPU message-passing kernels, as a result of the lower number of connections in the QC parity-check matrix.While the designed rate 1/10 random code achieves a maximum raw throughput of only 2.78Mb/s, the q = 512 QC code delivers a maximum raw throughput of 9.17Mb/s with early termination enabled only in iterations greater than the average number of iterations, as determined empirically through FER simulation.The q = 512 QC code achieves a 1.29× higher throughput than the 7.1Mb/s reported by Jouguet and Kunz-Jacques [25], further demonstrating that the QC code structure offers computational speedup benefits for multi-edge codes operating in the high β-efficiency region at low SNR.Although the comparison work is from 2014, both GPU models have a similar memory bus width, which is the primary constraint that limits the latency per iteration.As previously discussed, GPU decoder performance is bound by the memory access rate, and not the floating point operations per second (FLOPS).Thus, a wider GPU memory allows for a higher memory access rate, which in turn, reduces the decoding latency.
Other types of error-correcting codes have been studied for application in the low-SNR regime of CV-QKD, such as polar codes, repeat-accumulate (RA) codes, and raptor codes.Polar codes require block lengths on the order of 2 27 bits to achieve comparable FER performance to the rate 1/10 multi-edge LDPC codes designed in this work, however, they have been shown to achieve low decoding latency on generic x86 CPUs due to their recursive decoding algorithm [25].A polar-code performance comparison is not available for the rate 0.02 multi-edge QC-LDPC codes designed in this work.Punctured and extended low-rate RA codes have been constructed from ETSI DVB-S2 codes with block lengths of 64,800 bits to achieve β > 0.85 efficiency over a wide range of SNRs [89], however, their performance has not been investigated beyond 70km and there is currently no hardware implementation to provide a sufficient throughput comparison.Lastly, raptor codes achieve high β-efficiency at low SNR and guarantee error-free decoding (P e = 0) by sending as many coded symbols as required by the receiver [90].However, their decoding latency may be a limitation to high-throughput reconciliation, and at the time of writing, there is no known hardware implementation of raptor codes for long-distance CV-QKD.The demand for long-distance communication through applications such as CV-QKD motivates the need for continued research in high-efficiency codes and their hardware realizations.

V. CONCLUSION
This work introduced multi-edge quasi-cyclic LDPC codes to accelerate the reconciliation step in long-distance CV-QKD by means of a GPU-based decoder implementation and multidimensional reconciliation schemes.With an 8-dimensional reconciliation scheme, the GPU-based decoder delivers an information throughput up to 8.03× higher than the upper bound on secret key rate for a lossy channel with a 1MHz source, thereby demonstrating that key reconciliation is no longer a computational bottleneck in long-distance CV-QKD.Furthermore, the low-rate LDPC codes extend the maximum distance of CV-QKD from the previously achieved 100km to 160km based on the quantum channel parameters assumed in this work.
The LDPC codes and reconciliation techniques applied in this work can be extended to post-processing algorithms in two areas that show promise for the future of QKD: (1) free-space QKD using low-Earth orbit satellites as communication relays to extend the distance of secure communication beyond 200km without fiber-optic infrastructure, and (2) fully-integrated chip implementations [4].Recent works have experimentally demonstrated terrestrial free-space QKD for distances up to 143km [91], [92], while satellite-based QKD has been proposed as a practical near-term solution to achieving long-distance QKD on a global scale [20], [93].In August 2016, China launched the Quantum Experiments at Space Scale (QUESS) satellite to generate secret keys between ground stations in Beijing and Vienna by transmitting entangled photon pairs from an orbit altitude of 500km [94].Freespace fading channels for satellite QKD typically operate at SNRs above 0dB [95], however, quasi-cyclic code construction techniques can still be employed to achieve high secret key rates, while GPUs would allow for simple integration with other satellite equipment for rapid prototyping, in contrast to ASIC-or FPGA-based LDPC decoder implementations.This paper presented the computational speedup achievable on a single state-of-the-art GPU.Further acceleration can be achieved through architectural optimizations in the design of a monolithic QKD chip that combines both optical and postprocessing circuits.Photonic chips have already been realized for QKD transmitters and receivers [4], [21], [96], [97], and further integration of post-processing algorithms would provide a considerable reduction in system size and power consumption.A final key takeaway here is that the quasi-cyclic LDPC code construction and GPU architecture techniques presented in this work can also be applied to forward errorcorrection implementations for DV-QKD where reconciliation is performed over the binary symmetric channel (BSC) instead of the BIAWGNC as in CV-QKD.
This work addressed the challenge of achieving high-speed, high-efficiency reconciliation for long-distance CV-QKD over fiber-optic cable.In addition to extending information-theoretic security to general attacks for finite key sizes, a major remaining hurdle to extending the secure transmission distance in CV-QKD is the reduction of excess noise in the optical quantum channel.While recent techniques have been demonstrated to control excess noise to within a tolerable limit [29], future work may also investigate the security of CV-QKD in the presence of non-Gaussian noise sources, and in particular, the performance of LDPC decoding at low SNR with non-Gaussian noise.GPU-based decoder implementations with quasi-cyclic codes would provide a suitable platform for such investigations.Furthermore, reducing the latency of privacy amplification for large block sizes on the order of N privacy ≥ 10 12 bits is necessary in order to realize secret key exchange for distances beyond 100km.

Fig. 1 :
Fig. 1: Information transmission over untrusted quantum channel and authenticated public channel between Alice and Bob for CV-and DV-QKD.

Fig. 4 :
Fig.4: Optimal VA vs. transmission distance for maximum theoretical secret key rate, from β = 0.8 to β = 0.99, based on the assumed physical operating parameters of the quantum channel.

Fig. 5 :
Fig.5: Maximum theoretical secret key rates vs. transmission distance.The maximum CV-QKD key rate is defined by Kopt from β = 0.8 to β = 0.99 based on the optimal VA.The fundamental limit for a lossy channel is defined by Klim = − log 2 (1 − T ).

4 (Fig. 11 :
Fig. 11: Structure of designed parity-check matrices. mod 2) then Go to Step 5 end for Step 5: Discard parity bits Ŝv ← Ĉv , v ≤ k number of decoding iterations is reached.In Step 2, m (t) cv is the message from CN c to VN v in iteration t, and Φ(x) = Φ −1 (x) = − ln(tanh(x/2)).In Step 3, L (t) vc is the message from VN v to CN c, and L (t) v is the updated LLR belief of bit v in the frame, whose decision is given by Ĉ(t) v in Step 4. In Step 2, the set of VNs connected to CN c is defined as N (c) = {v|v ∈ {1, 2, . . ., n} ∧ H vc = 1}, where the notation v ∈ N (c)\v refers to all VNs in the set N (c) excluding VN v. Similarly, in Step 3, the set of CNs connected to VN v is defined as M (v) = {c|c ∈ {1, 2, . . ., n−k}∧H vc = 1}, where c ∈ M (v)\c refers to all CNs in the set M (v) excluding CN c.As previously described, LDPC decoding is successful if the decoded message Ŝ is equal to the original message S.

Fig. 14 :Fig. 15 :
Fig. 14: Probability of invalid decoding error vs. SNR for Sum-Product decoding with d = 1, 2, 4, 8 dimensional reconciliation on BIAWGNC.Probability of error is computed for invalid messages that are correctly decoded but CRC fails.

Fig. 20 :
Fig.20: GPU implementation of LDPC decoder showing four multithreaded compute kernels and data flow from top to bottom for one decoding iteration.Coalesced memory access patterns and message variables are indicated.Thread i is denoted by ti, where T in kernels 1 and 3 represents the maximum number of connections between all CNs and VNs, (n − k) in kernel 2 is the number of CNs, and n in kernel 4 is the number of VNs.Early termination is not shown.

Fig. 21 :
Fig. 21: Measured information throughput K GPU vs. reconciliation efficiency for d = 1 and d = 8 dimensional reconciliation.Each measurement point corresponds to a particular SNR operating point with a measured FER presented in Fig. 15.
summarizes the parameters of the three codes designed in this work, and Figures 12 and 13 present their FER vs. SNR error-correction performance under Sum-Product decoding for d = 1, 2, 4, 8 reconciliation dimensions.FER simulations were performed

TABLE I :
Designed rate 0.02 multi-edge LDPC codes

TABLE II :
GPU-based LDPC decoding latency and error-correction performance for rate 0.02 multi-edge codes Early-termination check is enabled only after the number of decoding iterations is equal to the average number of iterations, which is determined empirically through FER simulation and stored in a lookup table. *

TABLE III :
Overview of secret key rate and GPU throughput at maximum reconciliation distance with rate 0.02 multi-edge codes and Nprivacy = 10 12 bits Table III compares the performance of the rate 0.02 random and QC codes at the maximum achievable distance for each reconciliation