Practical distributed quantum information processing with LOCCNet

Distributed quantum information processing is essential for building quantum networks and enabling more extensive quantum computations. In this regime, several spatially separated parties share a multipartite quantum system, and the most natural set of operations is Local Operations and Classical Communication (LOCC). As a pivotal part in quantum information theory and practice, LOCC has led to many vital protocols such as quantum teleportation. However, designing practical LOCC protocols is challenging due to LOCC's intractable structure and limitations set by near-term quantum devices. Here we introduce LOCCNet, a machine learning framework facilitating protocol design and optimization for distributed quantum information processing tasks. As applications, we explore various quantum information tasks such as entanglement distillation, quantum state discrimination, and quantum channel simulation. We discover protocols with evident improvements, in particular, for entanglement distillation with quantum states of interest in quantum information. Our approach opens up new opportunities for exploring entanglement and its applications with machine learning, which will potentially sharpen our understanding of the power and limitations of LOCC. An implementation of LOCCNet is available in Paddle Quantum, a quantum machine learning Python package based on PaddlePaddle deep learning platform.


I. INTRODUCTION
In the past few decades, quantum technologies have been found to have an increasing number of powerful applications in areas including optimization [1,2], chemistry [3,4], security [5,6], and machine learning [7]. To realize large-scale quantum computers and deliver real-world applications, distributed quantum information processing will be essential in the technology road map, where quantum entanglement and its manipulation play a crucial role.
Quantum entanglement is central to quantum information by serving as a fundamental resource which underlies many important protocols such as teleportation [8], superdense coding [9], and quantum cryptography [6]. To achieve real-world applications of quantum technologies, protocols for manipulating quantum entanglement are essential ingredients, and it will be important to improve existing methods. The study of entanglement manipulation is one of the most active and important areas in quantum information [10,11].
In entanglement manipulation and distributed quantum information processing, multiple spatially separated parties are usually involved. As direct transfers of quantum data between these nodes are not feasible with current technology, Local Operations and Classical Communication (LOCC) [8] is more practical at this stage. Such an LOCC (or distant lab) paradigm plays a fundamental role in entanglement theory, and many important results have been obtained within this paradigm [11]. However, how to design LOCC protocols on near-term quantum devices [12] remains an important challenge. Such protocols are generally hard to design even with perfect entanglement due to the complicated and hardto-characterize structure of LOCC [13]. Moreover, limited * wangxin73@baidu.com capabilities and structure of near-term quantum devices have to be considered during the design of LOCC protocols.
Inspired by the breakthroughs of deep learning [14] in mastering the game of Go [15] and solving protein folding [16], it is desirable to apply machine learning ideas to explore quantum technologies. For instance, machine learning has been applied to improve quantum processor designs [17][18][19][20] and quantum communication [21,22]. Here, we adopt the ideas from machine learning to solve the challenges in exploring LOCC protocols. We use parameterized quantum circuits (PQC) [23] to represent the local operations allowed in each spatially separated party and then incorporate multiple rounds of classical communication. Then one can formulate the original task as an optimization problem and adopt classical optimization methods to search the optimal LOCC protocol. The PQCs have been regarded as machine learning models with remarkable expressive power, which leads to applications in quantum chemistry and optimization [23]. Here, we generalize PQC to a larger deep learning network to deal with distributed quantum information processing tasks and in particular to explore better entanglement manipulation protocols.
In this work, we introduce a machine learning framework for designing and optimizing LOCC protocols that are adaptive to near-term quantum devices, which consists of a set of PQCs representing local operations. As applications, we explore central quantum information tasks such as entanglement distillation, state discrimination, and quantum channel simulation. We discover protocols with evident improvements via this framework, sharpening our understanding of the power and limitations of LOCC. As showcases, we establish hardware-efficient and simple protocols for entanglement distillation and state discrimination, which outperforms previously best-known methods. In particular, for distillation of Bell states with non-orthogonal product noise, the optimized protocol outputs a state whose distillation fidelity even reaches the theoretical upper bound and hence is optimal.
According to the number of classical communication rounds, one can divide LOCC into different classes [13]. The one-round protocols correspond to LOCC operations where one party applies a local operation and sends the measurement outcome to others, who then apply local operations chosen based on the outcome they receive. Based on one-round protocols, we are able to construct an r-round protocol recursively. All these protocols belong to the finite-round LOCC class, and can be visualized as tree graphs. Each node in the tree represents a local operation and different measurement outcomes correspond to edges connecting to this node's children, which represent different choice of local operations based on the measurement outcomes from last round.
Although the basic idea of LOCC is relatively easy to grasp, its mathematical structure is highly complicated [13] and hard to characterize. As indicated by its tree structure, a general rround LOCC protocol could lead to exponentially many possible results, making LOCC protocol designs for many essential quantum information processing tasks very challenging. At the same time, it will be more practical to consider LOCC protocols with hardware-efficient local operations and a few communication rounds due to the limited coherence time of local quantum memory. To overcome these challenges, we propose to find LOCC protocols with the aid of machine learning, inspired by its recent success in various areas. Specifically, we present the LOCCNet framework, which incorporates optimization methods from classical machine learning field into the workflow of designing LOCC protocols and can simulate any finite round LOCC in principle.
As illustrated in Fig. 1, each party's local operations, represented by nodes in a tree, are described as parameterized quantum circuits (PQC) [23]. Users can measure any chosen qubit and define a customized loss function from measurement outcomes as well as remaining states. With a defined loss function for a task of interest, LOCCNet can be optimized to give a protocol. The effect of classical communication is also well simulated by LOCCNet in the sense that different PQCs can be built for different measurement outcomes from previous rounds.
Previously, PQCs have been adapted to many research areas including quantum simulation [49], quantum optimization [1], and quantum error correction [50]. The family of variational quantum algorithms [51][52][53], based on PQCs, is one promising candidate to achieve quantum advantages with near-term devices. In quantum information, PQCs also help in estimat-FIG. 1: Illustration of the procedure for optimizing an LOCC protocol with LOCCNet. For simplicity, only two parties are involved in this workflow, namely Alice and Bob. The tree presented here corresponds to a specific two-round LOCC protocol. Such a tree can be customized with LOCCNet. With each node (Local Operation) encoded as a PQC and arrows between nodes referring to classical communication, one can define a loss function to guide the training process depending on the task. The tree branch diverges indicating different possible measurement outcomes. Finally, one can adopt optimization methods to iteratively update the parameters θ in each local operation and hence obtain the optimized LOCC protocol.
ing distance measures for quantum states [54,55] and compressing quantum data [56,57]. Here, we take one step further by extending the use of PQCs to the distributed quantum information processing scenario where LOCC is the most natural set of operations.
In the next three sections, we will demonstrate the LOC-CNet framework in details with important applications and present some interesting findings, including protocols that achieve better results than existing ones. We conduct software implementations of LOCCNet using the Paddle Quantum toolkit [58] on the PaddlePaddle Deep Learning Platform [59,60].

B. Entanglement distillation
Many applications of LOCC involve entanglement manipulation, and the use of entanglement is generally required to be in its pure and maximal form. Hence, the efficient conversion of entanglement into such a form, a process known as entanglement distillation [24,42], is usually a must for many quantum technologies. The development of entanglement distillation methods remains at the forefront of quantum information [11]. For example, the two-qubit maximally entangled state |Φ + = 1/ √ 2(|00 + |11 ), which is also known as the entangled bit (ebit), is the fundamental resource unit in entanglement theory since it is a key ingredient in many quantum information processing tasks. Thus, an essential goal for entanglement distillation in a two-qubit setting is to convert a number of copies of some two-qubit state ρ AB shared by two parties, Alice and Bob, into a state as close as possible to the ebit. Here, closeness between the state ρ AB and the ebit is usually measured in terms of the fidelity Although theory is more concerned with asymptotic distillation with unlimited copies of ρ AB , protocols considering a finite number of copies are more practical due to the physical limitations of near-term quantum technologies. Also, practical distillation protocols usually allow for the possibility of failure as a trade-off for achieving a higher final fidelity. Furthermore, due to limited coherence time of local quantum memories, schemes involving only one round of classical communication are preferred in practice. Under these settings, many practical schemes for entanglement distillation have been proposed [24,25,[61][62][63][64]. Not surprisingly, there is not a single scheme that applies to all kinds of states. In fact, designing a protocol even for a specific type of states is a difficult task.
In this section, we apply LOCCNet to entanglement distillation and present selected results that reinforce the validity and practicality of using this framework for designing LOCC protocols. To use LOCCNet for finding distillation protocols for a state ρ AB , we build two PQCs, one for Alice and one for Bob. In the preset event of success, these PQCs output a state supposed to have a higher fidelity to the ebit. To optimize PQCs, we define the infidelity of the output state and the ebit, i.e., 1 − F , as the loss function to be minimized. As soon as the value of the loss function converges through training, the PQCs along with the optimized parameters form an LOCC distillation protocol. In principle, this training procedure is general and can be applied to find distillation protocols for any initial state ρ AB given its numerical form. Beyond rediscovering existing protocols, we are also able to find improved protocols with LOCCNet. Below, we give two distillation protocols for S states and isotropic states, respectively, as examples of optimized schemes found with LOCCNet.
An S state is a mixture of the ebit |Φ + and non-orthogonal product noise [63]. Here, we define it to be where p ∈ [0, 1]. A distillation protocol known to perform well on two copies of some S state is the DEJMPS protocol [25], which in this case outputs a state whose fidelity to the ebit is (1 + p) 2 /(2 + 2p 2 ) with a probability of (1 + p 2 )/2 [see Supplementary Note 2].
Here, we present a protocol learned by LOCCNet that can output a state achieving a fidelity higher than DEJMPS and close to the highest possible fidelity. Details on this protocol after simplification are given in Fig. 2, where Alice and Bob apply local operations to their own qubits independently and then compare their measurement outcomes through classical communication. The distillation succeeds only when both Alice and Bob get 0 from computational basis measurements.
The final fidelity achieved by this protocol is compared with that achieved by the DEJMPS protocol in Fig. 3. For the aim of benchmarking, the techniques based on partial positive transpose (PPT) were introduced to derive fundamental limits of entanglement distillation [63,[65][66][67][68][69]. The entanglement theory under PPT operations has been extensively studied in the literature (e.g., [70][71][72][73][74][75]) and offers valuable limitations of LOCC. Here, the PPT bound obtained with semi-definite programming [63] is an upper bound to the fidelity achieved by any LOCC protocol [see Supplementary Note 2]. As shown in the figure, the protocol learned by LOCCNet achieves near-optimal fidelity in the sense that it is close to the PPT bound. Analytically, for two copies of some S state with a parameter p, the post-measurement state in the event of success is is its fidelity to the ebit and |Φ − = 1/ √ 2(|00 − |11 ). The probability of arriving at this state is p succ = p 2 − p 3 /2 [see Supplementary Note 2]. It is noteworthy that the distilled state is a Bell diagonal state of rank two. For two copies of such a state, the DEJMPS protocol achieves the optimal fidelity [63,76]. Thus, combining our protocol with the DEJMPS protocol offers an efficient and scalable distillation scheme for more copies of some S state.
Another important family of entangled states is the isotropic state family, defined as where p ∈ [0, 1] and I is the identity matrix. Distillation protocols for two copies of some isotropic state have been well studied, and the DEJMPS protocol achieves empirically optimal fidelity in this case. Given four copies of some isotropic state with a parameter p, a common way to distill entanglement is to divide them into two groups of two copies and apply the DEJMPS protocol to each group. Conditioned on success, we then apply the DEJMPS protocol again to the two resulting states from the previous round. Since the DEJMPS protocol was originally designed for two-copy distillation, such a generalization is probably unable to fully exploit the resources contained in four copies of the state. Indeed, with the aid of LOCCNet, we find a protocol optimized specifically for four copies of some isotropic state. As illustrated in Fig qubits being measured and then compare their measurement outcomes through classical communication. If their measurement outcomes for each pair of qubits are identical, the distillation procedure succeeds.
FIG. 4: Circuit of a distillation protocol learned by LOCCNet for isotropic states. This simplified circuit represents Alice's local operation in a protocol learned by LOCCNet for entanglement distillation with four copies of some isotropic state. Bob's local operation is identical to Alice's, except that the rotation angles of Bob's Rx gates are −π/2.
The fidelity achieved by this protocol for different input isotropic states is plotted in Fig. 5, along with that of the generalized DEJMPS protocol. For four copies of some isotropic state with a parameter p, our protocol achieves a final fidelity of which is slightly higher than the DEJMPS protocol, as shown in Fig. 5. Details are referred to [Supplementary Note 2]. Another advantage of this optimized protocol is that the output state in the event of success is still an isotropic state, implying the possibility of a generalized distillation protocol for 4 n copies of some isotropic state. We remark that our protocols are optimized with the goal to achieve the highest possible fidelity, so their probabilities of success are not high. For situations where the probability of success is important, one can also design a customized loss function to optimize a protocol according to their metrics.

C. Distributed quantum state discrimination
Another important application of LOCC is quantum state discrimination (QSD). Distinguishing one physical configuration from another is central to information theory. When messages are encoded into quantum states for information transmission, the processing of this information relies on the distinguishability of quantum states. Hence, QSD has been a central topic in quantum information [77][78][79], which investigates how well quantum states can be distinguished and underlies various applications in quantum information processing tasks, including quantum data hiding [80] and dimension witness [81].
QSD using global quantum operations is well-understood in the sense that the optimal strategy maximizing the success probability can be solved efficiently via semi-definite programming (SDP) [82][83][84]. However, for an important operational setting called distant lab paradigm or distributed regime, our knowledge of QSD remains limited despite substantial efforts in the past two decades [30][31][32][33][34][35][36][37][38][39][40][41]. In the distributed regime, multipartite quantum states are distributed to spatially separated labs, and the goal is to distinguish between these states via LOCC.
For two orthogonal pure states shared between multiple parties, it has been shown that they can be distinguished via LOCC alone no matter if these states are entangled or not [31]. However, it is not easy to design a concrete LOCC protocol for practical implementation on near-term quantum devices. Using LOCCNet, one can optimize and obtain practical LOCC protocols for quantum state discrimination. Furthermore, for non-orthogonal states, limited aspects have been investigated in terms of the feasibility of LOCC discrimination. However, LOCCNet can provide an optimized and practical protocol in this realistic setting.
Here, to explore the power of LOCCNet in state discrimination, we focus on the optimal success probability of discriminating between noiseless and noisy Bell states via LOCC. Consider two Bell states, |Φ + and |Φ − , and an amplitude damping (AD) channel A with noise parameter γ such that If we send |Φ − 's two qubits respectively through this AD channel, then the resulting state is Suppose Φ 0 and Φ 1 are some pair of two-qubit states. To find a protocol discriminating between them, we build an ansatz with measurements on both qubits. As illustrated in Fig. 6, Alice performs a unitary gate on her qubit followed by a measurement, whose outcome determines Bob's operation on his qubit. Given an ideal discrimination protocol, Bob's measurement outcome should be 0 if and only if the input state is Φ 0 so that he can tell which state the input state is for sure. Based on this observation, we define a loss function where P (j|Φ k ) is the probability of Bob's measurement outcome being j given the input state being Φ k . By minimizing this loss function, we are able to obtain a protocol for distinguishing between states Φ 0 and Φ 1 with an optimized probability of success. Specifically, for Φ 0 ≡ |Φ + Φ + | and , through optimization we find a protocol where Alice's local unitary operation is U = R y (π/2) and Bob's local unitary operation is V = R y ((−1) a θ) where θ = π −arctan((2−γ)/γ) and a = 0 or 1 is Alice's measurement outcome. This optimized protocol achieves an average success probability of In Fig. 7, we compare the protocol learned by LOCCNet with the optimal protocol for perfect discrimination between two noiseless and orthogonal Bell states |Φ + and |Φ − . The PPT bound shown in Fig. 7 is obtained via SDP and serves as an upper bound to the average probability of any LOCC protocol recognizing the input state correctly [85], where the input state is either Φ 0 or Φ 1 with equal chance. While the noiseless protocol is consistently better than random guessing as noise in the AD channel increases, it inevitably suffers from a decrease in its discrimination ability. The gap between its probability of success and the PPT bound steadily widens. On the other hand, the protocol optimized with LOCCNet can achieve a near-optimal probability of success for each noise setting, as shown in the figure.

D. Quantum channel simulation
One central goal of quantum information is to understand the limitations governing the use of quantum systems to take advantage of quantum physics laws. Quantum channel lies at the heart of this question since it characterizes what we can do with the quantum states physically [86][87][88]. To fully exploit quantum resources, the ability to manipulate quantum channels under operational settings is important. Particularly, in distributed quantum computing, one fundamental primitive, dubbed quantum channel simulation, is to realize quantum channels from one party to another using entanglement and LOCC protocols. Quantum channel simulation, exploiting entanglement to synthesize a target channel through LOCC protocols [42][43][44][45][46][47][48]89], servers as the basis of many problems in quantum information, including quantum communication, quantum metrology [90], and quantum key distribution [91].
One famous example of quantum channel simulation is quantum teleportation (i.e., simulation of the identity channel). As one of the most important quantum information pro-cessing protocols [8,92], quantum teleportation exploits the physical resource of entanglement to realize noiseless quantum channels between different parties and it is an important building block for quantum technologies including distributed quantum computing and quantum networks. Similar to quantum teleportation, quantum channel simulation is a general technique to send an unknown quantum state ψ from a sender to a receiver such that the receiver could obtain N A →B (ψ A ) with the help of a pre-shared entangled state ρ AB and an LOCC protocol Π. The overall scheme simulates the target channel N in the sense that For some classes of channels such as Pauli channels, the LOCC-based simulation protocols were known [42,45,93]. However, the LOCC protocols for general quantum channel simulation is hard to design due to the complexity of LOCC. Even for the qubit amplitude damping (AD) channel, the LOCC protocol for simulating this channel in the nonasymptotic regime is still unknown, and its solution would provide a better estimate of its secret key capacity [91]. Note that the asymptotic simulation of this channel involving infinite dimensions was introduced in [45].
Here, we apply our LOCCNet to explore the simulation of an AD channel A using its Choi state [94] ρ A = (I ⊗ A)(Φ + ) as the pre-shared entangled state. Note AD channel is one of the realistic sources of noise in superconducting quantum processor [95].
To train the LOCCNet for simulating A, we select a set of linearly independent density matrices S as the training set. The loss function for this channel simulation task is then defined as where B is the actual channel simulated by LOCCNet with current parameters and F (ρ, σ) = Tr ρ 1/2 σρ 1/2 2 gives the fidelity between states ρ and σ. With this loss function to be minimized, the parameters in LOCCNet are optimized to maximize the state fidelity between A(ψ) and B(ψ) for all ψ ∈ S.
Once the LOCCNet is trained to teleport all the basis states in S with near perfect fidelity, we obtain a protocol for simulating A. For benchmarking, we randomly generate 1000 pure states and teleport them to Bob. The results are summarized in Fig. 8. Compared with the original teleportation protocol, we could achieve an equivalent performance at low noise level and a better performance at noise level γ > 0.4. Note that the numerical simulations are conducted on Paddle Quantum [58].

III. DISCUSSION
We established LOCCNet for exploring LOCC protocols in distributed quantum information processing. Its overall pipeline is standard for machine learning algorithms. For a specific task, one firstly designs an appropriate loss function and then utilizes different LOCCNet structures and optimization methods to train the model to obtain an optimal or nearoptimal protocol. Depending on the nature of the task, a selected training data set may be required, as in the case of channel simulation. Based on the current design of LOCC-Net, more machine learning techniques, such as reinforcement learning could be incorporated into this framework, making it a more powerful tool for exploring LOCC protocols.
LOCCNet not only unifies and extends the existing LOCC protocols, but also sheds light on the power and limitation of LOCC in the noisy intermediate-scale quantum (NISQ) era [12] by providing a plethora of examples. We developed improved protocols for entanglement distillation, local state discrimination, and quantum channel simulation as applications. As a showcase, we applied LOCCNet to establish hardware-efficient and state-of-the-art protocols for entanglement distillation of noisy entangled states of interest. In addition to making a significant contribution to entanglement distillation, LOCCNet finds direct practical use in many settings, as we exemplified with several explicit applications in distinguishing noisy and noiseless Bell states as well as simulating amplitude damping channels.
As we have shown the ability of LOCCNet in discovering improved LOCC protocols, one future direction is to apply LOCCNet to further enhance practical entanglement manipulation and quantum communication and explore fundamental problems in quantum information theory. While in this paper we mainly focus on bipartite cases, LOCCNet also supports multipartite entanglement manipulation. For example, as an essential part in quantum repeaters [96], entanglement swapping aims to transform two entangled pairs shared between Alice and Bob and between Bob and Carol into a new entan-gled pair shared by Alice and Carol using only LOCC. Indeed, we could use LOCCNet to design such a protocol. For instance, we can build an LOCCNet where Bob first operates on and measure his subsystem, and then Alice and Carol perform local operations according to the measurement results from Bob. The loss function to minimize can be defined as the infidelity of a target state and the output state shared between Alice and Carol. Similar procedures can be followed to apply LOCCNet in optimizing other multipartite protocols as well, which is worth exploring in future works.
Another important direction is to extend the framework to the continuous-variable quantum information processing, which may be applied to explore better LOCC protocols of private communication based on continuous variable systems [91]. As we have seen the potential of advancing distributed quantum information processing with the aid of machine learning, we expect more of such cases with classical machine learning being used to improve quantum technologies, which in turn will enhance quantum machine learning applications.

Data Availability
Data that support the plots and other findings of this study are available from the corresponding authors upon reasonable request.

Code Availability
Code used in the numerical experiments on quantum channel simulation is available at https://github.com/ vsastray/LOCCNetcodes. Other Code used in this study is available from the corresponding authors upon reasonable request.
Supplemental Information: Practical distributed quantum information processing with LOCCNet SUPPLEMENTARY NOTE 1: Details of LOCC Preliminaries. We begin with the preliminaries on quantum information. We will frequently use symbols such as A (or A ) and B (or B ) to denote finite-dimensional Hilbert spaces associated with Alice and Bob, respectively. We use d A to denote the dimension of system A. The set of linear operators acting on A is denoted by L(A). We usually write an operator with a subscript indicating the system that the operator acts on, such as M AB , and write M A := Tr B M AB .
A quantum state on system A is a positive operator ρ A with unit trace. The set of quantum states is denoted as S(A) := { ρ A ≥ 0 | Tr ρ A = 1 }. We call a positive operator separable if it can be written as a convex combination of tensor product positive operators. A bipartite positive semidefinite operator E AB ∈ L(A ⊗ B) is said to be Positive-Partial-Transpose (PPT) if E T B AB is positive semidefinite. Note that the action of partial transpose (with respect to B) is defined as LOCC. When a quantum system is distributed to spatially separated parties, it is natural to consider how the system evolves when the parties perform local quantum operations with classical communication. A systematic definition of LOCC can be found in [13]. Here, for self-consistency, we give a detailed description of LOCC as follows.
Consider a setting involving multiple spatially separated parties sharing a multipartite quantum system. The set LOCC 1 consists of the most elementary LOCC operations corresponding to LOCC protocols with one classical communication round, where one party performs a local operation and sends the measurement outcome to others, who then perform corresponding local operations on their local systems upon receiving the outcome. A local operation can be described as a set of completely positive (CP) maps {E m } such that m E m is trace-preserving. The subscript m corresponds to an operation's measurement outcome, which could affect each party's choices of subsequent local operations. A more complicated LOCC operation can be seen as a sequence of LOCC 1 operations. Specifically, for any r ≥ 2, LOCC r is defined to be a set of LOCC operations, in which each operation is constructed from an LOCC r−1 operation followed by an LOCC 1 operation. A common characteristic of these LOCC operations is that they can implemented with finite rounds of classical communication. Thus, we define a set LOCC N , corresponding to finite round protocols, such that an LOCC operation is in this set if it belongs to LOCC r for some r in N = {1, 2, . . . }. As there are finite round protocols, there also exist infinite round protocols in theory. These infinite round protocols, together with operations in LOCC N , form the set known as LOCC.
LOCCNet is a machine learning framework developed for designing and exploring LOCC protocols for various quantum information processing tasks. In the main text , we give a brief introduction to this framework. Here, we give some common types of LOCC protocols involving two parties, Alice and Bob, as examples to explain how a protocol can be constructed and optimized using the LOCCNet.
Optimizing one-round LOCC protocols. One-round LOCC protocols are protocols having only one round of classical communication. An example is shown in Fig. S1. An application of such a protocol is quantum state teleportation. To optimize a one-round protocol with LOCCNet, we need to build and train three PQCs, shown as a tree in Fig. S2. The PQC U (θ 0 ) is used to optimize Alice's local operation U , and PQCs V 0 (θ 1 ) and V 0 (θ 2 ) are for Bob's local operation in the case of Alice measuring 0 and 1, respectively. Optimizing two-round LOCC protocols. A general two-round LOCC protocol includes Alice performing a local operation and telling Bob her measurement outcome, then Bob performing a corresponding local operation and telling Alice his measurement outcome, and finally Alice performing another local operation. Such a protocol is already a little complicated and optimizing such a protocol requires seven PQCs. Here, we give two special types of two-round protocols that are easier to train and has practical applications.
The first type of protocols is shown in Fig. S3 and are widely used for entanglement distillation. In such a protocol, Alice and Bob first perform local operations independently and then exchange their measurement outcomes through classical communication to check whether the expected task is completed. To optimize such a protocol, we only need to build two PQCs, one for Alice's local operation and one for Bob's local operation.
Another type of protocols is given in Fig. S4. In such a protocol, after Bob obtains his measurement outcome and tells it to Alice, Alice does not need to perform a local operation. An application of such a protocol is state discrimination, as we show in the main text. Like training a one-round protocol, optimizing a protocol of this type only requires three PQCs.

SUPPLEMENTARY NOTE 2: Analysis of entanglement distillation
The aim of entanglement distillation is to compensate for the impurity caused by noise and restore a maximally entangled state at the cost of many noisy entangled states. In this sense, one could also refer an entanglement distillation protocol as a purification or error-correction protocol. The Bell states are four two-qubit maximally entangled states defined as The state |Φ is also known as the entangled bit (ebit), and entanglement distillation in two-qubit settings usually means to convert copies of a state ρ AB shared by two parties, Alice and Bob, into a state closer to the ebit. Here, closeness between the state ρ AB and the ebit is usually measured in terms of the fidelity A well known protocol for two-copy entanglement distillation is the DEJMPS protocol, which is illustrated in Fig. S5. Sharing two copies of an initial state, ρ A0B0 and ρ A1B1 , both Alice and Bob first apply R x gates and CNOT gates to their local qubits and then measure a pair of qubits from the same copy. Finally, they exchange measurement outcomes and output the unmeasured copy when their outcomes agree. Otherwise, the distillation procedure fails.
The DEJMPS protocol has been shown to be optimal in purifying two copies of any Bell diagonal state with of rank at most three [63], where a Bell diagonal state is a state of the form which is a convex combination of the four Bell states. For conciseness, we can write such a Bell diagonal state as a 4-tuple, The DEJMPS protocol can also distill some states besides Bell diagonal states, like S states. In the following, we will analyze the performance of the DEJMPS protocol on two copies of an S state and compare it with a protocol learned by LOCCNet. After that, we will compare the DEJMPS protocol with another protocol learned by LOCCNet for distilling four copies of an isotropic state, which is a special Bell diagonal state.
S state. The S state is defined as the Bell state with a non-orthogonal product noise, where p ∈ [0, 1]. In the main text, we give expressions of fidelity achieved by the DEJMPS protocol and the protocol learned by LOCCNet for two copies of an S state. Here, we give a detailed derivation of these two expressions.
Proposition S1 For two copies of an S state with parameter p, the DEJMPS protocol outputs a state whose fidelity to the ebit is with a probability of success Proof By its definition in Equation (S5), an S state ρ with parameter p can be written in the matrix form as Applying the circuit in Fig. S5 to two copies of such an state, Alice and Bob both get 0 for measurement outcomes with a probability of p 00 = (1 + p 2 )/4. By matrix calculation, we obtain the post-measurement state of the unmeasured copy in this case as where α = (1 + p) 2 /(4 + 4p 2 ) and β = (1 − p) 2 /(4 + 4p 2 ). The probability that Alice and Bob both get 1 for measurement outcomes is p 11 = (1 + p 2 )/4, and the post-measurement state in this case is According to the definition of fidelity, the fidelity of state σ ± to the ebit is The probability of Alice and Bob arriving at state σ ± is p succ = p 00 + p 11 = 1 + p 2 4 + 1 + p 2 4 = 1 + p 2 2 .
(S12) With LOCCNet, we are able to find a new protocol that achieves a higher fidelity than the DEJMPS protocol when distilling two copies of an S state. Indeed, we show in the main text that this protocol is optimal in the sense that it achieves the highest possible fidelity. With some simplification, we obtain a circuit shown in Fig. S6. Below, we offer analysis on the performance of this optimized protocol in Proposition S2. Proposition S2 For two copies of an S state with parameter p, the protocol illustrated in Fig. S6 outputs a state whose fidelity to the ebit is with probability p succ = p 2 − p 3 2 of success. Proof The matrix form of an S state ρ with parameter p is given in Equation (S8). Applying the circuit in Fig. S6 to two copies of such an state, Alice and Bob both get 0 for measurement outcomes with a probability of p 00 = p 2 − p 3 /2. By matrix calculation, we obtain the post-measurement state of the unmeasured copy as (S14) Note that the state σ can be written as . By the definition of fidelity, we have since Φ + |Φ + = 1 and Φ − |Φ + = 0 for |Φ − is orthogonal to |Φ + . The probability of Alice and Bob arriving at state σ is (S18) Isotropic state. A two-qubit isotropic state is of the form where p ∈ [0, 1]. Alternatively, one can write an isotropic state as a Bell diagonal state Distillation with the DEJMPS protocol. The DEJMPS protocol is known to achieve a high fidelity when distilling two copies of an Bell diagonal state, and the resulting state in the event of success is still a Bell diagonal state. Specifically, the DEJMPS protocol's circuit, excluding the measurements, acts on an Bell diagonal state as a permutation of the Bell states' coefficients. For a Bell diagonal state the operator R x (π/2) ⊗ R x (−π/2) maps it to another Bell diagonal state As stated in Eq. (S22), a pair of R x (±π/2) gates transforms a Bell diagonal state (p 0 , p 1 , p 2 , p 3 ) to another Bell diagonal state (p 0 , p 1 , p 3 , p 2 ). Similarly, a pair of bilateral CNOT gates shown in Fig. S5 acts on the tensor product of two Bell diagonal states as a permutation of coefficients. The effect of the bilateral CNOT gates is summarized as a Table in [24]. Specifically, for a pair of Bell diagonal states (a 0 , a 1 , a 2 , a 3 ) and (b 0 , b 1 , b 2 , b 3 ), applying the bilateral CNOT gates on the state results in a state CNOT(p 0 , p 1 , . . . , p 14 , p 15 ) = (p 0 , p 1 , p 10 , p 11 , p 5 , p 4 , p 15 , p 14 , p 8 , p 9 , p 2 , p 3 , p 13 , p 12 , p 7 , p 6 ).
Although the coincidence measurement, referring to Alice and Bob getting identical measurement outcomes, on a Bell diagonal state is not a Bell basis permutation, the post-measurement state is still a Bell diagonal state. To be specific, note that since only 00 and 11 are counted as valid results, Bell states |Ψ ± Ψ ± | are filtered out and thus a Bell diagonal state (p 0 , p 1 , p 2 , p 3 ) collapses to (p 0 , 0, p 2 , 0) up to a normalization factor after the coincidence measurement.
The final fidelity and the probability of success achieved by the DEJMPS protocol can be derived by permuting coefficients in the Bell basis, and the results is given in [25]. For self-consistency, we give a derivation as below.
Proposition S3 ( [25]) For two copies of a Bell diagonal state (a 0 , a 1 , a 2 , a 3 ), the DEJMPS protocol outputs a state whose fidelity to the ebit is where p succ = a 2 0 + a 2 3 + a 2 1 + a 2 2 + 2a 0 a 3 + 2a 1 a 2 is the probability of success. Proof After the first layer of R x gates, the input state (a 0 , a 1 , a 2 , a 3 ) ⊗2 becomes (a 0 , a 1 , a 3 , a 2 ) ⊗2 according to Eq. (S22). Then, transformed by the layer of bilateral CNOT gates, the state (p 0 , p 1 , . . . , p 14 , p 15 ) = (a 0 , a 1 , a 3 , a 2 ) ⊗2 becomes (p 0 , p 1 , p 10 , p 11 , p 5 , p 4 , p 15 , p 14 , p 8 , p 9 , p 2 , p 3 , p 13 , p 12 , p 7 , p 6 ). The coincidence measurement in the computational basis on the second copy filters out |Ψ ± Ψ ± | and the remaining state is either |00 00| or |11 11|. In either case, the first copy becomes σ = (p 0 + p 10 , p 5 + p 15 , p 8 + p 2 , p 13 + p 7 ) (S27) = (a 2 0 + a 2 3 , a 2 1 + a 2 2 , a 3 a 0 + a 0 a 3 , a 2 a 1 + a 1 a 2 ) (S28) up to a normalization factor. The sum of all the unnormalized coefficients is the probability of measuring 00 or 11, i.e., The the normalized output state is σ/p succ , and its fidelity to the ebit, which is the coefficient before Since the DEJMPS protocol is for distilling two copies of a Bell diagonal state, to distill four copies of an isotropic state, we can follow these steps. First, we divide them into two groups where each group consists of two copies. Then we apply the DEJMPS protocol to both groups independently. In the event of success, we will get two copies of a Bell diagonal state, to which we apply the DEJMPS protocol again.
Proof For two copies of the isotropic state to be distilled, the DEJMPS protocol outputs a state up to a normalization factor with a probability of success Then, the probability of successful distillation for both groups is p 2 succ . In that case, applying the DEJMPS protocol to the resulting two copies of state ρ gives a state whose fidelity to the ebit is with a probability of success Substituting p succ into F , we have The sucess probability of the whole process is The protocol found with LOCCNet. As we show in the main text, the DEJMPS protocol does not fully exploit the resources encoded in four copies of an isotropic states, and there is a protocol learned by LOCCNet that achieves a higher fidelity.
FIG. S7: The simplified circuit of a protocol learned by LOCCNet for entanglement distillation with four copies of some isotropic state. This circuit only includes Alice's operation, while Bob's operation is identical to Alice's, except that the rotation angles of Bob's Rx gates are −π/2. Proposition S5 For four copies of an isotropic state ρ with parameter p, the protocol illustrated in Fig. S7 outputs a state whose fidelity to the ebit is with a probability of success Proof Similar to the DEJMPS protocol, this optimized protocol consists of Ry(±π/2) gates, bilateral CNOT gates, and coincidence measurements in the computational basis. Thus, the claimed fidelity and probability of success can be derived by simulating the circuit shown in Fig. S7 as permutation on Bell basis. Using similar techniques from the proof of Proposition S3, we obtain the unnormalized state after three coincidence measurements in the event of success, which is Then, adding up all the coefficients in σ, we obtain the probability of success Meanwhile, the normalized σ's fidelity to the ebit is PPT bound. As the mathematical structure of LOCC is complex and difficult to characterize [13], we may consider larger but mathematically more tractable classes of operations. The operations most frequently employed beyond LOCC are the PPT operations, which completely preserve the positivity of the partial transpose [65]. A bipartite quantum operation Π AB→A B is called a PPT operation if its Choi-Jamiołkowski matrix The entanglement theory under PPT operations has been extensively studied in the literature (e.g., [66,[70][71][72][73]75]) and offers the limitations of LOCC. In particular, the limit of finite-copy entanglement distillation was recently explored in [63,68,72]. In the following, we compare our results with the PPT bound from [63], which gives the fundamental limits on the fidelity of distillation with given success probability. To be specific, the maximal fidelity of distilling D-dimensional Bell state from ρ with fixed success probability δ using PPT operations [63] is given by where d A , d B are the dimensions of systems A and B, respectively. Recall that ρ AB is the initial input state that Alice and Bob are attempting to distill and in most examples considered here, it will consist of two copies of some two-qubit state.

SUPPLEMENTARY NOTE 3: Analysis of LOCC state discrimination
To explore the power of LOCCNet in state discrimination, we focus on the optimal success probability of discriminating noiseless and noisy Bell states via LOCC. In the following, we present the LOCC protocol from [31] for discriminating two Bells states. After that, we show how to distinguish one Bell state from one noisy Bell state using the protocol learned via LOCCNet and compare it with the protocol of the noiseless case.
Noiseless case. Consider two Bell states, |Φ + and |Φ − . Since these two states are pure and orthogonal to each other, there exist an LOCC protocol that can perfectly distinguish between them [31]. Here, we give a specific discrimination protocol for these two Bell states. Suppose Alice and Bob share a two-qubit state ρ AB , which could be either |Φ + Φ + | or |Φ − Φ − |. To find out which state it is through LOCC, they can follow the steps below. First, Alice applies a R y (π/2) gate on her qubit followed by a measurement. Then, Alice tell Bob her measurement outcome through classical communication. Receiving the measurement outcome from Alice, Bob applies on his qubit a R y gate with the rotation angle θ being π/2 or −π/2, corresponding to the case where the communicated measurement outcome is 0 or 1, respectively. Finally, Bob measures his qubit. If he gets 0, then he can be sure that the state ρ AB is |Φ + Φ + |. Otherwise, ρ AB = |Φ − Φ − |. The whole process is also illustrated with a circuit shown in Fig. S8, which can perfectly discriminate |Φ + and |Φ − . Noisy case. Quantum noises unavoidably may occur in quantum information processing. One common noise of theoretical and experimental interest is the amplitude damping channel [86], which is one of the realistic sources of noise in superconducting quantum processor [95]. To be specific, an amplitude damping (AD) channel A with noise parameter γ such that A(ρ) = If |Φ − is affected by the amplitude damping noise on each qubit, then the resulting state is A ⊗ A(|Φ − Φ − |). The goal is now to distinguish between Φ 0 ≡ |Φ + Φ + | and Φ 1 ≡ A ⊗ A(|Φ − Φ − |). PPT bound. The distinguishability of quantum states under PPT (positive partial transpose) POVMs was introduced in [85] to better understand the fundamental limits of the local distinguishability of quantum states. To be specific, the PPT POVM used for distinguishing a set of n orthogonal quantum states {ρ 1 , . . . , ρ n } can be defined as an n-tuple of operators, (M k ) k=1,...,n , where M k is PPT for k = 1, . . . , n and n k=1 M k = I AB . The set of PPT POVMs enjoys a more tractable mathematical structure than the LOCC POVMs due to the SDP characterization of PPT condition.
The optimal success probability of discriminating a collection of quantum states {ρ 1 , · · · , ρ K } using PPT POVMs is given by p s (ρ 1 , · · · , ρ K ) = max 1 K Tr (S46) where we assume that each state in this collection has a equal probability of appearance. As LOCC POVMs is a proper subset of PPT POVMs, the above SDP gives the upper bound to the optimal success probability of discriminating a collection of quantum states.
Optimized LOCC protocol. While the PPT bound serves as an upper bound to the success probability of LOCC discrimination, an optimal LOCC protocol may not necessarily reach the bound. Here, we present an LOCC protocol optimized by LOCCNet that achieves a success probability close to the PPT bound.
The only difference between this optimized protocol and the protocol for noiseless discrimination is that the rotation angle θ of Bob's R y gate is not fixed in the noisy case, as shown in Fig. S9. Specifically, for Alice's measurement outcome being 0, where γ is the noise parameter of the AD channel A that |Φ − Φ − | goes through. For Alice's measurement outcome being 1, Proposition S6 For states Φ 0 = |Φ + Φ + | and Φ 1 = A ⊗ A(|Φ − Φ − |), the optimized protocol illustrated in Fig. S9 discriminates between them with an average probability of (S49) −π + arctan((2 − γ)/γ).