Training a quantum measurement device to discriminate unknown non-orthogonal quantum states

Here, we study the problem of decoding information transmitted through unknown quantum states. We assume that Alice encodes an alphabet into a set of orthogonal quantum states, which are then transmitted to Bob. However, the quantum channel that mediates the transmission maps the orthogonal states into non-orthogonal states, possibly mixed. If an accurate model of the channel is unavailable, then the states received by Bob are unknown. In order to decode the transmitted information we propose to train a measurement device to achieve the smallest possible error in the discrimination process. This is achieved by supplementing the quantum channel with a classical one, which allows the transmission of information required for the training, and resorting to a noise-tolerant optimization algorithm. We demonstrate the training method in the case of minimum-error discrimination strategy and show that it achieves error probabilities very close to the optimal one. In particular, in the case of two unknown pure states, our proposal approaches the Helstrom bound. A similar result holds for a larger number of states in higher dimensions. We also show that a reduction of the search space, which is used in the training process, leads to a considerable reduction in the required resources. Finally, we apply our proposal to the case of the phase flip channel reaching an accurate value of the optimal error probability.


I. INTRODUCTION
The states of quantum systems have properties that distinguish them from their classical counterparts.Unknown quantum states cannot be perfectly and deterministically copied [1] and entangled states exhibit correlations without classical equivalence [2].These are deeply related to the impossibility of discriminating nonorthogonal quantum states.If this were the case, then unknown quantum states could be perfectly and deterministically copied and entangled states could be used to implement super luminal communications [3].Consequently, the discrimination of non-orthogonal quantum states has become an important research subject due to its implications for the foundations of the quantum theory [4,5] and quantum communications [6,7].For example, the problem of implementing quantum teleportation, entanglement sharing, and dense coding through a partially entangled pure state can be solved by local discrimination of non-orthogonal pure states [8][9][10][11][12][13][14][15].
The discrimination of quantum states can be naturally stated in the context of two parties that attempt to communicate: Alice encodes information representing the letters of an alphabet through a set Ω 1 of orthogonal pure quantum states, which is transmitted through a communication channel.The channel transforms the orthogonal states into a new set Ω 2 of states, which might become non-orthogonal and mixed.Bob then receives them and * corresponding author: dconcham@udec.clperforms a generalized quantum measurement to discriminate the states and to retrieve the information encoded by Alice.Both parties are assumed to know the generation probabilities of the orthogonal states in Ω 1 and the set Ω 2 of non-orthogonal states in advance.To decode the transmitted information, the parties agree on a figure of merit which is subsequently optimized to obtain the best single-shot generalized quantum measurement.This leads to several discrimination strategies such as minimum-error discrimination [16][17][18], pretty-good measurement [19,20], unambiguous discrimination [21][22][23][24], maximum-confidence discrimination [25,26], and fixedrate of inconclusive result [27,28].Various discrimination strategies have already been experimentally demonstrated [29][30][31][32][33][34][35][36].
Typical figures of merit for state discrimination are functions of the generation probabilities and the conditional probabilities between the states transmitted by Alice and Bob's measurement outcomes, which in turn also depend on the generalized measurement used by Bob and the states received by Bob.Thereby, optimizing any figure of merit and choosing the best generalized measurement become difficult problems.This imposes the use of numerical optimization techniques such as semidefinite programming (SDP) [37][38][39].
Recently, the distinguishability between two known families of non-orthogonal quantum states has been studied by training quantum circuits through neural networks [40,41].Quantum state discrimination has also been studied in the context of unknown quantum states.In this case, the communication channel maps the states in Ω 1 into a new set of states Ω 2 that are unknown to the Typeset by REVT E X arXiv:2111.13568v2[quant-ph] 26 Apr 2022 communicating parties.An example of this situation is free-space quantum communication [42][43][44][45][46], where information is encoded into states of light and transmitted through the atmosphere.This exhibits local and temporal variations in the refractive index, which can greatly modify the state of light and makes it difficult to characterize the transmitted states.Given that neither Alice nor Bob have access to the density matrices of the communicated states, standard approaches can not be applied.This problem has been studied assuming that the unknown states can be stored in quantum registers that control the action of a measurement device [47][48][49][50][51][52], or programmable discriminator.
In this article, we propose the training of a measurement device to optimally discriminate a set of unknown non-orthogonal quantum states.We assume that the action of this device is defined by a large set of control parameters, such that a given set of parameter values corresponds to the realization of a positive operator-valued measure (POVM).Given a fixed figure of merit for the discrimination process, it is iteratively optimized in the space of the control parameters.The optimization is driven by a gradient-free stochastic optimization algorithm [53][54][55], which approximates the gradient of the figure of merit by a finite difference.This requires at each iteration evaluations of the figure of merit at two different points in the control parameter space.Thereby, the training is driven by experimentally acquired data.Furthermore, stochastic optimization methods have been shown to be robust against noise [56], so they are a standard choice in experimental contexts.The training of the measurement device is carried out until approaching the optimal value of the figure of merit within a prescribed tolerance.
We illustrate our approach by studying minimum-error discrimination, where the figure of merit is the average retrodiction error.This figure of merit can be experimentally evaluated if, during the training step, Alice communicates the labels of the states that she sent to Bob through a classical channel.Minimum-error discrimination plays key role in quantum imaging [57], quantum reading [58], image discrimination [59], error-correcting codes [60], and quantum repeaters [61].This problem does not have a closed analytical solution except for sets of states with high symmetry.Our approach may also implement other discrimination strategies at the expense of resorting to more elaborate optimization algorithms.We first consider the minimum-error discrimination of two unknown non-orthogonal pure states.In this case the minimum of the average error probability, which can be analytically calculated, is given by the Helstrom bound.We show that it is possible to train the measurement device to reach values very close to the Helstrom bound.We extend our analysis to d unknown non-orthogonal quantum states using d-dimensional symmetric states.Discrimination of this class of states plays an important role in processes such as quantum teleportation [10,11], entanglement swapping [14], and dense coding [15] when carried out with partially entangled states and has already been implemented experimentally [34].In this case, our approach also leads to the optimal single-shot generalized quantum measurement.However, it requires a large number of iterations.This is a consequence of the dimension of the control parameter space that scales as d 4 .We also show that the use of a priori information effectively reduces the number of iterations, where we consider the use of initial conditions close to the optimal measurement as well as the reduction of the dimension of the control parameter space by assuming a particular property of the optimal measurement.Finally, we consider the discrimination of unknown mixed quantum states generated by quantum channels such as phase flip.
This article is organized as follows: in Sec.II we introduce our approach to the discrimination of unknown orthogonal quantum states.In Sec.III we study the properties of our approach by means of several numerical experiments.In Sec.IV we summarize and conclude.

II. METHOD
Alice encodes the information to be transmitted into a set Ω 1 = {|ψ q } of N mutually orthogonal pure states that are generated with probabilities {η q }.The communication channel transforms the states in Ω 1 into the states {ρ q } in Ω 2 , pure or mixed.For simplicity, we assume that the relation between states in Ω 1 and Ω 2 is one to one and that the action of the channel does not change the generation (or a priori) probabilities.Upon receiving each state, Bob tries to decode the information sent by Alice using a positive operator-valued measure {E m } composed of positive semi-definite matrices E m such that m E m = I, the identity operator.The probability of obtaining the m-th measurement outcome given that the state ρ q was sent is P (E m |ρ q ) = T r(E m ρ q ).We assume that if Bob obtains the m-th measurement result, he concludes that Alice attempted to transmit the state |ψ m .This decoding rule leads to errors unless the states in Ω 2 are mutually orthogonal, which leads Bob to seek to minimize the occurrence of errors in the discrimination process.Thereby, Bob needs to find the optimal POVM that minimizes the figure of merit that accounts for the errors.
Several quantum state discrimination strategies are known, each defined by a particular figure of merit.Here, we focus on minimum-error discrimination, where the number of states to be identified is equal to the number of elements of the POVM.The probability of correctly identifying the state |ψ q is given by P (E q |ρ q ).Since the states in Ω 2 are generated with probabilities {η q }, the average probability of correctly identifying all states is given by The average error probability is p err = 1 − p corr .This probability, which is a function of the POVM {E m } and of the unknown states {ρ k }, is minimized over the POVM space in order to train the measurement device.We assume that the states are fixed, that is, every time Alice aims to transmit the state |ψ k , Bob receives the same state ρ k .Thereby, the set Ω 2 behaves as a set of unknown fixed parameters.The value of p err corresponds to a sum over probabilities η l T r(ρ l E l ), which must be independently estimated.In order to do this Alice sends N copies of each state |ψ l to Bob, communicating classically the label of them.These states play the role of training set.Bob measures the states with the corresponding POVM, which allows estimate the value of T r(ρ l E l ).We simulate the experiment that allows us to estimate this value.For simulation purposes the states {ρ l } are known and thus we calculate the probabilities These are employed to generate a random number n l from a binomial distribution with success probability p l on a sample of size N .The probability p l is estimated as n l /N .This procedure is repeated for each state in the set {ρ k } and for each one of the POVMs at each iteration.Thereby, the average error probability is estimated as We consider that the measurement device implements a POVM using the direct sum extension.According to this, the Hilbert space H s of the states to be discriminated is complemented with an ancilla space H a , obtaining an extended Hilbert space H e = H s ⊕H a .The POVM is implemented by applying a unitary transformation U on H e followed by a projective measurement on H e .This procedure requires adding fewer dimensions than the extension by means of the tensor product [62].To generate the unitary matrix U we use a complex matrix Z of order dn × d, where d is the dimension of H and n is the number of states to be discriminated, and through the QR decomposition, we project it into an isometric matrix S, that is, a matrix such S † S = I d×d .Reshaping S into a rank-3 tensor S ijk of size n × d × d, we can generate a full-rank POVM as , where the components of the matrices M i are M i,jk = S ijk .To physically realize the unitary transformation U of order dn × dn one would need to fill the matrix S with free parameters determined only up to unitarity.The average error probability p err can be thus regarded as a function f (z) of the complex vector z whose coefficients are given by the matrix elements of U , that is, we have f (z) = p err (z).
We assume that neither Alice nor Bob knows the states in Ω 2 .Therefore, we cannot numerically evaluate the error probability p err or its derivatives.Besides, given that we consider POVMs implemented by direct sum extension, the shift parameter rule can not be applied to evaluate the gradient [63,64].Therefore, the optimization problem cannot be carried out using SDP or gradientbased optimization algorithms.To overcome this problem, we optimize p err using the Complex simultaneous perturbation stochastic approximation (CSPSA) [53][54][55], a gradient-free algorithm.This is based on the iterative rule where z k is a complex vector in the control parameter space at the k-th iteration, a k is a positive gain coefficient, and g k is an approximation of the gradient of the figure of merit f (z) whose components are given by In the expression above, the quantities f (z k,+ ) and f (z k,− ) are the values of the figure of merit on the vectors where c k is a positive gain coefficient and ∆ k is a vector whose components are randomly generated at each iteration from the set {1, −1, i, −i}.CSPSA allows for the existence of noise ζ k,± in the evaluations f (z k,± ).
The gain coefficients are defined by the sequences and where {s, r, A, a, b} are gain parameters.The values of the gain parameters are chosen to achieve the best possible rate of convergence.Therefore, the selection of the values itself becomes a costly optimization problem whose solution depends on the objective function and the particular optimizer.To avoid this problem, two sets of gain parameters are commonly used.Standard gain parameters with s = 0.602, r = 0.101, A = 10000.0,a = 2.25 and b = 0.5, which provide fast convergence in the regime of a small number of iterations, and asymptotic gain parameters with s = 1.0, r = 0.166, A = 0.0, a = 2.0 and b = 0.5, which provide fast convergence in the regime of a large number of iterations.

III. RESULTS
We start to analyze our approach by considering the simplest case, namely, the discrimination of two unknown orthogonal pure states.We assume that Alice prepares the states {|0 , |1 } with a priori probabilities η 0 and η 1 , respectively.These orthogonal states are transformed by the communication channel into the states, where the parameter s corresponds to the real-valued inner product ψ 0 |ψ 1 .In this scenario, the optimal average error probability is given by the Helstrom bound [18] which can be achieved by measuring an observable.We assume that the value of s is unknown.The training of the measurement device is carried out without the use of a priori information.In particular, the training does not use the facts that the transmitted states are pure and that the optimal measurement is an observable.For a given value of s our training procedure leads to a quantum measurement characterized by a value perr close to the optimal value given by the Helstrom bound.This is depicted in Fig. 1(a), which shows the value of perr achieved by the training procedure as a function of s for η 0 = η 1 = 1/2.Since the optimization algorithm is stochastic, for each value of s we repeat the procedure considering 100 randomly chosen initial guesses in the control parameter space and 100 iterations.The statistic generated by each POVM is simulated using an ensemble size N = 150.In Fig. 1 The total ensemble N t employed throughout the training process is given by N t = 2N k t , where k t is the total number of iterations.Thus, the total ensemble can be split among the total number of iterations and the ensemble used to estimate the statistics of each POVM.This raises the question whether for a fixed total ensemble better accuracy is achieved by increasing k t or N .Figures 1(b) and 1(c) show the impact on perr for different splittings of N t = 15 × 10 3 .In Fig. 1(b) we have N = 1500 and k t = 10 and in Fig. 1(c) we have N = 50 and k t = 300.As these two figures indicate, a much better accuracy is obtained by splitting the total ensemble in a small ensemble N and a large number k t of iterations.In particular, in Fig. 1(c) the difference |p err − p err | is in the order of 10 −3 , that is, one order of magnitude smaller than in the case of Fig. 1(a).Furthermore, the interquartile range becomes narrower indicating less variability in the set of estimates {p err } for a given p err .The training of the measurement device for the discrimination of a larger number of states is demonstrated via symmetric states.These are given by the expression where d is the dimension of H s , ω = exp (2πi/d), and the coefficients c m are constrained by the normalization condition.As long as the generation probabilities are equal, d non-orthogonal symmetric states can be identified by measuring an observable whose eigenstates are given by the Fourier transform of the canonical base {|m } (with m = 0, . . ., d − 1).The minimum-error discrimination of symmetric states has been experimentally demonstrated with high accuracy in dimensions up to d = 21 [34].The discrimination of symmetric states typically arises in the processes of quantum teleportation, entanglement swapping and dense coding.These use a maximally entangled quantum channel as resource.If the entanglement decreases along the generation of the channel, then the performance of the process can be enhanced by resorting to the local discrimination of symmetric states, where the coefficients c k entering in Eq. ( 12) are given by the real coefficients of the partially entangled state.If in addition, the channel coefficients are unknown, then our approach can be used.
In the case of three symmetric states, the channel coefficients are parameterized as c 0 = cos(θ 1 /2) cos(θ 2 /2) and c 1 = sin(θ 1 /2) cos(θ 2 /2) with θ 1 and θ 2 in the interval [0, π]. Figure 2(a) shows perr as a function of θ 1 for a particular value of θ 2 .The solid black line corresponds to the optimal minimum error discrimination probability p err while the solid blue dots indicate the median of perr calculated on 100 initial conditions for each value of θ 1 after 10 3 iterations using an ensemble size N = 10 3 .The difference |p err − perr | is on the order of 10 −2 , as in the case of Fig. 1, but is obtained with a higher number of iterations and a larger ensemble size.In the case of a higher number of states we resort to a bi-parametric family of symmetric states given by c Thus, as we increase the number n of states to be discriminated and the dimension d of the Hilbert space, the number of iterations k t and the ensemble size N required to achieve a given tolerance also increase.This is due to the fact that the dimension of the search space, that is, the number of parameters that control the measurement device, increases as nd 2 .In addition, the probabilities entering in the estimate of p err of Eq. ( 3) are obtained using as a resource a given ensemble size.As the number of probabilities increases it is necessary to increase the ensemble size N to obtain probability estimates that lead to a given tolerance.
So far we have considered training the measurement device assuming the most general quantum measurement.This typically conveys an increase in the resources required for the training.As our previous simulations indicate, as we increase the number n of states to discriminate, as well as the dimension d, the training of the measurement device consumes even more resources, that is, larger ensembles and higher iteration numbers.To reduce the resources required for training, it is customary to reduce the dimension of the search space.This is done by imposing a set of conditions on the measurement device.This occurs when we have a priori information that allows us to ascertain that the optimal measurement satisfy a given condition.For instance, if the non-orthogonal states to be discriminated via the minimum-error strategy are pure and n = d, then we can assume that the optimal POVM is an observable.This effectively decreases the dimension of the search space.Another possibility is that we are interested in reaching a given value of the minimum-error probability in a particular family of measurement devices, in which case we don't need the optimal measurement.This is depicted in Fig. 3, where we reproduce the Helstrom bound for states in Eq. ( 10) by optimizing in the set of observables.In Fig. 3(a) we show the case of equal generation probabilities.In this case, the training was carried out using an ensemble size N = 50 and a total number of iterations k t = 50, which leads to a difference |p err − perr | is on the order of 10 −3 .This result can be compared to the one illustrated in Fig. 1(c), where the same ensemble size is used but with a much larger number of iterations k t = 300.Therefore, the reduction in the dimension of the search space leads to a reduction of the total ensemble N tot by a factor 6. A similar result holds in Figs.3(b) and 3(c) for different values of the generation probabilities.Let us note that the initial condition in the search space is randomly chosen, that is, we do not assume as initial condition an observable close to the optimal one.
Finally, we consider a more realistic scenario in which two single-qubit orthogonal states emitted by Alice are subjected to the action of a phase flip channel [65].The action of this channel onto a single-qubit density matrix ρ is defined by the relation where the parameter p represent the strength of the channel and σ z = |0 0| − |1 1|.The phase flip channel nullifies the off-diagonal terms in the density operator with respect to the canonical basis {|0 , |1 } while decreasing the purity.We assume that the states transmitted by Alice are random pure states and that the value of p = 3/5 is unknown.The fidelity between the pure states and the  noisy states is 0.785.
Figure 4 shows the result of training the measurement device to discriminate the states generated by the phase flip channel.The training was initialized with the measurement that optimally discriminates the pure states sent by Alice.In Fig. 4(a) displays the median (continuous green line) of perr , calculated over 100 independent repetitions, as a function of the number of iterations.The continuous red line corresponds to the solution p err of the optimization of the minimum error for the states generated by the phase flip channel via semidefinite programming.Clearly, the value of perr obtained by training the measurement device converges to the optimal value p err .The interquartile range, described by the shaded area, is very narrow indicating a very small variability of the training with respect to the initial conditions.Similar results hold for other values of the parameter p, which controls the convergence rate toward the optimal value of p err .Figure 4

IV. CONCLUSIONS
Here, we have studied the problem of discriminating unknown non-orthogonal quantum states.This situation occurs when two parties try to transmit information encoded in orthogonal quantum states that are transformed into non-orthogonal states by the action of a partially characterized quantum channel.Since the communicating parties do not know the states generated by the channel, standard approaches to discriminate non-orthogonal quantum states cannot be applied.Instead, we have proposed to train a single-shot measurement to optimally discriminate unknown non-orthogonal quantum states.This device is controlled by a large set of parameters, such that a given set of parameter values corresponds to the realization of a positive operator-valued measure (POVM).The measurement device is iteratively optimized in the space of the control parameters, or search space, to achieve the minimum value of the error probability, that is, we seek to implement the minimum-error discrimination strategy.The optimization is driven by a gradient-free stochastic optimization algorithm that approximates the gradient of the error probability by a finite difference.This requires at each iteration evaluation of the error probability at two different points in the search space.Thereby, the training is driven by experimentally acquired data.The choice of a stochastic optimization methods is based on its robustness against noise.
We have studied the proposed approach using numerical simulations.First, we have shown that our approach leads to values of the error probability that are very close to the optimum.This was done in the case of two 2dimensional unknown non-orthogonal pure states, where the optimal value of the average error probability is given by the Helstrom bound.Since the training method requires the estimation of probabilities, the total ensemble is regarded as a resource.This is divided evenly throughout the iterations of the training method.We have shown that best results, that is, a value of the error probability closer to the Helstrom bound, can be obtained for a fixed total ensemble size by increasing the number of iterations.Thereafter, we have extended our result to the case of d d-dimensional symmetric states for d = 3, 4, 5, where our method also provides accurate results.However, to achieve a fixed accuracy as we increase the number of states and the dimension, its necessary to increase the ensemble size and, consequently, the number of iter-ations.To avoid this, we have reduced the dimension of the search space by assuming that the required measurement has some special property.In particular, we have assumed that the optimal measurement is an observable.Thereby, in the case of two non-orthogonal pure states we have achieved a considerable reduction by a factor 1/6 in the ensemble size, which leads to an equal reduction in the number of iterations.Finally, we have applied the training procedure to the phase flip channel and shown that it is possible to achieve a value of the error probability close to the optimal one.Note that our proposal does not require data post-processing methods, such as maximum likelihood or Bayesian inference, which helps reduce computational cost and avoids exponential scaling of multipartite quantum states.
Our proposal finds applications whenever two parties intend to communicate through a channel whose characterization is difficult or costly.For instance, processes such as quantum teleportation, entanglement swapping, and dense coding, when performed through a partially entangled channel, can become a problem of local discrimination of non-orthogonal states [8][9][10][11][12][13].If the description of the entangled channel is not available, then the states to be discriminated are unknown, in which case our method can also be applied.Recently, the problem of optimally discriminating between different configurations of a complex scattering system has been studied [66] from the point of view of quantum state discrimination, where several non-orthogonal quantum states of light are associated to different hypotheses about an scattering system.These must be resolved with the best possible accuracy, which is limited by the Helstrom bound in the simplest case.Our training method can also be applied to this problem by finding the best average error probability.
(a) the solid black line describes the Helstrom bound while the solid blue dots indicate the median of perr calculated over the 100 repetitions.The blue error bars describe the interquartile range.As is apparent from this figure, the training of the measurement device provides a median of perr that is very close to the Helstrom bound, where the difference |p err − p err | is on the order of 10 −2 .The training procedure leads to similar results for other values of generation probabilities.

FIG. 1 :
FIG. 1: Median of the estimated minimum-error probability perr as a function of the inner product s between two unknown pure states given by Eq. (10) for η 0 = η 1 = 1/2.Solid black line corresponds to the minimum-error probability p err given by the Helstrom bound in Eq. (11).Solid blue dots corresponds to the median of perr calculated over 100 initial conditions for each value of s.Blue error bars indicate interquartile range.(a)N = 150 and k t = 100, (b) N = 1500 and k t = 10, and (c) N = 50 and k t = 300.Asymptotic gain parameters are used.

FIG. 2 :
FIG. 2: Median of the estimated minimum-error probability perr for symmetric states as a function of: (a) θ 1 for d = 3, (b) α for d = 4, and (c) α for d = 5.Solid black line corresponds to the optimal minimum-error probability p err .Solid blue dots correspond to the median of perr calculated over 100 initial conditions for each set of symmetric states.Blue error bars indicate interquartile range.(a) N = 300 and k t = 300.(b) N = 300 and k t = 6 × 10 3 .(c) N = 300 and k t = 1.2 × 10 4 .Asymptotic gain parameters are used.
where j 0 = 1, . . ., d − 1 and α ∈ [0, 1].Figures 2(b) and 2(c) show the behavior of perr as a function of α and j 0 = 2 for d = 4 and d = 5, respectively.In both cases the ensemble size is N = 300 and the median was calculated over 100 initial conditions for each value of α.As in Fig. 2(a), the difference |p err − perr | is on the order of 10 −2 .To achieve this result, however, it was necessary to increase the number of iterations to 6 × 10 3 and 1.2 × 10 4 for d = 4 and d = 5, respectively.

FIG. 3 :
FIG.3: Median of the estimated minimum-error probability perr as a function of the inner product s between two unknown pure states given by Eq.(10).Solid black line corresponds to the minimum-error probability p err given by the Helstrom bound of Eq.(11).Solid blue dots indicate the median of perr calculated over 100 initial conditions for each value of s.Blue error bars indicate interquartile range.(a) η 0 = η 1 = 1/2, (b) η 0 = 1/3 and η 1 = 2/3, and (c) η 0 = 2/5 and η 1 = 3/5.Simulations are carried out with an ensemble size N = 50 and total number of iterations k t = 50.Asymptotic gain parameters are used.

FIG. 4 :
FIG. 4: Discrimination of two unknown non-orthogonal mixed single-qubit states generated by a phase flip channel.(a) Solid green line indicates median value of perr , calculated over 100 repetitions, as a function of the number of iterations.Solid red line corresponds to the optimal value of p err obtained via semidefinite programming.(b) Solid green line indicates the median value of |p err − p err | calculated over 100 repetitions, as a function of the number of iterations.Shaded blue area corresponds to interquartile range and ensemble size N = 300.Standard gain parameters are used.
(b) displays the median of |p err − p err | as a function of the number of iterations.