Synergic quantum generative machine learning

We introduce a new approach towards generative quantum machine learning significantly reducing the number of hyperparameters and report on a proof-of-principle experiment demonstrating our approach. Our proposal depends on collaboration between the generators and discriminator, thus, we call it quantum synergic generative learning. We present numerical evidence that the synergic approach, in some cases, compares favorably to recently proposed quantum generative adversarial learning. In addition to the results obtained with quantum simulators, we also present experimental results obtained with an actual programmable quantum computer. We investigate how a quantum computer implementing generative learning algorithm could learn the concept of a maximally-entangled state. After completing the learning process, the network is able both to recognize and to generate an entangled state. Our approach can be treated as one possible preliminary step to understanding how the concept of quantum entanglement can be learned and demonstrated by a quantum computer.


INTRODUCTION
Generative adversarial network (GAN) machine learning is an intensely studied topic in the field of machine learning and artificial intelligence research [1].While quantum machine learning research is attracting increasingly more attention both from the industry and the scientific community , the quantum counterparts of GANs have been proposed in several recent papers works [28][29][30].For example, in the proposal put forward by Dallaire-Demers and Killoran in Ref. [29], the authors put much attention to specific circuit ansatz and discuss methods of computing gradients in specific types of variational quantum circuits.It is worth noting that the problem of computing gradients for variational quantum circuits is rather complex and can be also achieved by the parameter-shift rule [31,32].In its general form, the proposal of Ref. [29] includes sources of entropy (i.e., bath).
The idea behind GANs is rather simple, and it can be described with three circuits.The first circuit is the generator of real data R, which is in principle an irreversible transformation depending on a value of a random variable z R .In the case of quantum information this transformation at each instance takes the standard input state |0 and outputs a labeled random state ρ λ .A good example of such a generator is a painter who is asked to draw a cat (the label λ is the animal here).There is not a unique deterministic way of drawing a cat, nor we know how to construct a painter from basic elements.However, we can train a stochastic quantum machine G to perform as generator R the same task by observing the output of R and its labels.However, this is not enough because G trained in this way, in general, will not be able to create new original instances, which can be labeled as λ.Hence, an additional circuit D needs to be considered.This circuit is trained to distinguish between the samples ρ λ and the random output of G, and it is referred to as discriminator.
The operation of D is optimal, if it assigns value 0 to states generated by R and value 1 to states generated by G.At the same time, the operation of G is optimal, if the cross-entropy between its output and states ρ λ is minimal while the discriminator is most likely to assign value 0 to the output of G. Thus, a GAN problem is solved by adversarial training of D versus G.The parameters of both the generator and discriminator can be found by numerical optimization or quantum gradient evaluation [29] by dividing the training into rounds of adversarial optimization of both generator and discriminator.The circuits can perform an arbitrary computation as long as they are complex enough, admitting an arbitrary unitary operation and measurements on a number of ancillary qubits.However, similarly to classical artificial neural networks, choosing the appropriate architectures for specific concerns is a complex problem which is solved by trial and error.In quantum computing, this is even more so, because the lack of practical error correction limits the complexity of quantum circuits.
The quantum counterpart of the GAN (i.e., QGAN) learning similarly to its classical analogue also finds Nash equilibrium of two player game, where one of the players generates some output and the second player (discriminator, D) tries to tell if the output is generated by the first player (generator G) or provided by an external source (R).This could be expressed as a min-max problem, where the statistical distance between the outputs of G and R is minimized over the strategies of the generator, whereas the distance between the outputs of D for G and R, respectively, is maximized over the possible strategies of discriminator at the same time.In practice, this type of optimization if performed in rounds, and it is difficult to make the learning process stable.In a generative problem we do not have access directly to R, but we can collect random samples generated by this source.However, we can formally treat it as a general multiqubit operation, where a specific unknown operation is selected according to an unknown probability distribution.
This general approach towards QGAN employs gradient-descent methods, as in the ansatz presented in Ref. [29].In this standard QGAN it is impossible to apply the same sample from R to train both the discriminator and the generator due to the no cloning principle.Here, solve this problem by connecting the generator G and discriminator D in a single circuit.In the variational ansatz we present, we use the fact that we need to reach a conditional equilibrium state (i.e., an event when the states produced by G and R collapse on each other, yet at the same time the discriminator works at its peak performance) from the beginning of the training process.We train such a system by increasing the probability of a circuit state collapsing to this equilibrium state.
In this new kind of machine learning for quantum GANs, where a conceptually simpler problem is being solved during the training than in a typical approach to QGAN.While QGAN requires setting the hyperparmeters responsible for training the generator and the discriminator in tuns, our approach does not require this.To introduce this approach we exploit time-reversal property of unitary transformations and properties of relative entropy.In particular, the approach can be understood intuitively by assuming the reversibility of the discriminator D, which Hilbert space is the combined support space of the input state and a single-qubit decision register.We refer to this approach as synergic quantum generative network (SQGEN).The reversibility condition could be relaxed at the expense of raising the lower bound on the proposed cost function.In the extreme classical case the information on the input state is lost irreversibly in the discriminator and we cannot interpret the operation of SQGEN as conditioned on collapsing states produced by G and R on one another.Then, the cost function would be linear (instead of quadratic) in terms of the overlap between these states.This would impair the SQGEN ability to learn reproducing assemblages of density matrices instead of the mean density matrix describing the average output of R. In such a case, we loose the synergy between training G and D.
The resulting variational quantum circuit can be trained using gradient methods, by means of parameter shift rules [31,32] to compute partial derivatives of the cost function with respect to the circuit parameters.In many cases, it would be also practical to apply the Nelder-Mead method or similar algorithms to search for the optimal circuit parameters [33].In our experimental demonstration of SQGEN we applied the Nelder-Mead method for optimizing the circuit.For our numerical simulations of the noiseless training of larger networks we employed the BFGS algorithm.

QUANTUM STATE DISCRIMINATION
The main difference between QGAN and SQGEN approaches stems from the particular strategies applied for the state discrimination [34] performed by the discriminator network, i.e., the interpretation and application of the performed measurements.
As an introduction to state discrimination, let us assume that we want to distinguish between two states |g and |r containing information on the output of a generative network and the real data, respectively.
Then, the probability of these two states being discriminated via von Neumann measurements reads This expression can be reduced to p a,b = (1 + sin 2 β)/2.This is the case QGAN training, where the optimization of discriminator consists of increasing the probability of projecting states |r and |g onto a state |ψ co-planar with |a and |b while maximizing the angle β between the discriminated states (i.e.finding the basis |a , |b ), see Fig. 1a.Sate is |ψ given by a current configuration of the discriminator.
Instead of discriminating multidimensional states directly, we can introduce a single-qubit discriminator register initialized as |0 .Now, a discriminator performs a controlled R y (θ) on this register, where R y (θ) is controlled by a given input of the discriminator (|r or |g ), i.e., where φ and |ψ are parameters of the discriminator.Next, the register qubit is measured in z-basis, which yields for input |r two outcomes, i.e., −1 with probability p Both QGAN and SQGEN train the discriminator to reach its optimal performance.The advantage of SQ-GEN is that it automatically sets its internal pointer |ψ state to |r , i.e., only the cases, where |r collapses onto |ψ and |g collapses onto the support space of 1 1 − |ψ ψ| are counted as the relevant events.In case of QGAN the discriminator has to learn how to discriminate between |r and |g having access to only one of them at a time.This means that it performs superfluous computations that are needed for establishing a reference frame for the discrimination process.The details of the discriminator training for SQGEN together with the discriminator ansatz are discussed further in the text.
The discriminator works at its best when the probability of state discrimination is maximized.We can maximize this probability instead of the difference of rates of assigning Real/Fake label to a sample delivered by R or G, as it is done in the standard GAN.This probability will be lowered, if the similarity between the samples given by R or G is increased, as it happens to the aforementioned difference of rates.
In the synergic quantum generative learning protocol, the probability of jointly postselecting the listed states is proportional to the value of the cost function (15).This means that the cost function reaches its maximum value if both the discriminator and the generator perform their tasks optimally.(a) State |λ labels the class of the output of a generator.It is a control state that is not changed by the operation of source R or generator G.For a classical label λ, circuit (a) can be replaced with (b).In panel (c) we demonstrate an equivalent circuit inspired by a SWAP test [35], where the measured quantity depends only on the rate of the projections of the first and the last qubits.Note that in the case of QGAN, in contrast to the synergic approach, one has to build (i) a circuit that compares R with G, (ii) a circuit that evaluates the performance of D on R, and (iii) a circuit that evaluates the performance of D on G.To compare SQ-GEN with QGAN we also include the bottom qubit in all the panels is measured in Z basis.Depending on the outcome, we include or ignore the existence of D. This allows to measure either the cost function or only source-generator fidelity.No hyperparamters are set by trial of error.

SYNERGIC QUANTUM GENERATIVE NETWORK
Here, we consider reversible (unitary) discriminators D, which are provided with a generated state ρ λ,z R its label λ, a random variable z D , and a large enough ancil-lary Hilbert space to enable complex quantum computations.The λ parameter serves as a label for generating data states [29] and parameter z R is a random variable representing the unknown internal of its source, i.e., the state of generator R. Note that z R is not accessible to the discriminator because the learning process must be independent of any knowledge on internal operation of the real generator R. The task of the discriminator is to decide, for every input, if the input was indeed provided by generator R or not.The discriminator is trained only on a limited, but large, number of states ρ λ,z R and their labels.Note that in the classical ML the random variable is needed for the discriminator to make a decision if its input is real or fake, if the fakes are indistinguishable from the real inputs.In the quantum case, this is not necessary, as the collapse of a wave function of the discriminator output will achieve the same effect.
The third component is the circuit that is the model circuit of our generator G to be trained.This generator processes the same type of input as generator R and is provided with an independent random variable z G .We denote the output of this circuit σ = |ψ λ,z G ψ λ,z G |.The action of the generator is reversible as long as we know the value of the random variable z G .We assume that this is the case as this is a classical variable.We use random variables z G , z R to represent the internal states of both the G and R generators, so we also get random states at the output of these gates.We train the generator G by observing the output of the source R, but we cannot expect the output of G to be perfectly correlated with R.This is because, we only minimize the relative entropy of their outputs, defined as or in terms of Newton-Mercator series as where By keeping only the first term of this expansion we are left with linear relative entropy S L , which for random samples of σ λ,z G and ρ λ,z R becomes Sample randomness (i.e., the statistics of z R and z G ), is required to place the linear entropy in the context of machine learning.The aim of a generative algorithm is, given samples ρ λ,z R prepare samples σ λ,z G , which are statistically indistinguishable from new samples ρ λ,z R , not used in the training.Thus, S L (σ λ,z G ||ρ λ,z R ) should be minimized on average, i.e., over random samples denoted by z G and z R .To indicate such averaging, we drop the z G , z R indices and from now we focus only on an average relative entropy.Note that relative entropy is in general jointly convex.In the linear approximation it is no longer the case, it is simply linear.This allows us to interpret ρ λ and σ λ as average density matrices of the states produced by the generators.For the generator G to mimic the source R correctly, it must also reproduce the probabilities of occurrence of the samples, not only to minimize the distance between the average states ρ λ and σ λ .Therefore, using a discriminator is essential in our approach.While optimizing the generator G, the discriminator D should reward a situation where a specific sample σ λ,z G is close a single sample of ρ λ,z R , and penalize this otherwise.For this reason, the state of the discriminator must be independent of z G and z R .Moreover, assuming that a minimal achievable distance between ρ λ and σ λ has been reached, its cost function should be minimized if distributions of z G and z R are as similar as possible.

Generator ansatz
Linear entropy is directly measurable.Sometimes the second term in the expression is referred to as SWAP test.However, S L is alone is not enough to correctly train the generator.To demonstrate this, let us consider the following example, where random variables z G , z R are given via probability distributions p and q, respectively.
Thus, the mean linear entropy, or equivalently the cost function of the generator reads where are mean outputs of the source and the generator.

Discriminator ansatz
To resolve between the real and fake states, we need to go beyond a simple swap test and make use of a discriminator, which would calculate the probability of discriminating states ρ λ,z R and σ λ,z G .From the standard theory of optimal state discrimination we know that the probability of discriminating between two pure qubits can be expressed as 1 − cos 2 (θ(ρ λ,z G ) − θ(σ λ,z G )).This can be easily understood in terms of the Mallus law, where qubits are encoded as single-photon polarization.In particular, one qubit is encoded as a linearly-polarized photon so that a polarizer can be set to transmit this photon.The second photon is transmitted with probability cos 2 (θ(ρ λ,z G ) − θ(σ λ,z G )). Thus, the training of the discriminator corresponds to finding such a function θ that the value of cos 2 (θ(ρ λ,z G ) − θ(σ λ,z G )) is minimized.This allows us to define the following cost function minimized by the discriminator and maximized by the generator, i.e., where g(z D ) is the probability of the discriminator having an internal state z D .At the same time, we train the generator to produce an assemblage {σ λ,z G , q(z G )} which maximizes In order to associate this function with measurable quantities, we propose the following ansatz.We work on two registers containing the state to be processed by the discriminator, i.e., an ancillary qubit initialized as |0 and the processed state |ψ .The discriminator is now described by the following unitary operator performing a y-axis rotation on the ancillary qubit: where , where 0 ≤ α ≤ 1 and φ|φ ⊥ = 0.The probability of a state |ψ being recognized as real by the discriminator is given as where p = 1 for α = 1 and arbitrary θ.In particular, the probability of a state |φ ⊥ being recognized as real reads where p = 0 for θ = π/2.Thus, we train the discriminator to have θ = π/2 and U z D which sets |φ φ| as close as possible to ρ λ,z From now on we will assume that θ = π/2 unless stated otherwise.
It can be shown by direct calculations that the expression quantifying the difference between predictions of a discriminator for two different states reads where cos θ α = α 2 and cos θ β = β 2 .This difference is maximized if either β = 1 or α = 1 i.e., the discriminator is set to maximize the p for a real state from assemblage {ρ λ,z R , p(z R )}.In this optimal case we arrive at the Mallus law for the discriminator, i.e. where The optimal settings for the discriminator are provided by minimizing the distinguishability between assemblages where If we reach the minimum of J * D (p = r and φ z R |ρ λ,z R |φ z R = 1), then for the corresponding parameters of discriminator and assemblages {ρ λ,z R , p(z R )} consisting of orthogonal states, we can return to the original cost function where for a given assemblage {σ λ,zg , q(z G )} at minimum of J * D we obtain This function is now minimized over the parameters of the discriminator, regardless of the settings of the generator.
Such a discriminator is independent of the generator.However, if the input assemblage is unknown due to the no-cloning theorem, we cannot send the real states both to the generator and the discriminator operating in parallel.It is also impossible to train the generator and discriminator on the same set subsequently (as in traditional QGAN), as the states are destroyed during measurements.Thus, we need to design an alternative generative learning framework to QGAN.

Synergic ansatz
As an alternative to the standard adversarial optimization, we propose minimizing a single cost function, i.e., If The cost function can be interpreted as probability that the assemblages σ and ρ are distinguishable for a given setting of the discriminator.This quantity is minimized if both the generator and the discriminator are optimized simultaneously.If we optimize only the generator or the discriminator, there is always a place for improving J by optimizing the other.Finally, in order to improve the readability we plot an equivalent cost function Let us again assume that the source provides at random two states, i.e., ρ 0 = |0 0| and ρ 1 = |1 1| with p(0) = p(1) = 1/2.Now, if we consider two configurations of the generator corresponding to equiprobable generation (q(0) = q(1) = 1/2) of σ 0 = |+ +| and σ 1 = |− −| or σ 0 = |0 0| and σ 1 = |1 1|, we can easily verify that for some configurations of the discriminator (corresponding to its optimal operation) the latter provides a lower value of J.This makes SQGEN to train the generator properly by introducing a discriminator, which is not the case when only considering generator.

Circuit for synergic ansatz
Let us for simplicity assume that all the probabilities p, q, r correspond to a single deterministic setting.The probabilities q, r are to be found by classical machine learning.The probability p is associated with the purity of the unknown assemblage {ρ λ,z R ), p(z R )}.If for some z R we have p(z G ) = 1 and ρ λ,z R ) is pure, then the assemblage is pure.Now, instead of minimizing J we could equivalently maximize 1 − J = Tr(σ λ,z G ρ λ,z R ) cos 2 (θ α ).Such a function can be measured directly in a single circuit.To this end, we propose connecting conjugated circuits to form a circuit that has D interfaced with its reverse of D with a conditional X-gate in between (i.e., Pauli σ x operation) in the first qubit as depicted in Fig. 2a.To reduce the complexity of this circuit, let us note that the labels marking the class to which a given state belongs to can be purely classical.This means that generator G and discriminator D can be controlled by a classical variable λ, which simplifies the quantum circuit from Fig. 2a to the one depicted in Fig. 2b.Note that the middle (generator) qubit in Fig. 2b can in general represent an arbitrary number of qubits, i.e., ρ and σ can be of arbitrary large Hilbert space.
We have already discussed the discriminator regime θ = π/2.However, it is now apparent that we can also optimize the settings of the discriminator for θ = π/4.In such a case the probability of finding the first qubit in state |0 varies between p(α max ) = 1 and p(0) = 1/2.If for a given state the discriminator outputs p = 1, we know that the state was recognized as originating from the source.Thus, in the comparator regime it is convenient to use a value of p (α) = 2p(α) − 1 and to interpret this value as a probability of recognizing the associated state as real, as in the discriminator regime.Now, we can observe that the measured probability p(α) compared against the probability p(0) of |φ ⊥ being recognized as a real state becomes p(α) − p(0) = p /2, hence the term comparator.This difference p(α) − p(0) is maximized while optimizing the discriminator.Thus, it is reasonable to introduce a cost function for a discriminator which could be easily interpreted in both regimes as the probability of a given state being properly associated with its origin (i.e., G or R), which reads J D = 1−p (α)/2.In the discriminator regime p (α) = p(α) and in the comparator regime p (α) = 2p(α) − 1, where p is the measured quantity.Note that J D is optimized for the same parameters of discriminator in both regimes.
The complete circuit can be considered as working in two settings, depending on detecting |0 or |1 in the last qubit in Fig. 2b.In the latter case, the linear relativeentropy between the generator and the source can be measured by feeding states ρ λ,z G to the circuit and for the fixed values of λ and z G , and consecutively measuring the rate at which the state of the generator line of the circuit is projected on |0 .However, this is only the case if the reversible discriminator returns |1 for a state generated by G and |0 for a state provided by R. The probability of this process is proportional to the rate at which the top line is projected onto |0 .Given that the top qubit is projected onto |0 , the middle line measures the linear cross-entropy.In the opposite case (the decision qubit is detected to be in |1 ), the operation of the discriminator failed to be reversed and the detection rates of the middle line are meaningless.Hence, both the discriminator and the generator work at their best, if the joint detection rates of |0 in both top-most circuit qubits in Fig. 2b are maximized simultaneously.This is why we refer to the learning process as synergic learning.However, there exist solutions to this optimization problem, where the generator G, taken separately from the discriminator, does not perform similarly to R. To address this issue, we consider the regime where only the similarity between G and R is maximized (|0 detected in the third qubit in Fig. 2b).More generally, we could consider the synergic learning as a process where both D and G are trained cooperatively, under the condition that G also is improving separately.To optimize the performance of the quantum setup, we propose to update its parameters using the Nelder-Mead algorithm or gradient descent to minimize the cost function (15).
To consider a possible ansatz for the discriminator, let us again consider the regime, where the X operation is active in the decision qubit.While maximizing the detection rates for |0 in the qubit generated state by varying the parameters of generator G, we are making it less likely to detect |0 in the decision line.If the operations of G and R are identical, then gate X will flip the top qubit and could not achieve maximal two-fold detection rates of |0 in both qubits, unless we allow D to become a Hadamard gate H, conditioned on the similarity of R and G circuits.Note that, while maximizing the detection rates of |0 in the decision line by varying the parameters of the discriminator D, in general, we do not necessarily decrease the value of relative entropy.If during the training the discriminator becomes a separable operation similar to √ H ⊗ 1, and the generator G is very close to operating as R.Then, by optimizing G even further we would not influence the detection rate in the top qubit, i.e., the discriminator stops learning.In fact, the detection rate stops varying with G as soon as the operation D becomes separable.This suggests that inseparability of D is necessary to train the discriminator.Thus, it must be ensured during the design of D that its outcome in the decision qubit is strongly correlated with the generator qubits.This can be easily achieved by making the discriminator to consist of a Y -rotation controlled by the generator output qubits, targeting only the discriminator decision qubit.This rotation is set to π/2 to compute p and q, and to π/4 in case of minimizing J.The discriminator should also admit arbitrary unitary transformations before the controlled operations.This guarantees that the output of a discriminator is state-dependent, and the optimization works as described above.
In our experiments and numerical simulations, we use the circuit ansatz of Möttönen et.al. from Ref. [36].This means that both G and D [i.e., U z D from Eq. ( 7)] are implemented by a circuit block depicted in Fig. 3.We chose this particular ansatz because of its universality, uncomplicated implementation, and straightforward generalization to an arbitrary number of qubits.For a relatively small number of qubits, the exponential scaling in the number of CNOT gates does not constitute a problem.In higher dimensions, one can easily switch to a different ansatz, such as the so-called hardware efficient ansatz [? ] to avoid unfavorable scaling.In both cases, the number of parameters scales linearly with the number of qubits.

EXPERIMENTAL SINGLE-QUBIT SQGEN
Let us consider a proof-of-principle experiment, where λ labels the bases in which states are prepared.If λ = x the generator prepares R at random state The eigenstates of the remaining Pauli matrices σ y and σ z are prepared if λ = y, z.This in general requires feeding generators R and G with uncorrelated bivariate random variables z R and z G (baths), respectively.In addition, we require that the SQGEN performs equally well for all combinations of values of the random variables.Let us train a SQGEN with R set as a Hadamard matrix proceeded by X z G operation, i.e., λ = x.To make the training process more transparent, let us focus on the special case of z g = 0, only In the experiment, we deal with finite numbers of shots, which can lead to random fluctuations in the measured values of the minimized cost function.To establish a sufficient number of shots, we analyzed the impact of this Poissonian noise on the experimental data.In the case considered, we used the Nelder-Mead algorithm because in the noise experiment, it gives better results than the gradient method, needing fewer steps to find the solution.From our numerical simulations, it follows that for our specific problem the training to perform well already for about 10 4 shots for about 10 2 evaluations of the cost function.When using more than 10 6 shots the performance of Nelder-Mead algorithm further improves, reaching 70 cost function evaluations needed to find the minimum of the cost function.The speed of the convergence of this algorithm for this particular problem can be slightly improved by choosing a larger initial simplex.The requirements on the number of function evaluations and the number of coincidences make it feasible to implement conjugated SQGEN on a contemporary quantum computer.The results of the experiment are shown in Fig. 4.
We performed our experiments on ibmq_montreal quantum processor [37].Note that due to technical solutions used in IBMQ processors [37] we cannot directly implement the circuit given in Fig. 2b.The processors, physically implement controlled-phase gates, controllednot gates, and single-qubit rotations.This results in a circuit that performs 27 steps (circuit depth 27, 3 qubit circuit) before evaluating the cost function J. Independent 3 experiments were used to measure 16 values of real/fake state fidelity F (circuit depth 15, 1 qubit circuit), probability p of a real state (generated by R) being classified by D as being real (circuit depth 11, 2 qubit circuit), probability q of a fake state (generated by G) being classified as being real (circuit depth 11, 2 qubit circuit).These experiments were performed for parameter values found after each epoch of training.
For 32000 shots such circuit runs for 15 s per single cost The shaded areas depict the range of values obtained in 100 Monte Carlo simulations accounting for both shot noise and transmon decoherence for the quantum processor ibmq_montreal [37] using three qubits.The connected points represent the actual values measured in the training process performed as described in the main text.For the J cost function, the region associated with the noise model does not include all the experimental data.This means that the noise model provided for the quantum processor by the manufacturer may be inadequate for circuits of depth of order 27.This is similar for the source-generator F and probability p (q) of the discriminator recognizing the source (generator) state as real.
function evaluation.For the random starting point used in Fig. 4, on average, we need 260 evaluations of the cost function to complete 15 training epochs (an epoch corresponds to 5 iterations of the Nelder-Mead algorithm).Our results show that the SQGEN training on a quantum processor (see Fig. 4) performs similarly as predicted by our numerical simulations.We did not use gradientbased approach here, as our experience shows that it is lest robust to experimental noise and because of this its convergence in many cases is worse than the Nelder-Mead methods.
The experimental results, shown in Fig. 4, demonstrate that SQGEN can can be implemented using the available quantum computers, even without applying error correction.However, to obtain our result we applied standard measurement error mitigation, a method which corresponds to calibrating the detection part of the quantum computer.
To find the smallest number of shots needed for the learning process to complete, we have tested the proposed algorithm both on real quantum processors and simulators available to researchers via the IBMQ project [37].Each evaluation of the circuit was performed on 8192 shots, which was found to be sufficient to limit the effect of Poisson noise.Due to the technical imperfections of these real devices, the algorithm converged only in about one half of the runs.It should be stressed out, however, that the user can always rerun the algorithm until it converges.One can observe that the algorithm converges to a non-zero value of the object function, which we also attribute to the experimental noise in the processor.Note that using the noiseless simulators, the algorithm converged on every attempt and the final object function was minimized below 0.001.This supports our finding that the algorithm is performing well, and the convergence difficulties are solely due to the noise in real presently available quantum processors.
5. The circuits used for QGAN.a) a circuit that evaluates the performance of D on R (computes p), b) a circuit that evaluates the performance of D on G (computes q), and c) a circuit that compares R with G (computes F ).These circuits are used in rounds where for a given number of steps either D or G is optimized while keeping the parameters of the other fixed.A proper setting of this procedure requires a trial of error.

COMPARISON OF QGAN AND SQGEN: GENERATING AND RECOGNIZING A MULTIQUBIT ENTANGLED STATE
The proposed approach to generative quantum learning is conceptually different from the approaches described in Refs.[29,30].Both approaches can solve an interesting problem, i.e., given samples of an entangled state, they can learn to generate the entangled states on their own.Moreover, the respective discriminators can be trained to detect the entangled state.However, from our numerical simulations it follows that for the same number of cost function calls, it is the SQGEN that will complete the training first.
To illustrate this, let us consider generator R, which prepares a maximally entangled (for n > 1) n-qubit GHZ state |Ψ = (|0 ⊗n + |1 ⊗n )/ √ 2. Thus, there is one possible value of λ = e.The goal of the QGAN and SQGEN training is to train generator G (i.e., find the optimal circuit parameters) without knowing the algorithm used by R nor its internal state z R by optimization of both the discriminator D and the generator G.The circuits used for QGAN and SQGEN are shown in Fig. 5a-c and Fig. 2b, respectively.
To compare the dynamics of the training process, we use (as in the previous section) three figures of merit: (i) probability q of the fake (G-generated) state to be recognized as real by discriminator D, (ii) probability p of the real state (R-generated) state to be recognized as real by discriminator D, (iii) the distance between D = 1 − F the G-generated and R-generated states (linear entropy).
Here for noiseless numerical simulations we use a gradient descent method (i.e., BFGS), which guarantees at most as many function evaluations as the Nelder-Mead method.The learning process for both SQGEN and QGAN is performed for a fixed number of epochs.Each training epoch for SQGEN corresponds to a single iteration of BFGS algorithm used to minimize the cost function J.The relative number of iterations in QGAN is a hyperparameter that we tuned by trial of error.In the case of QGAN each epoch corresponds to one iteration of BFGS used to train the discriminator (to maximize a cost function proportional to |p − q| and p + q) and a single iteration of BFGS to minimize F.
For SQGEN, n + 1 qubits are required to solve the problem (n + 2 to also monitor F in addition to J), while for QGAN this value corresponds to as much as 3n + 2 qubits (this includes two sources R).Even in the SWAP test based circuit, SQGEN requires fewer qubits than QGAN, i.e., 2n + 4 when monitoring F.
Our numerical investigations suggest that it takes fewer epochs for SQGEN to reach a stable probability values, and QGAN approach is slightly faster to settle on a high fidelity values.If the number of parameters is not too large and the hyperparameters of QGAN are set in an optimal way, the overall performance of SQGEN and QGAN is similar.
However, for GHZ states, QGAN appears to reach the optimal solution for a larger number of initial setup configurations and for a fixed number of training epochs.It is hard to state this with certainty due to a limited number of tested initial configurations.For each studied value of n we found optimal generator and discriminator configurations using both approaches.For each n > 1 we also found cases where either QGAN or SQGEN settled at suboptimal values.
In Fig. 6, showing the dynamics of the learning process optimized after a set of function calls corresponding to epochs, we see the comparison of QGAN and SQGEN results for the best achieved configuration of QGAN hyperparameters.More details on these simulations are summarized in Tab.I.

CONCLUSIONS
We have proposed a new, efficient approach towards generative quantum machine learning.We have tested the proposed SQGEN algorithm experimentally on a small-scale programmable quantum processor.The experimental results shown in Fig. 4 confirm the feasibility of implementing SQGEN on a NISQ device.We have also performed feasibility study for larger experiments.However, we observed that experimental noise for n > 1 prohibited reaching the convergence of the optimization procedure within the observed number of training epochs.
In addition to being conceptually different from a  Note that in our numerical simulations, we have investigated how a quantum computer could learn the concept of a GHZ state.After the training, the network is able both to recognize and to generate this state.A next interesting step in would be to extend the notion of GHZ state to an arbitrary entangled state to investi-gate how the concept of entanglement could be learned and understood by a quantum computer.Solving this problem would require combining the presented concepts and methods with possibly more sophisticated classical machine learning to deal with providing labels for multidimensional, multiparty entanglement.
At this point, it is also important to stress that SQ-GEN cannot be directly reduced to a simple SWAP test, which corresponds to measuring only linear entropy.A SWAP test has an advantage when we are dealing with a source delivering a single pure state, but for general assemblages it not be sufficient to properly train the generator [see the discussion below, Eq. ( 6)].However, the depth of the proposed SQGEN circuit can be potentially reduced (depending on the particular circuit ansatz) by applying the circuit depicted in Fig. 2c.The multi-qubit controlled-SWAP gate can be composed of n standard controlled-SWAP gates (i.e., Fredkin gates).This itself adds to the total circuit depth and at the same time increases exponentially the Hilbert space, which makes it difficult to simulate such circuits.However, using the SWAP test approach can reduce the time needed to evaluate J on a quantum computer with respect to the sequential circuit studied here, but by no more than a half.The SWAP test approach to SQGEN could apply to mitigate to some extent dissipation in real quantum devices by reducing the impact of decoherence, which accumulates over time.
To some extent, we can compare the operation of the SQGEN circuit presented in this paper to that of an uncompressed autoencoder.Just as in the case of the autoencoder, the encoder it is trained together with the decoder, so in the circuit discussed here we train the generator and discriminator together.Another common aspect is the optimization of state fidelity between the input and the output.The key difference between the autoencoder and the problem at hand, is that due to the negation gate appearing, the right side of the circuit (regarded as a decoder) shown in Fig. 2, cannot be interpreted as the inverse of its left side (treated approximately as an encoder).
The aim of a generative algorithm is to generate samples that fit the properties of the real samples without knowing the ground truth (probability density function) about how the real samples are prepared.This is not the same as memorizing the real samples and generating them.We demonstrate that our approach is able to reproduce the real quantum samples and to distinguish between similar and dissimilar samples.In our analysis the ground truth about how the real samples were prepared was relatively simple.Thus, we were able to demonstrate that SQGEN works as intended.Demonstrating that SQGEN can handle more complex data patterns requires additional research as is beyond the scope of this paper.
Finally, it is interesting to consider some analogies be-tween SQGEN and kernel-based machine learning.The initial part of the circuit can be viewed as the state preparation step, whereas the second part (including the X gate) can be interpreted as kernel evaluation circuit.Now, the SQGEN ansatz, as in the case of kernel-based methods, can be understood as a procedure consisting of measuring Gram matrix elements.However, the main difference is that contrary to standard kernel-bases approaches, we are not interested in evaluating Garm matrix elements for a specific fixed feature map and specific pairs of points in the feature space.In our case, the kernel is generated by both parameters of the source state (associated with the generator circuit) and the parameters of the discriminator.The circuits parameters are variables that we optimize and not fixed points in the feature space.The points are given by the generator and the source.Thus, in the variational circuit, we search for such a kernel that minimizes J with respect to circuit parameters.However, the circuit parameters appear with some weights which must be found by a classical algorithm, as in the case of standard applications of kernel methods.Thus, the SQGEN circuit could be considered as a generative kernel learning method which are being currently studied as a promising tool for generative learning [38].
FIG. 1.Geometric interpretation of state-discrimination strategies.Standard measurement-based approach in basis basis |a , |b (a) is compared to (b) discriminator-based approach used in SQGEN.The states to be discriminated are |g and |r .The internal state of the discriminator associated with the optimal discrimination is denoted as |ψ .The discriminator is trained to find an optimal section of a Hilbert space supporting |g and |r , where the overlap | r|ψ | 2 is to be maximal.

FIG. 6 .
FIG.6.Comparison of the dynamics of the QGAN (left column) and SQGEN (right column) training process for the source providing n-qubit GHZ state.The sequence of panels corresponds to n, i.e., (a) n = 2, (b) n = 3, (c) n = 4, etc.We do not plot J as QGAN has no counterpart of it.The plots illustrate the probability p of a source state being recognized as real by the discriminator, probability q of a generated state being recognized, and fidelity F comparing the trained generator and the source.For n = 6, there we can observe QGAN almost settling at a suboptimal solution.This was circumvented by not only minimizing 1 − |p − q| in the discriminator training phase, but also by including a term proportional to 1 − p in the cost function.No such manipulation was required for SQGEN, but including a term proportional to 1 − F in the cost function is possible as it could reduce the number of epochs needed for the convergence of SQGEN algorithm.

TABLE I .
Comparison of performance of QGAN and SQGEN for 20 epochs of learning with BFGS optimizer for varied size of generated n-qubit GHZ state.The total run time is given in seconds, and it may vary depending both on software and hardware.The run times here were obtained as averages over 5 runs (for various initial configurations) on a workstation equipped with Intel(R) Xeon(R) CPU X5690 @ 3.47GHz, using Python-based programs utilizing, e.g., qiskit, numpy, and scipy modules.The tabulated data corresponds to Fig.6