Quantum Generative Adversarial Networks for Learning and Loading Random Distributions

Quantum algorithms have the potential to outperform their classical counterparts in a variety of tasks. The realization of the advantage often requires the ability to load classical data efficiently into quantum states. However, the best known methods require $\mathcal{O}\left(2^n\right)$ gates to load an exact representation of a generic data structure into an $n$-qubit state. This scaling can easily predominate the complexity of a quantum algorithm and, thereby, impair potential quantum advantage. Our work presents a hybrid quantum-classical algorithm for efficient, approximate quantum state loading. More precisely, we use quantum Generative Adversarial Networks (qGANs) to facilitate efficient learning and loading of generic probability distributions -- implicitly given by data samples -- into quantum states. Through the interplay of a quantum channel, such as a variational quantum circuit, and a classical neural network, the qGAN can learn a representation of the probability distribution underlying the data samples and load it into a quantum state. The loading requires $\mathcal{O}\left(poly\left(n\right)\right)$ gates and can, thus, enable the use of potentially advantageous quantum algorithms, such as Quantum Amplitude Estimation. We implement the qGAN distribution learning and loading method with Qiskit and test it using a quantum simulation as well as actual quantum processors provided by the IBM Q Experience. Furthermore, we employ quantum simulation to demonstrate the use of the trained quantum channel in a quantum finance application.

The first theoretic discussion of quantum GANs (qGANs) was followed by demonstrations of qGAN implementations.Some focus on quantum state estimation [13], i.e. finding a quantum channel whose output is an estimate to a given quantum state [14][15][16].Others exploit qGANs to generate classical data samples in accordance with the training data's underlying distribution [17][18][19].
In contrast, our qGAN implementation learns and loads probability distributions into quantum states.More specificially, the aim of the qGAN is not to produce classical samples in accordance with given classical training data but to train the quantum generator to create a quantum state which represents the data's underlying probability distribution.The resulting quantum channel, given by the quantum generator, enables efficient loading of an approximated probability distribution into a quantum state.It can be easily prepared and reused as often as needed.Now, applying this qGAN scheme for data loading can facilitate quantum advantage in combination with other algorithms such as Quantum Amplitude Estimation (QAE) [4] or the HHL-algorithm [1].Notably, QAE and HHL -given a well-conditioned matrix and a suitable classical right-hand-side [5] -are both compatible with approximate state preparation as these algorithms are stable to small errors in the input state, i.e. small deviations in the input only lead to small deviations in the result.
The remainder of this paper is structured as follows.Sec.II explains classical GANs.Then, the qGAN-based distribution learning and loading scheme is introduced and analyzed on different test cases in Sec.III.In Sec.IV, we discuss the exploitation of qGANs to facilitate quan-tum advantage in financial derivative pricing: First, we discuss the training of the qGAN with data samples drawn from a log-normal distribution and present the results obtained with a quantum simulator and the IBM Q Boeblingen superconducting quantum computer with 20 qubits, both accessible via the IBM Q Experience [20].Then, the resulting quantum channel is used in combination with QAE to price a European call option.Finally, Sec.V presents the conclusions and a discussion on open questions and additional possible applications of the presented scheme.

II. GENERATIVE ADVERSARIAL NETWORKS
The generative models considered in this work, GANs [10,11], employ two neural networks -a generator and a discriminator -to learn random distributions that are implicitly given by training data samples.
Originally, GANs have been used in the context of image generation and modification.In contrast to previously used generative models, such as Variational Auto Encoders (VAEs) [21,22], GANs managed to generate sharp images and consequently gained popularity in the machine learning community [23].VAEs and other generative models relying on log-likelihood optimization are prone to generating blurry images.Particularly for multimodal data, log-likelihood optimization tends to spread the mass of a learned distribution over all modes.GANs, on the other hand, tend to focus the mass on each mode [10,24].
Suppose a classical training data set X = { x 0 , . . ., x s−1 } ⊂ R kout sampled from an unknown probability distribution p real .Let G θ : R kin → R kout and D φ : R kout → {0, 1} denote the generator and the discriminator networks, respectively.The corresponding network parameters are given by θ ∈ R kg and φ ∈ R k d .The generator G θ translates samples from a fixed prior distribution p prior in R kin into samples which shall be indistinguishable from samples of the real distribution p real in R kout .The discriminator D φ , on the other hand, tries to distinguish between data from the generator and from training set.The training process is illustrated in Fig. 1.
The optimization objective of classical GANs may be defined in various ways.In this work, we consider the non-saturating loss [25] which is also used in the code of the original GAN paper [10].The generator's loss function aims at maximizing the likelihood that the generator creates samples that are labeled as real data samples.On the other hand, the discriminator's loss function  aims at maximizing the likelihood that the discriminator labels training data samples as training data samples and generated data samples as generated data samples.

x U X f w h M h H g A = " >
In practice, the expected values are approximated by batches of size m for x l ∈ X and z l ∼ p prior .
Training the GAN is equivalent to searching for a Nash-equilibrium of a two-player game: Typically, the optimization of Eq. ( 5) and Eq. ( 6) employs alternating update steps for the generator and the discriminator.These alternating steps lead to nonstationary objective functions, i.e. an update of the generator's (discriminator's) network parameters also changes the discriminator's (generator's) loss function.Common choices to perform the update steps are ADAM [26] and AMSGRAD [27], which are adaptive-learningrate, gradient-based optimizers that use an exponentially decaying average of previous gradients, and are well suited for solving non-stationary objective functions [26].

III. QGAN DISTRIBUTION LEARNING
Our qGAN implementation uses a quantum generator and a classical discriminator to capture the probability distribution of classical training samples.
Notably, the aim of this approach is to train a data loading quantum channel for generic probability distributions.As discussed in Sec.II, GAN-based learning is explicitly suitable to capture not only uni-modal but also multi-modal distributions, as we will also demonstrate later in this section.
In this setting, a parametrized quantum channel, i.e. the quantum generator, is trained to transform a given n-qubit input state |ψ in to an n-qubit output state where p j θ describe the resulting occurrence probabilities of the basis states |j .
For simplicity, we now assume that the domain of X is {0, ..., 2 n − 1} and, thus, the existence of a natural mapping between the sample space of the training data and the states that can be represented by the generator.This assumption can be easily relaxed, for instance, by introducing an affine mapping between {0, ..., 2 n − 1} and an equidistant grid suitable for X.In this case, it might be necessary to map points in X to the closest grid point to allow for an efficient training.The number of qubits n determines the distribution loading scheme's resolution, i.e. the number of discrete values 2 n that can be represented.During the training, this affine mapping can be applied classically after measuring the quantum state.However, when the resulting quantum channel is used within another quantum algorithm the mapping must be executed as part of the quantum circuit.As was discussed in [28], such an affine mapping can be implemented in a gate-based quantum circuit with linearly many gates.
The quantum generator is implemented by a variational form [29], i.e. a parametrized quantum circuit.We consider variational forms consisting of alternating layers of parametrized single-qubit rotations, here Pauli-Yrotations (R Y ) [3], and blocks of two-qubit gates, here controlled-Z-gates (CZ) [3], called entanglement blocks U ent .The circuit consists of a first layer of R Y gates, and then k alternating repetitions of U ent and further layers of R Y gates.The rotation acting on the i th qubit in the j th layer is parametrized by θ i,j .Moreover, the parameter k is called the depth of the variational circuit.If such a variational circuit acts on n qubits it uses in total (k+1)n parametrized single-qubit gates and kn two-qubit gates, see Fig. 2 for an illustration.Similarly to increasing the number of layers in deep neural networks [30], increasing the depth k enables the circuit to represent more complex structures and increases the number of parameters.Another possibility to increase the quantum generator's ability to represent complex correlations is adding ancilla qubits as this facilitates an isometric instead of a unitary mapping [3], see Appendix A for more details.
The rationale behind choosing a variational form with R Y and CZ gates, e.g. in contrast to other Pauli rotations and two-qubit gates, is that for θ i,j = 0 the variational FIG. 2 The variational form, depicted in (a), with depth k acts on n qubits.It is composed of k + 1 layers of single-qubit Pauli-Y -rotations and k entangling blocks Uent.As illustrated in (b), each entangling block applies CZ gates from qubit i to qubit (i + 1) mod n, i ∈ { 0, . . ., n − 1 } to create entanglement between the different qubits.
form does not have any effect on the state amplitudes but only flips the phases.These phase flips do not perturb the modeled probability distribution which solely depends on the state amplitudes.Thus, if a suitable |ψ in can be loaded efficiently, the variational form allows its exploitation.
To train the qGAN, samples are drawn by measuring the output state |g θ in the computational basis, where the set of possible measurement outcomes is |j , j ∈ { 0, . . ., 2 n − 1 }.Unlike in the classical case, the sampling does not require a stochastic input but is based on the inherent stochasticity of quantum measurements.Notably, the measurements return classical information, i.e. p j being defined as the measurement frequency of |j .
The scheme can be easily extended to d-dimensional distributions by choosing d qubit registers with n i qubits each, for i = 1, . . ., d, and constructing a multidimensional grid, see Appendix B for an explicit example of a qGAN trained on multivariate data.
A carefully chosen input state |ψ in can help to reduce the complexity of the quantum generator and the number of training epochs as well as to avoid local optima in the quantum circuit training.Since the preparation of |ψ in should not dominate the overall gate complexity, the input state must be loadable with O (poly (n)) gates.This is feasible, e.g. for efficiently integrable probability distributions, such as log-concave distributions [31].In practice, statistical analysis of the training data can guide the choice for a suitable |ψ in from the family of efficiently loadable distributions, e.g. by matching expected value and variance.
In Sec.III A, we present a broad simulation study that analyzes the impact of |ψ in as well as the circuit depth k.
The classical discriminator, a standard neural network consisting of several layers that apply non-linear activation functions, processes the data samples and labels them either as being real or generated.Notably, the topology of the networks, i.e. number of nodes and layers, needs to be carefully chosen to ensure that the discriminator does not overpower the generator and vice versa.
Given m data samples g l from the quantum generator and m randomly chosen training data samples x l , where l = 1, . . ., m, the loss functions of the qGAN are for the generator, and for the discriminator, respectively.As in the classical case, see Eq. ( 5) and ( 6), the loss functions are optimized alternately with respect to the generator's parameters θ and the discriminator's parameters φ.

A. Simulation Study
Next, we present the results of a broad simulation study on training qGANs with different settings for different target distributions.
The quantum generator is implemented with Qiskit [32] which enables the circuit execution with quantum simulators as well as quantum hardware provided by the IBM Q Experience [20].
We consider a quantum generator acting on n = 3 qubits, which can represent 2 3 = 8 values, namely {0, 1, . . ., 7}.We applied the method for 20, 000 samples of, first, a log-normal distribution with µ = 1 and σ = 1, second, a triangular distribution with lower limit l = 0, upper limit u = 7 and mode m = 2, and last, a bimodal distribution consisting of two superimposed Gaussian distributions with µ 1 = 0.5, σ 1 = 1 and µ 2 = 3.5, σ 2 = 0.5, respectively.All distributions were truncated to [0, 7] and the samples were rounded to integer values.The generator's input state |ψ in is prepared according to a discrete uniform distribution, a truncated and discretized normal distribution with µ and σ being empirical estimates of mean and standard deviation of the training data samples, or a randomly chosen initial distribution.
Preparing a uniform distribution on 3 qubits requires the application of 3 Hadamard gates, i.e. one per qubit [3].Loading a normal distribution involves more advanced techniques, see Appendix C for further details.For both cases, we sample the generator parameters from a uniform distribution on [−δ, +δ], for δ = 10 −1 .By construction of the variational form, the resulting distribution will be close to |ψ in but slightly perturbed.Adding small random perturbations helps to break symmetries and can, thus, help to improve the training performance [33][34][35].To create a randomly chosen distribution, we set |ψ in = |0 ⊗3 and initialize the parameters of the variational form following a uniform distribution on [−π, π].
From now on, we refer to these three cases as uniform, normal, and random initialization. .Furthermore, we tested quantum generators with depths k ∈ { 1, 2, 3 }.
The discriminator, a classical neural network, is implemented with PyTorch [36].The neural network consists of a 50-node input layer, a 20-node hidden-layer and a single-node output layer.First, the input and the hidden layer apply linear transformations followed by Leaky ReLU functions [10,37,38].Then, the output layer implements another linear transformation and applies a sigmoid function.The network should neither be too weak nor too powerful to ensure that neither the generator nor the discriminator overpowers the other network during the training.The used discriminator topology has been chosen based on empirical tests.
The qGAN is trained using AMSGRAD [27] with the initial learning rate being 10 −4 .Due to the utilization of first and second momentum terms, this is a robust optimization technique for non-stationary objective functions as well as for noisy gradients [26], which makes it particularly suitable for running the algorithm on real quantum hardware.Methods for the analytic computation of the quantum generator loss function's gradients are discussed in Appendix D. The training stability is improved further by applying a gradient penalty on the discriminator's loss function [39,40].
In each training epoch, the training data is shuffled and split into batches of size 2, 000.The generated data samples are created by preparing and measuring the quantum generator 2, 000 times.Then, the batches are used to update the parameters of the discriminator and the generator in an alternating fashion.After the updates are completed for all batches, a new epoch starts.
According to the classical GAN literature, the loss functions do not neccessarily reflect whether the method converges [41].In the context of training a quantum representation of some training data's underlying random distribution, the Kolmogorov-Smirnov statistic as well as the relative entropy represent suitable measures to evaluate the training performance.Given the nullhypothesis that the probability distribution from |g θ is equivalent to the probability distribution underlying X, the Kolmogorov-Smirnov statistic D KS determines whether the null-hypothesis is accepted or rejected with a certain confidence level, here set to 95%.The relative entropy quantifies the difference between two probability distributions.In the following, we analyze the results using these two statistical measures, which are formally introduced in Appendix E.
For each setting, we repeat the training 10 times to get a better understanding of the robustness of the results.Table I shows aggregated results over all 10 runs and presents the mean µ KS , the standard deviation σ KS and the number of accepted runs n ≤b according to the Kolmogorov-Smirnov statistic as well as the mean µ RE and standard deviation σ RE of the relative entropy outcomes between the generator output and the corresponding target distribution.The data shows that increasing the quantum generator depth k usually improves the training outcomes.Furthermore, the table illustrates that a carefully chosen initialization can have favorable effects, as can be seen especially well for the bimodal target distribution with normal initialization.Since the standard deviations are relatively small and the number of accepted results is usually close to 10, at least for depth k ≥ 2, we conclude that the presented approach is quite robust and applicable also to more complicated distributions.Fig. 3 illustrates the results for one example of each target distributions.

IV. APPLICATION IN QUANTUM FINANCE
Now, we demonstrate that training a data loading unitary with qGANs can facilitate financial derivative pricing.More precisely, we employ qGANs to learn and load a model for the spot price of an asset underlying a European call option.We perform the training for different initial states with a quantum simulator, and also execute the learning and loading method for a random initialization on an actual quantum computer, the IBM Q Boeblingen 20 qubit chip.Then, the fair price of the option is estimated by sampling from the resulting distribution, as well as with a QAE algorithm [4,28] that uses the quan-tum generator trained with IBM Q Boeblingen for data loading.A detailed description of the QAE algorithm is given in Appendix F.
The owner of a European call option is permitted, but not obliged, to buy an underlying asset for a given strike price K at a predefined future maturity date T , where the asset's spot price at maturity S T is assumed to be uncertain.If S T ≤ K, i.e. the spot price is below the strike price, it is unreasonable to exercise the option and there is no payoff.However, if S T > K, exercising the option to buy the asset for price K and immediately selling it again for S T can realize a payoff S T − K. Thus, the payoff of the option is defined as max{S T − K, 0}.Now, the goal is to evaluate the expected payoff E [max{S T − K, 0}], whereby S T is assumed to follow a particular random distribution.This corresponds to the fair option price before discounting [42].Here, the discounting is neglected to simplify the problem.
To demonstrate and verify the applicability of the suggested training method, we implement a small illustrative example that is based on the analytically computable standard model for European option pricing, the Black-Scholes model [42].The qGAN algorithm is used to train a corresponding data loading unitary which enables the evaluation of characteristics of this model, such as the expected payoff, with QAE.
It should be noted that the Black-Scholes model often over-simplifies the real circumstances.In more realistic and complex cases, where the spot price follows a more generic stochastic process or where the payoff function has a more complicated structure, options are usually evaluated with Monte Carlo simulations [43].A Monte Carlo simulation uses N random samples drawn from the respective distribution to evaluate an estimate for a characteristic of the distribution, e.g. the expected payoff.The estimation error of this technique behaves like = O(1/ √ N ).When using n evaluation qubits to run a QAE, this induces the evaluation of N = 2 n quantum samples to estimate the respective distribution charactersitic.Now, this quantum algorithm achieves a Grovertype error scaling for option pricing, i.e. = O(1/N ) [4,28,44].To evaluate an option's expected payoff with QAE, the problem must be encoded into a quantum operator that loads the respective probability distribution and implements the payoff function.
In this work, we demonstrate that this distribution can be loaded approximately by training a qGAN algorithm.
In the remainder of this section, we first illustrate the training of a qGAN using classical quantum simulation.Then, the results from running a qGAN training on actual quantum hardware are presented Finally, we employ the generator trained with a real quantum computer to conduct QAE-based option pricing.

A. QGAN Training
According to the Black-Scholes model [42], the spot price at maturity S T for a European call option is lognormally distributed.Thus, we assume that p real , which is typically unknown, is given by a log-normal distribution and generate the training data X by randomly sampling from a log-normal distribution.
As in Sec.III A, the training data set X is constructed by drawing 20, 000 samples from a log-normal distribution with mean µ = 1 and standard deviation σ = 1 truncated to [0, 7], and then, rounding the sampled values to integers, i.e. to the grid that can be natively represented by the generator.We discuss a detailed analysis of training a model for this distribution with a depth k = 1 quantum generator, which is sufficient for this small example, with different initializations, namely uniform, normal and random.The discriminator and generator network architectures as well as the optimization method are chosen equivalently to the ones described in Sec.III A.
First, we present results from running the qGAN training with a quantum simulator.The training procedure involves 2, 000 epochs.Fig. 4 shows the progress of the loss functions L D (φ, θ) and L G (φ, θ), as well as, the probability density function (PDF) corresponding to the trained |g θ and the target PDF.The PDFs visualize that both uniform and normal initialization perform better than the random initialization.
Fig. 5 shows the progress of the relative entropy and, thereby, illustrates how the generated distributions converge towards the training data's underlying distribution.This figure also shows that the generator model which is initialized randomly performs worst.Notably, the initial relative entropy for the normal distribution is already small.We conclude that a carefully chosen initialization clearly improves the training, although all three approaches eventually lead to reasonable results.
Table II presents the Kolmogorov-Smirnov statistics of the experiments.The results also confirm that initialization impacts the training performance.The statistics for the normal initialization are better than for the uniform initialization, which itself outperforms random initialization.It should be noted that the null-hypothesis is accepted for all settings.Next, we present the results of the qGAN training run on an actual quantum processor, more precisely, the IBM Q Boeblingen chip [20].We use the same training data,  Equivalent to the simulation, in each epoch, the training data is shuffled and split into batches of size 2, 000.The generated data samples are created by preparing and measuring the quantum generator 2, 000 times.To compute the analytic gradients for the update of θ, we use 8, 000 measurement to achieve suitably accurate gradients.
Fig. 6 presents the PDF corresponding to |g θ trained with IBM Q Boeblingen respectively with a classical quantum simulation that models the quantum chip's noise.To evaluate the training performance, we evaluate again the relative entropy and the Kolmogorov-Smirnov statistic.A comparison of the progress of the loss functions and the relative entropy for a training run with the IBM Q Boeblingen chip and with the noisy quantum simulation is shown in Fig. 7.The plot illustrates that the relative entropy for both, the simulation and the real quantum hardware, converge to values close to zero and, thus, that in both cases |g θ evolves towards the random distribution underlying the training data samples.Again, the Kolmogorov-Smirnov statistic D KS determines whether the null-hypothesis is accepted or rejected with a confidence level of 95%.The results presented in Table III confirm that we were able to train an appropriate model on the actual quantum hardware.Notably, some of the more prominent fluctuations might be due to the fact that the IBM Q Boeblingen chip is recalibrated on a daily basis which is, due to the queuing, circuit preparation, and network communication overhead, shorter than the overall training time of the qGAN.

B. European Option Pricing
In the following, we demonstrate that the qGAN based data loading scheme enables the exploitation of potential quantum advantage of algorithms such as QAE by using a generator trained with actual quantum hardware to facilitate European call option pricing.The resulting quantum generator loads a random distribution that approximates the spot price at maturity S T .More specifically, we integrate the distribution loading quantum channel into a quantum algorithm based on QAE to evaluate the expected payoff E [max {S T − K, 0}] for K = $2, illustrated in Fig. 8.Given this efficient, approximate data loading, the algorithm can achieve a quadratic improvement in the error scaling compared to classical Monte Carlo simulation.We refer to [28] and to Appendix F for a detailed discussion of derivative pricing with QAE.The resulting confidence intervals (CI) are shown for a confidence level of 95% for the Monte Carlo simulation as well as the QAE.The CIs are of comparable size, although, because of better scaling, QAE requires only a fourth of the samples.Since the distribution is approximated, both CIs are close to the exact value but do not actually contain it.
Note that the estimates and the CIs of the MC and the QAE evaluation are not subject to the same level of noise effects.This is due to the fact, that the QAE evaluation uses the generator parameters trained with IBM Q Boeblingen but is run with a quantum simulator, whereas the Monte Carlo simulation is solely run on actual quantum hardware.To be able to run QAE on a quantum computer, further improvements are required, e.g.longer coherence times and higher gate fidelities.

V. CONCLUSION AND OUTLOOK
We demonstrated the application of an efficient, approximate probability distribution learning and loading scheme based on qGANs that requires O (poly (n)) many gates.In contrast to this, current state-of-the-art techniques for exact loading of generic random distributions into an n-qubit state necessitate O (2 n ) gates which can easily predominate a quantum algorithm's complexity.
The respective quantum channel is implemented by a gate-based quantum algorithm and can, therefore, be directly integrated into other gate-based quantum algorithms.This is explicitly shown by the learning and loading of a model for European call option pricing which is evaluated with a QAE-based algorithm that can achieve a quadratic improvement compared to classical Monte Carlo simulation.Flexibility is given because the model can be fitted to the complexity of the underlying data and the loading scheme's resolution can be traded off against the complexity of the training data by varying the number of used qubits n and the circuit depth k.Moreover, qGANs are compatible with online or incremental learning, i.e. the model can be updated if new training data samples become available.This can lead to a significant reduction of the training time in real-world learning scenarios.
Some questions remain open and may be subject to future research, for example, an analysis of optimal quantum generator and discriminator structures as well as training strategies.Like in classical machine learning, it is neither apriori clear what model structure is the most suitable for a given problem nor what training strategy may achieve the best results.
Furthermore, although barren plateaus [45] were not observed in our experiments, the possible occurrence of this effect, as well as, counteracting methods should be investigated.Classical ML already offers a large variety of potential solutions, e.g. the inclusion of noise and momentum terms in the optimization procedure, the simplification of the function landscape by increase of the model size [46] or the computation of higher order gradients.Moreover, schemes that were developed in the context of VQE algorithms, such as adaptive initialization [47], could help to circumvent this issue.
Another interesting topic worth investigating considers the representation capabilities of qGANs with other data types.Encoding data into qubit basis states naturally induces a discrete and equidistantly distributed set of represented data values.However, it might be interesting to look into the compatibility of qGANs with continuous or non-equidistantly distributed values.

VI. ACKNOWLEDGMENTS
We would like to thank Giovanni Mariani for sharing his knowledge and engaging in very helpful discussions.
IBM and IBM Q are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide.Other product or service names may be trademarks or service marks of IBM or other companies.

VII. CODE AVAILABILITY
The code for the qGAN algorithm is publicly available as part of Qiskit [32].The algorithm can be found in https://github.com/Qiskit/qiskit-aqua. Tutorials explaining the training and the application in the context of QAE are located at https://github.com/Qiskit/qiskit-iqx-tutorials.
Appendix A: Isometric Quantum Generator Closed quantum systems follow a unitary evolution.The evolution of an open quantum system, i.e. a quantum system that interacts with an environment, evolves according to an isometry instead of a unitary [3].
In general, every isometry can be described by a unitary that acts on a larger system.In other words, an isometry is given by a partial trace of a unitary quantum state evolution.The dynamics of an open quantum systems, and, thus, also a quantum generator acting as an isometry, can be implemented with additional ancilla qubits.Depending on the setting, the use of an isometric quantum generator can be advantageous to learn random distributions, as mentioned in Sec.III.

Appendix B: Multivariate Historical Data for Portfolio Optimization
The qGAN scheme can also be used to learn and load multivariate random distributions.Here, we present the learning and loading of a distribution underlying the first two principle components of multivariate, constant maturity treasury rates of US government bonds.Note that the trained quantum channel can be used within the discussed QAE algorithm to evaluate, for instance, the fair price of a portfolio of government bonds, see [28].
The following results are computed with a quantum simulation.The training data set X consists of more than 5, 000 samples, whereby data samples smaller than the 5%−percentile and bigger than the 95%−percentile have been discarded to reduce the number of required qubits for a reasonable representation of the distribution.The optimization scheme uses data batches of size 1, 200 and is run for 20, 000 training epochs.Furthermore, we use depth k ∈ { 2, 3, 6 }, unitary quantum generators that act on n = 6 qubits, i.e. 3 qubits per dimension (principle component).The input state |ψ in is prepared as a multivariate uniform distribution and the generator parameters θ are initialized with random draws from a uniform distribution on the interval [−δ, +δ] with δ = 10 −1 .
Here, the classical discriminator is composed of a 512− node input layer, a 256− node hidden-layer, and a singlenode output layer.Equivalently to the discriminator described in Sec.III A, the hidden layers apply linear transformations followed by Leaky ReLU functions [37] and the output layer employs a linear transformation followed by a sigmoid function.The evolution of the relative entropy between the generated and the real probability distribution is shown Fig. 9.

Appendix C: Practical Initialization of a Normal Distribution
As proven in [31], a normal distribution can be efficiently loaded into a quantum state.However, the suggested loading method requires the use of involved quantum arithmetic techniques.Considering the illustrative examples from Sec. III A and Sec.IV A, it is sufficient to load an approximate normal distribution as initialization state.This can be achieved by fitting the parameters of a 3-qubit variational quantum circuit with depth 1 with a least squares loss function.More specifically, we minimize the distance between the measurement probabilities p i ζ of the circuit output and the probability density function of a discretized normal distribution q i min ζ i The circuit used for training is depicted in Fig. 10.Note that this approach does not scale, particularly not for higher-dimensional distributions.The sole purpose of this approach is to generate shallow testing circuits.According to [51] Eq. (D5) can be evaluated by with θ i,l ± = θ i,l ± π 2 e i,l and e i,l denoting the (i, l)-unit vector of the respective parameter space.

Appendix E: Statistical Measures
Two different statistical measures are utilized to evaluate the performance of the qGAN.Both measures are defined as a distance of two (empirical) probability distributions P and Q.
The Kolmogorov-Smirnov statistic [53,54] is based on the (empirical) cumulative distribution functions P (X ≤ x) and Q (X ≤ x) and is given by The statistic can be used as a goodness-of-fit test.Given the null-hypothesis P (x) = Q (x), we draw s = 500 samples from both distributions and choose a confidence level (1 − α) with α = 0.05.The null-hypothesis is accepted if Another measure that can be used to characterize the closeness of (empirical) discrete probability distributions P (x) and Q (x) is the relative entropy, also called Kullback-Leibler divergence [3,55].This entropy-related measure is given by The relative entropy represents a non-negative quantity, i.e.D RE (P ||Q) ≥ 0, where D RE (P ||Q) = 0 holds if P (x) = Q (x) , for all values x.
The error in the outcome -ignoring higher termscan be bounded by π 2 m .Considering that 2 m is the number of quantum samples used for the estimate evaluation, this error scaling is quadratically better than the classical Monte Carlo simulation.11 The quantum circuit corresponding to Quantum Amplitude Estimation algorithm with the inverse Quantum Fourier Transform [3] being denoted by F † m .
To use QAE for the pricing of European options, we need to construct and implement a suitable oracle A. First, we load the uncertainty distribution that represents the spot price S T of the underlying asset at the option's maturity T into a quantum state It should be noted that small errors in this state preparation only lead to small errors in the final result.Then, we add an ancilla qubit |0 and use a comparator circuit which applies an X gate to the ancilla if i > K, i.e.
where K denotes the strike price.Now, the state reads Finally, we control the mapping of the payoff function to the amplitude of another ancilla qubit |0 with the comparison ancilla.This construction implements channel A and approximates the quantum state with f (i) = i−K 2 n −K−1 .For practical reasons, we avoid the involved implementation of the exact linear objective rotation given in Eq. (F4) by applying the approximation scheme introduced in [28].
Eventually, the probability of measuring |1 in the last ancilla is equal to  FIG. 13 The action of the illustrated circuits is equivalent.Since the quantum circuit at the bottom requires fewer CX gates, it is the favorable implementation choice for training a qGAN with actual quantum hardware.Notably, the lower circuit projects the measurement of qubit q 1 (q 2 ) on bit c 2 (c 1 ).
Due to the connectivity layout of the IBM Q Boeblingen chip, shown in Fig. 12, any subset of three qubits -we use qubits 0, 1, 2 -has linear connectivity only.Thus, the implementation of the entanglement block presented in Fig. 2 requires the use of SWAP gates, as shown in Fig. 13(a).The implementation of CZ • SW AP with the gate set currently available for IBM Q backends requires the use of 4 CX gates, i.e. 3 for the SW AP and 1 for the CZ.However, we can reduce the number of required CX gates, see Fig. 13(b).As shown in [56], the action of circuit (a) is equivalent to the action of circuit (b), which only utilizes 2 CX gates.During the training, circuit (b) maps the measurement of q 1 (q 2 ) on bit c 2 (c 1 ) to compensate for the second SW AP in circuit (a).However, when using the generator circuit for data loading in another algorithm, such as QAE, an actual SW AP gate must be implemented.
t e x i t s h a 1 _ b a s e 6 4 = " X W E F S 9 h r s E k S U q c b

e 9 3 p 3 /
FIG. 1 Generative Adversarial Network: First, the generator creates data samples which shall be indistinguishable from the training data.Second, the discriminator tries to differentiate between the generated samples and the training samples.The generator and discriminator are trained alternately.

FIG. 3
FIG. 3 Result of training the qGAN for a log-normal target distribution with normal initialization and a depth 2 generator (a, b), a triangular target distribution with random initialization and a depth 2 generator (c, d), and a bimodal target distribution with uniform initialization and a depth 3 generator (e, f).The presented probability density functions correspond to the trained |g θ (a, c, e) and the loss function progress is illustrated for the generator as well as for the discriminator (b, d, f).

FIG. 4 FIG. 5
FIG. 4 Results of training the qGAN with uniformly (a, b), normally (c, d), and randomly (e, f) initialized quantum generator.The PDFs corresponding to the trained |g θ (a, c, e), as well as, the loss functions the generator and the discriminator (b, d, f) are illustrated.
quantum generator and discriminator as before.To improve the robustness of the training against the noise introduced by the quantum hardware, we set the optimizer's learning rate to 10 −3 .The initialization is chosen according to the random setting because it requires the least gates.Due to the increased learning rate, it is sufficient to run the training for 200 optimization epochs.For more details on efficient implementation of the generator on IBM Q Boeblingen, see Appendix G.

FIG. 6 FIG. 7
FIG. 6 The presented PDFs from |g θ are achieved with a randomly initialized qGAN training run on (a) the IBM Q Boeblingen and (b) a quantum simulation employing a noise model.

FIG. 8
FIG. 8 Probability distribution of the spot price at maturity S T and the corresponding payoff function for a European Call option.The distribution has been learned with a randomly initialized qGAN run on the IBM Q Boeblingen chip.

FIG. 9
FIG.9The progress of the relative entropy between the quantum generator and the multivariate random distribution underlying the training data for depth k ∈ { 2, 3, 6 }.

FIG. 10
FIG.10Variational quantum circuit for approximate loading of a discretized normal distribution.

2 FIG. 12
FIG. 12 The figure illustrates the connectivity of the IBM Q Boeblingen 20 superconducting qubit chip, as well as, the qubits used for the qGAN training.
TABLE III Kolmogorov-Smirnov statistic for randomly chosen samples of |g θ trained with a noisy quantum simulation and using the IBM Q Boeblingen device.