Abstract
Hybrid quantumclassical algorithms provide ways to use noisy intermediatescale quantum computers for practical applications. Expanding the portfolio of such techniques, we propose a quantum circuit learning algorithm that can be used to assist the characterization of quantum devices and to train shallow circuits for generative tasks. The procedure leverages quantum hardware capabilities to its fullest extent by using native gates and their qubit connectivity. We demonstrate that our approach can learn an optimal preparation of the GreenbergerHorneZeilinger states, also known as “cat states”. We further demonstrate that our approach can efficiently prepare approximate representations of coherent thermal states, wave functions that encode Boltzmann probabilities in their amplitudes. Finally, complementing proposals to characterize the power or usefulness of nearterm quantum devices, such as IBM’s quantum volume, we provide a new hardwareindependent metric called the qBAS score. It is based on the performance yield in a specific sampling task on one of the canonical machine learning data sets known as Bars and Stripes. We show how entanglement is a key ingredient in encoding the patterns of this data set; an ideal benchmark for testing hardware starting at four qubits and up. We provide experimental results and evaluation of this metric to probe the trade off between several architectural circuit designs and circuit depths on an iontrap quantum computer.
Introduction
What is a good metric for the computational power of noisy intermediatescale quantum^{1} (NISQ) devices? Can machine learning (ML) provide ways to benchmark the power and usefulness of NISQ devices? How can we capture the performance scaling of these devices as a function of circuit depth, gate fidelity, and qubit connectivity? In this work, we design a hybrid quantumclassical framework called datadriven quantum circuit learning (DDQCL) and address these questions through simulations and experiments.
Hybrid quantumclassical algorithms, such as the variational quantum eigensolver^{2,3} (VQE) and the quantum approximate optimization algorithm^{4,5} (QAOA), provide ways to use NISQ computers for practical applications. For example, VQE is often used in quantum chemistry when searching the ground state of the electronic Hamiltonian of a molecule.^{6,7,8} QAOA is used in combinatorial optimization to find approximate solutions of a classical Ising model^{9} and it has been demonstrated on a 19qubit device for the task of clustering synthetic data.^{10} Other successful hybrid approaches based on genetic algorithms were proposed for approximating quantum adders and training quantum autoencoders.^{11,12,13} In all these examples, there is a clearcut objective function describing the cost associated with each candidate solution. The task is then to optimize it exploiting both quantum and classical resources.
Although optimization tasks offer a great niche of applications, probabilistic tasks involving sampling have most of the potential to prove quantum advantage in the nearterm.^{14,15,16} For example, learning probabilistic generative models is in many cases an intractable task, and quantumassisted algorithms have been proposed for both gatebased^{17,18} and quantum annealing devices.^{19,20,21,22,23} Differently from optimization, the learning of a generative model is not always described by a clearcut objective function. All we are given as input is a data set, and there are several cost functions that could be used as a guide, each one with their own advantages, disadvantages, and assumptions.
Here we present a hybrid quantumclassical approach for generative modeling on gatebased NISQ devices which heavily relies on sampling. We use the 2^{N} amplitudes of the wave function obtained from a Nqubit quantum circuit to construct and capture the correlations observed in a data set. As Born’s rule determines the output probabilities, this model belongs to the class of models called Born machines.^{24} Previous implementations of Born machines^{25,26,27,28} often relied on the construction of tensor networks and their efficient manipulation through graphicalprocessing units. Our work differs in that Born’s rule is naturally implemented by a quantum circuit executed on a NISQ hardware. Following the notation of subsequent work,^{29} we refer to our generative model as a quantum circuit Born machine.
By employing the quantum circuit itself as a model for the data set, we also differentiate from quantum algorithms that target specific probability distributions. For example, in ref. ^{30} the authors developed a hybrid algorithm to approximately sample from a Boltzmann distribution. The samples are used to update a classical generative model, which requires running the hybrid algorithm for each training iteration. In contrast, our work does not assume a Boltzmann distribution and therefore does not require a specific sampling beyond Born’s rule.
Our work is also in contrast with other quantumassisted ML work (see e.g., refs ^{31,32,33,34}) requiring faulttolerant quantum computers, which are not expected to be readily available in the nearterm.^{35} Instead, our circuits are carefully designed to exploit the full power of the underlying NISQ hardware without the need for a compilation step.
On the benchmarking of NISQ devices, quantum volume^{9,36} has been proposed as an architectureneutral metric. It is based on the task of approximating a specific class of random circuits and estimating the associated effective error rate. This is very general and it is indeed useful for estimating the computational power of a quantum computer. In this paper, we propose the qBAS score, a complementary metric designed for benchmarking hybrid quantumclassical systems. The score is based on the generative modeling performance on a canonical synthetic data set which is easy to generate, visualize, and validate for sizes up to hundreds of qubits. Yet, implementing a shallow circuit that can uniformly sample such data is hard; we will show that some candidate solutions require large amount of entanglement. Hence, any miscalibration or environmental noise will affect this single performance number, enabling comparison between different devices or across different generations of the same device. Moreover, the score depends on the classical resources, hyperparameters, and various design choices, making it a good choice for the assessment of the hybrid system as a whole.
Our design choices are based on the setup of existing ion trap architectures. We choose the ion trap because of its full connectivity among qubits^{37} which allows us to study several circuit layouts on the same device. Our experiments are carried out on an ion trap quantum computer hosted at the University of Maryland.^{38}
Results
The learning pipeline
In this Section, we present a hybrid quantumclassical algorithm for the unsupervised machine learning task of approximating an unknown probability distribution from data. This task is also known as generative modeling. First, we describe the data and the model, then we present a training method.
The data set \({\cal{D}} = ({\mathbf{x}}^{(1)}, \ldots ,{\mathbf{x}}^{(D)})\) is a collection of D independent and identically distributed random vectors. The underlying probabilities are unknown and the target is to create a model for such distribution. For simplicity, we restrict our attention to Ndimensional binary vectors x^{(d)} ∈ {−1, +1}^{N}, e.g., black and white images. This gives us an intuitive onetoone mapping between observation vectors and the computational basis of an Nqubit quantum system, that is x ↔ x〉 = x_{1}x_{2}⋯x_{N}〉. Note that standard binary encodings can be used to implement integer, categorical, and approximate continuous variables.
Provided with the data set \({\cal{D}}\), our goal is now to obtain a good approximation to the target probability distribution \(P_{\cal{D}}\). A quantum circuit model with fixed depth and gate layout, parametrized by a vector θ, prepares a wave function ψ(θ)〉 from which probabilities are obtained according to Born’s rule P_{θ}(x) = 〈xψ(θ)〉^{2}. Following a standard approach from generative machine learning,^{39} we can minimize the Kullback–Leibler (KL) divergence^{40} \(D_{KL}[P_{\cal{D}}\vert P_{\boldsymbol{\theta}}]\) from the circuit probability distribution in the computational basis P_{θ} to the target probability distribution \(P_{\cal{D}}\). Minimization of this quantity is directly related to the minimization of a well known cost function: the negative loglikelihood \({\cal{C}}({\boldsymbol{\theta}}) =  \frac{1}{D} \sum\nolimits_{d = 1}^{D} {\mathrm{ln}} (P_{\boldsymbol{\theta }}({\mathbf{x}}^{(d)}))\). However, there is a caveat; as probabilities are estimated from frequencies of a finite number of measurements, lowamplitude states could lead to incorrect assignements. For example, an estimate P_{θ}(x^{(d)}) = 0 for some x^{(d)} in the data set would lead to infinite cost. To avoid singularities in the cost function, we use a simple variant
where ε > 0 is a small number to be chosen. Note that the number of measurements needed to obtain an unbiased estimate of the relevant probabilities may not scale favorably with the number of qubits N. In the Supplementary Material we suggest alternative cost functions such as the moment matching error and the earth mover’s distance.
After estimating the cost, we update the parameter vector θ to further minimize the cost. This can in principle be done by any suitable classical optimizer. We used a gradientfree algorithm called particle swarm optimization (PSO)^{41,42} as previously done in the context of quantum chemistry.^{7} The algorithm iterates for a fixed number of steps, or until a local minimum is reached and the cost does not decrease.
We chose the layout of the model circuit to be of the following form. Let us consider a general circuit parametrized by single qubit rotations \(\{ \theta _i^{(l,k)}\}\) and twoqubit entangling rotations \(\{ \theta _{ij}^{(l)}\}\) The subscripts denote qubits involved in the operation, l denotes the layer number and k ∈ {1, 2, 3} denotes the rotation identifier. The latter is needed as we decompose an arbitrary single qubit rotation into three simpler rotations (see Section “Methods” for further details about some exceptions and potential simplifications depending on the native gates available in each specific hardware). Inspired by the gates readily available in ion trap quantum computers, we use alternating layers of arbitrary single qubit gates (odd layers) and MølmerSørensen XX entangling gates^{43,44,45,46} (even layers) as our model. All parameters are initialized at random.
It is important to note that in our model the number of parameters is fixed and is independent of the size D of the data set. This means we can hope to obtain a good approximation to the target distribution only if the model is flexible enough to capture its complexity. Increasing the number of layers or changing the topology of the entangling layer alter this flexibility, potentially improving the quality of the approximation. However, we anticipate that such flexible models are more challenging to optimize because of their larger number of parameters. Principled ways to choose the circuit layout and to regularize its parameters could significantly help in such a case.
As summarized in Fig. 1, the learning algorithm iteratively adjusts all the parameters to minimize the value of the cost function. At any iteration the userdefined cost is approximated using both samples from the data set and measurements from the quantum hardware, hence the name datadriven quantum circuit learning (DDQCL).
The qBAS score
Bars and stripes (BAS)^{47} is a synthetic data set of images that has been widely used to study generative models for unsupervised machine learning. For n × m pixels, there are N_{BAS(n,m)} = 2^{n} + 2^{m} − 2 images belonging to BAS, and they can be efficiently produced and visualized. The probability distribution is 1/N_{BAS(n,m)} for each pattern belonging to BAS(n, m), and zero for any other pattern. Figure 2 on the top left panel shows patterns belonging to BAS(2, 2), while the top central panel shows the remaining patterns.
We use DDQCL to learn a circuit that encodes all the BAS patterns in the wave function of a quantum state. This also allows us to design the qBAS(n, m) score: a taskspecific figure of merit to assess the performance of shallow quantum circuits. In a single number, it captures the model capacity of the circuit layout and intrinsic hardware strengths and limitations in solving a complex sampling task that requires a fair amount of entanglement. It takes into account the model circuit depth, gate fidelities, and any other architectural design aspects, such as the quantum hardware’s qubittoqubit connectivity and native set of single and twoqubit gates. It also takes classical resources such as the choice of cost function, optimizer, and hyperparameters, into account. Therefore, it can be used to benchmark the performance of the components of the hybrid quantumclassical system.
The qBAS(n, m) score is an instantiation of the F_{1} score widely used in the context of information retrieval. The F_{1} score is defined as the harmonic mean of the precision p and the recall r, i.e., F_{1} = 2pr/(p + r). The precision p indicates the ability to retrieve states which belong to the data set of interest (The meaning and usage of precision in the field of information retrieval differs from the definition of precision within other branches of science and statistics). In our context this is the number of measurements that belong to the BAS(n, m) data set, divided by the total number of measurements N_{reads} performed. The recall r is the capacity of the model to retrieve the whole spectrum of patterns belonging to the desired data set. In our case, if we denote the number of unique patterns that were measured as d(N_{reads}), then r = d(N_{reads})/N_{BAS(n,m)}. To score high (F_{1} ≈ 1.0), both high precision (p ≈ 1.0) and high recall (r ≈ 1.0) are required.
The F_{1} score is a useful measure for the quality of information retrieval and classification algorithms, but for our purposes it has a caveat: the dependence of r on the total number of measurements. As an example, consider a model that generates only BAS patterns, i.e., its precision is 1.0, but with highly heterogeneous distribution. If some of the BAS patterns have infinitesimally small probability, we can still push the recall to 1.0 by taking a large number of measurements N_{reads} → ∞. This is not desirable since our purpose is to evaluate circuits on the task of uniformly sampling all the patterns from BAS(n, m). Therefore, to define a unique score it is important to fix N_{reads} to a reasonable value such that r ≈ 1.0 under the assumption of the model distribution being equal to the target distribution P_{BAS(n,m)}, but not so large as to make the score insensitive to deviations from the target distribution. Assuming a perfect model distribution P_{BAS(n,m)} = 1/N_{BAS(n,m)}, the expected number of measurements to obtain a value of r = 1.0 can be estimated using the famous coupon collector’s problem. For our purposes here, we set N_{reads} to be equal to the expected number of samples that need to be drawn to collect all the N_{BAS(n,m)} patterns (“coupons”). That is, \(N_{{\mathrm{reads}}} = N_{{\mathrm{BAS}}(n,m)}H_{N_{{\mathrm{BAS}}(n,m)}}\) where H_{k} is the kth harmonic number. Computed values of N_{reads} are provided in Table 1 for different values of n and m up to 100 qubits. As shown in the table, the number of readouts required to determine qBAS(n, m) are within experimental capabilities of current NISQ devices. (It is an interesting problem to optimize N_{reads} such that it maximizes the sensitivity of the score towards differentiating two probability distributions. This problem is left for future work).
For statistical robustness, we recommend as a good practice to perform R repetitions of the N_{reads} measurements leading to R independent estimates of the recall (each denoted as r_{i}). For estimating the precision p, all the samples collected should be used to robustly estimate this quantity. Using this value of p one can compute R independent values of the qBAS(n, m) score from each of the r_{i}. These are subsequently bootstrapped to obtain a more robust average for the final reported value of qBAS(n, m) (see details in Section “Methods”).
We note that a more general performance indicator than qBAS(n, m) score is indeed the KL divergence, D_{KL}[P_{BAS(n,m)}P_{θ}]. However, this would not be robust in terms of scalability; as n × m becomes large, it is expected that the KL divergence is frequently undefined (One may argue that the same scalability issue applies to the cost function in Eq. (1) used for learning. However, we can choose alternative scalable cost functions, as we show in the Supplementary Material. A comprehensive study is left for future work). This is true when measurements yield distributions such that P_{θ}(x^{(d)}) = 0 for any of the x^{(d)} in BAS(n, m). In all these cases, the qBAS score can still be computed and the number of measurements N_{reads} necessary for obtaining a robust estimate continues to remain relatively small to be practical for intermediate size n × m.
Experiments
We investigated three different examples, namely, GHZ state preparation, coherent thermal state preparation, and BAS(2, 2). We implemented each example of DDQCL using both numerical simulations and experiments. We explored the parameter space of DDQCL by varying the qubittoqubit connectivity topology (see topright panel of Fig. 2) and the number of layers. We evaluated the performance of each instance by using the KL divergence from the circuit probability distribution in the computational basis to the target probability distribution. While explicitly computing the KL divergence is generally intractable, due to its demanding resource requirement as the size of instances to consider increases, we were able to compute this quantity explicitly for all cases considered in this paper.
GHZ state preparation
To test the capabilities of DDQCL, we started with the preparation of GHZ states, also known as “cat states”.^{48} Besides their importance in quantum information, the choice is motivated by their simple description and by the availability of many studies about their preparation (see e.g., refs ^{49,50,51}). From the DDQCL perspective, we explored whether it is possible to learn any of the known recipes for GHZ state preparation starting only from classical data. More specifically, the input data consists of samples from a distribution corresponding to the two desired computational basis states; P_{0} = 0.5 for the state 0…0〉 and P_{1} = 0.5 for the state 1…1〉. Using a layer of single qubit rotations followed by an entangling layer with the alltoall topology, DDQCL yielded many degenerate preparations of GHZlike states differing only by a relative phase.
In particular, we first ran particle swarm optimization on 25 random initializations for 3, 4, 5, and 6qubit instances. Then, a human expert inspected the best set of parameters learned for each size and after rounding the parameters to some precision, spotted a clear pattern. Instances of 3 and 5 qubits yielded a recipe, while instances of 4 and 6 qubit yielded another recipe. The recipes obtained are summarized in Fig. 3 and were verified for larger number of qubits, both odd and even. Indeed, DDQCL successfully reproduced the recipes previously used on ion trap quantum computers,^{52} and, to the best of our knowledge, they correspond to the most compact and efficient protocols for GHZ state preparation using XX gates (see Fig. 3). Another commonly used approach consists of cascading entangling gates, with alternations of single qubit rotations.^{51} DDQCL produced approximate recipes of this kind in some of the test cases for 3 and 4 qubits when using a single entangling layer with chain topology.
It is interesting to note that in DDQCL all the parameters are learned independently and not constrained to be the same. As shown in Fig. 3, the learning process unveiled that these converge to the same value. This is not necessarily the case for the other data sets considered below. We also note that the simulations assumed noiseless hardware, making the analysis of parameters easier for the human expert. It would be much more difficult to analyze parameters found with noisy hardware, as DDQCL can learn to compensate certain types of noise, e.g., systematic parameter offsets, in nontrivial ways. The upside is that learning can be successful even in the presence of such systematic errors.
Finally, it is reassuring that DDQCL obtained circuits for cat state preparation starting from samples of a classical distribution. While this target distribution could have been modeled by a zerotemperature ferromagnet, there apparently was no other way for our circuit to reproduce such a distribution, if not by preparing a GHZlike state. One can obtain more general solutions by allowing DDQCL to prepare mixed states. For example, consider a circuit acting on both the main qubit register and an additional ancilla register. By tracing out the ancilla register, e.g., ignoring it during measurement, the main register can implement a mixed state and can be trained to simulate a zerotemperature ferromagnet. Another example which does not resort to an ancilla register is to use decoherence as a mechanism to prepare mixed states that explain the data.
Coherent thermal states
Thermal states play an important role in statistical physics, quantum information, and machine learning. Using DDQCL, we trained quantum circuits with different number of layers L ∈ {1, 2, 3} and using alltoall topology, to approximate a target Boltzmann distribution. In particular, we considered data sets sampled from the Boltzmann distribution of 25 synthetic instances with N = 5 qubits. By decreasing the temperature T of the target distribution, we can increase the difficulty of the learning task. Figure 4 shows the bootstrapped median and 90% confidence interval over the distribution of medians of the KL divergence during learning. Deeper circuits such as L = 3 (purple pentagons) consistently outperformed shallower circuits such as L = 2 (red circles) and L = 1 (yellow triangles). This became more evident as we went from easy learning tasks (Fig. 4a) to hard learning tasks (Fig. 4c). Results for instances of N = 6 qubits are shown in Fig. S2 in the Supplementary Material.
To assess how well DDQCL performs on the generative task, we compared DDQCL to the inverse Bethe approximation^{53} (see also Eqs. (3.21) and (3.22) in ref. ^{54}), a classical closedform approach widely used in statistical physics to solve the inverse Ising problem. As shown in Fig. 4, the inverse Bethe approximation (green bar) performed extremely well in the easy task (a), matched the L = 3 quantum circuit in the intermediate task (b), and underperformed on the difficult task (c). The latter observation comes from the fact that the median performance of the inverse Bethe approximation has very large confidence intervals. We emphasize that this is not a form of quantum supremacy as the two methods are fundamentally different. DDQCL prepares a quantum state without the assumption of an underlying Boltzmann distribution, while the inverse Bethe approximation infers the parameters with such assumption. Furthermore, the error in the inverse Bethe approximation is expected to go to zero with system size, and only above the reference temperature T_{c} (see Section “Methods” for details). Thus, it is not surprising that we obtained bad performance in Fig. 4c with the inverse Bethe approximation. Other classical methods based on machine learning and Markov Chain Monte Carlo, such as the Boltzmann machine,^{55} could achieve higher accuracy by requiring more computational resources than the inverse Bethe approximation used here. A thorough comparison is beyond the scope of this work.
BAS(2, 2)
For the purposes of benchmarking and measuring the power of NISQ devices with DDQCL, it is insufficient to have an easytogenerate target data set; we also require the data set to represent a useful quantum state in quantum computing, while simultaneously proving to be sufficiently challenging for the quantum computer to generate. Because of the importance of entanglement in quantum information processing, we considered the entanglement entropy averaged over all twoqubit subsets^{56} as a proxy measure of a specific quantum state’s usefulness for benchmarking purposes. We start by noting that the fourqubit cat state, whose rich entangled nature makes it ideal for studying decoherence and decay of quantum information,^{49,51} has entanglement entropy S_{GHZ} = 1. Now consider states that encode BAS(2, 2) in the computational basis. The minimum value of entanglement entropy that any such state can have is S_{BAS(2,2)} = 1.25163. Furthermore, the maximum value that a quantum representation of BAS(2, 2) can reach is S_{BAS(2,2)} = 1.79248, which happens to be the maximum entanglement entropy known for any fourqubit state^{56} (see also Fig. S5 and the corresponding Section in the Supplementary Material).
We also note that one of the quantum representations of BAS(2, 2) found by DDQCL reached a remarkable value of S_{BAS(2,2)} = 1.69989 (see Fig. S3 in the Supplementary Material). This shows the power of our framework, in that DDQCL is capable of handling useful quantum states that are rich in entanglement. This is an important observation, since we know, based on our empirical results, that (i) single layer circuits with no entangling gates severely underperform in producing the output state probability distribution that is close to the target data set, and (ii) when inspecting the parameters learned for circuits with alltoall topology with L = 2 layers, we found that most of the XX gates reached their maximum entangling setting.
We now discuss results for the qBAS(2, 2) score, which we computed experimentally and theoretically in order to compare the entangling topologies sketched in the top right panel of Fig. 2 and for different number of layers. The process consists of two steps; first, DDQCL is used to encode BAS(2, 2) in the wave function of the quantum state. Second, the best circuits, i.e., those with lowest cost, are compared using the qBAS(2, 2) score.
The bottom left and bottom central panels in Fig. 2 show the bootstrapped median of the KL divergence and 90% confidence interval over 25 random initializations of DDQCL in silico. The alltoall topology (orange circles) always outperforms sparse topologies (red stars and purple squares). However, deeper circuits do not always provide significant improvements, as it is the case for alltoall L = 4 (dark green circles) versus alltoall L = 2 (orange circles). A possible explanation is that, when going from two to four layers, we approximately double the number of parameters, and particle swarm optimization struggles to find enhanced local optima. Another plausible explanation is that for this small data set, the alltoall circuits with L = 2 are already close to optimal performance (we show supporting evidence in Fig. S3 in the Supplementary Material).
As the best performing circuits to compare using the qBAS score, we chose alltoall L = 2 and star L ∈ {2, 4} circuits. While they represent very different approximate solutions to the same problem, they may be compared with the help of qBAS(2, 2) score. For each setting, we computed 25 scores from batches of size N_{reads} = 15 samples, as described in Section “Results”. The bottom right panel in Fig. 2 shows the bootstrapped mean qBAS(2, 2) score and 95% confidence interval for simulations (green bars) and experiments on the ion trap quantum computer hosted at University of Maryland (pink bars). The score is sensitive to the depth of the circuit as shown by the performance improvement of L = 4 compared with L = 2 in the star topology. Note that the theoretical improvement for using L = 4 is larger than that observed experimentally in the ion trap. This is because the quantum computer accumulated errors while executing the deeper circuit. The score is also sensitive to the choice of topology as shown by the drop in performance of star compared with alltoall when the same number of layers L = 2 is used.
Although we compared circuits implemented on the same ion trap hardware, the score may be used to compare different device generations or even completely different architectures (e.g., superconductorbased versus atomicbased). Similarly, one may use the score to compare classical resources of the hybrid system (e.g., different optimizers).
Discussion
Data is an essential ingredient of any machine learning task. In this work, we presented a datadriven quantum circuit learning algorithm (DDQCL) as a framework that can assist in the characterization of NISQ devices and to implement simple generative models. The success of this approach is evidenced by the results on three different data sets.
To summarize, first, we learned a GHZ state preparation recipe for an ion trap quantum computer. Minimal intervention by a human expert allowed to generalize the recipe to any number of qubits. This is not an example of compilation, but rather an illustration of how simple classical probability distributions can guide the synthesis of interesting nontrivial quantum states. Depending on the level or type of noise in the system, the same algorithm could lead to a different circuit fulfilling the same probability distribution as that of the data. The message here is that machine learning can teach us that “there is more than one way to skin a cat (state)”.
Second, we trained circuits to prepare approximations of thermal states. This illustrates the power of Born machines^{24} to approximate Boltzmann machines^{55} when the data require thermallike features.
Finally, tapping into the real power of nearterm quantum devices and approximate algorithms implementable on them, we designed a taskspecific performance estimator based on a canonical data set. The bars and stripes data is easy to generate, visualize and verify classically, while modeling it still requires significant quantum resources in the form of entanglement. Errors in the device will affect this single performance measure, the qBAS score, which can be used to compare different device generations, or completely different architectures. The qBAS score can also be used to benchmark the typical performance of optimizers used in hybrid quantumclassical systems. Selecting the method and optimizing the hyperparameters can be a daunting task and is a key challenge towards a successful implementation as the number of qubits increases. Therefore, having this unique metric for benchmarking could help reduce the complexity of this finetuning stage. The score can be computed in any of the NISQ architectures available to date.
DDQCL is a modular framework and its performance will ultimately depend on the choices made for such modules. In this article we explored the impact of circuit layout and cost function, while subsequent work has analyzed other modules and suggested extensions to the algorithm. In ref. ^{29} the authors trained Born machines using a differentiable cost function and exploiting gradient calculations proposed in ref. ^{57}. In refs ^{58,59} the authors compared several optimizers, and in ref. ^{60} the authors focused on the impact of hardware noise. The expressive power of shallow circuits was investigated in ref. ^{61} and it was shown to outweigh that of some classes of artificial neural networks. Finally, recent work has shown successfull implementation of DDQCL on the IBM Q 20 Tokyo processor,^{60} on a fivequbit iontrap hosted at University of Maryland,^{58} and on the Rigetti 16Q Aspen1 processor.^{59}
It is left to future work to demonstrate more realistic machine learning by allowing more flexible models and employing regularization. At a finite and fixed low circuit depth, the power of the generative model can be enhanced by including ancilla qubits, in analogy to the role of hidden units in probabilistic graphical models. Regularization can be included in the cost function. In this paper, we used the negative loglikelihood which quickly becomes expensive to estimate as the size of the system increases. We reported preliminary results on alternative costfunctions that overcome the caveat and still produce satisfactory results. Layerwise pretraining of the quantum circuit inspired by deep learning^{39} could initialize parameters to nearoptimal locations in the cost landscape. Finally, DDQCL could be generalized to learn quantum distributions or states, assuming experimental data coming from quantum experiments, e.g., quantum measurements beyond the computational basis. We think these are the most promising directions to be explored in future work.
Our approach has the bidirectional capability of using NISQ devices for machine learning, and machine learning for the characterization of NISQ devices. We hope the ideas presented here contribute to the development of further concrete metrics to help guide the architectural hardware design, while tapping into the computational power of NISQ devices.
Methods
Simulation of quantum circuits in silico
We simulated quantum circuits using the \({\tt {QuTiP2}}\)^{62} Python library and implemented the constraints dictated by the ion trap experimental setting. In the current experimental setup, we can perform arbitrary single qubit rotations and MølmerSørensen XX entangling gates involving any two qubits. We used only these gates hence avoiding the need of further compilation. For the simulations in silico, we also assume perfect gate fidelities and errorfree measurements.
In the ion trap setting, the implementation of single qubit rotations R_{z} is very convenient. Therefore, we perform arbitrary single qubit operations relying on the decomposition \(U_i^{(l)} = R_z(\theta _i^{(l,3)})R_x(\theta _i^{(l,2)})R_z(\theta _i^{(l,1)})\) where l is the layer number, i is the qubit index, and \(\theta _i^{(l,k)} \in [  \pi , + \pi ]\) are Euler angles. The rotations are then expressed as exponentials of Pauli operators \(R_z(\theta _i^{(l, \cdot )}) = {\mathrm{exp}}\left( {  \frac{i}{2}\theta _i^{(l, \cdot )}\sigma _i^z} \right)\) and \(R_x(\theta _i^{(l,\cdot )}) = {\mathrm{exp}}\left( {  \frac{i}{2}\theta _i^{(l,\cdot )}\sigma _i^x} \right) .\)
Because we execute circuits always starting from the 0⋯0〉 state, the first set of R_{z} rotations would have no effect and, therefore, is not needed. When an odd number of layers is used, a similar exception occurs in the last layer. There, the last set of R_{z} rotations would only add a phase that becomes irrelevant when taking the amplitude squared required for the Born machine. In other words, we can slightly reduce the number of parameter without changing the expressive power of the circuit. Every other layer of arbitrary single qubit operations would in general require 3N parameters, where N is the number of qubits. By using an alternative decomposition, namely U = R_{x}R_{z}R_{x}, we could apply commutation rules with XX gates and obtain a reduction to 2N parameters in all odd layers. We decided not to do the former step for two reasons. First, there is no effective reduction of the number of parameters for experiments up to L = 5 layers considered here. Second, in the ion trap quantum computer used here^{38} it is experimentally convenient to use a larger number of R_{z} rather than R_{x} rotations.
For the case of the entangling gates, we use the notation \(U_{ij}^{(l)} = XX(\theta _{ij}^{(l)})\) which in exponential form reads as \(XX(\theta _{ij}^{(l)}) = {\mathrm{exp}}\left( {  \frac{i}{2}\theta _{ij}^{(l)}\sigma _i^x\sigma _j^x} \right)\). Recalling that states that differ by a global phase are indistinguishable, a direct computation shows that the tunable parameters can be taken as \(\theta _{ij}^{(l)} \in [  \pi , + \pi ]\). Also, there is no need to set up an order for these gates within an entangling layer as they commute with one another.
The total number of parameters per entangling layer depends on the chosen topology: all is a fullyconnected graph and has N(N − 1)/2 parameters, chain is a onedimensional nearest neighbor graph with N – 1 parameters, and star is a starshaped graph with N−1 parameters. The top right panel of Fig. 2 shows a graphical representation of these topologies for the case of N = 4 qubits.
When executing DDQCL, we always estimated the required quantities from 1000 measurements in the computational basis.
Gradientfree optimization
Once the number of layers and topology of entangling gates is fixed, the quantum circuits described above provide a template; by adjusting the parameters we can implement a small subset of the unitaries that are in principle allowed in the Hilbert space. The variational approach aims at finding the best parameters by minimizing a cost function. For all our tests, we choose to minimize a clipped version of the negative loglikelihood.
We use a globalbest particle swarm optimization algorithm^{42} implemented in the \({\tt {PySwarms}}\)^{63} Python library. A ‘particle’ corresponds to a candidate solution circuit; the position of a particle is a point θ in parameter space, the velocity is a vector determining how to update the position in parameter space. Position and velocity of all the particles are initialized at random and updated at each iteration following the schema shown in Fig. 1. There are three hyperparameters controlling the swarm dynamics: a cognition coefficient c_{1}, a social coefficient c_{2} and an inertia coefficient w. After testing a grid of values, we chose to use a constant value of 0.5 for all three hyperparameters, which we found to work well for our purpose. To avoid large jumps in parameter space, we further restrict position updates in each dimension to a maximum magnitude of π.
Finally, we set the number of particles to twice the number of parameters of the circuit. This is a conservative value compared with previous work,^{8} also because of the large number of parameters in the circuits explored here.
Data sets details
We worked with three synthetic data sets: zerotemperature ferromagnet, random thermal, and bars and stripes (BAS). In all our numerical experiments we use 1000 data points sampled exactly from these distributions.
GHZ state preparation
The zerotemperature ferromagnet distribution is equivalent to assigning 1/2 probability to both 0…0〉 and 1…1〉 states of the computational basis. This distribution can be easily prepared as a mixed state, but our study uses pure states prepared by the circuit. The only way to reproduce the zerotemperature ferromagnet distribution in our setting is to implement a unitary transformation that prepares a GHZlike state.
Thermal states
A thermal data set in N dimensions is generated by exact sampling realizations of x ∈ {−1, + 1}^{N} from the distribution \(P({\mathbf{x}}) = Z^{1}{\mathrm{exp}}((\sum\nolimits_{ij} {J_{ij}} x_{i}x_{j} + \sum\nolimits_{i} h_{i} x_{i} )T^{1})\) where Z is the normalization constant, J_{ij} and h_{i} are random coefficients sampled from a normal distribution with zero mean and \(\sqrt N\) standard deviation, and T is the temperature. In the large systemsize limit, a phase transition is expected at T_{c} ≈ 1. Although this is not true for the smallsized systems considered here, we take this value as a reference temperature. In our study, we vary T ∈ {2T_{c}, T_{c}, T_{c}/1.5} in order to generate increasingly complex instances.
Bars and stripes
BAS^{47} is a canonical machine learning data set for testing generative models. It consists of n × m pixel pictures generated by setting each row (or column) to either black (−1) or white (+1), at random. In generative modeling, a handful of patterns are input to the algorithm and the target is to train a model to capture correlations in the data. Assuming a successful training, the model can reconstruct and generate previously unseen patterns from partial or corrupted data. On the other hand, if we provide the algorithm with all the patterns we are interested in and aim to a model that generates only those, this would amount to an associative memory. Although both tasks can be done with our DDQCL pipeline, for the qBAS(n, m) score we focus on the latter task.
We now determine the number of patterns and provide an easy identification of the bitstring belonging to the BAS(n, m) class. For the total count of the number of patterns, we first count the number of single stripes, double stripes, etc. that can fit into the n rows. This number is the sum of binomial coefficients \(\mathop {\sum}\nolimits_{k = 0}^n {( {\begin{array}{*{20}{l}} n \hfill \\ k \hfill \end{array}})} = 2^n\). The same expression holds for the number of patterns with bars that can be placed in the m columns, that is 2^{m}. Note that empty (allwhite) and full (allblack) patterns are counted in both the bars and the stripes. Therefore, we obtain the total count for the BAS patterns by subtracting the two extra patterns from this double count:
In the main text, we use the BAS data set to design a taskspecific performance indicator for hybrid quantumclassical systems. Table 1 shows the requirements for some values of n and m.
Bootstrapping analysis
To obtain error bars for the KL divergence, we used the following procedure. DDQCL was always executed 25 times with random initialization of the parameters. From the 25 repetitions, we sampled 10,000 data sets of size 25 with replacement and computed the median KL divergence for each. From the distribution of 10,000 medians, we computed the median and obtained error bars from the 5th and 95th percentiles as the lower and upper limits, respectively, accounting for a 90% confidence interval.
For the case of qBAS score, we did the following bootstrap analysis. qBAS score was always computed 25 times from batches of samples N_{reads}. From the 25 repetitions, we sampled 10,000 data sets of size 25 with replacement and computed the mean for each. From the distribution of 10,000 means, we computed mean and obtained error bars from two standard deviations, accounting for a 95% confidence interval.
Data availability
All data needed to evaluate the conclusions are available from the corresponding author upon request.
References
 1.
Preskill, J. Quantum computing in the NISQ era and beyond. Quantum 2, 79 (2018).
 2.
Peruzzo, A. et al. A variational eigenvalue solver on a photonic quantum processor. Nat. Commun. 5, 4213 EP (2014).
 3.
McClean, J. R., Romero, J., Babbush, R. & AspuruGuzik, A. The theory of variational hybrid quantumclassical algorithms. New J. Phys. 18, 023023 (2016).
 4.
Edward Farhi, S. G. & Goldstone, J. A. quantum approximate optimization algorithm. Preprint at arXiv:1411.4028 (2014).
 5.
Hadfield, S. et al. From the quantum approximate optimization algorithm to a quantum alternating operator ansatz. Algorithms 12, 34 (2019).
 6.
O’Malley, P. et al. Scalable quantum simulation of molecular energies. Phys. Rev. X 6, 031007 (2016).
 7.
Colless, J. I. et al. Computation of molecular spectra on a quantum processor with an errorresilient algorithm. Phys. Rev. X 8, 011021 (2018).
 8.
Kandala, A. et al. Hardwareefficient variational quantum eigensolver for small molecules and quantum magnets. Nature 549, 242 (2017).
 9.
Moll, N. et al. Quantum optimization using variational algorithms on nearterm quantum devices. Quantum Sci. Technol. 3, 030503 (2018).
 10.
Otterbach, J. S. et al. Unsupervised machine learning on a hybrid quantum computer. Preprint at arXiv:1712.05771 (2017).
 11.
Romero, J., Olson, J. P. & AspuruGuzik, A. Quantum autoencoders for efficient compression of quantum data. Quantum Sci. Technol. 2, 045001 (2017).
 12.
Lamata, L., AlvarezRodriguez, U., MartnGuerrero, J., Sanz, M. & Solano, E. Quantum autoencoders via quantum adders with genetic algorithms. Quantum Sci. Technol . 4, 014007 (2018).
 13.
Li, R., AlvarezRodriguez, U., Lamata, L. & Solano, E. Approximate quantum adders with genetic algorithms: an IBM quantum experience. Quantum Meas. Quantum Metrol. 4, 1–7 (2017).
 14.
Farhi, E. & Harrow, A. W. Quantum supremacy through the quantum approximate optimization algorithm. Preprint at arXiv:1602.07674 (2016).
 15.
PerdomoOrtiz, A., Benedetti, M., RealpeGómez, J. & Biswas, R. Opportunities and challenges for quantumassisted machine learning in nearterm quantum computers. Quantum Sci. Technol. 3, 030502 (2018).
 16.
Harrow, A. W. & Montanaro, A. Quantum computational supremacy. Nature 549, 203–209 (2017).
 17.
Nathan Wiebe, K. M. S. & Kapoor, A. Quantum deep learning. Preprint at arXiv:1412.3489 (2015).
 18.
Kieferová, M. & Wiebe, N. Tomography and generative training with quantum boltzmann machines. Phys. Rev. A 96, 062327 (2017).
 19.
Benedetti, M., RealpeGómez, J., Biswas, R. & PerdomoOrtiz, A. Estimation of effective temperatures in quantum annealers for sampling applications: a case study with possible applications in deep learning. Phys. Rev. A 94, 022308 (2016).
 20.
Amin, M. H., Andriyash, E., Rolfe, J., Kulchytskyy, B. & Melko, R. Quantum boltzmann machine. Phys. Rev. X 8, 021050 (2018).
 21.
Benedetti, M., RealpeGómez, J., Biswas, R. & PerdomoOrtiz, A. Quantumassisted learning of hardwareembedded probabilistic graphical models. Phys. Rev. X 7, 041052 (2017).
 22.
Benedetti, M., RealpeGómez, J. & PerdomoOrtiz, A. Quantumassisted helmholtz machines: a quantum–classical deep learning framework for industrial datasets in nearterm devices. Quantum Sci. Technol. 3, 034007 (2018).
 23.
Wittek, P. & Gogolin, C. Quantum enhanced inference in Markov logic networks. Sci. Rep. 7, 45672 (2017).
 24.
Cheng, S., Chen, J. & Wang, L. Information perspective to probabilistic modeling: Boltzmann machines versus born machines. Entropy 20, 583 (2018).
 25.
Stoudenmire, E. & Schwab, D. J. Supervised learning with tensor networks. Advances in Neural Information Processing Systems Vol. 29 (eds Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I. & Garnett, R.) 4799–4807 (Curran Associates, Inc., 2016).
 26.
Han, Z.Y., Wang, J., Fan, H., Wang, L. & Zhang, P. Unsupervised generative modeling using matrix product states. Phys. Rev. X 8, 031012 (2018).
 27.
Liu, D. et al. Machine learning by twodimensional hierarchical tensor networks: a quantum information theoretic perspective on deep architectures. Preprint at arXiv:1710.04833 (2017).
 28.
Gao, X., Zhang, Z. & Duan, L. A quantum machine learning algorithm based on generative models. Sci. Adv. 4, eaat9004 (2018).
 29.
Liu, J.G. & Wang, L. Differentiable learning of quantum circuit born machines. Phys. Rev. A 98, 062324 (2018).
 30.
Verdon, G., Broughton, M. & Biamonte, J. A quantum algorithm to train neural networks using lowdepth circuits. Preprint at arXiv:1712.05304 (2017).
 31.
Lloyd, S., Mohseni, M. & Rebentrost, P. Quantum principal component analysis. Nat. Phys. 10, 631–633 (2014).
 32.
Kerenidis, I. & Prakash, A. Quantum recommendation systems. Preprint at arXiv:1603.08675 (2016).
 33.
Brandao, F. G. S. L. et al. Exponential quantum speedups for semidefinite programming with applications to quantum learning. Preprint at arXiv:1710.02581 (2017).
 34.
Schuld, M., Fingerhuth, M. & Petruccione, F. Implementing a distancebased classifier with a quantum interference circuit. EPL 119, 60002 (2017).
 35.
Mohseni, M. et al. Commercialize quantum technologies in five years. Nature 543, 171–174 (2017).
 36.
Bishop, L. S., Bravyi, S., Cross, A., Gambetta, J. M. & Smolin, J. Quantum volume. Techinal Report (2017).
 37.
Linke, N. M. et al. Experimental comparison of two quantum computing architectures. Proc. Natl Acad. Sci. 201618020 (2017).
 38.
Debnath, S. et al. Demonstration of a small programmable quantum computer with atomic qubits. Nature 536, 63 EP (2016).
 39.
Bengio, I. G. Y. & Courville, A. Deep learning, http://www.deeplearningbook.org. (MIT Press, 2016).
 40.
Kullback, S. & Leibler, R. A. On information and sufficiency. Ann. Math. Stat. 22, 79–86 (1951).
 41.
Kennedy, J. & Eberhart, R. Particle swarm optimization. Proc. ICNN’95 Int. Conf. Neural Netw. 4, 1942–1948 (1995).
 42.
Shi, Y. & Eberhart, R. A. modified particle swarm optimizer. In 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360), 69–73 (1998).
 43.
Sørensen, A. & Mølmer, K. Quantum computation with ions in thermal motion. Phys. Rev. Lett. 82, 1971–1974 (1999).
 44.
Sørensen, A. & Mølmer, K. Entanglement and quantum computation with ions in thermal motion. Phys. Rev. A 62, 022311 (2000).
 45.
Benhelm, J., Kirchmair, G., Roos, C. F. & Blatt, R. Towards faulttolerant quantum computing with trapped ions. Nat. Phys. 4, 463 EP (2008).
 46.
Maslov, D. & Nam, Y. Use of global interactions in efficient quantum circuit constructions. New J. Phys. (2017).
 47.
MacKay, D. J. C. Information Theory, Inference and Learning Algorithms. (Cambridge University Press, New York, 2002).
 48.
Greenberger, D. M., Horne, M. A., Shimony, A. & Zeilinger, A. Bell’s theorem without inequalities. Am. J. Phys. 58, 1131–1143 (1990).
 49.
Monz, T. et al. 14qubit entanglement: creation and coherence. Phys. Rev. Lett. 106, 130506 (2011).
 50.
Rocchetto, A. et al. Experimental learning of quantum states. Sci. Adv. 5 (2019).
 51.
Ozaeta, A. & McMahon, P. L. Decoherence of up to 8qubit entangled states in a 16qubit superconducting quantum processor. Quantum Sci. Technol. 4, 025015 (2019).
 52.
Monz, T. et al. 14qubit entanglement: creation and coherence. Phys. Rev. Lett. 106, 130506 (2011).
 53.
RicciTersenghi, F. The bethe approximation for solving the inverse ising problem: a comparison with other inference methods. J. Stat. Mech. Theory Exp. 2012, P08015 (2012).
 54.
Mastromatteo, I. On the typical properties of inverse problems in statistical mechanics. Preprint at arXiv:1311.0190 (2013).
 55.
Ackley, D. H., Hinton, G. E. & Sejnowski, T. J. A learning algorithm for boltzmann machines. Cogn. Sci. 9, 147–169 (1985).
 56.
Higuchi, A. & Sudbery, A. W. How entangled can two couples get? Phys. Lett. A 273, 213–217 (2000).
 57.
Mitarai, K., Negoro, M., Kitagawa, M. & Fujii, K. Quantum circuit learning. Phys. Rev. A 98, 032309 (2018).
 58.
Zhu, D. et al. Training of quantum circuits on a hybrid quantum computer. Preprint at arXiv:1812.08862 (2018).
 59.
LeytonOrtega, V., PerdomoOrtiz, A. & Perdomo, O. Robust implementation of generative modeling with parametrized quantum circuits. Preprint at arXiv:1901.08047 (2019).
 60.
Hamilton, K. E., Dumitrescu, E. F. & Pooser, R. C. Generative model benchmarks for superconducting qubits. Preprint at arXiv:1811.09905 (2018).
 61.
Du, Y., Hsieh, M.H., Liu, T. & Tao, D. The expressive power of parameterized quantum circuits. Preprint at arXiv:1810.11922 (2018).
 62.
Johansson, J., Nation, P. & Nori, F. Qutip 2: a python framework for the dynamics of open quantum systems. Comput. Phys. Commun. 184, 1234–1240 (2013).
 63.
Miranda, L. J. V. Pyswarms, a researchtoolkit for particle swarm optimization in python. https://zenodo.org/badge/latestdoi/97002861 (2017).
Acknowledgements
M.B. was supported by the UK Engineering and Physical Sciences Research Council (EPSRC) and by Cambridge Quantum Computing Limited (CQCL). The authors are very grateful to Prof. Christopher Monroe and his team at the University of Maryland (UMD) for their support in running the experiments presented here. Special thanks to N.M. Linke, C. Figgatt, K.A. Landsman, and D. Zhu for useful discussions and for running the several experiments used to test and validate the pipeline of this work, and the ones presented in the “Results” section. The authors acknowledge E. Edwards from the communications/publicity division at the Joint Quantum Institute at UMD for the rendering of the iontrap graphic used in Fig. 2, and would like to thank J. RealpeGomez, A.M. Wilson, and G. PazSilva for useful discussions and feedback on an early version of this manuscript. The authors would like to thank J.I. Latorre for pointing out ref. ^{56}.
Author information
Affiliations
Contributions
M.B. and A.P.O. designed the generative model and the learning algorithm. M.B., Y.N. and A.P.O. designed the data sets and the experiments. M.B. and V.LO. wrote the code and performed the in silico experiments. D.GP. wrote code for the analysis and figures. A.P.O. designed the qBAS score. O.P. performed the analysis related to entanglement entropies. All authors analyzed the experimental results and contributed to the final version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Benedetti, M., GarciaPintos, D., Perdomo, O. et al. A generative modeling approach for benchmarking and training shallow quantum circuits. npj Quantum Inf 5, 45 (2019). https://doi.org/10.1038/s4153401901578
Received:
Accepted:
Published:
Further reading

Costfunction embedding and dataset encoding for machine learning with parametrized quantum circuits
Physical Review A (2020)

Expressive power of parametrized quantum circuits
Physical Review Research (2020)

Classical versus quantum models in machine learning: insights from a finance application
Machine Learning: Science and Technology (2020)

Errormitigated datadriven circuit learning on noisy quantum hardware
Quantum Machine Intelligence (2020)

Reducing the Amount of Single‐Qubit Rotations in VQE and Related Algorithms
Advanced Quantum Technologies (2020)