Abstract
Generative modeling has seen a rising interest in both classical and quantum machine learning, and it represents a promising candidate to obtain a practical quantum advantage in the near term. In this study, we build over an existing framework for evaluating the generalization performance of generative models, and we establish the first quantitative comparative race towards practical quantum advantage (PQA) between classical and quantum generative models, namely Quantum Circuit Born Machines (QCBMs), Transformers (TFs), Recurrent Neural Networks (RNNs), Variational Autoencoders (VAEs), and Wasserstein Generative Adversarial Networks (WGANs). After defining four types of PQAs scenarios, we focus on what we refer to as potential PQA, aiming to compare quantum models with the bestknown classical algorithms for the task at hand. We let the models race on a welldefined and applicationrelevant competition setting, where we illustrate and demonstrate our framework on 20 variables (qubits) generative modeling task. Our results suggest that QCBMs are more efficient in the datalimited regime than the other stateoftheart classical generative models. Such a feature is highly desirable in a wide range of realworld applications where the available data is scarce.
Similar content being viewed by others
Introduction
Generative modeling has become more widely popular with its remarkable success in tasks related to image generation and text synthesis, as well as machine translation^{1,2,3,4,5,6}, making this field a promising avenue to demonstrate the power of quantum computers and to reach the paramount milestone of practical quantum advantage (PQA)^{7}. The most desirable feature in any machine learning (ML) model is generalization, and as such, this property should be considered to assess its performance in search of PQA. However, the definition of this property in the domain of generative modeling can be cumbersome, and it is yet an unresolved question for the case of arbitrary generative tasks^{8}. Its definition can take on different nuances depending on the area of research, such as in computational learning theory^{9} or other practical approaches^{10,11}. Reference ^{12} defines an unambiguous framework for generalization on discrete search spaces for practical tasks. This approach puts all generative models on an equal footing since it is samplebased and does not require knowledge of the exact likelihood, therefore making it a modelagnostic and tractable evaluation framework. This reference also demonstrates footprints of a quantuminspired advantage of Tensor Network Born Machines^{13} compared to Generative Adversarial Networks^{14}.
To the best of our knowledge, in the search for PQA, a concrete quantitative comparison between quantum generative models and a broader class of classical stateoftheart generative models is still lacking. In particular, quantum circuit Born machines (QCBMs)^{15} have not been compared uptodate with other classical generative models in terms of generalization, although they have been shown recently for their ability to generalize^{16}. In this paper, we aim to bridge this gap by providing a numerical comparison between quantum and classical stateoftheart generative models in terms of generalization.
In this comparison, these models compete for PQA. For this ‘race’ to be welldefined, it is essential to establish its rules first. Indeed, a clearcut definition of PQA is not present in the relevant literature so far, especially when it comes to challenging ML applications such as generative modeling, or in general, to practical ML.
Previous works emphasize either computational quantum advantage, or settings that are not relevant from a realworld perspective, or scenarios that use data sets that give an advantage to the quantum model from the start (and also bear no relevance to a realworld setting)^{17,18,19,20,21}. One potential exception would be the case of ref. ^{22}, which showed an advantage for a quantum ML model in a practical setting. However, besides the challenge of relying on loaded quantum data^{23}, it is still unclear if it would be relevant to some concrete realworld and largescale applications, although the authors mention some potential applications in the domain of quantum sensing.
We acknowledge as well previous works that have attempted or proposed ways to perform model comparisons, within generative models and beyond. For instance, a recent work^{24} has developed a metric for assessing the quality of variational calculations for different classical and quantum models on an equal footing. Another recent study^{25} proposes a detailed analysis that systematically compares generative models in terms of the quality of training to provide insights on the advantage of their adoption by quantum computing practitioners, although without addressing the question of generalization. In another recent work^{26}, the authors propose the generic notion of quantum utility, a measure for quantifying effectiveness and practicality, as an index for PQA, but this work differs from our study in the sense that PQA is defined in a broad perspective as the ability of a quantum device to be either faster, more accurate or demanding less energy compared to classical machines with similar characteristics in a certain task of interest. Others have emphasized quantum simulation as one of the prominent opportunities for PQA^{27}. In our paper, we share the longterm goal of identifying practical use cases for which quantum computing has the potential to bring an advantage. However, our work is focused on generative models and their generalization capabilities, which is the gold standard to measure the performance of generative ML models in realworld use cases.
In summary, the goal of this framework and of this study is to set the stage for a quantitative race between different stateoftheart classical and quantum generative models in terms of generalization in search of PQA, uncovering the strengths and weaknesses of each model under realistic ‘race conditions’ (see Fig. 1). These competition rules are defined in advance before the finetuning of each model and dictated by the desired outcome from realworld motivated metrics and limitations, making our framework application and/or commercially relevant from the start. Hence, we consider this formalization to be one of the main contributions of this work. This focus is motivated by the growing interest of the scientific and business community in showcasing the value of quantum strategies compared to conventional algorithms, and provides a common ground for a fair comparison based on relevant properties. Overall, we show that QCBMs are competitive with the other classical stateoftheart generative models and provide the best compromise for the requirements of the generalization framework we are adopting. Additionally, we demonstrate that QCBMs perform well in the lowdata regime, which constitutes a bottleneck for deep learning models^{28,29,30} and which we believe to be a promising setting for PQA.
Results and discussion
Defining practical quantum advantage
In this work, we refer to practical quantum advantage (PQA) as the ability of a quantum system to perform a useful task—where ‘useful’ can refer to a scientific, industrial, or societal use  with performance that is faster or better than what is enabled by any existing classical system^{26,31}. We highlight that this concept differs from the computational quantum advantage notion (originally introduced as quantum supremacy), which refers instead to the capability of quantum machines to outperform classical computers, providing a speedup in solving a given task, which would otherwise be classically unsolvable, even using the best classical machine and algorithm^{18,20,22,32}.
By taking inspiration from ref. ^{33}, we define four different types of PQA. The first version, which we refer to as provable PQA (PrPQA) has the ultimate goal of demonstrating the superiority of a quantum algorithm with respect to the best classical algorithm, where the proof is backed up by complexity theory arguments^{34,35}. An example of this would be to show a realization of Shor’s algorithm at scale. To the best of our knowledge, the equivalent of Shor’s algorithm in the context of realworld ML tasks, i.e., useful enough to be included in the definition of provable PQA provided above, is still missing. Here, we focus on the following three classes, which might be more reachable with near and mediumterm quantum devices. We define robust PQA (RPQA) as a practical advantage of a quantum algorithm compared to the best available classical algorithms. It is worth noting that an RPQA can be shortlived when a better classical algorithm is potentially developed after an RPQA has been established. On some occasions, there is no clear consensus about the status of the best available classical algorithm as it depends on each scientific community. To go around that, we can conduct a comparison with a stateoftheart classical algorithm or a set of classical algorithms. If there is a quantum advantage in this case, we can refer to it as potential PQA (PPQA). Within this scenario, a genuine attempt to compare against the bestknown classical algorithms has to be conducted with the possibility that a PPQA is shortlived with the development or discovery of more powerful and advanced classical algorithms. A weaker scenario corresponds to the case where we promote a classical algorithm to its quantum counterpart to investigate whether quantum effects are useful. A quantum advantage in this scenario is an example of limited PQA (LPQA). A potential case is a comparison between a restricted Boltzmann machine^{36} and a quantum Boltzmann machine^{37}. In this study, we are pushing the search for PQA beyond the LPQA scenario to a PPQA, with the hope to include a more comprehensive list of the best available classical models in our comparison in future studies.
In this study, we consider different generative models and let them compete for PPQA. To illustrate our approach, we propose a simple sports analogy. Let us consider a hurdles race, where different runners are competing against each other. Each generative model can thus be seen as a runner in such a race. Each contender has their strengths and weaknesses, which make them see hurdles differently (see Fig. 1a). Thus, one can aim to investigate relevant model features and determine whether they constitute a strength for the model under examination. However, hurdles races take place in a specific concrete context, for instance, with given wind and track surface conditions, which affect the competition outcome significantly (see Fig. 1b). The PQA approach takes this concrete context into account when evaluating the contenders, who are analyzed not only ‘in principle’ but also embedded in a specific context. For example, the track field’s length is crucial for the evaluation since different runners can perform differently if the ‘rules of the game’ are modified. The conditions of the race affect the runners’ performance, which is equivalent to saying that generative models are affected by factors such as the type and size of the dataset, the ground truth distribution to be learned, etc. Each instance of a generative modeling task is unique, just as the conditions for every day of the competitions could be unique. As such, the tracks and the race conditions must be specified before the competition happens, to clarify the precise setting where the search for PQA (or, in our study, for PPQA) takes place.
Lastly, we argue that, when evaluating performance in a concrete instance of a race on a given track, the measure of success for an athlete might not necessarily be attributed to the maximum speed. Outside the analogy, other factors than the speedup are likely needed to be taken into account to judge if practical quantum advantage has occurred. Qualitybased generalization is one of these playgrounds. This is particularly relevant when considering combinatorial optimization problems, as suggested by the generative enhanced optimization (GEO) framework^{38}. This reference introduces a connection between generative models and optimization, which is in and of itself a new perspective on a family of commercially valuable use cases for generative models beyond large language models and image generation, but that is not fully appreciated yet by the ML community. Remarkably, qualitybased generalization turns out to be paramount when the generative modeling task under examination is linked to a costequipped optimization problem. In this scenario, it is desirable to learn to generate solutions with the lowest possible cost, at least lower (i.e., of better quality) compared to the available costs in a training dataset. The utility, the minimum value, and the quality coverage have been introduced precisely to quantify this capability. However, these metrics can be computed in different ways according to the main features of a specific use case, i.e., as in the analogy of a track field defining the rules of the game. In Section ‘Competition details’, we propose two distinct ‘track fields’ that give us two different lenses, according to which we conduct a comparison of generative models toward PPQA in an optimization context that takes the resource bottlenecks of the specific use case into consideration.
Competition details
In our study, we compare several quantum and stateoftheart classical generative models. On the quantum side, we use quantum circuit Born machines (QCBMs)^{15} that are trained with a gradientfree optimization technique. On the classical side, we use Recurrent Neural Networks (RNN)^{39}, Transformers (TF)^{2}, Variational Autoencoders (VAE)^{40}, and Wasserstein Generative Adversarial Networks (WGAN)^{14}. More details about these models and their characteristics along with their hyperparameters are explained in Supplementary Note 1 ‘Generative Models’.
As a test bed, and to illustrate a concrete realization of our framework, we choose a reweighted version of the Evens (also known as parity) distribution where each bitstring with a nonzero probability has an even number of ones^{16}. Although the cost values for the bitstrings of the Evens dataset are synthetic and used mainly to provide a simple illustration of the framework, this distribution embeds a combinatorial constraint that is relevant in marginal probabilistic inference tasks^{41,42}, and in modular constrained optimization^{43}, and it also has realworld applications, namely in the parityconstrained facility location problem^{44}. For the Evens distribution, the size of the solution space, for N_{var} binary variable, is given by \( S ={2}^{{N}_{{{{{{{{\rm{var}}}}}}}}}1}\). Furthermore, we choose a synthetic cost, called the negative separation cost c^{16}, which is defined as the negative of the largest separation between two 1 in a bitstring, i.e., c(x) = − (z + 1), where z is the largest number of consecutive zeros in the bitstring x. For instance, c (‘11100011’) = −4, c (‘10110011’) = −3, and c (‘11111111’) = −1. Note that the minimum of this cost function is known exactly and it is equal to − (N_{var} − 1), which corresponds to the bitstring ‘100…001’.
Given this cost function, we can define our reweighted training distribution P_{train} over the training data, such that:
with inverse temperature \(\beta \equiv \hat{\beta }/2\), where \(\hat{\beta}\) is defined as the standard deviation of the scores c in the training set. If a data point \({{{{{{{\bf{x}}}}}}}}\, \notin \, {{{{{{{{\mathcal{D}}}}}}}}}_{{{{{{{{\rm{train}}}}}}}}}\), then we assign P_{train}(x) = 0. The reweighting procedure applied to the training data encourages our trained models to generate samples with low costs, with the hope that we sample unseen configurations that have a lower cost than the costs seen in the training set^{38}. To achieve the latter, it is crucial that the KullbackLeibler (KL) divergence between the generative model distribution and the training distribution does not tend to zero during the training to avoid memorizing the training data^{16}. It is important to note that it is not mandatory to apply the reweighting of the samples as part of the generative modeling task. However, the reweighting procedure in Eq. (1) has been shown to help in finding high quality samples^{12,16,38,45}. Since all the models will be evaluated in their capabilities to generate lowcost and diverse samples, as dictated by the evaluation criteria C_{q}, MV, and U, we used the reweighted dataset to train all the generative models studied here. In reality, the bare training set consists of T data points with their respective cost values c, and any other transformation could be applied to facilitate the generation of highquality samples.
In our simulations, we choose N_{var} = 20 as the size of each bitstring, and we train our generative models for two training set sizes corresponding to ϵ = 0.001 and ϵ = 0.01 (see Fig. 2). We choose the training data for the two different epsilons, such that we have the same minimum cost of −12 for the two datasets. The purpose of this constraint is to rule out the effect of the minimum seen cost in our experiments. We have selected these small epsilon values to probe the model’s capabilities to successfully train and generalize in this scarcedata regime.
We focus our attention on evaluating qualitybased generalization for the aforementioned generative models (the ‘runners’) using two different competition rules (the ‘tracks’). These two tracks described next are motivated, respectively, by the sampling budget and the difficulty of evaluating a cost function, which are common bottlenecks affecting realworld tasks. Specifically:

Track 1 (T1): there is a fixed budget of queries Q generated by the generative model to compute the quality coverage C_{q}, minimum value MV and the utility U to establish the most advantageous models (see Methods ‘Generalization Metrics’). This criterion is appropriate in the case where it is cheap to compute the cost associated with samples while only having access to a limited sampling budget. For instance, a definition of PPQA based on T1 can be used in the case of generative models requiring expensive resources for sampling, such as QCBMs executed on realworld quantum computers. Here, one aims to reduce the number of measurements as much as possible while still being able to see an advantage in the quality of the generated solutions.

Track 2 (T2): there is a fixed budget Q_{u} of unique, unseen and valid samples to compute the quality coverage, the utility and the minimum value. This approach implies the ability of sampling from the trained models repeatedly to get up to Q_{u} unique, unseen and valid queries. Note that some models might never get to the target Q_{u}, for instance, if they suffer from mode collapse. In this case, the metrics can be computed using the reached \({\tilde{Q}}_{{{{{{{{\rm{u}}}}}}}}}\). This track is motivated by a class of optimization problems where the cost evaluation is expensive. Examples of such scenarios include molecule design and drug discovery that involve clinical trials. In these settings, the cost function is expensive to compute. This track is aimed to provide a proxy reflecting these realworld use cases. In this case, one aims to avoid excessive evaluations of the cost function, i.e., for repeated samples.
Regarding the sampling budget, we use Q = 10^{4} configurations to estimate our quality metrics for track T1. From the perspective of track T2, we sample until we obtain Q_{u} = 10^{2} unique configurations that are used to compute our qualitybased metrics. Note that we checked how many sample batches are needed, and we observed that Q = 10^{4} is enough to extract Q_{u} = 10^{2} unique configurations for all the generative models in our study. Our metrics are averaged over 10 random seeds for each model while keeping the same data for each portion ϵ. For a fair comparison between the generative models, we conduct a hyperparameters grid search using Optuna^{46}, and we extract the best generative model that allows obtaining the lowest utility after 100 training steps. Note that, in order to carry out the hyperparameters tuning process, one could also utilize MV, C_{q}, or any appropriate combination of the three metrics. Additionally, as a fair training budget, we train all our generative models for N_{epochs} = 1000 steps. We compute our qualitybased generalization metrics for tracks T1 and T2 after each 100 training steps. We do not include this sampling cost in the evaluation budget (Q or Q_{u}), as in this study, we are not focusing on the training efficiency of these models, so we allow potentially unlimited resources for the training process. However, for a more realistic setting, the sampling budget could be customized to keep the training requirements into account. For clarity, Fig. 2 provides a schematic representation of our methods. The hyperparameters of each architecture and the parameter count are detailed in Supplementary Table I.
Numerical experiments
We show the generalization results of the different generative models for the two levels of data availability, ϵ = 0.01, 0.001, and for the two different tracks, T1 and T2. We start our analysis with ϵ = 0.01 as illustrated in Fig. 3a. By looking at the first track T1, and focusing on the MV results, we observe that the models experience a quick drop for the first 100 training steps. It is also interesting to see that all the models produce samples with a cost lower than the minimum cost value provided in the training set samples. Furthermore, we can see that VAEs, WGANs, and QCBMs converge to the lowest minimum value of −19, whereas RNNs and TFs jump to higher minimum values with more training steps. In this case, these two models gradually overfit the training data and generalize less to the lowcost sectors. This point highlights the importance of early stopping or monitoring our models during training to obtain their best performances. The utility (T1) provides a complementary picture, where we observe the VAE providing the lowest utility throughout training, followed by the QCBM and then by the other generative models. This ranking highlights the value of QCBMs compared to the other classical generative models. One interesting feature of QCBMs compared to the other models is the monotonic decrease of the utility in addition to its competitive diversity of samples, as illustrated by the quality coverage (T1). The quality coverage also shows the ability of QCBMs, in addition to VAEs and WGANs, to generate a diverse pool of unseen solutions with a lower cost compared to the costs shown in the training data. From the point of view of the second track T2, we observe that the WGAN has the best performance in terms of the three metrics. Additionally, all the models are still generalizing to configurations with a lower cost compared to what was seen in the training data. A complementary picture of the best quality metrics throughout training is provided in Fig. 4a for clearer visibility of the ranking of generative models in our race.
We now focus our attention on the results obtained for the degree of data availability corresponding to ϵ = 0.001 as illustrated in Fig. 3b. We again observe that all the models are generalizing to unseen configurations with a lower cost than the minimum cost seen in the training data. For the first track, T1, we highlight that the QCBM provides the lowest utility compared to the other models while maintaining a competitive minimum value and diversity of highquality solutions. For the second track, T2, we observe that the QCBM is competitive with the VAE while providing the best quality coverage C_{q}. This point is clearer when analyzing and comparing the best qualitybased metrics values in Fig. 4b.
Overall, QCBMs provide the best qualitybased generalization performances compared to the other generative models in the lowdata regime with the limited sampling budget, i.e., for ϵ = 0.001 and T1 with a sampling budget of Q = 10^{4} queries. More specifically, our QCBM competes on the quality coverage and the minimum value metrics and excels in the trend of the utility. This efficiency in the lowdata regime is a highly desirable feature compared to classical generative models, which are known in realworld settings to be datahungry^{28,29,30}. It is worthwhile to note that the used QCBM has the lowest number of parameters compared to the other generative models as outlined in Supplementary Note 1. Although using the parameters count to compare substantially different generative models is not necessarily a wellfounded method (even if widespread), we highlight that the quantum models can achieve results that are competitive with classical models that have significantly more parameters, sometimes one to two order(s) of magnitude more. Overall, these findings are promising steps toward identifying scenarios where quantum models can provide a potential advantage in the scarce data regime. More details about the best results obtained by our generative models can be found in Supplementary Note 2 ‘Additional qualitybased generalization results’.
Finally, we would like to note that QCBMs are also competitive with RNNs and TFs in terms of pregeneralization and validitybased generalization metrics (see Methods ‘Generalization Metrics’) for both data availability settings, ϵ = 0.001, 0.01, as outlined in Supplementary Note 3 ‘Pregeneralization and validitybased generalization results’. The VAE and the WGAN tend to sacrifice these aspects of generalization compared to qualitybased generalization. Here, the QCBM provides the best balance between qualitybased and validitybased generalization (see Supplementary Note 3).
Conclusions
In this paper, we have established a race between classical and quantum generative models in terms of qualitybased generalization and defined four types of practical quantum advantage (PQA). Here, we focus on what we referred to as potential PQA (PPQA), which aims to compare quantum models with the bestknown classical algorithms to the best of our efforts and compute capabilities for the specific task at hand. We have proposed two different competition rules for comparing different models and defining PPQA. We denote these rules as tracks based on the race analogy. We have used QCBMs, RNNs, TFs, VAEs, and WGANs to provide an instance of this comparison on the two tracks. The first track (T1) relies on assuming a fixed sampling budget at the evaluation stage while allowing for an arbitrary number of cost function evaluations. In contrast, the second track (T2) assumes we only have access to a limited number of cost function evaluations, which is the case for applications where the cost estimation is expensive. We also study the impact of the degree of data available to the models for their training. Our results have demonstrated that QCBMs are the most efficient in the scarcedata regime and, in particular, in T2. In general, QCBMs showcase a competitive diversity of solutions compared to the other stateoftheart generative models in all the tracks and datasets considered here.
It is important to note that the two tracks we chose for this study are not comprehensive, even though they are well motivated by plausible realworld scenarios. One could also use different rules of the game where, for example, the training data can be updated for each training step, as it is customary in the generatorenhanced optimization (GEO) framework^{38}, or where the overall budget takes into account the number of samples required during training. The two tracks introduced here serve the purpose of illustrating the possibilities ahead from this formal approach. In particular, such an approach helps to unambiguously specify the criteria for establishing PQA for generative models in realworld use cases, especially in the context of generative modeling to generate diverse and valuable solutions, which could boost in turn the solution to combinatorial optimization problems. This characterization is a longsoughtafter milestone by many application scientists in the quantum information community, and we believe this framework can provide valuable insights when analyzing the suitability of the adoption of quantum or quantuminspired models against stateoftheart classical ones.
Despite the encouraging results obtained from our quantumbased models, we foresee a significant space for potential improvements regarding all the generative models used in this study and some not explored here. In particular, one can embed constraints into generative models such as in U(1)symmetric tensor networks^{45} and U(1)symmetric RNNs^{47,48}. Furthermore, including other stateoftheart generative models with different variations is vital for establishing a more comprehensive comparison, extending the list of competitors both on the classical and quantum side^{49,50,51,52,53}. Moreover, the extension of this work to more realistic datasets is also crucial in the quest to investigate generalizationbased PQA. Although for the quantum circuit layout and system sizes used here, one can have an efficient simulation with tensor networks for large system sizes through a synergistic framework between classical simulation techniques and quantum circuits^{54}. The latter can be harnessed to provide a good starting point for quantum circuits based on tensor networks and overcome widespread trainability issues such as barren plateaus. We hope that our work will encourage more comparisons with a broader class of generative models and that it will be diversified to include more criteria for comparison into account.
Methods
Generalization metrics
The evaluation of unsupervised generative models is a challenging task, especially when one aims to compare different models in terms of generalization. In this work, we focus on discrete probability distributions of bitstrings where an unambiguous definition of generalization is possible^{12}. Here, we start from the framework provided in ref. ^{12} that puts different generative models on an equal footing and allows us to assess the generalization performances of each generative model from a practical perspective.
In this framework, we assume that we are given a solution space S that corresponds to the set of bitstrings that satisfy a constraint or a set of constraints, such that \( S \le {2}^{{N}_{{{{{{{{\rm{var}}}}}}}}}}\) where N_{var} is the number of binary variables in a bitstring. A typical example is the portfolio optimization problem, where there is a constraint on the number of assets to be included in a portfolio. Additionally, we assume that we are given a training dataset \({{{{{{{{\mathcal{D}}}}}}}}}_{{{{{{{{\rm{train}}}}}}}}}=\{{{{{{{{{\bf{x}}}}}}}}}^{(1)},{{{{{{{{\bf{x}}}}}}}}}^{(2)},\ldots ,{{{{{{{{\bf{x}}}}}}}}}^{(T)}\}\), where T = ϵ∣S∣ and ϵ is a tunable parameter that controls the size of the training dataset such that 0 < ϵ ≤ 1.
The metrics provided in ref. ^{12} allow probing different features of generalization. There are three main pillars of generalization: (1) pregeneralization, (2) validitybased generalization, and (3) qualitybased generalization. In the main text, we focus on qualitybased generalization and provide details about pregeneralization and validitybased generalization in Supplementary Note 3 ‘Pregeneralization and validitybased generalization results’.
In typical realworld applications, it is desirable to generate highquality samples that have a low cost c compared to what has been seen in the training dataset. In the qualitybased generalization framework, we can define the minimum value as:
which corresponds to the lowest cost in a given set of unseen and valid queries G_{sol}, which we obtain after generating a set of queries \({{{{{{{\mathcal{G}}}}}}}}=\{{{{{{{{{\bf{x}}}}}}}}}^{(1)},{{{{{{{{\bf{x}}}}}}}}}^{(2)},\ldots ,{{{{{{{{\bf{x}}}}}}}}}^{(Q)}\}\) from a generative model of interest. In our terminology, a sample x is valid if x ∈ S and it is considered unseen if x ∉ D_{train}. In the ideal scenario, MV is equal to the lowest possible cost, corresponding to the global solution of the problem of interest.
To avoid the chance effect of using the minimum, we can average over different random seeds. We can also define the utility that circumvents the use of the minimum through:
where P_{5} corresponds to the set of the 5% lowestcost samples obtained from G_{sol}. The averaging effect allows us to ensure that a low cost was not obtained by chance. Ideally, the best qualitybased generalization corresponds to U equal to the lowest possible cost in our problem of interest.
In qualitybased generalization, it is also valuable to have a diverse set of samples that have high quality. To quantify this desirable feature, we define the quality coverage as
where \({g}_{{{{{{{{\rm{sol}}}}}}}}}(c < \mathop{\min }\limits_{{{{{{{{\bf{x}}}}}}}}\in {{{{{{{{\mathcal{D}}}}}}}}}_{{{{{{{{\rm{train}}}}}}}}}}c({{{{{{{\bf{x}}}}}}}}))\) corresponds to the set of unique valid and unseen samples that have a lower cost compared to the minimal cost in the training data. The choice of the values of the number of queries Q depends on the tracks/rules of comparison presented in Section ‘Defining practical quantum advantage’. Note that an ideal diversity of quality samples corresponds to C_{q} = 1, where all the generated queries are new, unique, and have a cost lower than the minimal training cost. Although this is the ideal case, softer upper bounds can be devised taking into account more realistic scenarios, as proposed in refs. ^{12,16}. On the other side of the spectrum, a very bad quality diversity corresponds to C_{q} = 0 where all the queries are either inside the training data, are not valid, or have a cost above the minimal training cost.
Data availability
The data generated in this study is available from the corresponding author upon reasonable request.
Code availability
The code used to produce the results of this study is available from the corresponding author upon reasonable request.
References
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–44 (2015).
Vaswani, A. et al. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017). https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aaPaper.pdf.
Ramesh, A. et al. Zeroshot texttoimage generation. In International Conference on Machine Learning, 8821–8831 (PMLR, 2021). https://arxiv.org/abs/2102.12092.
Rombach, R., Blattmann, A., Lorenz, D., Esser, P. & Ommer, B. Highresolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10684–10695 (2022). https://arxiv.org/abs/2112.10752.
Team, O. Chatgpt: Optimizing language models for dialogue. https://openai.com/blog/chatgpt (2022).
Ouyang, L. et al. Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems, Vol. 35 (eds Koyejo, S.et al.) 27730–27744 (Curran Associates, Inc., 2022). https://proceedings.neurips.cc/paper_files/paper/2022/file/b1efde53be364a73914f58805a001731PaperConference.pdf.
PerdomoOrtiz, A., Benedetti, M., RealpeGómez, J. & Biswas, R. Opportunities and challenges for quantumassisted machine learning in nearterm quantum computers. Quant. Sci. Technol. 3, 030502 (2018).
Alaa, A., Van Breugel, B., Saveliev, E. S. & van der Schaar, M. How faithful is your synthetic data? Samplelevel metrics for evaluating and auditing generative models. In Proceedings of the 39th International Conference on Machine Learning, Vol. 162 (eds Chaudhuri, K. et al.) 290–306 (PMLR, 2022). https://proceedings.mlr.press/v162/alaa22a.html.
Vapnik, V. N. An overview of statistical learning theory. IEEE Trans. Neural Netw. 10, 988–999 (1999).
Zhao, S. et al. Bias and generalization in deep generative models: an empirical study. Adv. Neural Inform. Process. Syst. 31. https://proceedings.neurips.cc/paper/2018/hash/5317b6799188715d5e00a638a4278901Abstract.html (2018).
Nica, A. C. et al. Evaluating generalization in gflownets for molecule design. In ICLR2022 Machine Learning for Drug Discovery. https://openreview.net/forum?id=JFSaHKNZ35b (2022).
Gili, K., Mauri, M. & PerdomoOrtiz, A. Generalization metrics for practical quantum advantage in generative models. arXiv:2201.08770 (2022). https://arxiv.org/abs/2201.08770.
Han, Z.Y., Wang, J., Fan, H., Wang, L. & Zhang, P. Unsupervised generative modeling using matrix product states. PRX 8, 031012 (2018).
Goodfellow, I. Nips 2016 tutorial: Generative adversarial networks. arXiv:1701.00160 (2016). https://arxiv.org/abs/1701.00160.
Benedetti, M. et al. A generative modeling approach for benchmarking and training shallow quantum circuits. npj Quant. Inform. 5, 45 (2019).
Gili, K., HibatAllah, M., Mauri, M., Ballance, C. & PerdomoOrtiz, A. Do quantum circuit born machines generalize? Quant. Sci. Technol. 8, 035021 (2023).
Havlíček, V. et al. Supervised learning with quantumenhanced feature spaces. Nature 567, 209–212 (2019).
Boixo, S. et al. Characterizing quantum supremacy in nearterm devices. Nat. Phys. 14, 595–600 (2018).
Arute, F. et al. Quantum supremacy using a programmable superconducting processor. Nature 574, 505–510 (2019).
Bouland, A., Fefferman, B., Nirkhe, C. & Vazirani, U. On the complexity and verification of quantum random circuit sampling. Nat. Phys. 15, 159–163 (2019).
Madsen, L. S. et al. Quantum computational advantage with a programmable photonic processor. Nature 606, 75–81 (2022).
Huang, H.Y. et al. Quantum advantage in learning from experiments. Science 376, 1182–1186 (2022).
Umeano, C., Paine, A. E., Elfving, V. E. & Kyriienko, O. What can we learn from quantum convolutional neural networks? arXiv:2308.16664 (2023). https://arxiv.org/abs/2308.16664.
Wu, D. et al. Variational benchmarks for quantum manybody problems. arXiv:2302.04919 (2023). https://arxiv.org/abs/2302.04919.
Riofrío, C. A. et al. A performance characterization of quantum generative models. arXiv:2301.09363 (2023). https://arxiv.org/abs/2301.09363.
Herrmann, N. et al. Quantum utility  definition and assessment of a practical quantum advantage. In 2023 IEEE International Conference on Quantum Software (QSW), 162–174 (IEEE Computer Society, 2023). https://doi.ieeecomputersociety.org/10.1109/QSW59989.2023.00028.
Daley, A. J. et al. Practical quantum advantage in quantum simulation. Nature 607, 667–676 (2022).
Marcus, G. Deep learning: A critical appraisal. arXiv:1801.00631 (2018). https://arxiv.org/abs/1801.00631.
Zhang, Y. & Ling, C. A strategy to apply machine learning to small datasets in materials science. Npj Comput. Mater. 4, 25 (2018).
Tripp, A., Daxberger, E. & HernándezLobato, J. M. Sampleefficient optimization in the latent space of deep generative models via weighted retraining. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS 20 (2020). https://proceedings.neurips.cc/paper/2020/file/81e3225c6ad49623167a4309eb4b2e75Paper.pdf.
Alsing, P. et al. Accelerating progress towards practical quantum advantage: A national science foundation project scoping workshop. arXiv:2210.14757 (2022). https://arxiv.org/abs/2210.14757.
Preskill, J. Quantum computing in the NISQ era and beyond. Quantum 2, 79 (2018).
Rønnow, T. F. et al. Defining and detecting quantum speedup. Science 345, 420–424 (2014).
Coyle, B., Mills, D., Danos, V. & Kashefi, E. The born supremacy: quantum advantage and training of an ising born machine. npj Quant. Inform. 6. https://www.nature.com/articles/s41534020002889 (2022).
Sweke, R., Seifert, J.P., Hangleiter, D. & Eisert, J. On the quantum versus classical learnability of discrete distributions. Quantum 5, 417 (2021).
Hinton, G. A Practical Guide to Training Restricted Boltzmann Machines, 599–619 (Springer, 2012).
Amin, M. H., Andriyash, E., Rolfe, J., Kulchytskyy, B. & Melko, R. Quantum boltzmann machine. Phys. Rev. X 8. https://journals.aps.org/prx/abstract/10.1103/PhysRevX.8.021050 (2018).
Alcazar, J., Vakili, M. G., Kalayci, C. B. & PerdomoOrtiz, A. Geo: enhancing combinatorial optimization with classical and quantum generative models. arXiv:2101.06250. https://arxiv.org/abs/2101.06250 (2021).
Cho, K., van Merriënboer, B., Bahdanau, D. & Bengio, Y. On the properties of neural machine translation: Encoder–decoder approaches. In Proceedings of SSST8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (eds Wu, D., Carpuat, M., Carreras, X. & Vecchi, E. M.) 103–111 (Association for Computational Linguistics, 2014). https://aclanthology.org/W144012.
Rolfe, J. T. Discrete variational autoencoders. In International Conference on Learning Representations https://openreview.net/forum?id=ryMxXPFex (2017).
Ermon, S., Gomes, C. P., Sabharwal, A. & Selman, B. Optimization with parity constraints: From binary codes to discrete integration. In Proceedings of the TwentyNinth Conference on Uncertainty in Artificial Intelligence, UAI’13, 202–211 (AUAI Press, 2013).
Xue, Y., Li, Z., Ermon, S., Gomes, C. P. & Selman, B. Solving marginal map problems with np oracles and parity constraints. In Advances in Neural Information Processing Systems, vol. 29 (Curran Associates, Inc., 2016). https://proceedings.neurips.cc/paper_files/paper/2016/file/a532400ed62e772b9dc0b86f46e583ffPaper.pdf.
Caldwell, J. R., Watson, R. A., Thies, C. & Knowles, J. D. Deep optimisation: Solving combinatorial optimisation problems using deep neural networks. arXiv:1811.00784. https://arxiv.org/abs/1811.00784 (2018).
Kim, K., Shin, Y. & An, H.C. Constantfactor approximation algorithms for parityconstrained facility location and kcenter. Algorithmica 85, 1883–1911 (2023).
LopezPiqueres, J., Chen, J. & PerdomoOrtiz, A. Symmetric tensor networks for generative modeling and constrained combinatorial optimization. Machine Learning: Science and Technology 4. https://iopscience.iop.org/article/10.1088/26322153/ace0f5 (2022).
Akiba, T., Sano, S., Yanase, T., Ohta, T. & Koyama, M. Optuna: A nextgeneration hyperparameter optimization framework. In Proceedings of the 25rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019).
HibatAllah, M., Ganahl, M., Hayward, L. E., Melko, R. G. & Carrasquilla, J. Recurrent neural network wave functions. Phys. Rev. Res. 2, 023358 (2020).
Morawetz, S., De Vlugt, I. J., Carrasquilla, J. & Melko, R. G. U (1)symmetric recurrent neural networks for quantum state reconstruction. Phys. Rev. A 104, 012401 (2021).
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inform. Process. Syst. 33, 6840–6851 (2020).
Dinh, L., SohlDickstein, J. & Bengio, S. Density estimation using real NVP. In International Conference on Learning Representations. https://openreview.net/forum?id=HkpbnH9lx (2017).
Kyriienko, O., Paine, A. E. & Elfving, V. E. Protocols for trainable and differentiable quantum generative modelling. arXiv:2202.08253. https://arxiv.org/abs/2202.08253 (2022).
Zoufal, C., Lucchi, A. & Woerner, S. Quantum generative adversarial networks for learning and loading random distributions. npj Quant. Inform. 5. https://doi.org/10.1038/s4153401902232 (2019).
Cong, I., Choi, S. & Lukin, M. D. Quantum convolutional neural networks. Nat. Phys. 15, 1273–1278 (2019).
Rudolph, M. S. et al. Synergistic pretraining of parametrized quantum circuits via tensor networks. Nat. Commun. 14, 8367 (2023).
Acknowledgements
We would like to thank Brian Chen for his generous comments and suggestions, which were very helpful. We also acknowledge Javier LopezPiqueres, Daniel Varoli, Vladimir VargasCalderón, Brian Dellabetta and Manuel Rudolph for insightful discussions. We also acknowledge Zofia Włoczewska for assistance in designing our figures. Our numerical simulations were performed using Orquestra™. M.H. acknowledges support from Mitacs through Mitacs Accelerate. J.C. acknowledges support from Natural Sciences and Engineering Research Council of Canada (NSERC), the Shared Hierarchical Academic Research Computing Network (SHARCNET), Compute Canada, and the Canadian Institute for Advanced Research (CIFAR) AI chair program. Research at Perimeter Institute is supported in part by the Government of Canada through the Department of Innovation, Science and Economic Development and by the Province of Ontario through the Ministry of Colleges and Universities.
Author information
Authors and Affiliations
Contributions
M.H., M.M., J.C., and A.P.O. wrote the manuscript, designed the comparison framework, and analyzed the results. M.H., M.M., and A.P.O. designed the numerical experiments to test the framework. M.H. ran all the simulations. A.P.O. and M.M. cosupervised the project.
Corresponding author
Ethics declarations
Competing interests
The authors declare the following competing interests: M.H., M.M., and A.P.O. were employed by Zapata Computing Canada Inc. during the development of this work.
Peer review
Peer review information
Communications Physics thanks the anonymous reviewers for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
HibatAllah, M., Mauri, M., Carrasquilla, J. et al. A framework for demonstrating practical quantum advantage: comparing quantum against classical generative models. Commun Phys 7, 68 (2024). https://doi.org/10.1038/s42005024015526
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s42005024015526
This article is cited by

Enhancing combinatorial optimization with classical and quantum generative models
Nature Communications (2024)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.