Introduction

Quantum machine learning (QML) has recently emerged as a new research field aiming to take advantage of quantum computing for machine learning (ML) tasks1,2,3,4. It has been shown that embedding data into gate-based quantum circuits can be used to produce kernels for ML models by quantum measurements5,6,7,8,9,10,11. Quantum kernels have been used as kernels of support vector machines (QSVM) for classification12,13,14,15,16,17,18 and Gaussian process models for regression problems19,20. Variational quantum circuits have been used to devise variational quantum classifiers (VQC)5,21,22. However, for QML to become a new computational paradigm, it is necessary to prove and demonstrate the computational advantage of ML models based on quantum circuits.

Computational problems are classified in computational complexity theory according to the scaling of time and memory requirements in a computational model with the problem size. For example, the classical complexity class P encompasses all decision problems that are solvable on a deterministic Turing machine in time which scales polynomially with the problem size. Analogously, class NP can be defined to encompass problems solvable on a non-deterministic Turing machine in polynomial time. Problems solvable in polynomial time are considered efficient. Hence, decision problems in P are efficiently solvable by classical computers, but it is assumed that this is not the case for problems in NP (P ≠ NP). Problems can further be in special relations to complexity classes. A problem is complete relative to a complexity class, if every problem in this class can be reduced to this problem under an efficient transformation. Another relation is hardness. A hard problem relative to a complexity class is at least as difficult to solve as any problem in this class. Importantly, this implies that hardness is a stronger property than completeness since a hard problem is also complete for a particular class, if it is in this class, but it can be in a hierarchically higher class.

Quantum computing problems are classified by quantum complexity theory23. In particular, class BQP — bounded-error quantum polynomial time — encompasses decision problems solvable in polynomial time by a quantum Turing machine (the uniform family of polynomial-size quantum circuits), with at most 1/3 probability of error. While BQP includes P, because all efficient classical computations can be performed deterministically using quantum circuits with polynomial depth, BQP is assumed to also include problems that are not in P. This means that BQP-complete problems are not in P. Otherwise, BQP would be equal to P and there would be no quantum advantage to any quantum computing algorithm. Thus, (it is believed that) BQP-complete problems cannot be solved in polynomial time on a classical computer. The hierarchy and relations of complexity classes relevant for this work are shown in Fig. 1.

Fig. 1: Hierarchy and relations of the complexity classes and problems relevant for this work.
figure 1

This includes the discrete logarithm decision problem DLP1/2 (red square) and (explicit) k-FORRELATION promise problem (red star). We use the following established, but not yet proven, assumptions: DLP1/2 in NP, P ≠ NP, P ≠ BQP (  existence of quantum advantage), NP-complete is outside BQP, (PROMISE)BQP-complete is outside NP.

To demonstrate quantum advantage of QSVM, Liu et al.18 considered the DISCRETE LOGARITHM PROBLEM (DLP). The problem is to find the logarithm \(x={\log }_{g}y\) in a multiplicative group of integers modulo prime p (denoted as \({{\mathbb{Z}}}_{p}^{*}\)) for a generator g, i.e., such that \({g}^{x}\equiv y\,({{{{{{\mathrm{mod}}}}}}}\,\,p)\). DLP is believed, but not rigorously proven, to be unsolvable with polynomial time in the number of bits \(n=\lceil {\log }_{2}p\rceil\) on a classical computer. Furthermore, only computing the most significant bit of \(x={\log }_{g}y\) for the \(\frac{1}{2}+\frac{1}{{{{{{{{\rm{poly}}}}}}}}(n)}\) fraction of \(x\in {{\mathbb{Z}}}_{p}^{*}\) is as hard as solving DLP18,24. This forms a decision problem (DLP1/2), presumed to be in NP, which was adopted by Liu et al.18 into a classification task to prove separation between QSVM and classical ML classifiers. Given that DLP1/2 is in NP (as shown in Fig. 1 by the square), it can be argued that DLP1/2 cannot be a BQP-complete problem25. Therefore, one cannot generalize the results of Liu et al.18 to arbitrary problems in BQP.

In the present work, we show that VQC and QSVM can solve a problem that is complete in a hierarchically higher class in relation to BQP — namely, PROMISEBQP. As such, our results imply that there exists a quantum kernel or a feature map that makes VQC and QSVM efficient solvers for any problem with BQP complexity.

Results

We use the k-FORRELATION problem that is proven to be PROMISEBQP-complete26. As defined and described in detail in the Methods section, the k-FORRELATION problem considers k Boolean functions f1, …, fk: {0, 1}n → { − 1, 1} yielding

$$\begin{array}{r}{{{\Phi }}}_{{f}_{1},\ldots,{f}_{k}}:=\frac{1}{{2}^{(k+1)n/2}}\mathop{\sum}\limits_{{x}_{1},\ldots,{x}_{k}\in {\{0,1\}}^{n}}{f}_{1}\left({x}_{1}\right){(-1)}^{{x}_{1}\cdot {x}_{2}}\\ {f}_{2}\left({x}_{2}\right){(-1)}^{{x}_{2}\cdot {x}_{3}}\cdots {(-1)}^{{x}_{k-1}\cdot {x}_{k}}{f}_{k}\left({x}_{k}\right)\,\,\end{array}$$
(1)

with \(x\cdot y=\mathop{\sum }\nolimits_{i=1}^{n}{x}_{i}{y}_{i}\). We first introduce a classification problem based on the k-FORRELATION promise problem including a compact data encoding scheme. Correctly classifying such a data set requires an algorithm with PROMISEBQP-complete complexity.

We then show that this classification problem can be solved efficiently and with arbitrary accuracy by both quantum-enhanced classification algorithms: VQC and QSVM, which are reviewed in detail in the Methods section. Therefore, the resulting classification models solve the k-FORRELATION problem in the PROMISEBQP setting and can represent any algorithm to solve all PROMISEBQP problems. In other words, we show that these quantum-enhanced classification algorithms are of PROMISEBQP-complete expressive power.

k-FORRELATION classification data set

We formulate a classification problem with the same complexity as the k-FORRELATION problem. Generally, given a promise problem Π = (Π+, Π), one can obtain a data set \({{{{{{{\mathcal{D}}}}}}}}={\{{{{{{{{{\boldsymbol{x}}}}}}}}}_{{{{{{{{\boldsymbol{i}}}}}}}}},{y}_{i}\}}_{i\in \{1,\ldots,m\}}\) by encoding m = m+ + m instances from Π into input vectors xi where the m+ instances sampled from Π+ are labeled with class yi = + 1 whereas the m instances sampled from Π are labeled with class yi = − 1. Deriving a data set based on the k-FORRELATION problem is not straightforward since the problem instances Π+ Π consist of k-tuples of Boolean functions with n-bit inputs for which the description length to encode an instance generally grows exponentially in n. Specifically, an arbitrary n-bit Boolean function needs 2n bits to encode the evaluation outcome for the 2n possible inputs. Since a k-FORRELATION instance incorporates k such functions, the resulting data set would have dimensionality k2n.

We use the restriction that each Boolean function fi depends on at most three input bits as allowed for k-FORRELATION to remain PROMISEBQP-complete as long as the condition is fulfilled that at least one function depends on exactly three bits26. More specifically, each function can be restricted to be either constant fi(x) = 1 or of the form \({f}_{i}(x)={(-1)}^{{C}_{i}(x)}\) where Ci(x) is a product of at most three bits. This enables one to encode a k-FORRELATION instance using up to three indices per function fi indicating the input bits involved in the product Ci(x) or none indicating the constant function fi(x) = 1. We propose an explicit and practically effective multi-hot encoding scheme. Each function fi can be represented by an n-dimensional binary vector where a 1 in the j-th component indicates that the j-th input bit xj is incorporated in the product Ci(x). The constant function fi(x) = 1 can be encoded as the zero vector. For example, with n = 3, the k = 3 Boolean functions \({f}_{1}(x)={(-1)}^{{x}_{1}{x}_{3}}\), f2(x) = + 1 and \({f}_{3}(x)={(-1)}^{{x}_{2}}\) would be encoded as x = (1, 0, 1, 0, 0, 0, 0, 1, 0). The resulting encoding of a k-FORRELATION instance and, therefore, the data dimensionality is kn, which is linear in k and, since k = poly(n), polynomial in n instead of exponential in n.

Aaronson and Ambainis26 established the quantum algorithm to solve the k-FORRELATION problem with a constant query complexity by encoding the Boolean functions fi into unitary transformations with \({U}_{{f}_{i}}\left|x\right\rangle={f}_{i}(x)\left|x\right\rangle \ \forall x\in {\{0,1\}}^{n}\), which are diagonal in the computational basis, and applying them successively to the initial state \({\left|0\right\rangle }^{\otimes n}\) with leading and subsequent Hadamard gates (H). The full quantum circuit can be represented as

$${U}_{F}={H}^{\otimes n}{U}_{{f}_{k}}{H}^{\otimes n}\ldots {H}^{\otimes n}{U}_{{f}_{1}}{H}^{\otimes n}.$$
(2)

Note that fi(x) = 1 produces an identity map \({U}_{{f}_{i}}=I\), while fi(x) = (−1)C(x) with the product C(x) comprising one, two and three bits induces Z, controlled-Z and controlled-controlled-Z gates, respectively, which causes a relative phase-flip conditioned on the values of up to three qubits27. In the final state \({U}_{F}\left|{0}^{n}\right\rangle\), \({{{\Phi }}}_{{f}_{1},\ldots,{f}_{k}}\) is equal to the amplitude of state \({\left|0\right\rangle }^{\otimes n}\) and can be, therefore, estimated by measurements in the computational basis to decide the k-FORRELATION problem.

We use the feature map \(\left|{{\Phi }}({{{{{{{\boldsymbol{x}}}}}}}})\right\rangle={U}_{{{\Phi }}({{{{{{{\boldsymbol{x}}}}}}}})}{\left|0\right\rangle }^{\otimes n}={U}_{F({{{{{{{\boldsymbol{x}}}}}}}})}{\left|0\right\rangle }^{\otimes n}\) where UF(x) is defined by Eq. (2) under the k Boolean functions encoded in the data sample x. We show that when used for VQC and for kernel construction in QSVM, this feature map leads to classification models that predict the correct class associated with the k-FORRELATION instance encoded in the data sample x. This classification can be made arbitrarily accurate by increasing the number of measurements estimating the probability of \({\left|0\right\rangle }^{\otimes n}\) and is perfect given the exact measurement probability.

k-FORRELATION training data

We now show how to generate positive and negative training samples x+ and x of a classification problem for VQC and QSVM such that the quantum state \(\left|{{\Phi }}({{{{{{{{\boldsymbol{x}}}}}}}}}^{\pm })\right\rangle={U}_{F({{{{{{{{\boldsymbol{x}}}}}}}}}^{\pm })}{\left|0\right\rangle }^{\otimes n}\) produced by circuit (2) in the feature map or quantum kernel corresponds to the positive class sample if all qubits are in state \(\left|0\right\rangle\) and the negative class sample if they are in another computational basis state \(\left|z\right\rangle\) with 0 < z < 2n. To do this, we use the following theorem, which is proven in the Methods section:

Theorem 1

(odd-k-FORRELATION) Explicit k-FORRELATION remains PROMISEBQP-complete when k is restricted to odd k ≥ 3.

First, we show how to obtain a positive sample x+ such that the initial state is preserved under circuit (2), i.e., \({U}_{F({{{{{{{{\boldsymbol{x}}}}}}}}}^{+})}{\left|0\right\rangle }^{\otimes n}={\left|0\right\rangle }^{\otimes n}\). For odd k Boolean functions, circuit (2) includes k + 1 Hadamard gates, an even number. For all fi(x) = + 1, the initial state is preserved since \({U}_{{f}_{i}}={{{{{{{\rm{I}}}}}}}}\) and the resulting pairs of successive Hadamard gates annihilate. To fulfill the condition that at least one Boolean function must depend on exactly three bits, we choose, without loss of generality, the first and third Boolean functions to be \({f}_{1}(x)={f}_{3}(x)={(-1)}^{{x}_{i}{x}_{j}{x}_{l}}\). With this choice,

$${{{{{{{{\rm{H}}}}}}}}}^{\otimes n}{{{{{{{{\rm{U}}}}}}}}}_{{f}_{3}}{{{{{{{{\rm{H}}}}}}}}}^{\otimes n}{{{{{{{\rm{I}}}}}}}}{{{{{{{{\rm{H}}}}}}}}}^{\otimes n}{{{{{{{{\rm{U}}}}}}}}}_{{f}_{1}}{{{{{{{{\rm{H}}}}}}}}}^{\otimes n}={{{{{{{{\rm{H}}}}}}}}}^{\otimes n}{{{{{{{{\rm{U}}}}}}}}}_{{f}_{3}}{{{{{{{{\rm{U}}}}}}}}}_{{f}_{1}}{{{{{{{{\rm{H}}}}}}}}}^{\otimes n}={{{{{{{\rm{I}}}}}}}}$$
(3)

since \({f}_{1}(x){f}_{3}(x)={(-1)}^{2{x}_{i}{x}_{j}{x}_{l}}=1\). The positive sample x+ encoding these functions gives \({U}_{F({{{{{{{{\boldsymbol{x}}}}}}}}}^{+})}{\left|0\right\rangle }^{\otimes n}={\left|0\right\rangle }^{\otimes n}\).

Second, we proceed with generating a negative sample x for which circuit (2) maps \({\left|0\right\rangle }^{\otimes n}\) to a different computational basis state, i.e., \({U}_{F({{{{{{{{\boldsymbol{x}}}}}}}}}^{-})}{\left|0\right\rangle }^{\otimes n}=\left|z\right\rangle\) with 0 < z < 2n. Observe that the unitary \({U}_{{f}_{i}}\) with \({f}_{i}(x)={(-1)}^{{x}_{j}}\) implements a Pauli-Z gate, which resolves to the Pauli-X gate when sandwiched by Hadamard gates HZH = X. This flip in qubit j transforms from the initial to another computational basis state \(\left|z\right\rangle\) with zj = 1. Without loss of generality, we fix i = 1 and choose a subsequent f2(x) fulfilling the three-qubit dependence condition for PROMISEBQP-completeness so that all the following k − 1 Hadamard gates, an even number, pairwise annihilate when the remaining l > 2 functions are constant fl(x) = 1. Thus, f2(x) might only cause a global phase-flip on \(\left|z\right\rangle\), which can be ignored, and preserves the non-zero basis state of qubit j such that \({U}_{F({{{{{{{{\boldsymbol{x}}}}}}}}}^{-})}{\left|0\right\rangle }^{\otimes n}=|{2}^{j-1}\rangle \ne \left|0\right\rangle\).

Universal expressiveness of VQC

We first present the proof for VQC. The VQC model5 uses a feature map to encode the input data x into an n-qubit quantum state \(\left|{{\Phi }}({{{{{{{\boldsymbol{x}}}}}}}})\right\rangle={U}_{{{\Phi }}({{{{{{{\boldsymbol{x}}}}}}}})}{\left|0\right\rangle }^{\otimes n}\) followed by a parameterized quantum circuit W(θ). A decision rule, involving an additional bias term b [ − 1, 1], enables classification by estimating the binary measurement probability

$${p}_{\pm 1}({{{{{{{\boldsymbol{x}}}}}}}})=\left\langle {{\Phi }}({{{{{{{\boldsymbol{x}}}}}}}})\middle|{W}^{{{{\dagger}}} }({{{{{{{\boldsymbol{\theta }}}}}}}}){M}_{\pm 1}W({{{{{{{\boldsymbol{\theta }}}}}}}})\middle|{{\Phi }}({{{{{{{\boldsymbol{x}}}}}}}})\right\rangle$$
(4)

to classify x as positive if

$${p}_{+1}({{{{{{{\boldsymbol{x}}}}}}}}) \; > \; \frac{1}{2}(1-b)$$
(5)

or negative otherwise.

Proof

We use proof by reduction where our goal is to find the decision rule (5) to predict class +1 for each instance of the k-FORRELATION problem if and only if it is positive x Π+. We start with a data sample x that encodes the functions f1, …, fk and note that the choice of k-FORRELATION feature map UΦ(x) = UF(x), observable \({M}_{+1}={\left|0\right\rangle }^{\otimes n}{\left\langle 0\right|}^{\otimes n}\) and parameters θ such that W(θ) = I leads to

$${p}_{+1}({{{{{{{\boldsymbol{x}}}}}}}})=\left | {\left\langle 0 \middle|^{\otimes n}{U}_{F({{{{{{{\boldsymbol{x}}}}}}}})}\middle|0\right\rangle }^{\otimes n}\right | ^{2}=\left | {{{\Phi }}}_{{f}_{1},\ldots,{f}_{k}}\right | ^{2}.$$
(6)

For the two possible classes for a data sample x, two bounds to b can be derived as follows:

  • If x belongs to class + 1: \({{{\Phi }}}_{{f}_{1},\ldots,{f}_{k}}\ge 3/5\) holds and, therefore, \(|{{{\Phi }}}_{{f}_{1},\ldots,{f}_{k}}|\ge {(3/5)}^{2}=9/25\), which, when inserted into the decision rule (5), yields

    $${p}_{+1}({{{{{{{\boldsymbol{x}}}}}}}})\;\ge\; \frac{9}{25} \; > \; \frac{1}{2}(1-b).$$
    (7)

    This only holds if b is chosen to be greater than − 7/25.

  • If x belongs to class − 1: \({{{\Phi }}}_{{f}_{1},\ldots,{f}_{k}}\le 1/100\) holds and, therefore, \(|{{{\Phi }}}_{{f}_{1},\ldots,{f}_{k}}|\le {(1/100)}^{2}=1/10000\). As the decision rule (5) must be violated, i.e., p+1(x) < (1 − b)/2 for a negative sample x, a second condition can be derived as

    $${p}_{+1}({{{{{{{\boldsymbol{x}}}}}}}}) \;\le\; \frac{1}{10000} \; < \; \frac{1}{2}(1-b).$$
    (8)

    This only holds if b is chosen to be less than 4999/5000.

Thus, the VQC decision rule (5) with the choice of \(b\in \left(\frac{7}{25},\frac{4999}{5000}\right)\) decides the k-FORRELATION problem. The existence of values of θ and especially b that allows separation of the two classes was not a priori guaranteed. The demonstration of their existence ensures that VQC has PROMISEBQP-complete expressive power. We note again that the transformation from k-FORRELATION to VQC is polynomial in time. □

Universal expressiveness of QSVM

We now present the proof for QSVM. The QSVM approach uses a quantum computer to estimate the kernel function

$$k({{{{{{{{\boldsymbol{x}}}}}}}}}_{{{{{{{{\boldsymbol{i}}}}}}}}},{{{{{{{{\boldsymbol{x}}}}}}}}}_{{{{{{{{\boldsymbol{j}}}}}}}}})=\left | \left\langle {{\Phi }}({{{{{{{{\boldsymbol{x}}}}}}}}}_{{{{{{{{\boldsymbol{i}}}}}}}}})\middle|{{\Phi }}({{{{{{{{\boldsymbol{x}}}}}}}}}_{{{{{{{{\boldsymbol{j}}}}}}}}})\right\rangle \right | ^{2}=\left | \left\langle 0 \middle | ^{\otimes n}{U}_{{{\Phi }}({{{{{{{{\boldsymbol{x}}}}}}}}}_{{{{{{{{\boldsymbol{i}}}}}}}}})}^{{{{\dagger}}} }{U}_{{{\Phi }}({{{{{{{{\boldsymbol{x}}}}}}}}}_{{{{{{{{\boldsymbol{j}}}}}}}}})}\middle | 0\right\rangle^{\otimes n}\right | ^{2}$$
(9)

which is then used when solving the SVM dual problem5 classically:

$$\mathop{{{{{{{{\rm{maximize}}}}}}}}}\limits_{{{{{{{{\boldsymbol{\alpha }}}}}}}}}\quad \mathop{\sum }\limits_{i=1}^{m}{\alpha }_{i}-\frac{1}{2}\mathop{\sum }\limits_{i=1,j=1}^{m}{\alpha }_{i}{\alpha }_{j}{y}_{i}{y}_{j}k({{{{{{{{\boldsymbol{x}}}}}}}}}_{{{{{{{{\boldsymbol{i}}}}}}}}},{{{{{{{{\boldsymbol{x}}}}}}}}}_{{{{{{{{\boldsymbol{j}}}}}}}}})$$
(10)
$${{{{{{{\rm{s.t.}}}}}}}}\qquad 0\le {{{{{{{\boldsymbol{\alpha }}}}}}}}\le C,\qquad 0=\mathop{\sum }\limits_{i=1}^{m}{\alpha }_{i}{y}_{i}.$$
(11)

The decision rule for an unseen (test) data sample s, involving an additional bias term b [ − 1, 1], is then

$$m({{{{{{{\boldsymbol{s}}}}}}}})={{{{{{\mathrm{sign}}}}}}}\,\left(\mathop{\sum }\limits_{i=1}^{m}{\alpha }_{i}{y}_{i}k({{{{{{{{\boldsymbol{x}}}}}}}}}_{{{{{{{{\boldsymbol{i}}}}}}}}},{{{{{{{\boldsymbol{s}}}}}}}})+b\right).$$
(12)

Proof

We use proof by reduction to show that QSVM can have PROMISEBQP-complete expressive power. The constraints of the dual optimization problem in Eq. (11) imply that at least two training samples, one from each class, must be provided. Therefore, we consider m = 2 training samples and choose the positive training sample x1 = x+ with y1 = + 1 and the negative training sample x2 = x with y2 = − 1 as defined above. The equality constraint in Eq. (11) yields

$$0={\alpha }_{1}{y}_{1}+{\alpha }_{2}{y}_{2}={\alpha }_{1}-{\alpha }_{2}\ \iff \ {\alpha }_{1}={\alpha }_{2}.$$
(13)

We set α = α1 = α2, which simplifies the dual optimization problem to one-dimensional optimization constrained on the interval 0 ≤ αC. Since [0, C] is a closed and bounded (i.e., compact) interval and the objective function is concave, the Weierstraß’ extreme value theorem guarantees a maximum on this interval. We thus consider α to be the optimal solution, which is guaranteed to be non-negative and can be determined in closed-form in terms of the kernel function evaluated at the two training samples k(x1, x2).

As shown earlier, the two training samples produce \({U}_{F({{{{{{{{\boldsymbol{x}}}}}}}}}^{+})}{\left|0\right\rangle }^{\otimes n}={\left|0\right\rangle }^{\otimes n}\) and \({U}_{F({{{{{{{{\boldsymbol{x}}}}}}}}}^{-})}{\left|0\right\rangle }^{\otimes n}=\left|z\right\rangle\) with z ≠ 0n when the k-FORRELATION feature map using circuit (2) is applied. Under using the k-FORRELATION feature map to construct the kernel, the prediction mapping in Eq. (12) of QSVM for (test) data sample s can now be simplified as

$$m({{{{{{{\boldsymbol{s}}}}}}}})={{{{{{{\rm{sign}}}}}}}}\left(\alpha \left(k({{{{{{{{\boldsymbol{x}}}}}}}}}^{+},{{{{{{{\boldsymbol{s}}}}}}}})-k({{{{{{{{\boldsymbol{x}}}}}}}}}^{-},{{{{{{{\boldsymbol{s}}}}}}}})\right)+b\right)$$
(14)
$$={{{{{{{\rm{sign}}}}}}}}\left(\alpha \left(\left | \left\langle {0}^{n}\middle|{U}_{F({{{{{{{\boldsymbol{s}}}}}}}})}\middle|{0}^{n}\right\rangle \right | ^{2}-\left | \left\langle z \middle|{U}_{F({{{{{{{\boldsymbol{s}}}}}}}})}\middle|{0}^{n}\right\rangle \right | ^{2}\right)+b\right).$$
(15)

Here, the two required quantum kernel function estimates correspond to the probabilities to observe the bit-strings 0n and z in the state produced by the k-FORRELATION quantum circuit \({U}_{F({{{{{{{\boldsymbol{s}}}}}}}})}{\left|0\right\rangle }^{\otimes n}\) upon measurement in the computational basis.

For the two possible cases ± 1 of a test sample s, two bounds can be derived for the argument in Eq. (15):

  • If s belongs to class + 1: The measurement probability \(|\langle {0}^{n}|{U}_{F({{{{{{{\boldsymbol{s}}}}}}}})}|{0}^{n}\rangle {|}^{2}\) is the absolute squared forrelation quantity \(|{{{\Phi }}}_{{f}_{1},\ldots,{f}_{k}}{|}^{2}\) corresponding to the k-FORRELATION instance encoded in s, which is \(|\langle {0}^{n}|{U}_{F({{{{{{{\boldsymbol{s}}}}}}}})}|{0}^{n}\rangle {|}^{2}\ge {(3/5)}^{2}\) in this case. Since the probabilities have to add up to one, every other n-bit bit-string z ≠ 0n can only be observed with a probability of at most 1 − (3/5)2 = 16/25, i.e., \(|\left\langle z\right|{U}_{F({{{{{{{\boldsymbol{s}}}}}}}})}\left|{0}^{n}\right\rangle {|}^{2}\le 16/25\). These observations yield a lower bound of

    $$\alpha \Big(\underbrace{\left | \left\langle {0}^{n}\middle|{U}_{F({{{{{{{\boldsymbol{s}}}}}}})}}\middle|{0}^{n}\right\rangle \right | ^{2}}_{\begin{array}{c}\ge {\left(\frac{3}{5}\right)}^{2}\end{array}}+\underbrace{(-1)\left | \left\langle z \middle|{U}_{F({{{{{{{\boldsymbol{s}}}}}}})}}\middle|{0}^{n}\right\rangle \right | ^{2}}_{\begin{array}{c}\scriptstyle\ge -\frac{16}{25}\end{array}}\Big)+b \;\ge\; -\frac{7}{25}\alpha+b.$$
    (16)

    Inserting this bound into m(s), we see that it evaluates to m(s) = + 1 provided b is chosen to be greater than 7α/25.

  • If s belongs to class − 1: Analogously to the previous case, it is known that \(|\langle {0}^{n}|{U}_{F({{{{{{{\boldsymbol{s}}}}}}}})}|{0}^{n}\rangle {|}^{2}\le {(1/100)}^{2}\) and, therefore, \(|\langle z|{U}_{F({{{{{{{\boldsymbol{s}}}}}}}})}|{0}^{n}\rangle {|}^{2}\ge 1-{(1/100)}^{2}=9999/10000\) for any z ≠ 0n. Then, the upper bound is

    $$\alpha \Big(\underbrace{\left | \left\langle {0}^{n} \middle|{U}_{F({{{{{{{\boldsymbol{s}}}}}}})}}\middle|{0}^{n}\right\rangle \right | ^{2}}_{\begin{array}{c}\le {\left(\frac{1}{100}\right)}^{2}\end{array}}+\underbrace{(-1)\left | \left\langle z \middle|{U}_{F({{{{{{{\boldsymbol{s}}}}}}})}} \middle|{0}^{n}\right\rangle \right| ^{2}}_{\begin{array}{c}\le -\frac{9999}{10000}\end{array}}\Big)+b \;\le\; -\frac{4999}{5000}\alpha+b,$$
    (17)

    and b must be smaller than 4999α/5000, which then guarantees that m(s) in Eq. (15) evaluates to −1.

Thus, setting \(b\in \left(\frac{7}{25}\alpha,\frac{4999}{5000}\alpha \right)\) guarantees the correct evaluation of the classification mapping m(s) for both cases. Again, the existence of b that yields the SVM separating the two classes was not a priori guaranteed. That such an interval exists ensures that QSVM has PROMISEBQP-complete expressive power. □

k-FORRELATION fixed ansatz

Finally, we show that circuit (2) used in the feature map or quantum kernel can be implemented using a parameterized quantum circuit with a fixed ansatz, which is typically used in QML. With a single Boolean function fi in the multi-hot encoding x, the indices j {1, …, n} where xj = 1 determine the target and control qubits of Z gates. To obtain a fixed ansatz, all possible qubit combinations to apply Z gates, controlled-Z gates and controlled-controlled-Z gates in (2) need to be covered. There are \(\left({n}\atop{1}\right)=n\in {{{{{{{\mathcal{O}}}}}}}}(n)\), \(\left({n}\atop{2}\right)=n(n-1)/2\in {{{{{{{\mathcal{O}}}}}}}}({n}^{2})\), \(\left({n}\atop{3}\right)=n(n-1)(n-2)/6\in {{{{{{{\mathcal{O}}}}}}}}({n}^{3})\) possible qubit choices, respectively, due to the gate symmetry27. Instead of a (controlled-) Z gate, a (controlled) rotation about the Z axis RZ(λ) by angle parameter λ can be applied as it is equivalent to identity if λ = 0 and to the (controlled-) Z gate if λ = π. For a controlled rotation gate applied to J {1, …, n} qubits, the sample x determines λ as

$$\lambda=\pi \mathop{\prod}\limits_{j\in J}{x}_{j}\mathop{\prod}\limits_{l\in \{1,\ldots,n\}\setminus J}(1-{x}_{l})$$
(18)

which gives λ = 0 in all (controlled) rotation gates except λ = π for the one that implements fi encoded in x.

For k functions, the fixed ansatz requires \({{{{{{{\mathcal{O}}}}}}}}(k{n}^{3})\) gates. This shows that the expressiveness of VQC and QSVM proven here can be achieved using parameterized quantum circuits with fixed ansatz of polynomial depth since k = poly(n). This result is important considering that VQC and QSVM are generally implemented using circuits with fixed ansatz5,6,7. However, embedding the data directly through circuit (2) by applying a single (controlled) Z gate to the respective qubits, which is no longer a fixed ansatz, results in shallower circuits of depth O(k).

Discussion

The present work demonstrates that the feature map of VQC and the quantum kernels of QSVM can be used to solve the classification problem with the complexity of the k-FORRELATION problem that has previously been proven to be PROMISEBQP-complete. This means that it is possible to design the feature map of VQC and the quantum kernel of QSVM for any classification problem derived from any promise problem in PROMISEBQP. Because PROMISEBQP includes all decision problems in BQP as a special case, our results imply that it is possible to design the feature map of VQC and the quantum kernel of QSVM that solve any classification problem derived from any decision problem in BQP. If BQP ≠ BPP (classical bounded error probabilistic polynomial time), as required for exponential speed-up of quantum computing to exist, our results imply that VQC and QSVM must have quantum advantage over classical classifiers.

According to Havlíček et al.5, every problem that can be solved by VQC can also be solved by QSVM, but the reverse does not generally hold. This connection is detailed in Schuld7 and briefly outlined here. QSVM can be seen as VQC with an optimal measurement, i.e., W(θ) with an optimal ansatz and parameters, since W(θ) effectively changes the measurement basis. Generally, a fixed ansatz in W(θ) requires \({{{{{{{\mathcal{O}}}}}}}}({2}^{{2}^{n}})\) degrees of freedom to express arbitrary measurements. In QSVM, this reduces to an m-dimensional optimization problem as—in the SVM dual view—measurements (↔ separating hyperplane) become expansions in the training data (↔ support vectors). Due to the concavity in Eq. (10), this is optimally solved given the kernel values k(xi, xj) for all pairs of training data points. Therefore, QSVM is guaranteed to find better or equally good solutions than VQC. In the present work, we show that both VQC and QSVM can solve a classification problem based on the k-FORRELATION problem, which implies that VQC and QSVM have an equivalent (universal) expressiveness from a computational complexity theory point of view.

Methods

Quantum-enhanced classification algorithms

Two most common, and related, approaches to solving classification problems with quantum computers are VQC and QSVM5, schematically depicted in Fig. 2. The VQC model first uses a feature map to encode the input data x into an n-qubit quantum state by a unitary transformation of the initial state \({\left|0\right\rangle }^{\otimes n}\): \(\left|{{\Phi }}({{{{{{{\boldsymbol{x}}}}}}}})\right\rangle={U}_{{{\Phi }}({{{{{{{\boldsymbol{x}}}}}}}})}{\left|0\right\rangle }^{\otimes n}\). Subsequently, a parameterized quantum circuit W(θ) transforms the states to enable classification by a quantum measurement. The parameters θ and an additional bias term b [ − 1, 1] are learned by classical optimization. A binary measurement probability

$${p}_{\pm 1}({{{{{{{\boldsymbol{x}}}}}}}})=\left\langle {{\Phi }}({{{{{{{\boldsymbol{x}}}}}}}}) \middle|{W}^{{{{\dagger}}} }({{{{{{{\boldsymbol{\theta }}}}}}}}){M}_{\pm 1}W({{{{{{{\boldsymbol{\theta }}}}}}}}) \middle|{{\Phi }}({{{{{{{\boldsymbol{x}}}}}}}})\right\rangle$$
(19)

is estimated to classify x as positive if

$${p}_{+1}({{{{{{{\boldsymbol{x}}}}}}}}) > \frac{1}{2}(1-b)$$
(20)

or as negative otherwise under choosing two projectors

$${M}_{\pm }=\frac{1}{2}\left(I\pm \mathop{\sum }\limits_{z=0}^{{2}^{n}-1}{h}_{z}\left|z\right\rangle \left\langle z\right|\right)$$
(21)

with arbitrary but fixed coefficients hz { − 1, 1}.

Fig. 2: Quantum circuits used in the quantum-enhanced classification algorithms.
figure 2

Diagrams of quantum circuits for a variational quantum classifiers (VQCs) and b quantum kernel support vector machines (QSVMs).

The QSVM approach uses a quantum computer to estimate the kernel function k(xi, xj) that is then used in the dual problem5:

$$\mathop{{{{{{{{\rm{maximize}}}}}}}}}\limits_{{{{{{{{\boldsymbol{\alpha }}}}}}}}}\quad \mathop{\sum }\limits_{i=1}^{m}{\alpha }_{i}-\frac{1}{2}\mathop{\sum }\limits_{i=1,j=1}^{m}{\alpha }_{i}{\alpha }_{j}{y}_{i}{y}_{j}k({{{{{{{{\boldsymbol{x}}}}}}}}}_{{{{{{{{\boldsymbol{i}}}}}}}}},{{{{{{{{\boldsymbol{x}}}}}}}}}_{{{{{{{{\boldsymbol{j}}}}}}}}})$$
(22)
$${{{{{{{\rm{s.t.}}}}}}}}\quad 0\le {{{{{{{\boldsymbol{\alpha }}}}}}}}\le C,\quad 0=\mathop{\sum }\limits_{i=1}^{m}{\alpha }_{i}{y}_{i}.$$
(23)

The optimal solution is obtained classically by efficient quadratic optimization and determines the classification mapping of a (test) data sample s as

$$m({{{{{{{\boldsymbol{s}}}}}}}})={{{{{{\mathrm{sign}}}}}}}\,\left(\mathop{\sum }\limits_{i=1}^{m}{\alpha }_{i}{y}_{i}k({{{{{{{{\boldsymbol{x}}}}}}}}}_{{{{{{{{\boldsymbol{i}}}}}}}}},{{{{{{{\boldsymbol{s}}}}}}}})+b\right).$$
(24)

Fig. 2 depicts the quantum circuit to obtain the kernel function

$$k({{{{{{{{\boldsymbol{x}}}}}}}}}_{{{{{{{{\boldsymbol{i}}}}}}}}},{{{{{{{{\boldsymbol{x}}}}}}}}}_{{{{{{{{\boldsymbol{j}}}}}}}}})=\left | \left\langle {{\Phi }}({{{{{{{{\boldsymbol{x}}}}}}}}}_{{{{{{{{\boldsymbol{i}}}}}}}}})\middle|{{\Phi }}({{{{{{{{\boldsymbol{x}}}}}}}}}_{{{{{{{{\boldsymbol{j}}}}}}}}})\right\rangle \right | ^{2}=\left| {\left\langle 0 \middle|^{\otimes n}{U}_{{{\Phi }}({{{{{{{{\boldsymbol{x}}}}}}}}}_{{{{{{{{\boldsymbol{i}}}}}}}}})}^{{{{\dagger}}} }{U}_{{{\Phi }}({{{{{{{{\boldsymbol{x}}}}}}}}}_{{{{{{{{\boldsymbol{j}}}}}}}}})} \middle|0\right\rangle }^{\otimes n}\right | ^{2}$$
(25)

as the measurement probability of the 0n bit-string.

FORRELATION

The complexity classes such as P or BQP are for decision problems with inputs necessarily belonging to ‘+’ or ‘–’ instances. If inputs include a set that corresponds to neither ‘+’ nor ‘–’, the decision problems are generalized to become promise problems28. To make decisions, promise problems consider only inputs from the subsets corresponding to the ‘+/–’ instances (i.e. inputs that are promised to lead to a ‘+’ or ’–’ decision).

An example of a promise problem is the FORRELATION problem introduced in Aaronson29, and refined and extended in Aaronson and Ambainis26. This problem considers two Boolean functions f, g: {0, 1}n → { − 1, 1} where the domain {0, 1}n contains all 2nn-bit strings, i.e., the integers from 0 to (2n − 1) in decimal representation. The quantity

$${{{\Phi }}}_{f,g}:=\frac{1}{{2}^{3n/2}}\mathop{\sum}\limits_{x,y\in {\{0,1\}}^{n}}f(x){(-1)}^{x\cdot y}g(y)$$
(26)

with \(x\cdot y=\mathop{\sum }\nolimits_{i=1}^{n}{x}_{i}{y}_{i}\) determines the amount of correlation between f and the Fourier transform of g, i.e., the “forrelation" of f and g. Analogously to correlation, one can say that f and g are “forrelated" once the value Φf,g is large or not if it is small.

The FORRELATION problem is solvable with a quantum algorithm29 using a single query with error probability of 2/5, which can be arbitrarily reduced by increasing the query complexity by a constant factor. Therefore, a quantum algorithm exists that solves the problem with error probability ≤1/3 with a constant number of queries while the query implementing circuit remains polynomial, which makes it a PROMISEBQP problem26. As any decision problem is a trivial special case of a more general promise problem, the class of PROMISEBQP problems includes BQP entirely, as depicted in Fig. 1.

k-FORRELATION: a PROMISEBQP-complete extension

Aaronson and Ambainis26 extended the FORRELATION problem to the k-FORRELATION problem. Instead of two Boolean functions, k Boolean functions f1, …, fk: {0, 1}n → { − 1, 1} are considered and the quantity

$$\begin{array}{r}{{{\Phi }}}_{{f}_{1},\ldots,{f}_{k}}:=\frac{1}{{2}^{(k+1)n/2}}\mathop{\sum}\limits_{{x}_{1},\ldots,{x}_{k}\in {\{0,1\}}^{n}}{f}_{1}\left({x}_{1}\right){(-1)}^{{x}_{1}\cdot {x}_{2}}\\ {f}_{2}\left({x}_{2}\right){(-1)}^{{x}_{2}\cdot {x}_{3}}\cdots {(-1)}^{{x}_{k-1}\cdot {x}_{k}}{f}_{k}\left({x}_{k}\right)\,\,\end{array}$$
(27)

with \(x\cdot y=\mathop{\sum }\nolimits_{i=1}^{n}{x}_{i}{y}_{i}\) leads to a promise problem:

Definition 1 (k-FORRELATION)

The promise problem Π = (Π+, Π) over the space of k Boolean functions {0, 1}n → { − 1, 1} with

  • \({\circ}\forall ({f}_{1},\ldots,{f}_{k})\in {{{\Pi }}}_{+}:{{{\Phi }}}_{{f}_{1},\ldots,{f}_{k}}\ge \frac{3}{5}\)

  • \(\circ\forall ({f}_{1},\ldots,{f}_{k})\in {{{\Pi }}}_{-}:|{{{\Phi }}}_{{f}_{1},\ldots,{f}_{k}}|\le \frac{1}{100}\)

    is the k-FORRELATION problem. Here, Π± are the sets of ± problem instances with \({{{\Pi }}}_{+}\cap {{{\Pi }}}_{-}=\varnothing\).

This definition generally allows the evaluation of the functions f1, …, fk by oracle queries. Furthermore, for explicit descriptions, which we assume in this work, Aaronson and Ambainis26 proved the following theorem:

Theorem 2 (PROMISEBQP-completeness)

If f1, …, fk are described explicitly (e.g., by circuits to compute them), and k = poly(n), then k-FORRELATION is BQP-complete.

Also showed that this still holds when the functions are restricted to depend on at most three input bits of the form \({f}_{i}(x)={(-1)}^{{C}_{i}(x)}\) where Ci(x) is a product of at most 3 input bits, or be chosen constant fi(x) = 1, while at least one fi(x) must depend on exactly 3 bits in x. Note the crucial difference: k-FORRELATION (under the stated conditions) is not only a PROMISEBQP problem but a PROMISEBQP-complete problem.

odd-k-FORRELATION

Theorem 1 is used for the construction of the data set in the present work. It is restated and proven in the following:

Theorem 1 (odd-k-FORRELATION)

Explicit k-FORRELATION remains PROMISEBQP-complete when k is restricted to odd k ≥ 3.

Proof

By construction, odd-k-FORRELATION is a special case of k-FORRELATION, which trivially implies that odd-k-FORRELATION is in PROMISEBQP. For PROMISEBQP-completeness, it remains to show that odd-k-FORRELATION is PROMISEBQP-hard via a proof by reduction: we provide a polynomial mapping from every instance of k-FORRELATION to an instance of odd-k-FORRELATION that preserves the forrelation value Φ, which indicates that odd-k-FORRELATION is at least as difficult as k-FORRELATION.

If k is odd in an instance of k-FORRELATION, it is trivially an instance of odd-k-FORRELATION. If k is even in an instance of k-FORRELATION, we add 4n/2 − 1 Boolean functions resulting in odd k + 4n/2 − 1. The additional functions are chosen such that they are either constant f(x) = + 1 or of the form \(f(x)={(-1)}^{{x}_{i}{x}_{j}}\) with i, j {1, …, n}, fulfilling the necessary conditions. We show that \({{{\Phi }}}_{{f}_{1},\ldots,{f}_{k}}={{{\Phi }}}_{{f}_{1},\ldots,{f}_{k+4\lceil n/2\rceil -1}}\) as follows.

The proof of Theorem 25 in Aaronson and Ambainis26 uses a gadget applied to two qubits i and j with i ≠ j that converts an even number of H2 gates into an odd number. Namely,

$${{{{{{{{\rm{H}}}}}}}}}^{\otimes 2}\ {{{{{{{\rm{CZ}}}}}}}}\ {{{{{{{{\rm{H}}}}}}}}}^{\otimes 2}\ {{{{{{{\rm{CZ}}}}}}}}\ {{{{{{{{\rm{H}}}}}}}}}^{\otimes 2}\ {{{{{{{\rm{CZ}}}}}}}}\ {{{{{{{{\rm{H}}}}}}}}}^{\otimes 2}\equiv {{{{{{{\rm{SWAP}}}}}}}}\ {{{{{{{{\rm{H}}}}}}}}}^{\otimes 2}$$
(28)

using three controlled-Z gates (CZ), which implement \(f(x)={(-1)}^{{x}_{i}{x}_{j}}\). We apply this gadget successively to n/2 non-overlapping pairs of qubits to reproduce the final layer of Hadamard gates. The gadgets require 3n/2CZ gates and n/2 − 1 constant functions, so that every fourth of the additional functions produces an identity between two gadgets. In total, an odd number of Boolean functions fk+1, …fk+4n/2−1 is added. Obviously, this extends the problem instance from an even to an odd number of Boolean functions, while keeping the circuit equivalent (under SWAP operations) to the original one defined by even k Boolean functions. In other words, the value Φ is preserved since SWAP operations do not affect the amplitude of \({\left|0\right\rangle }^{\otimes n}\). For the pairwise application of the 2-qubit gadgets in the case of an odd number of qubits n, one can introduce an ancilla qubit in \(\left|0\right\rangle\). The final result remains unaffected as this (n + 1)-th qubit ends up in \(\left|0\right\rangle\) and is, therefore, not entangled.