Variational quantum approximate support vector machine with inference transfer

Park, Siheon; Park, Daniel K.; Rhee, June-Koo Kevin

doi:10.1038/s41598-023-29495-y

Download PDF

Article
Open access
Published: 25 February 2023

Variational quantum approximate support vector machine with inference transfer

Siheon Park¹,
Daniel K. Park^2,3 &
June-Koo Kevin Rhee^1,4

Scientific Reports volume 13, Article number: 3288 (2023) Cite this article

3325 Accesses
5 Citations
20 Altmetric
Metrics details

Subjects

Abstract

A kernel-based quantum classifier is the most practical and influential quantum machine learning technique for the hyper-linear classification of complex data. We propose a Variational Quantum Approximate Support Vector Machine (VQASVM) algorithm that demonstrates empirical sub-quadratic run-time complexity with quantum operations feasible even in NISQ computers. We experimented our algorithm with toy example dataset on cloud-based NISQ machines as a proof of concept. We also numerically investigated its performance on the standard Iris flower and MNIST datasets to confirm the practicality and scalability.

Universal expressiveness of variational quantum classifiers and quantum kernels for support vector machines

Article Open access 02 February 2023

Quantum classifier with tailored quantum kernel

Article Open access 15 May 2020

A rigorous and robust quantum speed-up in supervised machine learning

Article 12 July 2021

Introduction

Quantum computing opens up new exciting prospects of quantum advantages in machine learning in terms of sample and computation complexity^1,2,3,4,5. One of the foundations of these quantum advantages is the ability to form and manipulate data efficiently in a large quantum feature space, especially with kernel functions used in classification and other classes of machine learning^{6,7,8,9,10,11,12,13,14}.

The support vector machine (henceforth SVM)¹⁵ is one of the most comprehensive models that help conceptualize the basis of supervised machine learning. SVM classifies data by finding the optimal hyperplane associated with the widest margin between the two classes in a feature space. SVM can also perform highly nonlinear classifications using what is known as the kernel trick^16,17,18. The convexity of SVM guarantees global optimization.

One of the first quantum algorithms exhibiting an exponential speed-up capability is the least-square quantum support vector machine (LS-QSVM)⁵. However, the quantum advantage of LS-QSVM strongly depends on costly quantum subroutines such as density matrix exponentiation¹⁹ and quantum matrix inversion^20,21 as well as components such as quantum random access memory (QRAM)^2,22. Because the corresponding procedures require quantum computers to be fault-tolerant, LS-QSVM is unlikely to be realized in noisy intermediate-scale quantum (NISQ) devices²³. On the other hand, there are a few quantum kernel-based machine-learning algorithms for near-term quantum applications. Well-known examples are quantum kernel estimators (QKE)⁸, variational quantum classifiers (VQC)⁸, and Hadamard or SWAP test classifiers (HTC, STC)^10,11. These algorithms are applicable to NISQ, as there are no costly operations needed. However, the training time complexity is even worse than in the classical SVM case. For example, the number of measurements required to generate only the kernel matrix evaluation of QKE scales with the number of training samples to the power of four⁸.

Here, we propose a novel quantum kernel-based classifier that is feasible with NISQ devices and that can exhibit a quantum advantage in terms of accuracy and training time complexity as exerted in Ref.⁸. Specifically, we have discovered distinctive designs of quantum circuits that can evaluate the objective and decision functions of SVM. The number of measurements for these circuits with a bounded error is independent from the number of training samples. The depth of these circuits scales also linearly with the size of the training dataset. Meanwhile, the exponentially fewer parameters of parameterized quantum circuits (PQCs)²⁴ encodes the Lagrange multipliers of SVM. Therefore, the training time of our model with a variational quantum algorithm (VQA)²⁵ scales as sub-quadratic, which is asymptotically lower than that of the classical SVM case^5,26,27. Our model also shows an advantage in classification due to its compatibility with any typical quantum feature map.

Results

Support vector machine (SVM)

Data classification infers the most likely class of an unseen data point $\hat{{\textbf {x}}} \in \mathbbm {C}^N$ given a training dataset $\mathcal {S} = \left\{ \left( \textbf{x}_i, y_i\right) \right\} _{i=0}^{M-1}$ $\subset$ $\mathcal {X}\times \mathcal {Y}$. Here, $\mathcal {X}\subset \mathbbm {C}^N$ and $\mathcal {Y}=\{0,1,\ldots ,L-1\}$. Although the data is real-valued in practical machine learning tasks, we allow complex-valued data without a loss of generality. We focus on binary classification. (i.e., $\mathcal {Y}=\{-1, 1\}$), because multi-class classification can be conducted with a multiple binary SVM via a one-versus-all or a one-versus-on scheme²⁸. We assume that $\mathcal {S}$ is linearly separable in the higher dimensional Hilbert space $\mathcal {H}$ given some feature map $\phi :\mathcal {X}\mapsto \mathcal {H}$. Then, there should exist two parallel supporting hyperplanes $\langle \textbf{w}, \phi (\cdot )\rangle +b=y\in \mathcal {Y}$ that divide training data. The goal is to find hyperplanes for which the margin between them is maximized. To maximize the margin even further, the linearly separable condition can be relaxed so that some of the training data can penetrate into the “soft” margin. Because the margin is given as $2/\Vert \textbf{w}\Vert$ by simple geometry, the mathematical formulation of SVM¹³ is given as

$$\begin{aligned} p^\star =\min _{\textbf{w}, b, \varvec{\xi }}\frac{1}{2}\Vert \textbf{w}\Vert ^2+\frac{C}{2}\sum _{i=0}^{M-1}{\xi _i^2}~:~ y_i\left( \langle \textbf{w}, \phi (\textbf{x}_i)\rangle +b\right) \ge 1-\xi _i, \end{aligned}$$

(1)

where the slack variable $\xi$ is introduced to represent a violation of the data in the linearly separable condition. The dual formulation of SVM is expressed as²⁹

$$\begin{aligned} d^\star =\max _{\varvec{\beta }\succeq 0}{\sum _{i=0}^{M-1}{\beta _i}-\frac{1}{2}\sum _{i,j=0}^{M-1}{\beta _i\beta _jy_iy_jk(\textbf{x}_i, \textbf{x}_j)}-\frac{1}{2C}\sum _{i=0}^{M-1}{\beta _i^2}}~:~\sum _{i=0}^{M-1}{\beta _iy_i}=0, \end{aligned}$$

(2)

where the positive semi-definite (PSD) kernel is $k(\textbf{x}_1, \textbf{x}_2) = \langle {\phi (\textbf{x}_1)},{\phi (\textbf{x}_2)}\rangle$ for $\textbf{x}_{1,2} \in \mathcal {X}$. The $\beta _i$ values are non-negative Karush-Kuhn-Tucker multipliers. This formulation employs an implicit feature map uniquely determined by the kernel. The global solution $\varvec{\beta }^\star$ is obtained in polynomial time due to convexity²⁹. After optimization, the optimum bias is recovered as $b^{\star }=y_q(1-C^{-1}\beta ^\star _q)-\sum _{i=0}^{M-1}{\beta ^{\star }_i y_i k(\textbf{x}_q, \textbf{x}_i)}$ for any $\beta ^\star _q>0$. Such training data $\textbf{x}_q$ with non-zero weight $\beta _q$ are known as the support vectors. We estimate the labels of unseen data with a binary classifier:

$$\begin{aligned} \hat{y} = \textrm{sgn}\left\{ \sum _{i=0}^{M-1}{\beta ^{\star }_i y_i k({\textbf{x}_i},{\hat{\textbf{x}})}} +b^{\star }\right\} . \end{aligned}$$

(3)

In a first-hand principle analysis, the complexity of solving Eq. (2) is $\mathcal {O}(M^2(N+M)\log (1/\delta ))$ with accuracy of $\delta$. A kernel function with complexity of $\mathcal {O}(N)$ is queried $M(M-1)/2$ times to construct the kernel matrix, and quadratic programming takes $\mathcal {O}(M^3\log (1/\delta ))$ to find $\varvec{\beta }^\star$ for a non-sparse kernel matrix⁵. Although the complexity of SVM decreases when employing modern programming methods^26,27, it is still higher than or equal to $\mathcal {O}(M^2N)$ due to kernel matrix generation and quadratic programming. Thus, a quantum algorithm that evaluates all terms in Eq. (2) for $\mathcal {O}(MN)$ time and achieves a minimum of fewer than $\mathcal {O}(M)$ evaluations would have lower complexity than classical algorithms. We apply two forms of transformations to Eq. (2) to realize an efficient quantum algorithm.

Change of variable and bias regularization

Constrained programming, such as that in the SVM case, is often transformed into unconstrained programming by adding penalty terms of constraints to the objective function. Although there are well-known methods such as an interior point method²⁹, we prefer the strategies of a ‘change of variables’ and ‘bias regularization’ to maintain the quadratic form of SVM. Although motivated to eliminate constraints, the results appear likely to lead to an efficient quantum SVM algorithm.

First, we change optimization variable $\varvec{\beta }$ to $(\varvec{\alpha }, B)$, where $B:=\sum _{i=0}^{M-1}{\beta _i}$ and $\varvec{\alpha }:=\varvec{\beta }/B$ to eliminate inequality constraints. The $l_1$-normalized variable $\varvec{\alpha }$ is an M-dimensional probability vector given that $0\le \alpha _i\le 1, \forall {i}\in \left\{ 0,\ldots ,M-1\right\}$ and $\sum _{i=0}^{M-1}{\alpha _i}=1$. Let us define $W_k(\varvec{\alpha };\mathcal {S}):=\sum _{i,j=0}^{M-1}{} \alpha _i \alpha _j y_i y_j k\left( \textbf{x}_i, \textbf{x}_j\right) + C^{-1}\sum _{i=0}^{M-1}{}\alpha ^2_i$. We substitute the variables into Eq. (2):

$$\begin{aligned} \max _{\varvec{\alpha }\in {PV}_M}\max _{B\ge 0}\left\{ B-\frac{1}{2}B^2W_k(\varvec{\alpha };\mathcal {S})\right\} :\sum _{i=0}^{M-1}{\alpha _i y_i}=0, \end{aligned}$$

(4)

where ${PV}_M$ is a set of M-dimensional probability vectors. Because $W_k(\varvec{\alpha };\mathcal {S})\ge 0$ for an arbitrary $\varvec{\alpha }$ due to the property of the positive semi-definite kernel, $B^{\star } = 1/W_k(\varvec{\alpha };\mathcal {S})$ is a partial solution that maximizes Eq. (4) on B. Substituting $B^\star$ with Eq. (4), we have

$$\begin{aligned} \max _{\varvec{\alpha }\in {PV}_M}{\frac{1}{2W_k(\varvec{\alpha };\mathcal {S})}}:\sum _{i=0}^{M-1}{\alpha _i y_i}=0. \end{aligned}$$

(5)

Finally, because maximizing $1/2W_k(\varvec{\alpha };\mathcal {S})$ is identical to minimizing $W_k(\varvec{\alpha };\mathcal {S})$, we have a simpler formula that is equivalent to Eq. (2):

$$\begin{aligned} \tilde{d}^\star = \min _{\varvec{\alpha }\in {PV}_M}{\sum _{i,j=0}^{M-1}{\alpha _i\alpha _j y_i y_j k(\textbf{x}_i, \textbf{x}_j)}+\frac{1}{C}\sum _{i=0}^{M-1}{\alpha ^2_i}}:\sum _{i=0}^{M-1}{\alpha _i y_i}=0, \end{aligned}$$

(6)

The above Eq. (6) implies that instead of optimizing M numbers of bounded free parameters $\varvec{\beta }$ or $\varvec{\alpha }$, we can optimize the $\log (M)$-qubit quantum state $|\psi _{\varvec{\alpha }}\rangle$ and define $\alpha _i:=|\langle i|\psi _{\varvec{\alpha }}\rangle |^2$. Therefore, if there exists an efficient quantum algorithm that evaluates the objective function of Eq. (6) given $|\psi _{\varvec{\alpha }}\rangle$, the complexity of SVM would be improved. In fact, in the later section, we propose quantum circuits with linearly scaling complexity for that purpose.

The equality constraint is relaxed after adding the $l_2$-regularization term of the bias to Eq. (1). Motivated by the loss function and regularization perspectives of SVM³⁰, this technique was introduced^31,32 and developed^33,34 previously. The primal and dual forms of SVM become

$$\begin{aligned} p^\star= & {} \min _{\textbf{w}, b, \varvec{\xi }}\frac{1}{2}\Vert \textbf{w}\Vert ^2+\frac{\lambda }{2}b^2+\frac{C}{2}\sum _{i=0}^{M-1}{\xi _i^2}~:~ y_i\left( \langle \textbf{w}, \phi (\textbf{x}_i)\rangle +b\right) \ge 1-\xi _i, \end{aligned}$$

(7)

$$\begin{aligned} d^\star= & {} \max _{\varvec{\beta }\succeq 0}{\sum _{i=0}^{M-1}{\beta _i}-\frac{1}{2}\sum _{i,j=0}^{M-1}{\beta _i\beta _jy_iy_j\left[ k(\textbf{x}_i, \textbf{x}_j)+\frac{1}{\lambda }\right] }-\frac{1}{2C}\sum _{i=0}^{M-1}{\beta _i^2}}. \end{aligned}$$

(8)

Note that $k(\cdot , \cdot )+\lambda ^{-1}$ is a positive definite. As shown earlier, changing the variables causes Eq. (8) to become another equivalent optimization problem:

$$\begin{aligned} \tilde{d}^\star = \min _{\varvec{\alpha }\in {PV}_M}{\sum _{i,j=0}^{M-1}{\alpha _i\alpha _jy_iy_j\left[ k(\textbf{x}_i, \textbf{x}_j)+\frac{1}{\lambda }\right] }+\frac{1}{C}\sum _{i=0}^{M-1}{\alpha _i^2}}. \end{aligned}$$

(9)

As the optimal bias is given as $b^{\star }=\lambda ^{-1}\sum _{i=0}^{M-1}{\alpha ^{\star }_i y_i}$ according to the Karush-Kuhn-Tucker condition, the classification formula inherited from Eq. (3) is expressed as

$$\begin{aligned} \hat{y} = \textrm{sgn}\left\{ \sum _{i=0}^{M-1}{\alpha ^{\star }_i y_i {\left[ k(\textbf{x}_i, \textbf{x}_j)+\frac{1}{\lambda }\right] }}\right\} . \end{aligned}$$

(10)

Equations (8) and (9) can be viewed as Eqs. (2) and (6) with a quadratic penalizing term on the equality constraint such that they become equivalent in terms of the limit of $\lambda \rightarrow 0$. Thus, Eqs. (7), (8), and (9) are more relaxed SVM optimization problems with an additional hyperparameter $\lambda$.

Variational quantum approximate support vector machine

One way to generate the aforementioned quantum state $|\psi _{\varvec{\alpha }}\rangle$ is to use amplitude encoding: $|\psi _{\varvec{\alpha }}\rangle =\sum _{i=0}^{M-1}{\sqrt{\alpha _i}|i\rangle }$. However, doing so would be inefficient because the unitary gate of amplitude encoding has a complex structure that scales as $\mathcal {O}(\textrm{poly}(M))$³⁵. Another way to generate $|\psi _{\varvec{\alpha }}\rangle$ is to use a parameterized quantum circuit (PQC), known as an ansatz in this case. Because there is no prior knowledge in the distribution of $\alpha _i^\star$s, the initial state should be $|++\cdots +\rangle =\frac{1}{\sqrt{M}}\sum _{i=0}^{M-1}|i\rangle$. The ansatz $V(\varvec{\theta })$ can transform the initial state into other states depending on gate parameter vector $\varvec{\theta }$: $|\psi _{\varvec{\alpha }}\rangle =V(\varvec{\theta })|++\cdots +\rangle$. In other words, optimization parameters encoded by $\varvec{\theta }$ with the ansatz are represented as $\alpha _i(\varvec{\theta })=|\langle i|{V(\varvec{\theta })}|++\cdots +\rangle |^2$. Given the lack of prior information, the most efficient ansatz design can be a hardware-efficient ansatz (HEA), which consists of alternating local rotation layers and entanglement layers^36,37. The number of qubits and the depth of this ansatz are $\mathcal {O}(\textrm{polylog}(M))$.

We discovered the quantum circuit designs that compute Eqs. (9) and (10) within $\mathcal {O}(M)$ time. Conventionally, the quantum kernel function is defined as the Hilbert Schmidt inner product: $k(\cdot ,\cdot )=|\langle \phi (\cdot )|\phi (\cdot )\rangle |^2$^{4,7,8,10,11,12}. First, we divide the objective function in Eq. (9) into the loss and regularizing functions of $\varvec{\theta }$ using the above ansatz encoding:

$$\begin{aligned} \mathcal {L}_{\phi , \lambda }\left( \varvec{\theta };\mathcal {S}\right) = \sum _{i, j=0}^{M-1}{\alpha _i(\varvec{\theta })\alpha _j(\varvec{\theta }) y_iy_j \left[ |\langle \phi (\textbf{x}_i)|\phi (\textbf{x}_j)\rangle |^2 + \frac{1}{\lambda }\right] },~\mathcal {R}(\varvec{\theta }) = \sum _{i=0}^{M-1}{\alpha _i(\varvec{\theta })^2}. \end{aligned}$$

(11)

Specifically, the objective function is equal to $\mathcal {L}_{\phi , \lambda }+{C^{-1}}\mathcal {R}$. Similarly, the decision function in Eq. (10) becomes

$$\begin{aligned} f_{\phi , \lambda }\left( \textbf{x};\varvec{\theta },\mathcal {S}\right) = \sum _{i=0}^{M-1}\alpha _i(\varvec{\theta })y_i{\left[ |\langle \phi (\textbf{x}_i)|\phi (\textbf{x})\rangle |^2 + \frac{1}{\lambda }\right] }. \end{aligned}$$

(12)

Inspired by STC^10,11, the quantum circuits in Fig. 1 efficiently evaluate $\mathcal {L}_{\phi , \lambda }, \mathcal {R}$ and $f_{\phi , \lambda }$. The quantum gate $\mathcal {U}_{\phi , \mathcal {S}}$ embeds the entire training dataset with the corresponding quantum feature map $U_{\phi (\textbf{x})}|00\ldots 0\rangle =|\phi (\textbf{x})\rangle$, so that $\mathcal {U}_{\phi , \mathcal {S}}|i\rangle \otimes |00\ldots 0\rangle \otimes |0\rangle =|i\rangle \otimes |\phi (\textbf{x}_i)\rangle \otimes |y_i\rangle$. Therefore, the quantum state after state preparation is $|\Psi \rangle =\sum _{i=0}^{M-1}{}\sqrt{\alpha _i}|i\rangle \otimes |\phi (\textbf{x}_i)\rangle \otimes |y_i\rangle$. We apply a SWAP test and a joint $\sigma _z$ measurement in the loss and decision circuits to evaluate $\mathcal {L}_{\phi , \lambda }$ and $f_{\phi , \lambda }$:

$$\begin{aligned} \mathcal {L}_{\phi , \lambda }\left( \varvec{\theta };\mathcal {S}\right) =\langle \sigma _z^a \sigma _z^{y_1} \sigma _z^{y_2}\rangle _{\varvec{\theta }}+\frac{1}{\lambda }\langle \sigma _z^{y_1} \sigma _z^{y_2}\rangle _{\varvec{\theta }},~ f_{\phi , \lambda }\left( \textbf{x};\varvec{\theta },\mathcal {S}\right) =\langle \sigma _z^a \sigma _z^y\rangle _{\textbf{x};\varvec{\theta }}+\frac{1}{\lambda }\langle \sigma _z^y\rangle _{\textbf{x};\varvec{\theta }}, \mathcal {R}(\varvec{\theta })=\langle M_{00\ldots 0} \rangle _{\varvec{\theta }}. \end{aligned}$$

(13)

Here, $\sigma _z$ is a Pauli Z operator and $M_{00\ldots 0}$ is a projection measurement operator of state $|0\rangle ^{\otimes {\log (M)}}$. See “Methods” for specified derivations. The asymptotic complexities of the loss and decision circuits are linear with regard to the amount of training data. $\mathcal {U}_{\phi , \mathcal {S}}$ can be prepared with $\mathcal {O}(MN)$ operations⁴. See “Methods” for the specific realization used in this article. Because $V(\varvec{\theta })$ has $\mathcal {O}(\textrm{polylog}(M))$ depth and the SWAP test requires only $\mathcal {O}(N)$ operations, the overall run-time complexity of evaluating $\mathcal {L}_{\phi , \lambda }(\varvec{\theta };\mathcal {S})$ and $f_{\phi , \lambda }(\textbf{x}; \varvec{\theta }, \mathcal {S})$ with bounded error $\varepsilon$ is $\mathcal {O}(\varepsilon ^{-2}MN)$ (see the Supplementary Fig. S5 online for numerical verification of the scaling). Similarly, the complexity of estimating $\mathcal {R}(\varvec{\theta })$ with accuracy $\varepsilon$ is $\mathcal {O}(\varepsilon ^{-2}\textrm{polylog}(M))$ due to two parallel ${V(\varvec{\theta })}$s and $\mathcal {O}(\log (M))$ CNOT gates.

We propose a variational quantum approximate support vector machine (VQASVM) algorithm that solves the SVM optimization problem with VQA²⁵ and transfers the optimized parameters to classify new data effectively. Figure 2 summarizes the process of VQASVM. We estimate $\varvec{\theta }^\star$, which minimizes the objective function; this is then used for classifying unseen data:

$$\begin{aligned} \varvec{\theta }^{\star } = \mathop {\mathrm {arg\,min}}_{\varvec{\theta }}{\mathcal {L}_{\phi , \lambda }\left( \varvec{\theta };\mathcal {S}\right) }+\frac{1}{C}\mathcal {R}(\varvec{\theta }),~\hat{y}=\textrm{sgn}\left\{ f_{\phi , \lambda }\left( \hat{\textbf{x}};\varvec{\theta }^{\star }, \mathcal {S}\right) \right\} . \end{aligned}$$

(14)

Following the general scheme of VQA, the gradient descent (GD) algorithm can be applied; classical processors update the parameters of $\theta _i$, whereas the quantum processors evaluate the functions for computing gradients. Because the objective function of VQASVM can be expressed as the expectation value of a Hamiltonian, i.e.,

$$\begin{aligned} \mathcal {L}(\varvec{\theta };\mathcal {S})+\frac{1}{C}\mathcal {R}(\varvec{\theta })=\langle \varvec{+}|V(\varvec{\theta })^\dagger \otimes V(\varvec{\theta })^\dagger H V(\varvec{\theta })\otimes V(\varvec{\theta }) |\varvec{+}\rangle , \end{aligned}$$

(15)

where $H=\sum _{ij=0}^{M-1}\left[ y_iy_jk(\textbf{x}_i,\textbf{x}_j)+\frac{1}{\lambda }y_iy_j+\frac{1}{C}\delta _{ij}\right] |i\rangle \langle i|\otimes |j\rangle \langle j|$, the exact gradient can be obtained by the modified parameter-shift rule^38,39. GD converges to a local minimum after $\mathcal {O}(\log (1/\delta ))$ iterations with the difference $\delta$²⁹ given that estimation error of the objective function is smaller than $\delta$. Therefore, the total run-time complexity of VQASVM is $\mathcal {O}(\varepsilon ^{-2}\log ({1/\varepsilon }) M N \textrm{polylog}(M))$ with error of $\varepsilon$ as the number of parameters is $\mathcal {O}(\textrm{polylog}(M))$.

Experiments on IBM quantum processors

We demonstrate the classification of a toy dataset using the VQASVM algorithm on NISQ computers as a proof-of-concept. Our example dataset is mapped to a Bloch sphere, as shown in Fig. 3a. Due to decoherence, we set the data dimension to $N=2$ and number of training data instances to $M=4$. First, we randomly choose the greatest circle on the Bloch sphere that passes $|0\rangle$ and $|1\rangle$. Then, we randomly choose two opposite points on the circle to be the center of two classes, A and B. Subsequently, four training data instances are generated close to each class center in order to avoid overlaps between the test data and each other. This results in a good training dataset with the maximum margin such that soft-margin consideration is not needed. In addition, thirty test data instances are generated evenly along the great circle and are labelled as 1 or − 1 according to the inner products with the class centers. In this case, we can set hyperparameter $C\rightarrow \infty$ and the process hence requires no regularization circuit evaluation. The test dataset is non-trivial to classify given that the test data are located mostly in the margin area; convex hulls of both training datasets do not include most of the test data.

We choose a quantum feature map that embeds data $(\textrm{x}_0, \textrm{x}_1)$ into a Bloch sphere instead of $N=2$ qubits: $U_{\phi (\textrm{x}_0, \textrm{x}_1)}=R_z(\textrm{x}_1)R_y(\textrm{x}_0)$. Features $\textrm{x}_0$ and $\textrm{x}_1$ are the latitude and the longitude of the Bloch sphere. We use two qubits ($q_0$ and $q_1$) RealAmplitude⁴⁰ PQC as the ansatz: $V(\varvec{\theta })=R^{q_0}_y(\theta _2)\otimes R^{q_1}_y(\theta _3)$ ${CNOT}_{q_0\rightarrow q_1}$ $R^{q_0}_y(\theta _0)\otimes R^{q_1}_y(\theta _1)$. In this experiment, we use ibmq_montreal, which is one of the IBM Quantum Falcon processors. “Methods” section presents the specific techniques for optimizing quantum circuits against decoherence. The simultaneous perturbation stochastic approximation (SPSA) algorithm is selected to optimize $V(\varvec{\theta })$ due to its rapid convergence and good robustness to noise^41,42. The measurements of each circuits are repeated $R=8192$ times to estimate expectation values, which was the maximum possible option for ibmq_montreal. Due to long queue time of a cloud-based QPU, we reduce the QPU usage by applying warm-start and early-stopping techniques explained in “Methods” section.

Figure 3 shows the classification result. Theoretical decision function values were calculated by solving Eq. (9) with convex optimization. A noisy simulation is a classical simulation that emulates an actual QPU based on a noise parameter set estimated from a noise measurement. Although the scale of the decision values is reduced, the impact on the classification accuracy is negligible given that only the signs of decision values matter. This logic has been applies to most NISQ-applicable quantum binary classifiers. The accuracy would improve with additional error mitigation and offset calibration processes on the quantum device. Other VQASVM demonstrations with different datasets can be found as Supplementary Fig. S2 online.

Numerical simulation

In a practical situation, the measurement on quantum circuits is repeated R times to estimate the expectation value within $\varepsilon =\mathcal {O}(1/\sqrt{R})$ error, which could interfere with the convergence of VQA. However, the numerical analysis with the Iris dataset⁴³ confirmed that VQASVM converges even with the noise in objective function estimation exists. The following paragraphs describe the details of the numerical simulation, such as data preprocessing and the choice of the quantum kernel and the ansatz.

We assigned the labels + 1 to Iris setosa and − 1 to Iris versicolour and Iris virginica for binary classification. The features of the data were scaled so that the range became $[-\pi , \pi ]$. We sampled $M=64$ training data instances from the total dataset and treated the rest as the test data. The training kernel matrix constructed with our custom quantum feature map in Fig. 4a is learnable; i.e., the singular values of the kernel matrix decay exponentially. After testing the PQC designs introduced in Ref.³⁶, we chose the PQC exhibited in Fig. 4b as the ansatz for this simulation (see Supplementary Figs. S6–S9 online). The number of PQC parameters is 30, which is less than M. In this simulation, the SPSA optimizer was used for training due to its fast convergence and robustness to noise^41,42.

The objective and decision functions were evaluated in two scenarios. The first case samples a finite number of measurement results $R=8192$ to estimate the expectation values such that the error of the estimation is non-zero. The second case directly calculates the expectation values with zero estimation error, which corresponds to sampling infinitely many measurement results; i.e., $R=\infty$. We defined the residual loss of training as $\Delta =\mathcal {L}_{\phi , \lambda }(\varvec{\theta }^t;\mathcal {S})+C^{-1}\mathcal {R}(\varvec{\theta }^t)-\tilde{d}^\star$ at iteration t to compare the convergence. Here, $\tilde{d}^\star$ is the theoretical minimum of Eq. (9) as obtained by convex optimization.

Although containing some uncertainty, Fig. 4c shows that SPSA converges to a local minimum despite the estimation noise. Both the $R=8192$ and $R=\infty$ cases show the critical convergence rule of SPSA; $\Delta \sim \mathcal {O}(|\varvec{\theta }|/t)$ for a sufficiently large number of iterations t. More vivid visualization can be found as Supplementary Fig. S10 online. In addition, the spectrum of the optimized Lagrange multipliers $\alpha _i$s mostly coincides with the theory, especially for the significant support vectors. (Fig. 4d) Therefore, we concluded that training VQASVM within a finite number of measurements is achievable. The classification accuracy was 95.34% for $R=8192$ and 94.19% for $R=\infty$.

The empirical speed-up of VQASVM is possible because the number of optimization parameters is exponentially reduced by the PQC encoding. However, in the limit of large number of qubits (i.e., $\log (M)$), it is unclear that such PQC can be trained with VQA to well-approximate the optimal solution. Therefore, we performed numerical analysis to empirically verify that VQASVM with $\mathcal {O}(\textrm{polylog}(M))$ parameters achieves bounded classification error even for large M. For this simulation, the MNIST dataset⁴⁴ was used instead of the Iris dataset because there are not enough Iris data points to clarify the asymptotic behavior of VQASVM. In this setting, the number of maximum possible MNIST training data instances for VQASVM is $M=2^{13}$.

A binary image data of ‘0’s and ‘1’s with a $28\times 28$ image size were selected for binary classification, the features of which were then reduced to 10 by means of a principle component analysis (PCA). The well-known quantum feature map introduced in Ref.⁸ was chosen for the simulation: $U_{\phi (\textbf{x})}=W_{\phi (\textbf{x})}H^{\otimes n}W_{\phi (\textbf{x})}H^{\otimes n}$, where $W_{\phi (\textbf{x})}=\exp {i\sum _{G\subset [n],|G|=2}g_G(\textbf{x})\Pi _{i\in G}\sigma _z^i}$ and $g_{\{i\}}(\textbf{x})=x_i, g_{\{i,j\}}(\textbf{x})=(\pi -x_i)(\pi -x_j)\delta _{i+1,j}$. The visualization of the feature map can be found as Supplementary Fig. S4 online. The ansatz architecture used for this simulation was the PQC template shown in Fig. 4b with 19 layers; i.e., the first part of the PQC in Fig. 4b is repeated 19 times. Thus, the number of optimization parameters is $19\times \log (M)$.

The numerical simulation shows that although residual training loss $\Delta$ linearly increases with the number of PQC qubits, the rate is extremely low, i.e., $\Delta \sim 0.00024\times \log (M)$. Moreover, we could not observe any critical difference in classification accuracy against the reference, i.e., theoretical accuracy obtained by convex optimization. Therefore, at least for $M\le 2^{13}$, we conclude that run-time complexity of $\mathcal {O}(M\mathrm {polylog(M)})$ with bounded classification accuracy is empirically achievable for VQASVM due to $\mathcal {O}(\textrm{polylog}(M))$ number of parameters (see Supplementary Fig. S11 online for a visual summarization of the result).

Discussion

In this work, we propose a novel quantum-classical hybrid supervised machine learning algorithm that achieves run-time complexity of $\mathcal {O}(M\textrm{polylog}(M))$, whereas the complexity of the modern classical algorithm is $\mathcal {O}(M^2)$. The main idea of our VQASVM algorithm is to encode optimization parameters that represent the normalized weight for each training data instance in a quantum state using exponentially fewer parameters. We numerically confirmed the convergence and feasibility of VQASVM even in the presence of expectation value estimation error using SPSA. We also observed the sub-quadratic asymptotic run-time complexity of VQASVM numerically. Finally, VQASVM was tested on cloud-based NISQ processors with a toy example dataset to highlight its practical application potential.

Based on the numerical results, we presume that our variational algorithm can bypass the issues evoked by the expressibility³⁶ and trainability relationship of PQCs; i.e., the highly expressive PQC is not likely to be trained with VQA due to the vanishing gradient variance⁴⁵, a phenomenon known as the barren plateau⁴⁶. This problem has become a critical barrier for most VQAs utilizing a PQC to generate solution states. However, given that the SVM is learnable (i.e., the singular values of the kernel matrix decay exponentially⁵), only a few Lagrange multipliers corresponding to the significant support vectors are critically large; $\alpha _i\gg 1/M$^29,30. For example, Fig. 4d illustrates the statement. Thus, the PQCs encoding optimal multipliers should not necessarily be highly expressive. We speculate that there exists an ansatz generating these sparse probability distributions. Moreover, the optimal solution would have exponential degeneracy because only measurement probability matters instead of the amplitude of the state itself. Therefore, we could not observe a critical decrease in classification accuracy for these reasons even though the barren plateau exists, i.e., $\Delta =\mathcal {O}(\log (M))$. More analytic discussion on the trainability of VQASVM and the ansatz generating sparse probability distribution should be investigated in the future.

The VQASVM method manifests distinctive features compared to LS-QSVM, which solves the linear algebraic optimization problem of least-square SVM⁴⁷ with the quantum algorithm for linear systems of equations (HHL)²⁰. Given that the fault-tolerant universal quantum computers and efficient quantum data loading with QRAM^2,22 are possible, the run-time complexity of LS-QSVM is exponentially low: $\mathcal {O}(\kappa _{\textrm{eff}}^3\varepsilon ^{-3}\log (MN))$ with error $\varepsilon$ and effective condition number $\kappa _{\textrm{eff}}$. However, near-term implementation of LS-QSVM is infeasible due to lengthy quantum subroutines, which VQASVM has managed to avoid. Also, training LS-QSVM has to be repeated for each query of unseen data because the solution state collapses after the measurements at the end; transferring the solution state to classify multiple test data violates the no-cloning theorem. VQASVM can overcome these drawbacks. VQASVM is composed of much shorter operations; VQASVM circuits are much shallower than HHL circuits with the same moderate system size when decomposed in the same universal gate set. The classification phase of VQASVM can be separated from the training phase and performed simultaneously; training results are classically saved and transferred to a decision circuit in other quantum processing units (QPUs).

We continue the discussion on the advantage of our method compared to other quantum kernel-based algorithms, such as a variational quantum classifier (VQC) and quantum kernel estimator (QKE), which are expected to be realized in the near-term NISQ devices⁸. VQC estimates the label of data $\textbf{x}$ as $\hat{y}=\textrm{sgn}\{\bar{f}(\textbf{x};\varvec{\theta })+b\}$, where $\bar{f}(\textbf{x};\varvec{\theta })$ is the empirical average of a binary function f(z) such that z is the N-bit computational basis measurement result of quantum circuit $W(\varvec{\theta })|\phi (\textbf{x})\rangle$. The parameters $\varvec{\theta }$ and b are trained with variational methods that minimize the empirical risk: $R_{\textrm{emp}}(\varvec{\theta }, b)=\sum _{(\textbf{x}, y)\in \mathcal {S}}\Pr \left[ y\ne \textrm{sgn}\{\bar{f}(\textbf{x};\varvec{\theta })+b\}\right] /{|\mathcal {S}|}$. This requires $\mathcal {O}(MN\times |\varvec{\theta }|)$ quantum circuit measurements per iteration⁴. Subsequently, the complexity of VQC would match VQASVM from the heuristic point of view. However, VQC does not take advantage of the strong duality; the optimal state that $W(\varvec{\theta ^\star })^\dagger$ should generate has no characteristic, whereas the optimal distribution of $\alpha ^\star _i$ that ansatz of VQASVM should generate is very sparse, i.e., most $\alpha _i$s are close to zero. Therefore, optimizing VQC in terms of the sufficiently large number of qubits would be vulnerable to issues such as local minima and barren plateaus. On the other hand, QKE estimates the kernel matrix elements $\hat{K}_{ij}$ from the empirical probability of measuring a N-bit zero sequence on quantum circuit ${U_{\phi ({\textbf{x}_i})}}^\dagger {U_{\phi ({\textbf{x}_i})}}|\textbf{0}\rangle$, given the kernel matrix ${K}_{ij}=|\langle \phi (\textbf{x}_i)|\phi (\textbf{x}_j)\rangle |^2$. The estimated kernel matrix is feed into the classical kernel-based algorithms, such as SVM. That is why we used QKE as the reference for the numerical analysis. The kernel matrix can be estimated in $\mathcal {O}(\varepsilon ^{-2}M^4)$ measurements with the bounded error of $||K-\hat{K}||\le \varepsilon$. Thus, QKE has much higher complexity than both VQASVM and classical SVM⁸. In addition, unlike QKE, the generalized error converges to zero as $M\rightarrow \infty$ due to the exponentially fewer parameters of VQASVM, strengthening the reliability of the training⁴⁸. The numerical linearity relation with the decision function error $\mathcal {E}_f$ and $\Delta$ supports the claim (see Supplementary Fig. S11 online).

VQASVM can be enhanced further with kernel optimization. Like other quantum-kernel-based methods, the choice of the quantum feature map is crucial for VQASVM. Unlike previous methods (e.g., quantum kernel alignment⁴⁹), VQASVM can optimize a quantum feature map online during the training process. Given that $U_{\phi (\cdot )}$ is tuned with other parameters $\varvec{\varphi }$, optimal parameters should be the saddle point: $(\varvec{\theta }^\star , \varvec{\varphi }^\star ) = \mathop {\mathrm {arg\,min}}_{\varvec{\theta }}\mathop {\textrm{max}}_{\varvec{\varphi }}\mathcal {L}_{\phi [\varvec{\varphi }], \lambda }(\varvec{\theta })+C^{-1}\mathcal {R}(\varvec{\theta })$. In addition, tailored quantum kernels (e.g., $k(\cdot ,\cdot )=|\langle \phi (\cdot )|\phi (\cdot )\rangle |^{2r}$) can be adapted with the simple modification¹⁰ on the quantum circuits for VQASVM to improve classification accuracy. However, because the quantum advantage in classification accuracy derived from the power of quantum kernels is not the scope of this paper, we leave the remaining discussion for the future. Another method to improve VQASVM is boosting. Since VQASVM is not a convex problem, the performance may depend on the initial point and not be immune to the overfitting problem, like other kernel-based algorithms. A boosting method can be applied to improve classification accuracy by cascading low-performance VQASVMs. Because each VQASVM model only requires $\mathcal {O}(\log (M)+N)$ space, ensemble methods such as boosting are suitable for VQASVM.

Methods

Proof of Eq. (13)

First, we note that a SWAP test operation $(H_a\cdot \textrm{cSWAP}_{a\rightarrow b, c}\cdot H_a)$ in Fig. 1 measures the Hilbert–Schmidt inner product between two pure states by estimating $\langle \sigma _z^a\rangle =|\langle \phi |\psi \rangle |^2$, where a is the control qubit and $|\phi \rangle$ and $|\psi _{\varvec{\alpha }}\rangle$ are states on target qubits b and c, respectively. Quantum registers i and j in Fig. 1 are traced out because measurements are performed on only a and y qubits. The reduced density matrix on x and y quantum registers before the controlled-SWAP operation is $\rho _{x, y} = \sum _{i=0}^{M-1}\alpha _i |\phi (\textbf{x}_i)\rangle \langle \phi (\textbf{x}_i)|\otimes |y_i\rangle \langle y_i|$, which is the statistical sum of quantum states $|\phi (\textbf{x}_i)\rangle \otimes |y_i\rangle$ with probability $\alpha _i$. Let us first consider the decision circuit (Fig. 1b). Given that the states $|\phi (\textbf{x}_i)\rangle _{x}\otimes |y_i\rangle _{y}$ and $|\phi (\hat{\textbf{x}})\rangle _{\hat{x}}$ are prepared, $\langle \mathcal {A}_{a\rightarrow x,\hat{x}}\sigma _z^y\rangle =\langle \mathcal {A}_{a\rightarrow x,\hat{x}}\rangle \langle \sigma _z^y\rangle =y_i\langle \mathcal {A}_{a\rightarrow x, \hat{x}}\rangle$ due to separability. Here, $\mathcal {A}_{a\rightarrow x,\hat{x}}$ can be $(H_a\cdot \textrm{cSWAP}_{a\rightarrow x,\hat{x}}\cdot H_a)^\dagger \sigma _z^a (H_a\cdot \textrm{cSWAP}_{a\rightarrow x,\hat{x}}\cdot H_a)$ or $\frac{1}{\lambda }\sigma ^0_a$. Similarly, for the loss circuit (Fig. 1a), we have states $|\phi (\textbf{x}_i)\rangle _{x_0}\otimes |y_i\rangle _{y_0}$ and $|\phi (\textbf{x}_j)\rangle _{x_1}\otimes |y_j\rangle _{y_1}$ with probability $\alpha _i\alpha _j$ such that $\langle \mathcal {A}_{a\rightarrow x_0,x_1}\sigma _z^{y_0}\sigma _z^{y_1}\rangle =\langle \mathcal {A}_{a\rightarrow x_0,x_1}\rangle \langle \sigma _z^{y_0}\sigma _z^{y_1}\rangle =y_iy_j\langle \mathcal {A}_{a\rightarrow x_0, x_1}\rangle$. Therefore, from the definition of our quantum kernel, each term in Eq. (13) matches the loss and decision functions in Eqs. (11) and (12). More direct proof is provided in Ref.^10,11 and in the Supplementary Information section B (online).

Realization of quantum circuits

In this article, $\mathcal {U}_{\phi , \mathcal {S}}$ is realized using uniformly controlled one-qubit gates, which require at most $M-1$ CNOT gates, M one-qubit gates, and a single diagonal $(\log (M) + 1)$-qubit gate^35,50. We compiled the quantum feature map with a basis gate set composed of Pauli rotations and CNOT. $\mathcal {U}_{\phi , \mathcal {S}}$ can be efficiently implemented by replacing all Pauli rotations with uniformly controlled Pauli rotations. The training data label embedding of $\mathcal {U}_{\phi , \mathcal {S}}$ can also be easily implemented using a uniformly controlled Pauli X rotation (i.e., setting the rotation angle to $\pi$ if the label is positive and 0 otherwise). Although this procedure allows one to incorporate existing quantum feature maps, the complexity can increase to $\mathcal {O}(MN^2)$ if the quantum feature map contains all-to-all connecting parameterized two-qubit gates. Nonetheless, such a value of $\mathcal {U}_{\phi , \mathcal {S}}$ has linear complexity proportional to the number of training data instances.

Application to IBM quantum processors

Because IBM quantum processors are based on superconducting qubits, all-to-all connections are not possible. Additional SWAP operations among qubits for distant interactions would shorten the effective decoherence time and increase the noise. We carefully selected the physical qubits of ibmq_montreal in order to reduce the number of SWAP operations. For $M=4$ and single qubit embedding, $m=2$ and $n=1$. Thus, multi-qubit interaction is required for the following connections: $(a, x_0, x_1)$, $([i_0, i_1], x_0)$, $([j_0, j_1], x_1)$, $([i_0, i_1], y_0)$, and $([j_0, j_1], y_1)$. We initially selected nine qubits connected in a linear topology such that the overall estimated single and two-qubit gate errors are lowest among all other possible options. The noise parameters and topology of ibmq_montreal are provided by IBM Quantum. For instance, the physical qubits indexed as 1, 2, 3, 4, 5, 8, 11, 14, and 16 in ibmq_montreal were selected in this article (see Supplementary Fig. S1 online) We then assign a virtual qubit in the order of $y_0, i_0, i_1, x_0, a, x_1, j_0, j_1, y_1$ so that the aforementioned required connections can be made between qubits next to each other. In conclusion, mapping from virtual qubits to physical qubits proceeds as $\{a\mapsto 5, i_0\mapsto 2, i_1\mapsto 1, x_i\mapsto 3, y_i\mapsto 4, j_0\mapsto 11, j_1\mapsto 14, x_j\mapsto 8, y_j\mapsto 16\}$ in this experiment. We report that with this arrangement, the circuit depths of loss and decision circuits are correspondingly 60 and 59 for a balanced dataset and 64 and 63 for the an unbalanced dataset in the basis gate set of ibmq_montreal: $\left\{ R_z, \sqrt{X}, X, CNOT\right\}$.

Additional techniques on SPSA

The conventional SPSA algorithm has been adjusted for faster and better convergence. First, the blocking technique was introduced. Assuming that the variance $\sigma ^2$ of objective function $\mathcal {L}_{\phi , \lambda }+C^{-1}\mathcal {R}$ is uniform on parameter $\varvec{\theta }$, the next iteration $t+1$ is rejected if $\left[ \mathcal {L}_{\phi , \lambda }+C^{-1}\mathcal {R}\right] (\varvec{\theta }^{t+1})\ge \left[ \mathcal {L}_{\phi , \lambda }+C^{-1}\mathcal {R}\right] (\varvec{\theta }^{t})+2\sigma$. SPSA would converge more rapidly with blocking by preventing its objective function from becoming too large with some probability (see Supplementary Fig. S10). Second, Early-stopping is applied. Iterations are terminated if certain conditions are satisfied. Specifically, we stop SPSA optimization if the average of last 16 recorded training loss values is greater than or equal to the last 32 recorded values. Early stopping reduces the training time drastically, especially when running on a QPU. Last, we averaged the last 16 recorded parameters to yield the result $\varvec{\theta }^{\star }=\frac{1}{16}\sum _{i=0}^{15}\varvec{\theta }^{t-i}$. Combinations of these techniques were selected for better optimization. We adopted all these techniques for the experiments and simulations as the default condition.

Warm-start optimization

We report cases in which the optimization of IBM Q Quantum Processors yields vanishing kernel amplitudes due to the constantly varying error map problem. The total run time should be minimized to avoid this problem. Because accessing a QPU takes a relatively long queue time, we apply a ‘warm-start’ technique, which reduces number of QPU uses. First, we initialize and proceed a few iterations (32) with a noisy simulation on a CPU and then evaluate the functions on a QPU for the remaining iterations. Note that an SPSA optimizer requires heavy initialization computation, such as when the initial variance is calculated. With this warm-start method, we are able to obtain better results on some trials.

Data availability

The numerical data generated in this work are available from the corresponding author upon reasonable request. https://github.com/Siheon-Park/QUIC-Projects.

References

Sharma, K. et al. Reformulation of the no-free-lunch theorem for entangled datasets. Phys. Rev. Lett. 128, 070501. https://doi.org/10.1103/PhysRevLett.128.070501 (2022).
Article ADS MathSciNet CAS PubMed Google Scholar
Lloyd, S., Mohseni, M. & Rebentrost, P. Quantum algorithms for supervised and unsupervised machine learning (arXiv preprint) (2013).
Biamonte, J. et al. Quantum machine learning. Nature 549, 195–202. https://doi.org/10.1038/nature23474 (2017).
Article ADS CAS PubMed Google Scholar
Schuld, M. & Petruccione, F. Supervised Learning with Quantum Computers Vol. 17 (Springer, 2018).
Book MATH Google Scholar
Rebentrost, P., Mohseni, M. & Lloyd, S. Quantum support vector machine for big data classification. Phys. Rev. Lett. 113, 130503. https://doi.org/10.1103/PhysRevLett.113.130503 (2014).
Article ADS CAS PubMed Google Scholar
Schuld, M., Fingerhuth, M. & Petruccione, F. Implementing a distance-based classifier with a quantum interference circuit. EPL (Europhys. Lett.) 119, 60002. https://doi.org/10.1209/0295-5075/119/60002 (2017).
Article ADS CAS Google Scholar
Schuld, M. & Killoran, N. Quantum machine learning in feature Hilbert spaces. Phys. Rev. Lett. 122, 040504. https://doi.org/10.1103/PhysRevLett.122.040504 (2019).
Article ADS CAS PubMed Google Scholar
Havlíček, V. et al. Supervised learning with quantum-enhanced feature spaces. Nature 567, 209–212. https://doi.org/10.1038/s41586-019-0980-2 (2019).
Article ADS CAS PubMed Google Scholar
Lloyd, S., Schuld, M., Ijaz, A., Izaac, J. & Killoran, N. Quantum embeddings for machine learning (arXiv preprint) (2020).
Blank, C., Park, D. K., Rhee, J.-K.K. & Petruccione, F. Quantum classifier with tailored quantum kernel. NPJ Quantum Inf. 6, 41. https://doi.org/10.1038/s41534-020-0272-6 (2020).
Article ADS Google Scholar
Park, D. K., Blank, C. & Petruccione, F. The theory of the quantum kernel-based binary classifier. Phys. Lett. A 384, 126422. https://doi.org/10.1016/j.physleta.2020.126422 (2020).
Article MathSciNet CAS MATH Google Scholar
Schuld, M. Supervised quantum machine learning models are kernel methods (arXiv preprint) (2021).
Liu, Y., Arunachalam, S. & Temme, K. A rigorous and robust quantum speed-up in supervised machine learning. Nat. Phys. 17, 1013–1017. https://doi.org/10.1038/s41567-021-01287-z (2021).
Article CAS Google Scholar
Blank, C., da Silva, A. J., de Albuquerque, L. P., Petruccione, F. & Park, D. K. Compact quantum kernel-based binary classifier. Quantum Sci. Technol. 7, 045007 (2022).
Article ADS Google Scholar
Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn. 20, 273–297. https://doi.org/10.1007/BF00994018 (1995).
Article MATH Google Scholar
Boser, B. E., Guyon, I. M. & Vapnik, V. N. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, 144–152 (1992).
Vapnik, V., Golowich, S. E. & Smola, A. Support vector method for function approximation, regression estimation, and signal processing. Adv. Neural Inf. Process. Syst. 20, 281–287 (1997).
Google Scholar
Guyon, I., Vapnik, V., Boser, B., Bottou, L. & Solla, S. Capacity control in linear classifiers for pattern recognition. In Proceedings od 11th IAPR International Conference on Pattern Recognition. Vol. II. Conference B: Pattern Recognition Methodology and Systems, 385–388. https://doi.org/10.1109/ICPR.1992.201798 (IEEE Comput. Soc. Press, 1992).
Lloyd, S., Mohseni, M. & Rebentrost, P. Quantum principal component analysis. Nat. Phys. 10, 631–633. https://doi.org/10.1038/nphys3029 (2014).
Article CAS Google Scholar
Harrow, A. W., Hassidim, A. & Lloyd, S. Quantum algorithm for linear systems of equations. Phys. Rev. Lett. 103, 150502. https://doi.org/10.1103/PhysRevLett.103.150502 (2009).
Article ADS MathSciNet CAS PubMed Google Scholar
Aaronson, S. Read the fine print. Nat. Phys. 11, 291–293. https://doi.org/10.1038/nphys3272 (2015).
Article CAS Google Scholar
Giovannetti, V., Lloyd, S. & Maccone, L. Quantum random access memory. Phys. Rev. Lett. 100, 160501. https://doi.org/10.1103/PhysRevLett.100.160501 (2008).
Article ADS MathSciNet CAS PubMed MATH Google Scholar
Preskill, J. Quantum computing in the NISQ era and beyond. Quantum 2, 79. https://doi.org/10.22331/q-2018-08-06-79 (2018).
Article Google Scholar
Benedetti, M., Lloyd, E., Sack, S. & Fiorentini, M. Parameterized quantum circuits as machine learning models. Quantum Sci. Technol. 4, 043001 (2019).
Article ADS Google Scholar
Cerezo, M. et al. Variational quantum algorithms. Nat. Rev. Phys. 3, 625–644. https://doi.org/10.1038/s42254-021-00348-9 (2021).
Article Google Scholar
Chang, C.-C. & Lin, C.-J. LIBSVM. ACM Trans. Intell. Syst. Technol. 2, 1–27. https://doi.org/10.1145/1961189.1961199 (2011).
Article Google Scholar
Coppersmith, D. & Winograd, S. Matrix multiplication via arithmetic progressions. J. Symb. Comput. 9, 251–280. https://doi.org/10.1016/S0747-7171(08)80013-2 (1990).
Article MathSciNet MATH Google Scholar
Giuntini, R. et al. Quantum state discrimination for supervised classification (arXiv preprint) (2021).
Boyd, S. P. & Vandenberghe, L. Convex Optimization (Cambridge University Press, 2004).
Book MATH Google Scholar
Deisenroth, M. P., Faisal, A. A. & Ong, C. S. Mathematics for Machine Learning (Cambridge University Press, 2020).
Book MATH Google Scholar
Mangasarian, O. & Musicant, D. Successive overrelaxation for support vector machines. IEEE Trans. Neural Netw. 10, 1032–1037. https://doi.org/10.1109/72.788643 (1999).
Article CAS PubMed Google Scholar
Frie, T.-T., Cristianini, N. & Campbell, C. The kernel-adatron algorithm: A fast and simple learning procedure for support vector machines. In Machine Learning: Proceedings of the Fifteenth International Conference (ICML’98), 188–196 (1998).
Hsu, C.-W. & Lin, C.-J. A simple decomposition method for support vector machines. Mach. Learn. 46, 291–314. https://doi.org/10.1023/A:1012427100071 (2002).
Article MATH Google Scholar
Hsu, Chih-Wei. & Lin, Chih-Jen. A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Netw. 13, 415–425. https://doi.org/10.1109/72.991427 (2002).
Article PubMed Google Scholar
Mottonen, M., Vartiainen, J. J., Bergholm, V. & Salomaa, M. M. Transformation of quantum states using uniformly controlled rotations (arXiv preprint) (2004).
Sim, S., Johnson, P. D. & Aspuru-Guzik, A. Expressibility and entangling capability of parameterized quantum circuits for hybrid quantum-classical algorithms. Adv. Quantum Technol. 2, 1900070. https://doi.org/10.1002/qute.201900070 (2019).
Article Google Scholar
Kandala, A. et al. Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets. Nature 549, 242–246. https://doi.org/10.1038/nature23879 (2017).
Article ADS CAS PubMed Google Scholar
Mitarai, K., Negoro, M., Kitagawa, M. & Fujii, K. Quantum circuit learning. Phys. Rev. A 98, 032309. https://doi.org/10.1103/PhysRevA.98.032309 (2018).
Article ADS CAS Google Scholar
Schuld, M., Bergholm, V., Gogolin, C., Izaac, J. & Killoran, N. Evaluating analytic gradients on quantum hardware. Phys. Rev. A 99, 032331. https://doi.org/10.1103/PhysRevA.99.032331 (2019).
Article ADS CAS Google Scholar
ANIS, M. D. S. et al. Qiskit: An open-source framework for quantum computing. https://doi.org/10.5281/zenodo.2573505 (2021).
Spall, J. C. A one-measurement form of simultaneous perturbation stochastic approximation. Automatica 33, 109–112 (1997).
Article MathSciNet MATH Google Scholar
Spall, J. Adaptive stochastic approximation by the simultaneous perturbation method. IEEE Trans. Autom. Control 45, 1839–1853. https://doi.org/10.1109/TAC.2000.880982 (2000).
Article MathSciNet MATH Google Scholar
Fisher, R. A. & Marshall, M. Iris data set (1936).
Deng, L. The mnist database of handwritten digit images for machine learning research. IEEE Signal Process. Mag. 29, 141–142 (2012).
Article ADS Google Scholar
Holmes, Z., Sharma, K., Cerezo, M. & Coles, P. J. Connecting ansatz expressibility to gradient magnitudes and barren plateaus. PRX Quantum 3, 010313. https://doi.org/10.1103/PRXQuantum.3.010313 (2022).
Article ADS Google Scholar
McClean, J. R., Boixo, S., Smelyanskiy, V. N., Babbush, R. & Neven, H. Barren plateaus in quantum neural network training landscapes. Nat. Commun. 9, 4812. https://doi.org/10.1038/s41467-018-07090-4 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Suykens, J. A. K. & Vandewalle, J. Least squares support vector machine classifiers. Neural Process. Lett. 9, 293–300. https://doi.org/10.1023/A:1018628609742 (1999).
Article Google Scholar
Caro, M. C. et al. Generalization in quantum machine learning from few training data. Nat. Commun. 13, 1–11 (2022).
Article ADS Google Scholar
Glick, J. R. et al. Covariant quantum kernels for data with group structure (arXiv preprint). https://doi.org/10.48550/arxiv.2105.03406 (2021).
Bergholm, V., Vartiainen, J. J., Möttönen, M. & Salomaa, M. M. Quantum circuits with uniformly controlled one-qubit gates. Phys. Rev. A 71, 052330. https://doi.org/10.1103/PhysRevA.71.052330 (2005).
Article ADS CAS MATH Google Scholar

Download references

Acknowledgements

This work was supported by the Samsung Research Funding & Incubation Center of Samsung Electronics under Project Number SRFC-TF2003-01. We acknowledge the use of IBM Quantum services for this work. The views expressed are those of the authors and do not reflect the official policy or position of IBM or the IBM Quantum team.

Author information

Authors and Affiliations

KAIST, School of Electrical Engineering, Daejeon, 34141, South Korea
Siheon Park & June-Koo Kevin Rhee
Department of Applied Statistics, Yonsei University, Seoul, 03722, South Korea
Daniel K. Park
Department of Statistics and Data Science, Yonsei University, Seoul, 03722, South Korea
Daniel K. Park
Qunova Computing, Inc., Daejeon, 34051, South Korea
June-Koo Kevin Rhee

Authors

Siheon Park
View author publications
You can also search for this author in PubMed Google Scholar
Daniel K. Park
View author publications
You can also search for this author in PubMed Google Scholar
June-Koo Kevin Rhee
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.P. contributed to the development and experimental verification of the theoretical and circuit models; D.K.P. contributed to developing the initial concept and the experimental verification; J.K.R. contributed to the development and validation of the main concept and organization of this work. All co-authors contributed to the writing of the manuscript.

Corresponding author

Correspondence to June-Koo Kevin Rhee.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Park, S., Park, D.K. & Rhee, JK.K. Variational quantum approximate support vector machine with inference transfer. Sci Rep 13, 3288 (2023). https://doi.org/10.1038/s41598-023-29495-y

Download citation

Received: 05 October 2022
Accepted: 06 February 2023
Published: 25 February 2023
DOI: https://doi.org/10.1038/s41598-023-29495-y

This article is cited by

Deep Q-learning with hybrid quantum neural network on solving maze problems
- Hao-Yuan Chen
- Yen-Jui Chang
- Ching-Ray Chang
Quantum Machine Intelligence (2024)
Enhancing quantum support vector machines through variational kernel training
- N. Innan
- M.A.Z. Khan
- M. Bennai
Quantum Information Processing (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.