The power of one clean qubit in supervised machine learning

Karimi, Mahsa; Javadi-Abhari, Ali; Simon, Christoph; Ghobadi, Roohollah

doi:10.1038/s41598-023-46497-y

Download PDF

Article
Open access
Published: 15 November 2023

The power of one clean qubit in supervised machine learning

Mahsa Karimi^1,2,
Ali Javadi-Abhari³,
Christoph Simon^1,2 &
…
Roohollah Ghobadi^1,2

Scientific Reports volume 13, Article number: 19975 (2023) Cite this article

972 Accesses
1 Citations
5 Altmetric
Metrics details

Subjects

Abstract

This paper explores the potential benefits of quantum coherence and quantum discord in the non-universal quantum computing model called deterministic quantum computing with one qubit (DQC1) in supervised machine learning. We show that the DQC1 model can be leveraged to develop an efficient method for estimating complex kernel functions. We demonstrate a simple relationship between coherence consumption and the kernel function, a crucial element in machine learning. The paper presents an implementation of a binary classification problem on IBM hardware using the DQC1 model and analyzes the impact of quantum coherence and hardware noise. The advantage of our proposal lies in its utilization of quantum discord, which is more resilient to noise than entanglement.

Probing single electrons across 300-mm spin qubit wafers

Article Open access 01 May 2024

Maximum diffusion reinforcement learning

Article 02 May 2024

Quantum control of a cat qubit with bit-flip times exceeding ten seconds

Article 06 May 2024

Introduction

Recent progress in the control and mitigation of noise and decoherence has paved the way for the development of intermediate-scale quantum devices consisting of hundreds of qubits. Although these devices are currently not fault-tolerant, there is considerable evidence that they possess superior computational capabilities compared to classical supercomputers, as a result of their ability to support quantum entanglement^1,2. As quantum hardware continues to evolve, it is expected to play a crucial role in various fields such as quantum simulations, quantum chemistry, and quantum machine learning (QML)^3,4.

The use of quantum hardware for complex computations such as kernel function estimation has been proposed as a way to achieve a quantum advantage in machine learning^5,6. Quantum entanglement is considered a key resource for this^7,8,9, but it is highly susceptible to noise, thus it is important to explore other forms of quantum correlation that are less sensitive to noise or require less entanglement.

The Deterministic Quantum Computing with One Qubit (DQC1) model is a non-universal quantum computing model that leverages a single qubit as a probe to interact with a highly mixed quantum state and estimate computationally expensive functions. This ability is known as the “power of one qubit”¹⁰. The DQC1 model generates quantum discord, a resilient type of weak quantum correlation, using the coherence of a pure qubit^11,12. Quantum discord is more resistant to noise than entanglement and may offer a quantum advantage in noisy conditions for quantum illumination tasks¹³.

There is a limited body of literature exploring the use of DQC1 in machine learning contexts^14,15,16. Reference¹⁴ investigates the advantages of DQC1 in addressing the parity learning problem. In reference¹⁵, DQC1 is proposed for application in kernel based supervised machine learning. Finally, reference¹⁶ builds upon the results in reference¹⁵, extending the concept to multiple kernel learning for supervised machine learning within a DQC1 framework.

This paper studies the use of the DQC1 model in supervised machine learning for efficient estimation of complex kernel functions. The study is implemented on IBM hardware and examines the effects of coherence consumption, quantum discord, and hardware noise. The DQC1 protocol reduces measurement errors by only measuring one qubit, achieving high classification accuracy despite requiring more gates than a similar protocol in⁶.

The paper is structured as follows: “Preliminaries” provides a review of the DQC1 algorithm, quantum coherence and quantum discord, and a brief overview of kernel-based supervised machine learning. In “Supervised machine learning with DQC1” , we describe the application of DQC1 for the estimation of arbitrary kernel functions. “Implementation on IBM hardware” presents our implementation of supervised machine learning using DQC1 on IBM hardware. We also compare the DQC1 kernel with the projected quantum kernel in this section. “The role of coherence and the effect of noise” presents the role of quantum coherence and the effect of noise in our implementation. Finally, “Discussion and conclusion” discusses the limitations imposed on the DQC1 kernel and concludes with a summary of our findings.

Preliminaries

DQC1

The DQC1 model was originally introduced in the context of nuclear magnetic resonance (NMR) quantum information processing and has been implemented in various physical setting^17,18,19,20.

As shown in Fig. 1, the DQC1 circuit consists of one control qubit which is prepared in $\frac{I_{1}+\alpha Z}{2}$ with $\alpha \in [0,1]$, Z as Pauli Z matrix and n target qubits in a maximally mixed state denoted by $\frac{I_{n}}{2^{n}}$ where $I_{n}$ is a $2^n\times 2^n$ identity matrix. One can change the purity of the control qubit by tuning $\alpha \in [0,1]$: for $\alpha =0$ and $\alpha =1$ the control qubit will be in maximal mixed and pure states, respectively, while for $0<\alpha <1$ the control qubit will be in a partially mixed state. Once the control qubit is evolved through the Hadamard gate, as shown in Fig. 1, the matrix form of the initial state in the computational basis of control qubit becomes

$$\begin{aligned} \rho _{in}=\frac{1}{2^{n + 1}}\begin{pmatrix} I_{n} &{}\alpha I_{n}\\ \alpha I_{n}&{}I_{n} \end{pmatrix}. \end{aligned}$$

(1)

Following the application of the DQC1 circuit evolution

$$\begin{aligned} U_{DQC1}=|0\rangle \langle 0|\otimes I_{n}+| 1\rangle \langle 1|\otimes U_{n}, \end{aligned}$$

(2)

where $U_{n}$ is an arbitrary $2^n\times 2^n$ unitary matrix that is applied to n target qubits, the total state is updated to

$$\begin{aligned} \rho _{f}=U_{DQC1}\rho _{\text {in}}U^{\dagger }_{DQC1}=\frac{1}{2^{n+1}}\begin{pmatrix} I_{n}&{}\alpha U_{n}^{\dagger }\\ \alpha U_{n}&{}I_{n} \end{pmatrix}. \end{aligned}$$

(3)

Tracing out the last n qubits from Eq. (3), the density matrix of the control qubit denoted by $\rho _{f,c}$ is given by

$$\begin{aligned} \rho _{f,c}=\frac{1}{2}\begin{pmatrix} 1&{}\frac{\alpha }{2^{n}}\text {tr}(U_{n}^{\dagger })\\ \frac{\alpha }{2^{n}}\text {tr}(U_{n})&{}1 \end{pmatrix}. \end{aligned}$$

(4)

Measuring the off-diagonal elements of the control qubit can be used to calculate the trace of a unitary matrix, as demonstrated in Eq. (4). The number of measurements needed to estimate the off diagonal elements in Eq. (4) within precision $\epsilon$ and with probability $1-\delta$ is $O(\epsilon ^{-2}\alpha ^{-2}\log (1/\delta ))$, which is independent of the number of register qubits. Therefore, DQC1 serves as an efficient method for estimating the trace of an arbitrary unitary matrix, a problem for which no efficient classical algorithm is known¹⁰. In the special case of a real positive semi-definite matrix, there is a classical randomized algorithm to estimate the trace²¹. As pointed out in²², the capability of DQC1 to achieve universal classical computation is still an open question. However, there exist complexity arguments that prove the classical efficient approximation of the output probability distribution of the DQC1 model is impossible unless the polynomial-time hierarchy collapses to the second level²³.

For completeness, we will now demonstrate that the clean qubit and register qubits remain in a separable state throughout the computation as stated in²⁴. To do this, we will use the eigenvectors and eigenvalues of $U_n$, denoted as $|u_i\rangle$ and $\lambda _i$, respectively. In the basis of $\{|u_i\rangle \}$, the mixed state of the target qubits can be represented as $I_{n}=\sum _{i}|u_i\rangle \langle u_i|$. Applying the DQC1 evolution as defined in Eq. (2) results in

$$\begin{aligned} U_{DQC1}(|0\rangle +|1\rangle )|u_{i}\rangle =(|0\rangle +\lambda _{i}|1\rangle )|u_{i}\rangle , \end{aligned}$$

(5)

which is clearly a product state, meaning there is no entanglement between the control qubit and the target qubits.

DQC1 resource

The DQC1 model has been widely studied to investigate the potential use of quantum resources, other than entanglement, in quantum computation^25,26. In this section, we shortly review the definitions of quantum coherence and discord and we demonstrate that it is the consumption of this coherence that allows for the production of discord²⁷.

The rigorous definition of coherence was first given in²⁸. The coherence is defined as

$$\begin{aligned} C(\rho )=S(\rho _{\text {diag}})-S(\rho ), \end{aligned}$$

(6)

where $S(\rho )=-\text {tr}(\rho \text {log}\rho )$ is the von Neumann entropy and $\rho _{\text {diag}}$ is the diagonal part of $\rho$. One can find the change of coherence for the control qubit defined as $\Delta C=C(\rho _{in,c})-C(\rho _{f,c})$ with $\rho _{in,c}$ and $\rho _{f,c}$ as input and output state for the control qubit, respectively. The input state of the control qubit can be obtained by tracing out the Eq. (1) (See Eq. S1 in supplementary material). Using Eq. S1 and Eq. (4) in Eq. (6) one obtains¹⁴

$$\begin{aligned} \Delta C = H_{2}\left( \frac{{1-\alpha \frac{|\text {tr}(U_{n})|}{2^n}}}{2}\right) - H_{2}\left( \frac{1 -\alpha }{2}\right) , \end{aligned}$$

(7)

where $H_{2}(x)=-x\text {log}_{2}x-(1-x)\text {log}_{2}(1-x)$ is the binary Shannon entropy (see Supplementary material for the derivation). From Eq. (7), it is clear that the coherence consumption which is determined by the parameter $\alpha$ and the trace of $U_{n}$, can be obtained efficiently by DQC1.

Quantum discord is a generalization of the classical notion of mutual information and is defined as the difference between the total quantum mutual information and the classical mutual information of the subsystems. For a bipartite system in state $\rho _{AB}$, the quantum discord is defined by the difference

$$\begin{aligned} D(\rho _{AB})=I(\rho _{AB})-J(\rho _{AB}), \end{aligned}$$

(8)

with $I(\rho _{AB})$ and $J(\rho _{AB})$ as the quantum mutual information, and the measurement-based mutual information, respectively. The quantum mutual information is given by

$$\begin{aligned} I(\rho _{AB})=S(\rho _{A})+S(\rho _{B})-S(\rho _{AB}). \end{aligned}$$

(9)

The measurement-based mutual information on the other hand is given by

$$\begin{aligned} J(\rho _{AB})=S(\rho _{B})-\text {min}_{E_{k}}[p_{k}S(\rho _{B|k})], \end{aligned}$$

(10)

where the minimum is taken over all possible positive operator-valued measurements (POVM) $\{E_{k}\}$ on subsystem A, $\rho _{B|k}=\text {tr}_{A}(E_{k}\rho _{AB})/p_{k}$ is the post-measurement state for system B if outcome k is obtained with probability $p_{k}=\text {tr}(E_{k}\rho _{AB})$.

In the following, we use an alternative definition for discord known as geometric discord, which is easier to calculate and has a closed form for DQC1²⁹. For a given quantum state $\rho$, the geometric discord is defined as³⁰

$$\begin{aligned} D_{G}(\rho )=\text {min}_{\chi \in \mathcal {C}}||\rho -\chi ||^{2}, \end{aligned}$$

(11)

where $\mathcal {C}$ denotes the set of classical zero-discord states and $||A-B||^{2}=\text {tr}(A-B)^{2}$. Evaluating Eq. (11) for the state in Eq. (3) leads to²⁹

$$\begin{aligned} D_{G}(\rho _{f})=(\frac{\alpha }{2})^{2}\frac{1}{2^{n}}\left( 1-\frac{|\text {tr}(U_{n}^{2})|}{2^{n}}\right) . \end{aligned}$$

(12)

Evaluating Eq. (12) can be done by applying two consecutive controlled-$U_n$ in Fig. 1. In²⁷, the connection between coherence consumption and discord production in DQC1 was examined, where it was demonstrated that quantum discord is bounded by the quantum coherence consumed in the control qubit, i.e.

$$\begin{aligned} D_{G}(\rho _{f})\le \Delta C. \end{aligned}$$

(13)

In our implementation, the results obtained from IBM hardware verifies the relation (13).

Supervised machine learning: support vector machines and kernel method

In this section, we introduce the concepts of support vector machines (SVM) and the kernel method within the context of supervised machine learning. Given a set of n training data points, represented by $X_{\text {train}}:=\{(x_i,y_i): i=1,2,...,n\}$, where each data point has k features, i.e., $x_{i}\in \mathbb {R}^{k}$, and is labeled by $y_i\in \{1,-1\}$. The task is to use the training data to develop a classifier function that can accurately predict the labels for test (unseen) data. In the simplest scenario, where the data points are linearly separable, the classifier function can be expressed as

$$\begin{aligned} f(x) =\text {sign}\big (w^T x + b\big ), \end{aligned}$$

(14)

where $w\in \mathbb R^k$ and b are to be determined such that $y_{i}f(x_i)>0$. In the SVM, the separating plane f(x) is determined by maximizing the distance between the hyperplane to the nearest data point of each class³¹, see Fig. 2a.

The SVM can be generalized to the case of non-linearly separated data points by mapping the data points to a higher dimension space for which the data is linearly separable, see Fig. 2b. In other words, one considers a non-linear mapping $\phi :X\rightarrow \mathcal {H}$, so that the decision function can be written as

$$\begin{aligned} f(x) = \text {sign}\big (w^T \phi (x)+ b\big ). \end{aligned}$$

(15)

In this context, $\mathcal {H}$ and $\phi (x)$ are known as feature space and feature map, respectively.

It is well-known that for nonlinear separable data points the SVM leads to solutions of the form³²,

$$\begin{aligned} f(x)= \text {sign}\sum _{i}{\beta ^{*}_{i}K(x,x_{i})}, \end{aligned}$$

(16)

where $\beta ^{*}_{i}$ are coefficients to be determined, and we defined the kernel function $K(x,x_{i})=\langle \phi (x),\phi (x_{i})\rangle$, where $\langle ,\rangle$ denotes the inner product in feature space $\mathcal {H}$. The procedure for finding $\beta _{i}$ in Eq. (16) is through maximizing

$$\begin{aligned} \sum _{i=1}^{n}\beta _{i}-\frac{1}{2}\sum _{i,j=1}^{n}y_{i}y_{j}\beta _{i}\beta _{j}K(x_{i},x_{j}), \end{aligned}$$

(17)

over the training data, subject to $\sum _{=1}^{n}\beta _{i}y_{i}=0$ and $\beta _{i}\ge 0$. For a positive definite kernel, Eq.(17) is a concave problem, whose solution $\beta ^{*}=(\beta _{1}^{*},...,\beta _{n}^{*})$ can be found efficiently.

The basic idea of SVM can be extended to the quantum domain by interpreting the feature map as a quantum state that can be constructed by a quantum circuit, and the kernel function as the inner product between respective quantum states^6,33.

Supervised machine learning with DQC1

The freedom in choosing the unitary operator $U_{n}$ in DQC1 allows one to make a connection between DQC1 and the kernel method¹⁵. To see this, we choose $U_{n}=u^l(\textbf{x})\left( u^l(\textbf{x}')\right) ^{\dagger }$, where $u^l$ represents l consecutive application of unitary operator u with $\textbf{x}$ and $\textbf{x}'$ as encoded data points in the gate parameters. Next, we note that $\text {tr}(u^l(\textbf{x})\left( u^l(\textbf{x}')\right) ^{\dagger })$ is positive semidefinite, i.e. $\sum _{i,j}c_{i}c_{j}\text {tr}(u^l(\textbf{x})\left( u^l(\textbf{x}')\right) ^{\dagger })\ge 0$ for $c_{i}\in \mathbb {C}$.

Rewriting Eq. (4) for $U_{n}(\textbf{x},\textbf{x}')=u^l(\textbf{x})\left( u^l(\textbf{x}')\right) ^{\dagger }$,

$$\begin{aligned} \rho _{f,c} = \frac{1}{2} \begin{pmatrix}1&{}\alpha K^{*}(\textbf{x},\textbf{x}')\\ \alpha K(\textbf{x},\textbf{x}')&{}1 \end{pmatrix}, \end{aligned}$$

(18)

where $K(\textbf{x},\textbf{x}')=\frac{\text {tr}(U_{n}(\textbf{x},\textbf{x}'))}{2^{n}}$.

From Eq. (18), it follows that the DQC1 model allows for an efficient method for estimating arbitrary, complicated kernel functions.

Interestingly, by comparing Eq. (18) with equation (7), we can relate the coherence consumption to the kernel function. For example, by setting $\alpha =1$ in equation (7), we obtain

$$\begin{aligned} \Delta C(\textbf{x},\textbf{x}')=H_{2}\left( \frac{1-|K(\textbf{x},\textbf{x}')|}{2}\right) . \end{aligned}$$

(19)

From Eq. (19) the following key insight can be obtained. Firstly, when $\Delta C(\textbf{x},\textbf{x}')=0$ it follows that $K(\textbf{x},\textbf{x}')=1$, indicating that the kernel function is incapable of distinguishing between the two data points. This lack of discrimination hinders the learning process and underscores the significance of coherence consumption in learning. Secondly, Eq. (19) provides a means to observe the influence of hardware noise. To see this, note that in the absence of hardware noise, one expects that $\Delta C(\textbf{x},\textbf{x})=0$ as $K(\textbf{x},\textbf{x})=1$. In real situations, where noise cannot be ignored, the diagonal elements of a kernel will be smaller than one, related to a loss of coherence as shown by Eq. (19) (see “The role of coherence and the effect of noise” for more details.). Please note that it follows from Eq. (7) that the above conclusions are applicable for arbitrary $\alpha \in (0,1]$.

Implementation on IBM hardware

In this section, we describe our implementation of supervised machine learning based on the DQC1 model. Our scheme was implemented on the “$ibm\_perth$” quantum processor, as shown in Fig. 3, using IBM’s open-source software interface, Qiskit.

Our demonstration were performed on ibm_perth, which is a $7-$qubit superconducting quantum processor from IBM. It consists of fixed-frequency transmon qubits connected according to the coupling map of Fig. 3. The device characteristics at the time of our demonstration were as follows: Median $T1: 137.24 \mu s$ Median $T2: 107.32 \upmu$s Median 1-qubit gate error: $0.03\%$ Median 2-qubit gate error: $1.06\%$ Median readout error: $4.56\%$ The code was implemented on top of Qiskit Machine Learning, which provides the dataset and the SVM kernels. The code and Qiskit version for these demonstration can be found online³⁴.

Figure 4a shows the schematics of our implemented circuit which is composed of two target qubits and three ancilla qubits. In the first part of the circuit, left to the dashed line, input states for control and target qubits are prepared. The mixed state preparation of target qubits are based on creating Bell states between target and ancilla qubits, followed by ignoring the state of ancilla qubits. The resulting state of the control and target qubits right before the dashed line is given by $\rho _{\text {control}}\otimes \frac{I_{2}}{4}$ where $\rho _{\text {control}}=\text {diag} \left(\cos ^{2}\frac{\theta }{2},\sin ^{2}\frac{\theta }{2}\right)$. By choosing $\theta =2\cos ^{-1} \left(\sqrt{\frac{1+\alpha }{2}}\right)$ the state of control qubit becomes $\frac{I_{1}+\alpha Z}{2}$.

In our implementation on the IBM real hardware, the purity of the resulting target qubits is $\text {tr}(\rho ^{2}_{target})=0.506$, which deviates from the ideal mixed state by $6\times 10^{-3}$.

To benchmark the performance of our protocol, we use the encoding map and dataset used in⁶. In⁶ the data points $\textbf{x}$ and $\textbf{x}'$ are mapped onto gate parameters of the unitary matrix $U_{2}(\textbf{x},\textbf{x}')=u^{l}(\textbf{x})\left( u^l(\textbf{x}')\right) ^{\dagger }$, where

$$\begin{aligned} u^{l}(\textbf{x})=\prod \limits _{i = 0}^{l} (U_{\phi (\textbf{x})}H^{\otimes 2})_{i}, \end{aligned}$$

(20)

with l as the number of iterations of each layer in the feature map (See Fig. 4b), encoding map $U_{\phi (\textbf{x})}$, and $H^{\otimes 2}$ denotes two Hadamard gates acting on two qubits (see Fig.4c).

$$\begin{aligned} U_{\phi (\textbf{x})} = \exp \left(\sum \limits _{S \subseteq [n]} {{\phi _S}}(\textbf{x})\prod \limits _{i\in S} {Z_i} \right), \end{aligned}$$

(21)

where $\phi _{i}(\textbf{x})=x_{i}$, $\phi _{i,j}(\textbf{x})=(\pi -x_{i})(\pi -x_{j})$, and $Z_{i}$ denotes Pauli Z gate (See Fig.4(d). Fig. 4c shows the quantum circuit that describes Eq.(20) for $l=2$. We defined our kernel as $K(\textbf{x},\textbf{x}')=\frac{|\text {tr}(U_2(\textbf{x},\textbf{x}'))|}{4}$. It has been conjectured that approximation of the resulting kernel function for the encoding map Eq. (21) with $l=2$ is hard classically, i.e. the resources required to perform it, increase at a non-polynomial rate with respect to number of qubits⁶.

Our implementation is divided into three phases. In the first stage, we run the circuit in Fig. 4 for all pairs of training data to obtain the corresponding density matrix for the control qubit, using the quantum state tomography package in Qiskit with repeating each measurement 8000 times (shots), and therefore to obtain the corresponding kernel function. Having obtained the kernel function on the quantum hardware, we apply the classical SVM to obtain the optimal separating hyperplane, or equivalently $\beta ^{*}$ by applying Eq. (17). Finally, in the prediction phase, given test data $\textbf{x}$, we run the DQC1 circuit to estimate the $K(\textbf{x},\textbf{x}_{i})$ for all $\textbf{x}_{i}\in X_{\text {train}}$ and apply Eq. (16).

In Fig. 5, we display the results of applying the above procedure for the classification task on the “ad_hoc” dataset for the IBM simulator (Qiskit) (left) and IBM 7-qubit hardware (right) for $l=2$ for the control qubit in the pure state, i.e. $\alpha = 1$³⁴. From Fig. 5, it can be seen that the accuracy of the Qiskit simulator is $100\%$. On the other hand, the obtained accuracy on the hardware is $90\%$. The difference between the simulation and hardware performance can be attributed to the effects of hardware noise. It is worth noting that the circuits were optimized using the Qiskit compiler, specifically the Approximate Quantum Compilation method³⁵. This method converts the entire circuit (excluding ancilla) to a 3-qubit unitary matrix, and then re-synthesizes it into a new circuit that approximates the matrix with 0.995 accuracy (synthesis fidelity). A higher synthesis fidelity uses more CNOT gates in the resulting circuit, which reduces approximation error but increases runtime noise. This method reduced the CNOT gate count of the circuit from 177 to 19.

It is worth noting the similarity between our approach and the projected kernel method introduced in³⁶, where both methods involve constructing the kernel function by measuring a subset of the relevant qubits. More explicitly, the projected kernel function is defined as³⁶

$$\begin{aligned} k^{PQ}(\textbf{x}_i,\textbf{x}_j) = \exp \left( -\gamma \sum _{m=1}^n \Vert \rho _m(\textbf{x}_i) - \rho _m(\textbf{x}_j) \Vert ^2\right) , \end{aligned}$$

(22)

where $\rho _m(\textbf{x}_i)$ is the reduced density matrix of the m-th register of the encoded quantum state $\rho (\textbf{x}_i)$, and $\gamma >0$ is a hyperparameter. Furthermore, their work extends this kernel to $k^{(PQ)}_s$ which takes every subset of s qubits into account (for clarification, we have $k^{(PQ)}_1=k^{(PQ)}$). We highlight that the number of measurements required in determining $k^{(PQ)}_s$ grows as $4^s {n\atopwithdelims ()s}$ since there are $4^s$ Pauli strings on s qubits, and ${n\atopwithdelims ()s}$ subsets of size s for a state of n qubits. This quantity grows polynomially in n and exponentially in s. In contrast, our method only necessitates the measurement of the control qubit. This practical aspect holds strong appeal because, in current superconducting qubit technology, measurements are the most error-prone operations, with error rates ranging from 3 to 10 times that of 2-qubit gates, as demonstrated in³⁷. For completeness, we repeated our experiments employing the projected quantum kernel on “ad_hoc” dataset³⁴. The two-qubit projected kernel is depicted in Fig. 6a with the same encoding circuit (Fig. 6b), and feature map (Fig. 6c) as in DQC1 kernel. The Qiskit simulation and IBM hardware’s results (considering $\gamma =0.01$) demonstrated an accuracy of $100\%$ and $90\%$, respectively.

The role of coherence and the effect of noise

In the following, we explore the role of control qubit’s coherence, hardware noise, coherence consumption and quantum discord production in our setting.

To see the role of the control qubit’s coherence in our implementation, we repeat the learning task with the control qubit prepared in the state $\frac{I_1+\alpha Z}{2}$, where $0\le \alpha \le 1$. In Fig. 7 we show the prediction accuracy in the simulation (blue dots) and the implementation (red dots) as a function of the purity of the control qubit. From Fig. 7 one can see that $\alpha =0$, for which the control qubit is in a maximally mixed state, the accuracy is 0.5, corresponding to randomly guessing the labels. By increasing the purity, however, the accuracy increases until it reaches its maximum value at $\alpha \ge 0.6$. Due to the device noise, the accuracy in the implementation is degraded in comparison to the simulation.

For completeness we repeat the learning process with two well-known datasets called “make-moon”, and “make-circle” from “scikit-learn”, each of them including 800 training data points, and 200 testing data points. For these two datasets we observed an abrupt change in the accuracy for $\alpha \ge 0.2$. Hence the critical value of $\alpha$ depends on the dataset. These results are depicted in Fig. 8. By interpreting $1-\alpha$ as the noise strength, one can see that the accuracy is robust against noise for $\alpha \ge 0.6$ (Fig. 7) and $\alpha \ge 0.2$ (Fig. 8). Likewise, variational quantum circuits are predicted to display similar robustness against noise³⁸. Let us emphasize that in the rest of our paper we use $\alpha =1$.

In Fig. 9a, b we show the absolute value of the kernel obtained from simulation, and IBM hardware respectively. The difference between the two kernels can be attributed to hardware noise. To better show the role of noise in the kernel, we compare the diagonal elements of the kernel obtained from simulation and IBM hardware. As discussed earlier, in the ideal case $K(\textbf{x},\textbf{x}) = 1$ (blue bar) but in practice one has $K(\textbf{x},\textbf{x})<1$. In Fig. 9c, one finds a maximum difference of 0.610 between simulation and implementation, while the mean difference is 0.27.

Having access to the kernel, we can obtain the coherence consumption in our implementation from Eq. (19), as shown in Fig. 10. In accordance with Eq. (19), it can be seen from Fig. 10 that the coherence consumption is minimum (but not equal to zero in the IBM results) along the diagonal axes.

Our next step is to obtain the generated discord in our implementation based on Eq. (12). Equation (12) indicates that for estimating the discord, $\text {tr}(U_{n}^{2})$ must be estimated, which requires successive application of DQC1 evolution Eq. (2). Fig.11 shows the quantum discord. When comparing Fig.11 with Fig.10, it is also evident that the condition (13) is satisfied.

Discussion and conclusion

In³⁹, an upper bound on the generalization error of the fidelity quantum kernel model has been established, which is determined by the average purity of the encoded states. As highlighted in the same work, a noisier encoding process, when transferring data into quantum states, can lead to poorer training performance. Our empirical results, illustrated in Figs. 7 and 8 and based on three distinct datasets, support the idea that improving the purity of the control qubit (parameterized by $\alpha$) enhances the accuracy of DQC1 kernel model. We believe that our DQC1 kernel framework provides a ground for rigorous theoretical studies for relating coherence consumption to generalization error.

We also comment on the connection between our work and the findings in⁴⁰, which investigate the exponential concentration of quantum kernels. Despite our method relying solely on measuring the control qubit, it is essential to emphasize that the number of required measurements scales exponentially with the number of target qubits, as the variance in the kernel exponentially approaches zero. Hence, we expect the same untrainability issues to hold for our kernel function.

In this study, we have investigated the application of the DQC1 model, a restricted computational model, to supervised machine learning tasks. Unlike the standard universal computational model, the DQC1 model relies on mixed states and does not incorporate quantum entanglement into the computation. We have presented a test of the DQC1 model’s ability to solve supervised machine learning problems for some classically difficult kernels⁶. Despite requiring a greater number of gates than a similar protocol described in⁶, since one needs to measure only the control qubit, our protocol still achieved a relatively high level of classification accuracy. Our proposal highlights the potential of utilizing quantum discord over entanglement in the presence of noise.

In a broader context, our work highlights the computational power of a single-qubit as a universal classifier^41,42. It would be interesting to realize our protocol in the NMR setting⁴³. We hope that this study will inspire further research on the integration of quantum coherence and quantum discord in machine learning.

Data availability

The datasets and codes used for the generation of the plots of this manuscript are publicly available at https://github.com/mahsakarimii/The-power-of-one-clean-qubit-in-supervised-machine-learning.

References

Zhong, H.-S. et al. Quantum computational advantage using photons. Science 370, 1460–1463 (2020).
Article ADS CAS PubMed Google Scholar
Madsen, L. S. et al. Quantum computational advantage with a programmable photonic processor. Nature 606, 75–81 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Preskill, J. Quantum computing in the Nisq era and beyond. Quantum 2, 79 (2018).
Article Google Scholar
Huang, H.-Y. et al. Quantum advantage in learning from experiments. Science 376, 1182–1186 (2022).
Article ADS MathSciNet CAS PubMed Google Scholar
Schuld, M., Bergholm, V., Gogolin, C., Izaac, J. & Killoran, N. Evaluating analytic gradients on quantum hardware. Phys. Rev. A 99, 032331 (2019).
Article ADS CAS Google Scholar
Havlíček, V. et al. Supervised learning with quantum-enhanced feature spaces. Nature 567, 209–212 (2019).
Article ADS PubMed Google Scholar
Rebentrost, P., Mohseni, M. & Lloyd, S. Quantum support vector machine for big data classification. Phys. Rev. Lett. 113, 130503 (2014).
Article ADS PubMed Google Scholar
Lloyd, S., Mohseni, M. & Rebentrost, P. Quantum principal component analysis. Nat. Phys. 10, 631–633 (2014).
Article CAS Google Scholar
Gao, X., Anschuetz, E. R., Wang, S.-T., Cirac, J. I. & Lukin, M. D. Enhancing generative models via quantum correlations. Phys. Rev. X 12, 021037 (2022).
CAS Google Scholar
Knill, E. & Laflamme, R. Power of one bit of quantum information. Phys. Rev. Lett. 81, 5672 (1998).
Article ADS CAS Google Scholar
Ollivier, H. & Zurek, W. H. Quantum discord: A measure of the quantumness of correlations. Phys. Rev. Lett. 88, 017901 (2001).
Article ADS PubMed MATH Google Scholar
Modi, K., Brodutch, A., Cable, H., Paterek, T. & Vedral, V. The classical-quantum boundary for correlations: Discord and related measures. Reviews of Modern Physics 84, 1655 (2012).
Article ADS Google Scholar
Weedbrook, C., Pirandola, S., Thompson, J., Vedral, V. & Gu, M. How discord underlies the noise resilience of quantum illumination. New J. Phys. 18, 043027 (2016).
Article ADS MATH Google Scholar
Park, D. K., Rhee, J.-K.K. & Lee, S. Noise-tolerant parity learning with one quantum bit. Phys. Rev. A 97, 032327 (2018).
Ghobadi, R., Oberoi, J. S. & Zahedinejhad, E. The Power of One Qubit in Machine Learning. arXiv preprint arXiv:1905.01390 (2019).
Vedaie, S. S., Noori, M., Oberoi, J. S., Sanders, B. C. & Zahedinejad, E. Quantum Multiple Kernel Learning. arXiv preprint arXiv:2011.09694 (2020).
Passante, G., Moussa, O., Trottier, D. & Laflamme, R. Experimental detection of nonclassical correlations in mixed-state quantum computation. Phys. Rev. A 84, 044302 (2011).
Lanyon, B. P., Barbieri, M., Almeida, M. P. & White, A. G. Experimental quantum computing without entanglement. Phys. Rev. Lett. 101, 200501 (2008).
Hor-Meyll, M. et al. Deterministic quantum computation with one photonic qubit. Phys. Rev. A 92, 012337 (2015).
Wang, W. et al. Witnessing quantum resource conversion within deterministic quantum computation using one pure superconducting qubit. Phys. Rev. Lett. 123, 220501 (2019).
Avron, H. & Toledo, S. Randomized algorithms for estimating the trace of an implicit symmetric positive semi-definite matrix. J. ACM (JACM) 58, 1–34 (2011).
Article MathSciNet MATH Google Scholar
Aaronson, S., Bouland, A., Kuperberg, G. & Mehraban, S. The Computational Complexity of Ball Permutations. 317–327 (2017).
Fujii, K. et al. Impossibility of classically simulating one-clean-qubit model with multiplicative error. Phys. Rev. Lett. 120, 200502 (2018).
Poulin, D., Blume-Kohout, R., Laflamme, R. & Ollivier, H. Exponential speedup with a single bit of quantum information: Measuring the average fidelity decay. Phys. Rev. Lett. 92, 177906 (2004).
Datta, A., Flammia, S. T. & Caves, C. M. Entanglement and the power of one qubit. Phys. Rev. A 72, 042316 (2005).
Datta, A., Shaji, A. & Caves, C. M. Quantum discord and the power of one qubit. Phys. Rev. Lett. 100, 050502 (2008).
Ma, J., Yadin, B., Girolami, D., Vedral, V. & Gu, M. Converting coherence to quantum correlations. Phys. Rev. Lett. 116, 160407 (2016).
Baumgratz, T., Cramer, M. & Plenio, M. B. Quantifying coherence. Phys. Rev. Lett. 113, 140401 (2014).
Passante, G., Moussa, O. & Laflamme, R. Measuring geometric quantum discord using one bit of quantum information. Phys. Rev. A 85, 032325 (2012).
Dakić, B., Vedral, V. & Brukner, Č. Necessary and sufficient condition for nonzero quantum discord. Phys. Rev. Lett. 105, 190502 (2010).
Tong, S. & Koller, D. Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45–66 (2001).
MATH Google Scholar
Hofmann, T., Schölkopf, B. & Smola, A. J. Kernel methods in machine learning. Ann. Stat. 36, 1171–1220 (2008).
Article MathSciNet MATH Google Scholar
Schuld, M. & Killoran, N. Quantum machine learning in feature Hilbert spaces. Phys. Rev. Lett. 122, 040504 (2019).
Karimi, M. The Power of One Clean Qubit in Supervised Machine Learning. Codes are Available via this Link (2022).
Madden, L. & Simonetto, A. Best approximate quantum compiling problems. ACM Trans. Quantum Comput. 3, 1–29 (2022).
Article MathSciNet Google Scholar
Huang, H.-Y. et al. Power of data in quantum machine learning. Nat. Commun.https://doi.org/10.1038/s41467-021-22539-9 (2021).
Article PubMed PubMed Central Google Scholar
Arute, F. et al. Quantum supremacy using a programmable superconducting processor. Nature 574, 505–510 (2019).
Article ADS CAS PubMed Google Scholar
Liu, J., Lin, Z. & Jiang, L. Laziness, Barren Plateau, and Noise in Machine Learning. arXiv preprint arXiv:2206.09313 (2022).
Heyraud, V., Li, Z., Denis, Z., Le Boité, A. & Ciuti, C. Noisy quantum kernel machines. Phys. Rev. A 106, 052421 (2022).
Thanasilp, S., Wang, S., Cerezo, M. & Holmes, Z. Exponential Concentration and Untrainability in Quantum Kernel Methods. arXiv preprint arXiv:2208.11060 (2022).
Pérez-Salinas, A., Cervera-Lierta, A., Gil-Fuster, E. & Latorre, J. I. Data re-uploading for a universal quantum classifier. Quantum 4, 226 (2020).
Article Google Scholar
Dutta, T., Pérez-Salinas, A., Cheng, J. P. S., Latorre, J. I. & Mukherjee, M. Single-qubit universal classifier implemented on an ion-trap quantum device. Phys. Rev. A 106, 012411 (2022).
Kusumoto, T., Mitarai, K., Fujii, K., Kitagawa, M. & Negoro, M. Experimental quantum kernel trick with nuclear spins in a solid. npj Quantum Inf. 7, 94 (2021).

Download references

Acknowledgements

We acknowledge discussions with Anton Dekusar and Hadi Zadeh-Haghighi. This work was supported by the Alberta Major Innovation Fund (MIF) Quantum Technologies project and by the Natural Sciences and Engineering Research Council (NSERC) of Canada.

Author information

Authors and Affiliations

Department of Physics and Astronomy, University of Calgary, Calgary, AB, T2N 1N4, Canada
Mahsa Karimi, Christoph Simon & Roohollah Ghobadi
Institute for Quantum Science and Technology, University of Calgary, Calgary, AB, T2N 1N4, Canada
Mahsa Karimi, Christoph Simon & Roohollah Ghobadi
IBM Quantum, IBM T. J. Watson Research Center, Yorktown Heights, NY, 10598, USA
Ali Javadi-Abhari

Authors

Mahsa Karimi
View author publications
You can also search for this author in PubMed Google Scholar
Ali Javadi-Abhari
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Simon
View author publications
You can also search for this author in PubMed Google Scholar
Roohollah Ghobadi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.K performed simulation analysis. A.J-A wrote the optimization code to run the code on IBM hardware. C.S provided detailed feedback on the manuscript. R.G supervised the entire project and wrote the manuscript with input from M.K, A.J-A and C.S.

Corresponding authors

Correspondence to Mahsa Karimi or Roohollah Ghobadi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Karimi, M., Javadi-Abhari, A., Simon, C. et al. The power of one clean qubit in supervised machine learning. Sci Rep 13, 19975 (2023). https://doi.org/10.1038/s41598-023-46497-y

Download citation

Received: 01 August 2023
Accepted: 01 November 2023
Published: 15 November 2023
DOI: https://doi.org/10.1038/s41598-023-46497-y

This article is cited by

The power of one clean qubit in supervised machine learning
- Mahsa Karimi
- Ali Javadi-Abhari
- Roohollah Ghobadi
Scientific Reports (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.