The power of one clean qubit in supervised machine learning

This paper explores the potential benefits of quantum coherence and quantum discord in the non-universal quantum computing model called deterministic quantum computing with one qubit (DQC1) in supervised machine learning. We show that the DQC1 model can be leveraged to develop an efficient method for estimating complex kernel functions. We demonstrate a simple relationship between coherence consumption and the kernel function, a crucial element in machine learning. The paper presents an implementation of a binary classification problem on IBM hardware using the DQC1 model and analyzes the impact of quantum coherence and hardware noise. The advantage of our proposal lies in its utilization of quantum discord, which is more resilient to noise than entanglement.


I. INTRODUCTION
Recent progress in the control and mitigation of noise and decoherence has paved the way for the development of intermediate-scale quantum devices consisting of hundreds of qubits.Although these devices are currently not fault-tolerant, there is considerable evidence that they possess superior computational capabilities compared to classical supercomputers, as a result of their ability to support quantum entanglement [1,2].As quantum hardware continues to evolve, it is expected to play a crucial role in various fields such as quantum simulations, quantum chemistry, and quantum machine learning (QML ) [3,4].
The use of quantum hardware for complex computations such as kernel function estimation has been proposed as a way to achieve quantum advantage in machine learning [5,6].Quantum entanglement is considered a key resource for this [7][8][9], but it is highly susceptible to noise, thus it is important to explore other forms of quantum correlation that are less sensitive to noise or require less entanglement.
The Deterministic Quantum Computing with One Qubit (DQC1) model is a non-universal quantum computing model that leverages a single qubit as a probe to interact with a highly mixed quantum state and estimate computationally expensive functions.This ability is known as the "power of one qubit" [10].The DQC1 model generates quantum discord, a resilient type of weak quantum correlation, using the coherence of a pure qubit [11,12].Quantum discord is more resistant to noise than entanglement and may offer a quantum advantage in noisy conditions for quantum illumination tasks [13].
This paper studies the use of the DQC1 model in supervised machine learning for efficient estimation of complex kernel functions.The study is implemented on IBM hardware and examines the effects of coherence consump-tion, quantum discord, and hardware noise.The DQC1 protocol reduces measurement errors by only measuring one qubit, achieving high classification accuracy despite requiring more gates than a similar protocol in [6].
Quantum kernel methods can be hindered by a trainability barrier similar to the barren plateau problem when the values of the quantum kernels become highly concentrated, requiring an exponential number of measurements for accurate evaluation [14].Our method tackles this challenge by only reading out one qubit for kernel estimation, thus maintaining a constant number of measurements, regardless of the complexity of the kernel.
The paper is structured as follows: Section II provides a review of the DQC1 algorithm, quantum coherence and quantum discord, and a brief overview of kernel-based supervised machine learning.In Sec.III, we describe the application of DQC1 for the estimation of arbitrary kernel functions.Sec.IV presents our implementation of supervised machine learning using DQC1 on IBM hardware.Sec.V discusses the role of quantum coherence and the effect of noise in our implementation.Finally, Sec.VI concludes with a summary of our findings.

A.DQC1
The DQC1 model was originally introduced in the context of nuclear magnetic resonance (NMR) quantum information processing and has been implemented in various physical setting [15][16][17][18].
As shown in Fig. 1, the DQC1 circuit consists of one control qubit which is prepared in I1+αZ 2 with α ∈ [0, 1], Z as Pauli Z matrix and n target qubits in a maximally mixed state denoted by In  2 n where I n is a 2 n × 2 n identity matrix.One can change the purity of the control qubit by tuning α ∈ [0, 1]: for α = 0 and α = 1 the control qubit will be in maximal mixed and pure states, respectively, while for 0 < α < 1 the control qubit will be in a partially mixed state.Once the control qubit is evolved through the Hadamard gate, as shown in Fig. 1, the matrix form of the initial state in the computational basis of control qubit becomes Following the application of the DQC1 circuit evolution where U n is an arbitrary 2 n × 2 n unitary matrix that is applied to n target qubits, the total state is updated to Tracing out the last n qubits from Eq.( 3), the density matrix of the control qubit denoted by ρ f,c is given by Measuring the off-diagonal elements of the control qubit can be used to calculate the trace of a unitary matrix, as demonstrated in equation ( 4).The number of measurements needed to estimate the trace with a specific distance and accuracy δ is proportional to −2 log 1/δ 2 , regardless of the number of register qubits.For a real positive semi-definite matrix, there is a classical randomized algorithm to estimate the trace [19], but no classical algorithm is known for general matrices.
The DQC1 class is a complexity class for problems solvable by the DQC1 algorithm with bounded error in the one-clean-qubit model.In relation with boundederror probabilistic polynomial time (BPP) and boundederror quantum polynomial time (BQP) it satisfies BPP ⊂ DQC1 ⊂ BQP [20].
For completeness, we will now demonstrate that the clean qubit and register qubits remain in a separable state throughout the computation as stated in [21].To do this, we will use the eigenvalues and eigenvectors of U n , denoted as |u i and λ i , respectively.In the basis of {|u i }, the mixed state of the target qubits can be represented as Applying the DQC1 evolution as defined in Eq.(2) results in which is clearly a product state, meaning there is no entanglement between the control qubit and the target qubits.

B.DQC1 resource
The DQC1 model has been widely studied to investigate the potential use of quantum resources, other than entanglement, in quantum computation [22,23].In this section, we shortly review the definitions of quantum coherence and discord and we demonstrate that it is the consumption of this coherence that allows for the production of discord [24].
The rigorous definition of coherence was first given in [25].The coherence is defined as where S(ρ) = −tr(ρlogρ) is the von Neumann entropy and ρ diag is the diagonal part of ρ.One can find the change of coherence for the control qubit defined as ) with ρ in,c and ρ f,c as input and output state for the control qubit, respectively.Using Eqs.(A.1,4) in Eq.( 6) one obtains [26] ∆C where is the binary Shannon entropy (see Appendix for the derivation).From Eq.( 7), it is clear that the coherence consumption which is determined by the parameter α and the trace of U n , can be obtained efficiently by DQC1.Quantum discord is a generalization of the classical notion of mutual information and is defined as the difference between the total quantum mutual information and the classical mutual information of the subsystems.For a bipartite system in state ρ AB , the quantum discord is defined by the difference with I(ρ AB ) and J(ρ AB ) as the quantum mutual information, and the measurement-based mutual information, respectively.The quantum mutual information is given by The measurement-based mutual information on the other hand is given by where the minimum is taken over all possible positive operator-valued measurements (POVM) In the following, we use an alternative definition for discord known as geometric discord, which is easier to calculate and has a closed form for DQC1 [27].For a given quantum state ρ, the geometric discord is defined as [28] where C denotes the set of classical zero-discord states and ||A − B|| 2 = tr(A − B) 2 .Evaluating Eq.( 11) for the state in Eq.( 3) leads to [27] Evaluating Eq.( 12) can be done by applying two consecutive controlled-U n in Fig. 1.In [24], the connection between coherence consumption and discord production in DQC1 was examined, where it was demonstrated that quantum discord is bounded by the quantum coherence consumed in the control qubit, i.e.
In our implementation, we experimentally validate the relation (13).

C. SUPERVISED MACHINE LEARNING: SUPPORT VECTOR MACHINES AND KERNEL METHOD
In this section, we introduce the concepts of support vector machines (SVM) and the kernel method within the context of supervised machine learning.Given a set of n training data points, represented by X train = (x i , y i ) : i = 1, 2, ..., n, where each data point has k features, i.e., x i ∈ R k , and is labeled by y i ∈ 1, −1.The task is to use the training data to develop a classifier function that can accurately predict the labels for test (unseen) data.In the simplest scenario, where the data points are linearly separable, the classifier function can be expressed as where w and b are to be determined such that y i f (x i ) > 0. In the SVM, the separating plane f (x) is determined by maximizing the distance between the hyperplane to the nearest data point of each class [29], see Fig. 2(a).
The SVM can be generalized to the case of non-linearly separated data points by mapping the data points to a higher dimension space for which the data is linearly separable.In other words, one considers a non-linear mapping φ : X → H, so that the decision function can be written as In this context, H and φ(x) are known as feature space and feature map, respectively.
It is well-known that for nonlinear separable data points the SVM leads to solutions of the form [30], where β * i are coefficients to be determined, and we defined the kernel function K(x, x i ) = φ(x), φ(x i ) , where , denotes the inner product in feature space H.The procedure for finding β i in Eq.( 16) is through maximizing over the training data, subject to n =1 β i y i = 0 and β i ≥ 0. For a positive definite kernel, Eq.( 17) is a concave problem, whose solution β * = (β * 1 , ..., β * n ) can be found efficiently.
The basic idea of SVM can be extended to the quantum domain by interpreting the feature map as a quantum state that can be constructed by a quantum circuit, and the kernel function as the inner product between respective quantum states [6,31].

III. SUPERVISED MACHINE LEARNING WITH DQC1
The freedom in choosing the unitary operator U n in DQC1 allows one to make a connection between DQC1 and the kernel method [32].To see this, we choose † , where u l represents l consecutive application of unitary operator u with x and x as encoded data points in the gate parameters.Next, we note that tr(u l ( x) u l ( x ) † ) is positive semidefinite, i.e.
i,j c i c j tr(u l ( x) u l ( x ) † ) ≥ 0 for c i ∈ C.
Rewriting Eq.( 4) for where K( x, x ) = tr(Un( x, x )) . From equation ( 18), it follows that the DQC1 model allows for an efficient method for estimating arbitrary, complicated kernel functions.
Interestingly, by comparing equation ( 18) with equation (7), we can relate the coherence consumption to the kernel function.For example, by setting α = 1 in equation (7), we obtain The kernel matrix's diagonal elements, in an ideal implementation without hardware noise, are equal to one, as per the definition of the kernel.According to equation (19), the coherence consumption in this scenario is zero.On the other hand, when it comes to the off-diagonal elements, one can expect the coherence consumption to be non-zero, since K( x, x ) < 1.
In real situations, where noise cannot be ignored, the diagonal elements of a kernel will be smaller than one, resulting in loss of coherence as shown by Eq. (19).

IV. IMPLEMENTATION ON IBM HARDWARE
In this section, we describe our implementation of supervised machine learning based on the DQC1 model.Our scheme was implemented on the "ibm_perth" quantum processor, as shown in Fig. 3, using IBM's opensource software interface, Qiskit.Fig. 4 (a) shows the schematics of our implemented circuit which is composed of two target qubits and three ancilla qubits.In the first part of the circuit, left to the dashed line, input states for control and target qubits are prepared.The mixed state preparation of target qubits are based on creating Bell states between target and ancilla qubits, followed by ignoring the state of ancilla qubits.The resulting state of the control and target qubits right before the dashed line is given by ρ control ⊗ I2 4 where ρ control = diag(cos 2 θ 2 , sin 2 θ 2 ).By choosing θ = 2 cos −1 ( 1+α 2 ) the state of control qubit becomes I1+αZ
In our implementation on the IBM real hardware, the purity of the resulting target qubits is tr(ρ 2 target ) = 0.506, which deviates from the ideal mixed state by 6 × 10 −3 .
To benchmark the performance of our protocol, we use the encoding map and dataset used in [6].In [6] the data points x and x are mapped onto gate parameters of the unitary matrix U 2 ( x, x ) = u l ( x) u l ( x ) † , where with l as the number of iterations of each layer in the feature map and encoding map U φ( x) , and H ⊗2 denotes two Hadamard gates acting on two qubits (See Fig. 4c). where , and Z i denotes Pauli Z gate.Fig. 4(c) shows the quantum circuit that describes Eq.( 20) for l = 2.We defined our kernel as K( x, x ) = |tr(U2( x, x ))|

4
. It has been conjectured that approximation of the resulting kernel function for the encoding map Eq. ( 21) with l = 2 is hard classically, i.e. the resources required to perform it, increase at a nonpolynomial rate with respect to number of qubits [6].
Our implementation is divided into three phases.In the first stage, we run the circuit in Fig. 4 for all pairs of training data to obtain the corresponding density matrix for the control qubit, using the quantum state † .u l ( x) is the encoding circuit, where l is the number of the iterations of this gate decomposition (length of the circuit).(c) The gate components of the unitary operator u l ( x) adapted from [6] for two target qubits n = 2, and two iterations l = 2.Here U φ ( x) is the feature map (defined in the text).At first, the Hadamard gate is applied to all qubits and then a diagonal gate U φ ( x) acts on the qubits.(d) The feature encoding circuit U φ ( x) in Eq.( 21). Here, ) is a single qubit phase gate.
tomography package in Qiskit with repeating each measurement 8000 times (shots), and therefore to obtain the corresponding kernel function.Having obtained the kernel function on the quantum hardware, we apply the classical SVM to obtain the optimal separating hyperplane, or equivalently β * by applying Eq.( 17).Finally, in the prediction phase, given test data x, we run the DQC1 circuit to estimate the K( x, x i ) for all x i ∈ X train and apply Eq.( 16).
In Fig. 5, we display the results of applying the above procedure for the classification task on the "adhoc" dataset for the IBM simulator (Qiskit) (left) and IBM 7-  qubit hardware (right) for l = 2 for the control qubit in the pure state, i.e. α = 1 [33].From Fig. (5), it can be seen that the accuracy of the Qiskit simulator is 100%.
On the other hand, the obtained accuracy on the hardware is 90%.The difference between the simulation and hardware performance can be attributed to the effects of hardware noise.It is worth noting that the circuits were optimized using the Qiskit compiler, specifically the Approximate Quantum Compilation method [34].This method converts the entire circuit (excluding ancilla) to a 3-qubit unitary matrix, and then re-synthesizes it into a new circuit that approximates the matrix with 0.995 accuracy (synthesis fidelity).A higher synthesis fidelity uses more CNOT gates in the resulting circuit, which reduces approximation error but increases runtime noise.This method reduced the CNOT gate count of the circuit from 177 to 19.

V. THE ROLE OF COHERENCE AND THE EFFECT OF NOISE
In the following, we explore the role of control qubit's coherence, hardware noise, coherence consumption and quantum discord production in our setting.
To see the role of the control qubit's coherence in our implementation, we repeat the learning task with the control qubit prepared in the state I1+αZ

2
, where 0 ≤ α ≤ 1.In Fig. 6 we show the prediction accuracy in the simulation (blue dots) and the implementation (red dots) as a function of the purity of the control qubit.From Fig. 6 one can see that α = 0, for which the control qubit is in a maximally mixed state, the accuracy is 0.5, corresponding to randomly guessing the labels.By increasing the purity, however, the accuracy increases until it reaches its maximum value at α ≥ 0.6.Due to the device noise, the accuracy in the implementation is degraded in comparison to the simulation.
For completeness we repeat the learning process with two well-known datasets called "make-moon", and "makecircle" from "scikit-learn", each of them including 800 training data points, and 200 testing data points.For these two datasets we observed an abrupt change in the accuracy for α ≥ 0.2.Hence the critical value of α depends on the dataset.These results are depicted in Fig. 7.By interpreting 1 − α as the noise strength, one can see that the accuracy is robust against noise for α ≥ 0.6 (Fig. 6) and α ≥ 0.2 (Fig. 7).Likewise, variational quantum circuits are predicted to display similar robustness against noise [35].Let us emphasize that in the rest of our paper we use α = 1.
In Fig. (8) we show the absolute value of the kernel obtained from simulation (a) and experiment (b).The difference between the two kernels can be attributed to hardware noise.To better show the role of noise in the kernel, we compare the diagonal elements of the kernel obtained from simulation and experiment.As discussed earlier, in the ideal case K( x, x) = 1 (blue bar) but in practice one has K( x, x) < 1.In Fig. (8), one finds a maximum difference of 0.610 between simulation and implementation, while the mean difference is 0.27.Having access to the kernel, we can obtain the coherence consumption in our implementation from Eq. ( 19), as shown in Fig. (9).In accordance with Eq.( 19), it can be seen from Fig. (9) that the coherence consumption is minimum (but not equal to zero in the experiment) along the diagonal axes.Our next step is to obtain the generated discord in our implementation based on Eq. (12).Eq.( 12) indicates that for estimating the discord, tr(U 2 n ) must be estimated, which requires successive application of DQC1 evolution Eq.(2).Fig. (10) shows the quantum discord.When comparing Fig. (10) with Fig. (9), it is also evident that the condition (13) is satisfied.

Simulation Results
Experimental Results Figure 10: The simulation results (left), and experimental results (right) for geometric discord for the same dataset as in Fig. 5, and the circuit in Fig. 4.

VI. CONCLUSION
In this study, we have experimentally investigated the application of the DQC1 model, a restricted computational model, to supervised machine learning tasks.Unlike the standard universal computational model, the DQC1 model relies on mixed states and does not incorporate quantum entanglement into the computation.We have presented a test of the DQC1 model's ability to solve supervised machine learning problems for some classically difficult kernels [6].Despite requiring a greater number of gates than a similar protocol described in [6], since one needs measures only the control qubit, our protocol still achieved a relatively high level of classification accuracy.Our proposal highlights the potential of utilizing quantum discord over entanglement in the presence of noise as well as minimizing measurement requirements and avoiding untrainability issues encountered in other quantum kernel methods.
In a broader context, our work highlights the computational power of a single-qubit as a universal classi-fier [36,37].There is limited literature on the application of DQC1 in machine learning [26,32].It would be interesting to realize our protocol in the NMR setting [38].We hope that this study will inspire further research on the integration of quantum coherence and quantum discord in machine learning.

Figure 1 :
Figure 1: The circuit representation of the DQC1 algorithm.The input states for control and target qubits are I 1 +αZ 2 , with α ∈ [0, 1] and In 2 n , respectively.H and Z denote the Hadamard and Pauli Z gates, respectively.

Figure 2 :
Figure 2: (a) A Support Vector Machine (SVM) is a classifier used to separate two linearly separable classes, depicted in black and white.The data points closest to the decision boundary (shown in red), one from each class, are known as support vectors and are indicated by green circles.(b) When data points of two classes cannot be separated by a hyperplane in the original space (left), a non-linear mapping can be applied to project the data points into a higher-dimensional feature space (right) where a hyperplane can be found to separate the classes.

Figure 4 :
Figure 4: (a) A schematic picture of a three-qubit version of DQC1 circuit, with one control-qubit, and two target qubits and three ancilla qubits.The first part of the circuit, before the dashed line, prepares the control qubit in I 1 +αZ 2 and target qubits in mixed state.Here, Ry(θ) = exp −i θ 2 Y , is a rotation gate around y axis, and Y denotes Pauli Y gate.(b) The gate decomposition for the unitary matrix Un = u l ( x) u l ( x ) † .u l ( x) is the encoding circuit, where l

Figure 5 :
Figure 5: The simulation (left) and experimental (right) results for the DQC1 kernel classification are shown in Fig.4with n = 2 and l = 2.We used the "adhoc" dataset, which includes 20 training and 5 test samples per label.The accuracy of classification for the IBM quantum simulator (Qiskit) is 100%, while it is 90% for IBM's real hardware.

Figure 6 :
Figure 6: The accuracy as a function of the control qubit's purity for the same dataset as in Fig.(5) is shown.Note that when α = 0, the state is in a completely mixed state, and when α = 1, the state is pure.The blue curve indicates simulation results, and the red curve shows experimental results.

Figure 8 :
Figure 8: (a).The simulation (left), and (b).experimental (right) results for the DQC1 quantum kernel with n = 2, and l = 2. Experimental results have been obtained from "ibm_perth" device.(c).Diagonal elements of simulated (blue bars), and experimental (red bars) kernel matrices.The maximum difference between diagonal elements is 0.610, and the mean difference is 0.27.

Figure 9 :
Figure 9: Simulation results (left), and experimental results (right) for coherence consumption for the same dataset as in Fig. 5, and the circuit in Fig. 4.