Clinical Data Classi�cation With Noisy Intermediate Scale Quantum Computers

Quantum machine learning has experienced a significant progress in both software and hardware development in the recent years and has emerged as an applicable area of near-term quantum computers. In this work, we investigate the feasibility of utilizing quantum machine learning (QML) on real clinical datasets. We propose two QML algorithms for data classification on IBM quantum hardware: a quantum distance classifier (qDS) and a simplified quantum-kernel support vector machine (sqKSVM). We utilize these different methods using the linear time quantum data encoding technique ( log 2 𝑁 ) for embedding classical data into quantum states and estimating the inner product on 15-qubit IBMQ Melbourne quantum computer. We match the predictive performance of our QML approaches with prior QML methods and with their classical counterpart algorithms for three open-access clinical datasets. Our results imply that the qDS in small sample and feature count datasets outperforms kernel-based methods. In contrast, quantum kernel approaches outperform qDS in high sample and feature count datasets. We demonstrate that the log 2 𝑁 encoding increases predictive performance with up to +2% area under the receiver operator characteristics curve across all quantum machine learning approaches, thus, making it ideal for machine learning tasks executed in Noisy Intermediate Scale Quantum computers.


INTRODUCTION
Quantum technologies promise to revolutionize the future of information and computation using quantum devices to process massive amount of data.To date, considerable progress has been made from both software and hardware points of view.Many researches are underway to simplify quantum algorithms [1][2][3][4][5][6][7][8] in order to implement them on existing, so-called Noisy Intermediate Scale Quantum (NISQ) computers 9 .As a result, small quantum devices based on photons, superconductors, or trapped ions are capable of efficiently running scalable quantum algorithms 6,7,10   .Quantum Machine Learning (QML) is a particularly interesting approach, as it is suited for existing NISQ architectures [11][12][13][14][15] .While conventional machine learning is generally applied to process large amounts of data, many research fields cannot provide such large datasets.One example is medical research, where collecting cohorts that represent certain characteristics of diseases routinely results in small datasets 16 .NISQ devices can efficiently execute algorithms with shallow depth and a low number of qubits 9 .Therefore, it appears logical to exploit the potential of QML executed on NISQ devices incorporating clinical datasets.
However, the execution of QML algorithms in the form of practical quantum gate operations is non-trivial.First, the classical data needs to be encoded into quantum states.For this purpose, prior QML algorithms assume that a quantum random access memory (QRAM) device for storing the data is present 17 .Nevertheless, to date, such practical device is not available.Second, since the output of quantum algorithms are obviously quantum states, the efficient classical bits of information must be extracted through quantum measurements.To date, various classical data encoding approaches have been proposed 6,7,[18][19][20][21] .In particular, encoding classical numerical features into quantum states has the advantage to utilize log 2  number of qubits (a.k.a.linear time encoding) in relation to  number of input features [18][19][20][21] .This approach allows to utilize NISQ devices with a small numbers of qubits and to minimize quantum noise, while at the same time maintaining quantum speedup 14 .In contrast, to date, this approach in combination with quantum machine learning appears to be underrepresented.
In light of the above proceedings, we hypothesize that clinically-relevant quantum prediction models can be built on NISQ devices employing log 2  encoding, having prediction performances comparable to classic ML approaches.
In our work, we propose two quantum machine learning approaches that rely on the log 2  encoding approach.First, we demonstrate a simple and efficient quantum distance classifier (qDC) executable on existing NISQ devices.Second, we present a simplified quantum-kernel SVM (sqKSVM) approach using quantum kernels which can be executed once without optimization instead of twice with optimization as in case of the quantum-kernel SVM (qKSVM) approach 6,7 .
In order to test our hypothesis, we demonstrate the performance of the qDC and the sqKSVM approaches using real clinical data and compare their performances to qKSVM, as well as to classic computing counterparts such as k-nearest neighbors 22 and classic support vector machines 23 .

Dataset
This study incorporated three open-access clinical datasets that have been presented and evaluated in various contexts [24][25][26] .Each dataset underwent redundancy reduction by correlation matrix analysis 27 followed by a 10-fold cross-validation split with a training-validation ratio of 80%-20% 16 .Training sets of the folds were subjects of feature ranking analysis 28 and the highestranking eight as well as 16 (if available) features were selected for further analysis.The resulted dataset configurations were analyzed by class imbalance ratios and the quantum advantage score (a.k.a.difference geometry) 20 for quantum kernel methods.Table 1 demonstrates the characteristics of the data configurations as well as the results of the imbalance ratio and the quantum advantage scores (for estimation of the quantum advantage scores (  ), see Appendix E of the supplementary material).

Encoding strategies
This study relies on the data encoding strategy which uses sequences of Pauli-Y gate rotations (  ) and  gates (see Appendix A of the supplementary material) to result in a number of log 2  encoding qubits 18,19,21 where |( ⃗)⟩ =  ( ⃗)  ⊗ |0⟩ ⊗ .( ⃗) is the model circuit for N features data encoding.
In order to compare the predictive performance of the above two data encoding strategies, the qDC, the sqKSVM and the qKSVM (see Appendix C of the supplementary material) approaches were compared utilizing a number of  = 8 features.This analysis was executed using the Pennylane simulator environment 30 , while the sqKSVM was also evaluated on the IBMQ Melbourne machine (see Methods).*The sqKSVM was also executed on the IBMQ Melbourne machine for reference comparison.

Quantum and classic machine learning predictive performance evaluation
The quantum distance classifier (qDC) first calculates the distance between the state vector of a where |•| is the norm  2 of a vector.Therefore, the task is to calculate the inner product ⟨|⟩ with a quantum computer.
The two different approaches to estimate ⟨|⟩ with quantum computers are Hadamard Test 31 and the Swap Test 32 .For the simplified quantum kernel SVM (sqKSVM), we first need to note that the standard form of the quantum kernelized binary classifiers is where  ̃ is the unknown label,   is the label of the  th training sample,   * is the  th component of yields the binary output as following In equation ( 5), ( ⃗  ,  ⃗ ) is defined as (see Appendix F of the supplementary material) The dataset configurations were utilized to estimate the performance of quantum and classic machine learning algorithms incorporated in this study.Performance estimation was done by confusion matrix analytics 33 .Prediction models were built based on the given training subset, followed by evaluating the respective validation subsets of each fold.Average area under the receiver operator characteristics curve (AUC) was calculated across validation cases for each predictive model.To build predictive models, quantum ML approaches included the qDC, the sqKSVM and the qKSVM (see Appendix C of supplementary material) were utilized.Classic machine learning approaches were k-nearest neighbors (ckNN) 22 and support vector machines (cSVM) 23 .See Table 3

DISCUSSION
In this study, we aimed to investigate the effect of two encoding strategies in various quantum machine learning-built clinical prediction models.Next to prior quantum machine learning approaches, we also proposed two methods specifically designed for the log 2  encoding approach.
Our results demonstrate that the log 2  encoding in combination with low-complexity quantum machine learning approaches provides comparable or better results than the  encoding approach with previously-proposed quantum machine learning methods.This advantage was demonstrated not only in a simulator environment, but also utilizing NISQ devices.The low algorithmic quantum complexity also aims towards building prediction models that may be easier to interpret in the future, especially in light of the high complexity of classic machine learning approaches 34 .In contrast, it is important to emphasize, that the proposed quantum machine learning processes are also applicable in big data, given, that calculating the inner product of quantum states in NISQ devices can be done efficiently with the log 2  encoding approach 21,31 .
The log 2  data encoding we used is also quite robust against noise, since they use a small part of the entire Hilbert space of the quantum device to estimate the inner product of quantum states 10 .
When feature count increases,   increases as well, because quantum state vectors of input features become closer due to the high dimensionality property of the Hilbert space.Higher feature count significantly influences performance in a positive way if   is < 1 (e.g.+5-6% AUC in the Pediatric bone marrow dataset).It has been shown that classical ML models are competitive or outperform quantum ML approaches when   is small 20 .Nevertheless, we demonstrated that when   > 1, higher feature count does not contribute much to the performance increase (e.g.1% difference in the Wisconsin breast cancer dataset).It is important to point out that a high   (> 1) alone does not mean that the dataset is not ideal for kernel-based quantum machine learning.Specifically, the highest AUC of 0.93 was achieved in the 16 feature counts Wisconsin breast cancer dataset, while it also demonstrated the highest   , which also confirms prior findings 20 .In contrast, the same dataset in the classic SVM resulted in 0.89 AUC.We hypothesize that this phenomenon is due to the high sample count of the Wisconsin breast cancer dataset (M=569).In general, the imbalance ratio of the datasets did not appear to be correlated with predictive performance.The log 2  increased AUC with up to 2% compared to the  encoding when comparing the execution of the quantum machine learning approaches using simulation environment.This behavior was also identifiable with executions in NISQ devices, in case of kernel methods and the qDC.We hypothesize that lower AUC performance for the  encoding method in the simulator environment and NISQ device is due to higher number of qubits which likely lead to lower value of inner products.This is in line with the findings in 20 .
In general, the qKSVM demonstrated 2-5% higher AUC compared to the sqKSVM.The relative performance increase of the qKSVM was in relation to sample count and feature count.
Specifically, the qKSVM showed an average 2% higher AUC with small sample count (Heart failure and Pediatric bone datasets), while it had 5% higher AUC in the Wisconsin breast cancer dataset.Nevertheless, both the qKSVM and the sqKSVM increased its AUC with double feature counts in the small Pediatric bone marrow dataset.This level of performance increase was not identifiable in the larger Wisconsin breast cancer dataset.Classic SVM demonstrated similar properties in relation to higher feature counts in small datasets 20 , while it was outperformed by the qKSVM in the large Wisconsin breast cancer dataset.
In conclusion, quantum SVM approaches benefit from higher feature count in general, where the qKSVMdue to relying on optimizationhas a particular benefit compared to the sqKSVM.In the large Wisconsin breast cancer dataset, the qDC demonstrated higher performance compared to the sqKSVM, especially in small feature counts (0.91 AUC vs 0.87 AUC in the qDC and the sqKSVM respectively in 8 features).The qDC resulted in the highest AUC of 0.60 across all other quantum (0.50-0.51 AUC) and classic machine learning (0.53-0.58 AUC) approaches in the Heart failure dataset.We hypothesize that this is due to the distribution characteristics of the samples belonging to the two subclasses in the feature space, which challenges classification with kernel methods.Generally, the performance of the executed quantum and classic machine learning approaches are comparable within the collected cohorts (Table 3).
According to our findings, quantum distance approaches can provide high performance with small feature and sample counts, which is particularly ideal for NISQ devices.In contrast, quantum kernel methods appear to provide high performance with high feature and sample counts.We demonstrated that the log 2  encoding strategy allows to execute quantum ML algorithms for highly dimensional clinical datasets on low qubit count NISQ devices.In general, quantum machine learning benefits from utilizing the log 2  encoding strategy, as it increases predictive performance and reduces execution time in NISQ devices, while keeping model complexity lower.
We consider these findings of high importance in relation to building future quantum ML prediction models for clinically-relevant cohorts.

METHODS
All experiments of this study were performed in accordance with the respective guidelines and regulations of the open-access data sources this study relied on.For details, see section "Access".

Estimation of the inner product ⟨𝒖|𝒗⟩ and |⟨𝒖|𝒗⟩| 𝟐
Fig. 1 shows the quantum circuit for estimation of the real part of ⟨|⟩ with the Hadamard Test.To estimate the real part of ⟨|⟩ on the quantum computer with the Hadamard Test, the training and test data needs to be prepared in a quantum state as 1 where |⟩ and |⟩ are the quantum states for the train and test datasets, respectively.
Then the Hadamard gate on the ancilla qubit interferences the training vector |⟩ with the test Finally, the measuring quantum state given in equation ( 8) in the computational basis |0⟩  gives probability as where Pr is the value of the probability of measurement on the |0⟩  state of equation (8)   and ⟨|⟩ = ⟨|⟩ = 1.Since our datasets are real values (⟨|⟩) = ⟨|⟩.
The inner product ⟨|⟩ can also be estimated on a quantum computer with the Swap Test (see Fig. 2).The Hadamard gate is applied on the ancilla qubit to create a superposition of |⟩|⟩, i.e.
The application of the single-controlled swap gates on the state given in equation ( 10) entangles the ancilla qubit with |⟩|⟩.The resulted entangled quantum state is Measuring the quantum state given in equation (11) in the computational basis |0⟩  yields the probability where Pr is the value of the probability of measurement on the |0⟩  state of equation (11).

Simplified Quantum Kernel Support Vector Machine
The quantum Support Vector Machine algorithm is proposed in 35 for big data classification.They show exponential speedup for their algorithm via quantum mechanically access to data.
Nevertheless, this approach is not ideal for NISQ devices 9 .To date, two separate qKSVM approaches are proposed for data classification via classical access to data 6,7 .In these approaches, the quantum circuits must run twice on the quantum computer and a cost function needs to be optimized on the classical computer to compute the support vector 7 .We propose a simplified version qKSVM called sqKSVM as shown in Fig. 3.

Software and Hardware
For classical machine learning algorithms, we use classical machine learning (CML) libraries of scikit-learn 36 .Pennylane-Qiskit 30   Singlequbit error rate is the error induced by applying the single qubit gates. error is the error of the only twoqubit  gates.Each circle represents a physical superconducting qubit and each ↔ shows coupling between nearest neighbor qubits.

Implementation of the Hadamard Test on IBMQ Melbourne machine
The Hadamard Test approach is used to estimate the inner product of two state vectors.To map the quantum circuit of the Hadamard Test (Fig. 1) on IBMQ Melbourne machine, first, we design the circuit in the Pennylane-Qiskit 0.13.0 environment.As can be seen from Fig. 5, there are Toffoli gates (), and single control   gates which are needed to be decomposed before mapping the circuit of Fig. 5 on the IBMQ Melbourne machine architecture.To this end, we decompose each Toffoli gate in Fig. 5 into Hadamard (), ,  † , and  gates and each single control   gates into single qubit rotation gates and two s (see Appendix B of the supplementary material).To design the Hadamard Test dataset with 16 features, ( 0 ,  4 ) cannot be implemented on IBMQ Melbourne machine directly due to coupling constraints 37 .
For the implementation of SWAP test on IBMQ Melbourne machine see Appendix D in the supplementary material.
test sample and each state vector of the training sample in set  and set  and, then, assigns a label of the test sample to the label of the closest set.In the qDC, we divide the training set, with  number of samples, based on their labels {,  ∈ } into two subset {} and {}, where {} contains only label  with the number of samples  and {} contains only label  with the number of samples  with  +  = .The task is to determine the label of the given test sample {  }, if   =  or   = .Mathematically, if |⟩ is the state vector of the test sample as well as |⟩ ∈  and |⟩ ∈ , then the label of |⟩ is determined by   = , if (||⟩ − |⟩|) ≤ (||⟩ − |⟩|), otherwise   = .The distance between the vectors is given by 8 i.e. ||⟩ − |⟩| = ||⟩|||⟩| − ⟨|⟩ the support vector  ⃗ * = ( 1 * ,  2 * , … ,   * ),  is the number of training data, and ( ⃗  ,  ⃗ ) is the kernel matrix of all the training-test pairs.For a given dataset  = {( ⃗  ,   ):  ⃗  ∈   ,   ∈ {−1, 1} } =1,…, , one option to bypass the drawbacks of the qKSVM algorithm (see Appendix C of the supplementary material) as presented in 6,7 is to set uniform weight   * = 1, in case of  = 0.5 (balanced dataset).Otherwise,   * =  for the majority class and   * = 1 −  for the minority class.Thresholding the value ∑     * ( ⃗  ,  ⃗ )  =1

Figure 1 .
Figure 1.Quantum circuit computes the real part of the inner product ⟨|⟩.The Hadamard gate puts the ancilla qubit ( 0 ) into uniform superposition.A single-controlled unitary gate entangles the exited state of the ancilla qubit with the training data state vector (|⟩ = | 1  2  3 ⟩).The  gate flips the ancilla qubit.Another single unitary controlled gate entangles the state vector of the test data (|⟩ = | 1  2  3 ⟩) with the excited state of the ancilla qubit.A second  gate flips the ancilla qubit.The Hadamard gate on the ancilla qubit interferences train and test data state vectors.The ancilla

Figure 2 .
Figure 2. Quantum Circuit to compute |⟨|⟩|  .The model circuits encode train and test data into quantum states |⟩ = | 1  2  3 ⟩ and |⟩ = | 1  2  3 ⟩.The Hadamard gate on the ancilla qubit ( 0 ) generates a superposition of the quantum state including the train and test datasets.The application of the single-controlled swap gates with the ancilla qubit as the control results in an entangled state of equation (10).Another Hadamard gate on the ancilla qubit interferences |⟩|⟩ and |⟩|⟩.The ancilla qubit on the |0⟩ state is measured with Pauli- gate.Therefore, the value of |⟨|⟩| 2 can be obtained from equation (12).

Figure 3 .
Figure 3. Schematic of the sqKSVM for data classification algorithm.First, the training data vector  ⃗ and test  ̃ are prepared on a classical computer.Next, the original training data and test data are encoded into quantum states followed by computing the kernel matrix of all pairs of the training-test data ( ⃗  ,  ⃗ ) with a NISQ computer.If  * = ( 1 * ,  2 * , … ,   * ) are considered to be a solution of the support vector, the binary classifier can be constructed based on equation (5).

Figure 5 .
Figure 5.The Hadamard Test circuit to estimate the inner product of state vectors of the training and the test datasets with 8 features (3+1-qubit).The ancilla qubit (0 0 ) is measured to estimate ⟨|⟩.The blue boxes are the part of quantum circuit that encodes the train data into the quantum state and entangles the ancilla qubit with the quantum state vector of the train data.The quantum circuit in the green boxes encode the test data and also entangle quantum state vector of the test data with the ancilla qubit.

Table 1 .
Clinical datasets utilized for the study with their sample and selected feature count as well as their imbalance ratios and quantum advantage scores (  ).Given a two-class dataset, the imbalance ratio () is  =   ⁄ , where  is the number of minority class and  is the total number of samples.Furthermore,   measures the similarities of quantum kernel and linear classical kernel functions of the same dataset.

Table 2
demonstrates the cross-validation AUC performance values of the quantum ML algorithms in relation to the log 2  and  encoding qubit strategies.

Table 2 . Comparison of the cross-validation AUC performance for different data encodings
. The qDC, qKSVM, and sqKSVM run on Pennylane simulator for  = 8.For log 2  encoding,  features are encoded into log 2  qubits with sequences of Pauli-Y gate rotations (  ) and s.In another strategy,  features are encoded into  qubits with sequences of the Hadamard gates, Pauli-Z gate (  ) rotations followed by nearest neighbor s.

Table 3 . Comparison of the cross-validation AUC performance with QML and ML algorithms
for the comparison of cross-validation AUC performances of quantum and classic computing algorithm within the dataset configurations.
. For all QML algorithms,  features are encoded into log 2  qubits with sequences of Pauli-Y gate rotations (  ) and s.All QML algorithms were executed on the IBMQ Melbourne machine.*Heart failure has no 16-feature variant, since the maximum number of features are 13.
gate operations, complex gate operations must be decomposed into elementary supported gates before mapping the quantum circuit on noisy hardware.Owing to the specific architecture of IBM quantum computers, the coupling map must satisfy all two qubit  gate operations37, i.e., if   is the control qubit and   is the target qubit, (  ,   ) can only be applied if there is coupling between   and   .In case of running the QML algorithms on the quantum computer, we choose the 15-qubits IBMQ Melbourne machine with the standard gates ,  1 ,  2 ,  3 , and , where  is identity single qubit gate,  1 ,  2 ,  3 are single qubit arbitrary rotation gates with  as two-qubit gate.