Quantum discriminator for binary classification

Quantum computers have the unique ability to operate relatively quickly in high-dimensional spaces—this is sought to give them a competitive advantage over classical computers. In this work, we propose a novel quantum machine learning model called the Quantum Discriminator, which leverages the ability of quantum computers to operate in the high-dimensional spaces. The quantum discriminator is trained using a quantum-classical hybrid algorithm in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {O}(N\log N)$$\end{document}O(NlogN) time, and inferencing is performed on a universal quantum computer in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {O}(N)$$\end{document}O(N) time. The quantum discriminator takes as input the binary features extracted from a given datum along with a prediction qubit, and outputs the predicted label. We analyze its performance on the Iris and Bars and Stripes data sets, and show that it can attain 99% accuracy in simulation.


Introduction
Machine learning has become ubiquitous in almost every discipline under the sun (Qiu et al., 2016).While high-quality training data will only continue to increase in availability in the coming decades, it is projected that classical approaches to machine learning will fail to keep pace with this increase owing to the end of Moore's Law (Aimone, 2019;Preskill, 2018).Consequently, we must look towards alternative computing paradigms such as quantum and neuromorphic computing to address these scalability issues and develop more efficient machine learning methods (Biamonte et al., 2017;Date et al., 2021;Date, 2019).
Quantum computing has the potential to significantly speed up machine learning tasks (Biamonte et al., 2017;Ciliberto et al., 2018).Quantum computers use the quantum phenomena of superposition, tunneling and entanglement to per-Proceedings of the 39 th International Conference on Machine Learning, Baltimore, Maryland, USA, PMLR 162, 2022.Copyright 2022 by the author(s).
form computations (Nielsen & Chuang, 2002).As a result, they are able to operate in high-dimensional tensor-product spaces much faster than classical computers (Vazirani, 1998;Preskill, 1998).For certain applications such as integer factoring and searching, quantum computers are known to outperform classical computers (Shor, 1994;Grover, 1997).We believe the ability of quantum computers to efficiently operate in these high-dimensional tensor product spaces can be leveraged to design efficient training and inferencing methods.
In this work, we focus on binary classification.We operate within the traditional two-step workflow in machine learning, where the first step is to extract features from the data, and the second step is to perform classification using a discriminant function.We further assume that binary features have been extracted from the data; many approaches for extracting binary features already exist in the literature (Ren et al., 2014;Chalkidis et al., 2017).Therefore, our focus in this paper is to propose the Quantum Discriminator, which is a quantum discriminant model that performs binary classification on a set of binary features.We also present the hybrid quantum-classical training algorithm used to train the quantum discriminator in O(N log N ) time.
As a proof of concept, we demonstrate that our model can be used to completely solve the 2-bit binary classification problem.We also benchmark the quantum discriminator on the Iris and the Bars and Stripes data sets.Our results demonstrate that under a proper feature extraction and training regime, the quantum discriminator can attain a near-perfect (99%) accuracy in simulation on the these data sets.Ideally, we would want to benchmark the quantum discriminator on benchmark data sets such as MNIST.However, we are severely constrained by the small size and noisy nature of today's quantum computers, and can only embed small data sets (such as Iris and Bars and Stripes) reliably.It is expected that future quantum computers would be larger and more reliable, making it possible to embed larger data sets.
Section 2 and 3 cover the related work and notation used in this paper.The quantum discriminator, associated quantumclassical training algorithm, theoretical analysis and notes on generalizability are presented in Section 4. In Section 5 and Appendix A, we apply the quantum discriminator to the 2-bit binary classification problem as a proof of concept.In arXiv: 2009.01235v3 [quant-ph] 28 Jan 2022 Section 6, we benchmark the performance of our model on the Iris and Bars and Stripes data sets.

Related Work
Several machine learning approaches on universal quantum computers have been proposed in the literature (Biamonte et al., 2017).Lloyd and Weedbrook as well as Dallaire-Demers and Killoran derive the theoretical underpinnings of quantum generative adversarial networks (Lloyd & Weedbrook, 2018;Dallaire-Demers & Killoran, 2018) (Quiroga et al., 2021).Apart from universal quantum computing approaches, adiabatic quantum machine learning approaches have also been proposed for traditional machine learning models such as regression and k-means clustering (Date & Potok, 2021;Arthur & Date, 2021).
The above literature describes quantum implementations of conventional machine learning models.Another line of research in quantum machine learning focuses on developing purely quantum or hybrid quantum-classical models that are novel and different from conventional machine learning models.Gambs recasts the quantum discrimination problem within the framework of machine learning and uses the notion of learning reduction to solve different variants of the quantum classification task (Gambs, 2008).Sentis et al. present a quantum learning machine for binary classification of qubit states that does not require quantum memory and produces classifiers that are robust to an arbitrary amount of noise (Sentís et al., 2012).Chen et al. propose a discrimination method for two similar quantum systems and apply it to quantum ensemble classification (Chen et al., 2016).Sergioli et al. propose a new quantum classifier called the Helstrom Quantum Centroid, which acts on density matrices that encode the classical patterns of a data set onto the quantum computer (Sergioli et al., 2019).Park et al. focus on the squared overlap between quantum states as a similarity measure and examine the essential ingredients for the quantum binary classification, advancing the theory of quantum kernel-based binary classification (Park et al., 2020).In order to reduce the number of trainable parameters of a quantum circuit, Li et al. propose the Variational Shadow Quantum Learning (VSQL) framework (Li et al., 2020).Blank et al. present a distance-based quantum classifier whose kernel is based on the quantum state fidelity between training and test data (Blank et al., 2020).They also conduct proof of principle experiments on the IBMQ platform.
A number of approaches to quantum machine learning leverage the two-step workflow followed by traditional machine learning models.Havlicek et al. propose a quantum variational classifier and a quantum kernel estimator for classification problems (Havlíček et al., 2019).Schuld and Killoran propose a nonlinear feature map that maps data to a quantum feature space and discuss two discriminant models for classification (Schuld & Killoran, 2019).Bergou and Hillery propose a quantum discriminator that can distinguish between two unknown quantum states (Bergou & Hillery, 2005).Lloyd et al. present a two-part quantum machine learning model, where the first part of the circuit implements a quantum feature map that encodes classical inputs into quantum states, and the second part of the circuit executes a quantum measurement, which acts as the output of the model.(Lloyd et al., 2020) Both universal as well as adiabatic approaches for quantum machine learning have been proposed in the literature.However, there are several limitations in the current state-ofthe-art.Most of the proposed approaches are based on variational quantum circuits.By definition, these are iterative quantum-classical hybrid approaches, and require multiple data exchanges between the quantum and the classical computer.These data exchanges necessitate frequent measurements of the quantum states, which introduce measurement errors as well as collapse the quantum superposition.When the superposition is collapsed frequently, the ability of quantum computers to operate efficiently in high-dimensional tensor product spaces is curtailed.So, it is desirable to minimize the use of classical computers in quantum machine learning methods.With regards to validating quantum machine learning methods, most of the approaches in the literature are not validated on benchmark data sets such as Iris and MNIST-this is largely due to the small size and noisy nature of today's quantum computers.
The quantum discriminator proposed in this paper mitigates some of these challenges.Firstly, it uses classical computers to process O(N ) data during training only.The classical computer is not used in the inferencing stage.This restricted use of the classical computer enables the quantum discriminator to operate in the high-dimensional tensor-product spaces efficiently.Next, we validate the quantum discriminator on the Iris and the Bars and Stripes data sets, which, in quantum machine learning, are some of the largest data sets that can be embedded on today's quantum computers.Ideally, we would like to validate the quantum discriminator on the MNIST, CIFAR, ImageNet and other benchmark machine learning data sets.However, the small size and low reliability of current quantum computers prevent us from embedding these large data sets onto them.Another

Notation
We use the following notation throughout this paper: • R, N, B: Set of real numbers, natural numbers and binary numbers (B = {0, 1}) respectively.
• N : Number of data points in training data set (N ∈ N).
• d: Dimension of each data point in the training data set (d ∈ N).
• b: Dimension of each data point in the binary feature set of the training data set (b ∈ N).
• B: Number of unique states that can be attained using b bits (B = 2 b ).
• X: The training data set (X ∈ R N ×d ).
• Y : The training labels (Y ∈ B N ).If the i th data point belongs to class 0 (class 1), then y i = 0 (y i = 1).
• X: The binary feature set of the training data set X ( X ∈ B N ×b ).xi ∈ X contains the features corresponding to the i th data point x i ∈ X.
• P : The labels predicted by the quantum binary classification model (P ∈ B N ).Ideally, the predicted labels should be identical to the training labels (Y ).

The Quantum Discriminator
Given a data point x which belongs to one of two classes (Class 0 or Class 1), we would like to predict the correct class for x.Generally, the binary classification model is characterized by a set of model parameters Θ.The workflow used for traditional machine learning classification models-such as support vector machines (SVM) and logistic regression-is comprised of 2-steps: (i) Feature extraction from the data; and (ii) Class determination by application of a discriminant function.The workflow governing . . . . . .our quantum model for binary classification is analogous to this 2-step workflow as shown in Figure 1.
In Figure 1, we are given a data point x ∈ R d whose class needs to be predicted.We first extract the binary features of x, denoted by x ∈ B b .Usually, the extracted features are domain specific, for example, Histogram of Oriented Gradients (HOG) (Shu et al., 2011) and Scale-Invariant Feature Transform (SIFT) (Nguyen et al., 2014) have been widely used in computer vision.The feature space could also originate from dimensionality reduction techniques like Principal Component Analysis (PCA).So, we do not make any assumptions on the extracted features, except that they are binary.This feature set of the training data is denoted by X in Figure 1.Each point x ∈ X is a point on a bdimensional unit hypercube, and there are B such points.Given x, its equivalent quantum feature state is denoted by |x , which is a point in b-dimensional Hilbert space.
In addition to preparing the quantum feature state in Figure 1, we also prepare a qubit in the |0 state, which would serve as our prediction qubit |p .The quantum feature state |x = |x 1 . . .xb , as well as the prediction qubit in the |0 state serve as inputs to the quantum discriminator as shown in Figure 2. The quantum discriminator (U Θ ) is a 2B × 2B matrix, which takes as input |x and |0 , and outputs |x and the prediction |p .We now describe U Θ , which is parameterized by Since all quantum operators are unitary matrices, it is crucial that U Θ be a unitary matrix.We now show that U Θ is unitary by showing that Thus, U Θ is a unitary.

Number of binary features
We would like to extract b binary features so that the N points in the feature set span as much of the feature space as possible.This would ensure that the quantum discriminator trained as a result would be generalizable to any test data point that originates from the same distribution as the training data set.Since the size of the feature space is 2 b , we have:

Training the Quantum discriminator
Figure 3 shows the training workflow for our quantum model for classification.We are given the training data set X and the training labels Y .We compute the binary feature set X and prepare the quantum feature states |x 1 . . .|x N , where xi ∈ X, ∀i = 1, 2, . . ., N .In addition to the quantum feature states, we also prepare the |0 state.Given |x 1 . . .|x N , Y , and |0 , the goal of training the quantum discriminator is to find the model parameters Θ, that minimize a well-defined error function.Some examples of error functions used in classification tasks are the Euclidean error (or L2-norm) in linear regression, negative log likelihood error in logistic regression, and the cross entropy error in deep neural networks.Negative log likelihood and cross entropy errors are generally used in probabilistic machine learning models, whereas, L2-norm is used for non-probabilistic machine learning models.For non-probabilistic machine learning models, the L1-norm is considered to be more robust than the L2-norm (Li et al., 2015).However, L2-norm is generally used in practice because it is differentiable and amenable to gradient computations.Since our quantum discriminator is not probabilistic and does not require gradient computations, we will use the L1-norm as the error function.Thus, the training process can be stated as follows: where p i is the measured value of |p i and the output state, Algorithm 1 Training the quantum discriminator.2, where U Θ = I, by initializing all the model parameters θ j to zero (j = 1, 2, . . ., B).We then look at each feature vector xi (i = 1, 2, . . ., N ).If xi belongs to Class 1 (i.e.y i = 1), then we compute the index j as 1 + τ • xi , and set θ j = 1.We repeat this process for all N points in the training feature set X. When Algorithm 1 terminates, it assigns all points in X to their respective correct classes.
We now shed some light on why Algorithm 1 works.The input state to the quantum discriminator, |x 0 , exists in (b + 1)-dimensional Hilbert space and is in a superposition of all 2B possible states.As such, each of the B quantum feature states |x , occurs twice: as |x 0 and |x 1 .These two states can be interpreted as |x belonging to Class 0 or Class 1 respectively.While training the quantum discriminator, we select the correct class for each |x .The rows and columns of U Θ that correspond to x can be found at indices j = 1 + τ • x and j + 1, which can be leveraged to assign x to Class 0 or Class 1 respectively.Initially, the 2 × 2 submatrix at j th row and j th column of U Θ is an identity matrix because θ j is initialized to 0. If x belongs to Class 0, then this sub-matrix outputs |x 0 for the input |x 0 , which can be interpreted as x being assigned to Class 0. On the other hand, if x belongs to Class 1, then this sub-matrix must be changed to the Pauli-X gate (also called the bit-flip gate or the NOT gate), which is done by setting θ j to 1.The Pauli-X gate at j th row and j th column of U Θ outputs |x 1 for the input |x 0 , which can be interpreted as x being assigned to Class 1.

Theoretical Analysis
We analyze the time and space complexity of training the quantum discriminator here.In Algorithm 1, lines 1 and 2 require O( 1

Generalizability
By generalizability, we refer to the ability of a machine learning model to make predictions on data points not encountered during training.The quantum discriminator has the ability to classify points in the training data set with 100% accuracy owing to an exponential number of model parameters (θ 1 , θ 2 , . . ., θ B ).It is highly complex and highly susceptible to overfitting the training data.This affinity to overfit is kept in check by the feature extraction process.If the extracted binary features are good and small in number (b ≈ O(log N )), the number of model parameters are also small (B ≈ O(N )).The subsequent quantum discriminator would have a lower tendency of overfitting and would be generalizable.

2-Bit Binary Classification
As a proof of concept, we use the quantum discriminator to solve the 2-bit binary classification problem.Because of the large number of figures in this part of the study, we have explained it in Appendix A. The reader is advised to read Appendix A to gain a better understanding of the quantum discriminator and its application.

Hardware and Simulator Details
We validated the performance of the quantum discriminator on the Iris and Bars and Stripes data sets.All our experiments were conducted on the IBM Jakarta processor as well as in numerous noise-free simulations using the QASM simulator in IBM Qiskit.The IBM Jakarta quantum computer consists of 7 qubits with a total quantum volume of 16.The average CNOT and readout errors on this machine were 0.015 and 0.035 respectively.The classical computer used in the training process was a desktop workstation having Intel Core i4-4670K CPU, running at 3.4 GHz, 64-bit operating system and 16 GB RAM.

The Iris data set
The Iris data set is pervasive as a benchmark in machine learning, and is small enough to be embedded on current quantum computers.The data set contains 150 data points gathered from a sample of Iris flowers.Each data point consists of four measurements taken on the given Iris flower along with its species.The recorded attributes for each flower are petal length, petal width, sepal length, and sepal width measured in centimeters.The 150 data points are split evenly between three species of Iris: Iris-Setosa, Iris-Versicolor, and Iris-Virginica.
Using this data set, a model can be created whereby the petal/sepal lengths and widths of a given Iris datum can be used to predict its species.For the purpose of testing the quantum discriminator-which is designed for binary classification-the data set was restricted to just the Iris-Setosa and Iris-Virginica samples, which were labeled as Class 0 and Class 1 respectively.This reduced the size of the data set to 100 data points.

FEATURE EXTRACTION
Before training the quantum discriminator, binary features must be extracted from the data.In this case, three features were gathered from the data by imposing threshold values for the sepal length, sepal width, and petal length.The fourth attribute, petal width, was not considered in this feature extraction process.Specifically, our extracted data consisted 3-tuples of binary numbers, x = (a 1 , a 2 , a 3 ) ∈ B 3 , where a 1 = 1 if the sepal length was recorded to be above 5.50 cm, a 2 = 1 if the sepal width was recorded to be above 3.00 cm, a 3 = 1 if petal length was recorded to be above 3.00 cm, and each a i was set to 0 if it failed to meet these respective threshold values.This feature extraction procedure resulted in a separable data set.Moreover, the extracted data set was found to span only three-quarters of our entire binary feature space, which has a theoretical size of 2 3 = 8, i.e. 8 unique configurations of binary features.

TRAINING
Training experiments were conducted in which the 100 data points were partitioned at random into a training data set of size N , and the remaining 100−N data points were reserved for validation.This parameter N , was varied from N = 4, where at most half of feature space could be sampled, to N = 80, which corresponds to a train/test split scheme commonly used to evaluate machine learning approaches.Training was performed in accordance with Algorithm 1.
The theory behind the quantum discriminator prescribes the number of features (b) to be O(log N ).Since the size of the smallest and largest training sets in our experiments were 4 and 80 respectively, we would need to have between log 2 4 and log 2 80 binary features, i.e., 2-6 binary features.As mentioned above, we selected 3 binary features for this task, which is consistent with the assumptions of the quantum discriminator.
In each trial, a quantum circuit was constructed using IBM's Qiskit software development kit in order to evaluate our model on each point in the validation set.Specifically, for each point in validation, (x i , y), a circuit was constructed which initializes a blank quantum register of 4 qubits into the state |x i 0 before passing into a series of gates equivalent to the unitary transformation obtained from the model parameters which were extracted in training.The predicted label, p, was then recorded for comparison with its true value y.

RESULTS ON THE IRIS DATA SET
The three training sizes tested in this work are N = 80, 8, and 4. In the former two cases, experiments consisting of 300 trials each were conducted, with each experiment being  performed once in the QASM simulator and once again on the IBM Jarkarta processor, meaning there were four experiments in total for these two cases.In each trial, N data points were selected at random to form a training set, which was used to train the quantum discriminator in accordance with Algorithm 1.The trained model was used for inferencing on each point in the validation set for benchmarking purposes.Benchmarking on the N = 4 case was conducted analogously, except just one experiment was conducted (in simulation) and the number of trials was increased to 600.
In the case of N = 80, the discriminator obtained an average validation accuracy of 99.15% with a standard deviation of 1.878% in simulation.On the Jakarta processor, however, the discriminator obtained a much lower accuracy of 89.13% on average with a standard deviation of 6.97%.A histogram depicting the distribution of model accuracies across the 300 trials in both cases is depicted in Figure 4.
When the training set was lowered to size N = 8, the aver-  age validation accuracy dropped to 94.98% with a standard deviation of 9.047% in simulation, whereas the average accuracy on the Jakarta processor fell to 82.37% with a standard deviation of 8.403%.The distribution of model accuracies on validation is similarly displayed in Figure 5. Figure 6 displays a box plot of model accuracies on quantum hardware for N equals 80 and 8.
In the case of N = 4, the average model accuracy across our 600 trials fell to 84.41%, with an increased standard deviation of 15.04%; the distribution of which can be seen in Figure 7.Additional statistics were gathered in this case.Inaccurate predictions made on validation were separated into Type I (false positive) and Type II (false negative) errors.It was found that all errors made by the models belong to Type II, meaning the model had a false positive rate of 0 in each simulated trial.In other words, each simulated model had a precision of 100%.It was found that false negative rate (also called the miss rate) was 0.3074 on average with a standard deviation of 0.2934.This distribution, seen in Figure 8 was heavily skew-right; i.e. the distribution was more concentrated on the side closer to a false-negative rate of 0. Consequently, the average recall (also called sensitivity or hit rate) was 0.6925 on average with the same standard deviation.
While these results are reported for the Iris data set and should be interpreted accordingly, we believe these are still very interesting.The fact that the quantum discriminator can achieve 84% accuracy when trained on just 4% of the data can be used to build approximate machine learning models with moderately high accuracy very quickly.Such models could then be refined quickly in subsequent training iterations, thereby reducing the training times.

The Bars and Stripes Data Set
To further evaluate the performance of our quantum discriminator, a Bars and Stripes data set of size 100 was generated on a 3x3 grid.Each cell in this grid can either be illuminated or not.In this experiment, a grid is said to be a bar if it has a rectangle of illuminated cells whose width is strictly greater than its height.Similarly, a grid is said to be a stripe if it has a rectangle of illuminated cells whose height is strictly greater than its width.Accordingly, all bars and stripes viable under this formulation are displayed in Figure 9.Using this scheme, 100 samples were drawn from a uniform sample of these viable bars and stripes to form the data set used in this experiment.The bars and strips were assigned to class 0 and 1 respectively.The Bars and Stripes experiments were run in simulation only.

METHODOLOGY
Nine binary features were extracted from each data point by reading the cells of each grid in lexicographic order, recording a one if the given cell is illuminated and a zero otherwise.Two experiments were conducted on the Bars and Stripes data set in a similar fashion to the Iris data set in Section 6.2.Specifically, 300 trials (each) were conducted in noise-free simulation whereby the data set was randomly partitioned into a training set of size N and the remaining data were reserved for validation of the resulting model; the training sizes used in this case were N = 80 and N = 11.The latter quantity was chosen as it is the minimal number of points needed to sample all of the Class 1 data in our binary feature space.

RESULTS ON THE BARS AND STRIPES DATA SET
In the case of N = 80, the discriminator obtained an average validation accuracy of 98.38% with a standard deviation of 3.647% in simulation.When N was lowered to 11, the average validation accuracy dropped to 71.02% with a standard deviation of 6.765% in simulation.Figure 10 displays how the distribution of model accuracies on validation varies in simulation with this change in N .These results were very similar to the Iris data set.

Discussion
The Quantum Discriminator fared extremely well in both of the cases where the size of the feature set was O(log N ) for the Iris as well as the Bars and Stripes data sets.There was a noticeable gap (≈ 10%) in both Iris cases (N = 80 and 8) between the discriminator's performance in and on the IBM Jakarta processor.This discrepancy can be attributed to the fact that the state of the qubit as well as the inter-qubit connections on the quantum hardware are extremely sensitive to any kind of noise (thermal, radiation, vibrational etc.) in today's quantum computers.Consequently, it is extremely difficult to maintain the qubit states for longer periods of time, and the qubits have a tendency to lose their state during the computation-this is called decoherence.Qubit decoherence is known to result in considerably poor performance on the hardware as compared to simulation for many quantum algorithms.
To mitigate this issue of decoherence, error correction regimes are often incorporated into quantum algorithms.However, no error-correction was performed in this experiment as it adds considerable overheads in terms of both time and space complexity.It did not seem warranted considering that only four qubits were required for the Iris experiments, and only ten qubits were required for the Bars and Stripes experiments under our feature extraction regime.It stands to reason that the disparity in performance of the simulated and real-world models would increase dramatically as the number of qubits required for inferencing increases when working with today's quantum computers.Having said that, it is expected that with hardware and engineering improvements in the future, quantum computers would become less noisy, more reliable, and larger in size.These fault-tolerant quantum computers are expected to run quantum algorithms at par with (if not better than) the current simulation results.In both data sets, it was observed that the quantum discriminator can obtain moderately high accuracies even when the training set is sparse, meaning approximate models can be quickly and efficiently generated for subsequent refinement.

Conclusion
Alternative computing paradigms, such as quantum computing, present a new frontier in which novel machine learning techniques can be developed to address the current limitations of classical learning techniques.In this work, we outline a quantum machine learning technique for binary classification called the quantum discriminator.The quantum discriminator is used as a discriminant function to infer the label of a given datum from its extracted features.The quantum discriminator is a 2B × 2B unitary matrix, which is parameterized by B parameters.It can be trained in O(N log N ) time, using O(N log N ) classical bits and O(b) qubits.Inferencing on the quantum discriminator can be performed in O(N ) time using O(b) qubits.We demonstrated that this model can be used to completely solve the 2-bit binary classification problem as outlined in Appendix A. Furthermore, we evaluated its performance on the Iris data set, which is a benchmark machine learning data set, and also on a 3x3 Bars and Stripes data set.This empirical evaluation demonstrates the discriminator's potential to generate highly accurate, inherently precise models on a separable data sets when appropriate binary features have been extracted from the data.
In our future work, we would like to evaluate the discriminator's performance on problems involving larger, more complex data sets, such as MNIST.We would also like to extend the quantum discriminator to multi-class classification problems.Lastly, we would like to investigate purely quantum training algorithms for training the quantum discriminator as opposed to the hybrid quantum-classical training algorithm described in this paper.To demonstrate a proof of concept for the proposed quantum discriminator, we use it to classify 2-dimensional binary data points.There are four such data points: (0, 0), (0, 1), (1, 0) and (1, 1), located on the corners of a unit square.There are 2 4 = 16 ways of classifying these points into two classes as shown in Figure 11.For example, one such way could be: (0, 0) and (0, 1) belong to Class 0, and (1, 0) and (1, 1) belong to Class 1, as shown by Case 10 in Figure 11.In Figure 12, we show the quantum circuits that can classify each of the 16 cases in Figure 11.In Table 1, we present the unitary matrices of the quantum discriminator models that classify each of the 16 cases in Figure 11.
We now elaborate Case 4 from Figure 11.In Case 4, the points (0, 0), (0, 1) and (1, 1) belong to Class 0, while the point (1, 0) belongs to Class 1.The corresponding quantum circuit (Case 4 in Figure 12) used for classification takes as input the two quantum data features (|x 1 and |x 2 ) as well as the prediction qubit initialized to |0 .Next, it applies the Pauli-X gate on |x 2 , followed by a Toffoli gate (also called the CCNOT gate) with |x 1 and |x 2 as the control bits and |0 as the target bit, followed by another Pauli-X gate on |x 2 .
The corresponding unitary operator (U Θ ) is shown in the Case 4 of Table 1.It looks like an identity matrix with one caveat.The 2 × 2 sub-matrix starting at the fifth row and fifth column is a Pauli-X operator instead of a 2 × 2 identity matrix.Note that the first and second rows/columns of all U Θ shown in Tables 1 correspond to the data point (0, 0).Similarly, third and fourth rows/columns correspond to the data point (0, 1), fifth and sixth rows/columns correspond to (1, 0), and seventh and eighth rows/columns correspond to (1, 1).The starting indices (1, 3, 5 and 7) of these 2 × 2 sub-matrices can be computed from the equation on line 7 of Algorithm 1.For each data point in each of the 16 cases, we can independently update the classification operator (U Θ ) so that it correctly classifies the said data point.In this way, the quantum discriminator is able to achieve near perfect accuracies on binary classification problems, provided appropriate binary features have been extracted from the data.
Table 1.Unitary matrices for each case of the 2-bit binary classification problem.

1
Algorithm 1 outlines the training process for the quantum discriminator.The inputs to the model are the feature vectors x1 , x2 , . . .xN , and the training labels y 1 , y 2 , . . ., y N .We first initialize b and B. The length(z) function computes the length of z.We then set the vector τ = [2 b−1 , 2 b−2 , . . ., 2 0 ].Next, we setup the quantum circuit shown in Figure ) time and line 3 requires O(b) time.It may seem that line 4 requires O(B) time, but initializing θ j to 0 essentially refers to setting up the quantum circuit with U Θ = I.This entails setting up b + 1 qubits, which takes O(b) time.Computing the dot product on line 7 may take O(b) time and setting θ j to unity on line 8 takes O(1) time.Since we may repeat lines 7 and 8 N -times in the worst case, the time complexity of Algorithm 1 is O(N b), which is the same as the size of the feature set X. Since b is O(log N ) from Section 4.1, the time complexity is O(N log N ).Since we use O(N b) classical bits for storing X, Y and τ , and computing the dot product on line 7, the space complexity of Algorithm 1 is also O(N log N ).The qubit footprint of Algorithm 1 is O(b) because we use b + 1 qubits.Thus, it is possible to train the quantum discriminator shown in Figure 2 in O(N log N ) time, using O(N log N ) classical bits and O(b) qubits.Subsequently, inferencing can be performed on a given input using the quantum circuit seen in Figure 2 in O(N ) time using just O(b) qubits.

Figure 4 .
Figure 4. Histogram of validation accuracies on hardware and simulator in the case of N = 80 for the Iris data set across 300 trials each.

Figure 5 .
Figure 5. Histogram of validation accuracies on hardware and simulator in the case of N = 8 for the Iris data set across 300 trials each.

Figure 6 .
Figure 6.Box plot of model accuracies on the IBM Jakarta processor for N = 80 and N = 8 across 300 trials each.

Figure 7 .
Figure 7. Histogram of validation accuracies on simulator in the case of N = 4 for the Iris data set across 600 trials.

Figure 8 .
Figure 8. Box plot of model accuracies on simulator across 600 trials on the IBM Jakarta processor for N = 4

Figure 9 .
Figure 9.A diagram depicting all possible bars (on the left) and stripes (on the right) in our data set.Here, cells are illuminated in green.Bars and stripes are assigned the classes 0 and 1 respectively.

Figure 10 .
Figure 10.Histogram of validation accuracies on simulator in the case of N = 80 and 8 for the Bars and Stripes data set across 300 trials each.

Figure 11 .
Figure 11.16 cases of the 2-bit binary classification problem.

p) Case 16 Figure 12 .
Figure 12.Quantum circuits for each case of the 2-bit binary classification problem.
. Blance and Spannowsky propose a variational quantum classifier for use in high energy physics applications (Blance & Spannowsky, 2021).Shingu et al. propose a variational quantum algorithm for Boltzmann machine learning (Shingu et al., 2021).Benedetti et al. propose quantum parameterized circuits as machine learning models.Quiroga et al. propose a quantum k-means (QK-Means) clustering technique to discriminate quantum states on the IBM Bogota quantum device