## Introduction

Noisy Intermediate-Scale Quantum (NISQ)1,2,3 devices hold a promise to deliver a practical quantum advantage by harnessing the complexity of quantum systems. Despite being several years away from having fault-tolerant quantum computing4,5,6, researchers have been hopeful to achieve this task. Perhaps one of the most exciting breakthroughs in this direction was a demonstration of “quantum supremacy” by Google researchers7, using their programmable superconducting Sycamore chip with 53 qubits, in which single-qubit gate fidelities of 99.85% and two-qubit gate fidelities of 99.64% were obtained on average. Here the task of sampling the output of a pseudo-random quantum circuit was successfully achieved. Quantum Supremacy would imply that a universal quantum computer has the ability to perform certain tasks exponentially faster than a classical computer8. However, it has been argued later that Google’s achievement amounted to a demonstration of a quantum advantage but not a practical advantage, in other words, the performed task was not useful for any real-life applications. Another quantum advantage breakthrough experiment has been implemented9 utilising a Jiuzhang photonic quantum computer and performing Gaussian boson sampling (GBS) with 50 indistinguishable single-mode squeezed states. Here, quantum advantage has been elucidated in the sampling time complexity of a Torontonian matrix, which has exponential scaling with output photon clicks. However, this experiment demonstrates quantum advantage but fails to demonstrate quantum supremacy as this photonic quantum computer is not programmable. One of the most promising areas of research to obtain practical advantage is Quantum Machine Learning10,11,12 which was born as a result of cross-fertilisation of ideas between Quantum Computing13,14 and Classical Machine Learning15,16. QML in its spirit is similar to classical machine learning but with the main difference being that instead of classical neurons in the layers of a deep neural network, now we have qubits and quantum gates acting on qubits combined with quantum measurements playing the role of the activation function. The elegant field of QML has been providing a new platform for devising algorithms that exhibit quantum speedups. For instance, it has been demonstrated that such basic linear algebra subroutines as solving certain types of linear equations (the quantum version is known in the community as HHL), finding eigenvectors and eigenvalues, principal component analysis (PCA) exhibit exponential speedups compared to their classical counterparts17,18,19,20,21.However, in the recent findings Ref.22 demonstrated that in case of PCA suggested Lloyd, Mohseni, and Rebentrost’s the quantum algorithm attaining the exponential speedup was simply an artifact of state preparation assumptions. Since we are dealing with a quantum system, one can utilise such quantum resources as coherence, entanglement, negativity, contextuality to leverage towards achieving practical advantage. However, it is still not completely understood what the role of different types of resources is in harnessing practical advantage from available 50 to 100 qubit noisy devices3. The three main building blocks of any QML algorithm are data encoding, unitary evolution of the system followed by the state readout performed through the measurement12. Uploading classical data in the quantum computer is not a trivial task and can account for most of the complexity of the algorithm, determining what kind of speed-ups are feasible. This procedure is called quantum embedding which can be achieved, for instance, with help of “quantum feature maps”23,24,25,26,27 which take classical data and map it to the high-dimensional Hilbert space, where one hopes to achieve higher separation between the data classes compared to the original coordinate system. Moreover, one can train the quantum embedding to achieve maximal separation between the data clusters in the Hilbert space (this approach has been coined as “quantum metric learning”)26,27, paving the way towards constructing faithful quantum classifiers.

Binary classification is a ubiquitous task in machine learning. Perhaps the most prominent example is the cat recognition algorithm, which gives a flavour of the power brought by utilising such basic tools as logistic regression combined with deep neural network architectures15. Quantum classifiers hold a promise to bring feasible speedups compared to their classical counterparts. Several theoretical proposals combined with actual experimental runs on commercially available backends have been put forward for realising faithful quantum classifiers23,24,28,29,30,31,32,33,34,35,36,37,38. For instance, approaches in Refs.36,37 are inspired by kernel methods used in classical machine learning. Refs.23,28,29 are combining certain types of quantum embeddings to achieve quantum hybrid neural networks, which are promising candidates for building a faithful classifier. Ref.30 suggests using hypergraph-states39, where the assumption is that such states can lower the circuit depth of the classifier. Refs.32,33 are based on quantum Grover’s search algorithm.

In this manuscript, we take a rather pragmatic approach and try to benefit from a plethora of available QML software packages40,41,42,43,44, which grant access to run the quantum circuit in the quantum simulator or an actual hardware (such as IBM Quantum Experience, Amazon Braket, Rigetti Computing, Strawberry Fields). By utilising these tools we provide new software that is particularly well suited for targeting classification problems in the unbalanced and noisy datasets which are prevalent in the financial industry45.

In this paper at first we briefly outline and review three different necessary building block QML architectures for our software package: hybrid-neural networks23,28,29, parametric quantum circuits2,46,47,48 and data-reuploading 24,25.

The metric we use for assessing the performance of our quantum classifiers is the area under the receiver operating characteristic curve AUC–ROC . ROC is a probability curve and AUC represents the degree of separability. In general a good model has AUC close to 1. We test our FULL HYBRID models and benchmark them against existing QML classifiers and also to the best known classical machine learning counterparts by running simulations on quantum simulators for three different 2-dimensional non-convex surfaces. It is believed that non convex boundaries represent more difficult classification problems as linear regression is bound to fail in this tasks. Then by introducing asymmetrical Gaussian noise we study the resilience of our different approaches to the noise. This kind of study sheds light on learning properties for the amount of noise in the dataset. We also perform systematic hyperparameter tuning by studying how AUC–ROC curve changes with the number of repeating units in the data-re-uploading approach, number of qubits, batch size, number of epochs and number of strongly entangling units. We remark, that our binary classifiers can be extended to multi-class classification problems using a one-versus-all approach.

## Results

### Problem setting

We consider a non-trivial classification problem and will train single and multi-qubit variational quantum circuits to achieve this goal. The data is generated as a set of random points in a plane $$x_{1},x_{2}$$ and labelled as 1 (blue) or 0 (red) depending on whether they lie inside or outside of a given 2-dimensional non-convex figure. The goal is to train a quantum circuit to predict the label (red or blue) given an input point’s coordinate.

### Comparative study of different quantum and classical classifiers

Here we test several models (including our proposed models) and benchmark them against each other as well as to the best-known classical machine learning counterparts by running on the simulator backends (such as Aer in qiskit) for 2-dimensional and 3-dimensional non-convex datasets. Then we will study the resilience of our different approaches to the noise by introducing asymmetrical Gaussian noise by studying the prediction grids and AUC–ROC characteristics.

This kind of study sheds light on the learning properties as a function of the amount of existing noise in the dataset. These results have been obtained by systematic hyperparameter tuning, by observing how the AUC–ROC curve changes with: the number of repeating units in the data-re- uploading approach, batch size, number of epochs and the number of strongly entangling units.

To produce datasets with noise, we introduce asymmetrical (here noise is only applied to one class) Gaussian noise (N). In bottom of Fig. 1 we plot the case of N = 0.0 , N = 0.6 and N = 1.2. Each dataset has 6000 data points and is further equally split into training and testing datasets.

Here we would like to refer readers to the respective subsections of the “Methods” section, for a detailed description of different types of quantum classifiers referred as DRC (data-reuploading classifier), VC (variational classifier “Variational Quantum Algorithms (VQA)” section), VC–DRC (variational classifier combined with data-reuploading one), QNODE (quantum node, see  “QNode” section) and our newly designed FULL HYBRID circuit architectures referred as FH: VC–DRC/NN and FH: NN/VC–DRC.

To demonstrate the power of the data-reuploading technique combined with the variational classifier in the VC–DRC model, we plot the AUC–ROC curve versus noise for different number of blocks. The results are shown in Fig. 2. It is apparent from Fig. 2 (left) that with an increasing number of repeating blocks, we get better AUC–ROC curve for every noise level for the DRC classifier. On the right of Fig. 2 we show results for the VC–DRC where compared to DRC we get even higher AUC–ROC curve. We remark that no major improvements are seen for a Block number greater than six. From now on, in all codes of this section, we will set the number of blocks equal to six (B = 6). In what follows we specify number of blocks and layers for each classifier: 1) The single qubit DRC (B = 6) 2) 2 qubit VC (with 6 layers, L = 6) 3) VC–DRC (B = 6, L = 1) 4) QNode (B = 6, L = 1) 5) FH: VC–DRC/NN (B = 6, L = 1) 6) FH: NN/VC–DRC (B = 6, L = 1). All models have been trained for maximum 35 epochs, using the same optimizer and learning rate. The best result during the training process is shown. On the left Fig. 3 we compare all the previously mentioned classifiers. As we can see from on the left Fig. 3 VC–DRC outperforms both VC and DRC. VC–DRC and Qnode have almost identical performance. The FH:NN/VC–DRC outperforms all classifiers whilst FH:VC–DRC/NN has slightly worse behavior. In the right of Fig. 3 we can see the prediction grids for all classifiers at different noise levels. For low noise levels (Noise/10 = 0), DRC and VC struggle to capture the prediction grid pattern while VC–DRC and FH almost capture it. For medium noise levels (Noise/10 = 6), DRC tends to capture the noise (overfitting) while VC looks more stable. VC–DRC still captures the main pattern but also shows signs of overfitting. FH performs very well thanks to the classical preprocessing and utilising the power brought by VC–DRC. For high noise levels (Noise/10 = 12) FH captures the pattern and shows robustness to the noise while the rest of the classifiers are capturing the noise. In order to demonstrate that FULL HYBRID does not perform well only because of the strong classical NN attached to the quantum circuit, we benchmark FH versus just the classical part (NN) and versus just the Quantum part (QNode). From Fig. 4 onwards we show results for two NN’s one with 35 epochs training (same training epochs as in the FH) and 3000 epochs to see what is the best outcome this NN can produce. We conclude that the FH outperforms both it’s components (NN and QNode) which shows that FH is more powerful classifier than it’s isolated parts.

To test even further the FH classifier, we benchmark its performance against a great number of classical counterparts, which are specified in the inset of the Fig. 5. Interestingly, this figure shows that in the high noise region, the quantum classifier appears to outperforms some classical ones, at least performing equally well in all noise regions. We also see that compared to the other classical approaches (QDA, Decision tree, KNN and Random forest) that are well suited for non-convex classification problems and showing good performance in all noise regimes. In Fig. 6 we are showing results for a more complicated non-convex classification problem versus noise. In the table on the right we summarize the highest AUC–ROC curve scores for the respective classifiers. In the left figure we show prediction grids for the respective quantum classifiers. As in previous case, VC is more stable to noise and DRC tends to overfit and explores richer prediction grids. That is why VC–DRC, which combines both features, and the more complex approach like FH, is giving great results as apparent from row number 6. Surprisingly, for this particular dataset FH: NN/VC–DRC fails to capture the pattern of the dataset while FH: VC–DRC/NN captures the pattern and has the highest AUC–ROC score. It should be noted that the FH models outperforms again both it’s components (NN and QNode).

## Discussion

In this paper, we applied Quantum Machine Learning frameworks to improve binary classification models for noisy datasets which are prevalent in financial markets. The metric used for assessing the performance of our quantum classifiers is the area under the receiver operating characteristic curve AUC–ROC curve. By combining such approaches as hybrid-neural networks, parametric circuits, and data re-uploading we created a new approach called Full Hybrid (FH). We tested our models for the classification of 2 and 3-dimensional non-convex datasets and benchmarked them against each other as well as to the best known classical machine learning counterpart by running simulations on quantum simulators. Then, by introducing asymmetrical Gaussian noise in the input datasets, we studied the resilience of our different approaches to noise. This kind of study sheds light on the learning efficacy to the amount of noise in the dataset. In the scope of the manuscript we also performed systematic hyperparameter tuning by studying how AUC–ROC curve changes with the number of repeating units in the data-re-uploading approach, number of qubits, batch size, number of epochs and number of strongly entangling units.

An extensive benchmarking of our new QML approach against existing quantum and classical classifier models reveals that our novel (FH) models exhibits better learning properties with asymmetric Gaussian noise in the dataset compared to known quantum classifiers, and performs equally well or possibly better for existing classical counterparts. Yet more understanding of the merits of the (FH) classifier has been gained by a detailed analysis and comparison of the prediction grids for the VC, DRC, VC–DRC, QNode binary classifiers. We observed that for low noise levels , DRC and VC struggle to capture the prediction grid pattern while VC–DRC and FH almost fully capture it. For medium noise levels, DRC tends to capture the noise (overfitting) while VC looks more stable. VC–DRC still captures the main pattern but also shows signs of overfitting. FH performs very well thanks to the classical preprocessing and utilising the power brought by VC–DRC. For high noise levels, (FH) captures the pattern and shows robustness in noise while the rest of the classifiers are capturing the noise in the dataset.

It is a well conceived fact that one of the bottlenecks for VQAs is the phenomenon called “barren plateau”49. As it has been demonstrated in Ref.49, a given spin-spin interacting Hamiltonians cost function may exhibit a barren plateau, associated with exponentially vanishing variance in its first derivative, when one increases the number of qubits. Moreover, the VQE based algorithms perform a classical–quantum feedback loop to update the parameters of the parametric quantum circuits. For future studies, it would be interesting to implement non-VQA algorithms for building more efficient quantum classifiers. By the time a classical computer calculates its output, the classical–quantum feedback loop limits the efficiency of the quantum device, slowing the algorithm execution on current cloud computing frameworks. Most of the obstacles faced by VQE, such as the barren plateau issue49 as well as lacking a systematic method to select the ansatz and the innate necessity of having controlled unitaries, have been recently tackled by suggesting a quantum assisted simulator (QAS)50,51. Remarkably, The QAS algorithm does not require any classical–quantum feedback loop, can be parallelized, alleviates the barren plateau problem by prescribing a systematic approach to constructing the ansatz, and is not based on the usage of complicated unitaries.

Of course, for the future studies, one has to keep in mind that sensitivity to errors and noise in qubits and quantum gates are the two most prominent obstacles towards scalable universal quantum computers. Given that, it would be nice to study how our results are affected if one implements noise models for realistic quantum backends. In general, a noisy quantum system is described by the open system model and systems dynamics within the Born-Markov approximation is governed by the Lindblad master equation for the system’s density matrix52. Another approach to describe the different noise channels is based on Kraus operators which are the most general physical operations acting on density matrices13.

It has been elucidated that sensitivity to input errors such as adversarial robustness is a severe problem in quantum classifiers Refs.53,54.Robustness of our FULL Hybrid architecture is a topic of future investigation, however as it has been demonstrated in Ref.54 practical quantum classification tasks classify a subset of encoded states with some commonly used qubit encoding scheme(Which is indeed the case as in the current article we have used angle embedding for the data encoding). For such tasks, the authors have shown that one can use the concentration of measure phenomenon to derive the robustness of any quantum classifiers in situations where the distribution of states to be classified can be smoothly generated from a Gaussian latent space.

Since most of our codes were based on PennyLane, it is instructive to mention that Pennylane has 3 different ways for implementing noise in quantum circuits: classical parametric randomness, PennyLane’s built-in default.mixed device, and plugins for other platforms. Of course, Quantum circuits may be run on a variety of backends, some of which have their own associated programming languages and simulators. PennyLane interfaces to these other languages via plugins such as for Cirq and Qiskit.

Finally, it is also worth mentioning that we plan to test our classifiers on real world financial data. Here we hope to demonstrate that our proposed classifiers have the potential to improve credit scoring accuracy. Credit scoring provides lenders and counterparties better transparency of the credit risk they are taking when dealing with a counterparty. For large companies, this transparency is provided by public credit ratings. Small and medium enterprise companies(SMEs) are not covered by rating agencies and are suffering from reduced availability of credit. These datasets, along with the best classical neural networks, will by provided by the company called Tradeteq (Tradeteq is a value-added service provider to the Networked Trading Platform (NTP) of Singapore).

In summary, we have demonstrated that the FH architecture outperforms several previously known quantum classifiers along with some of the best known classical counterparts. Interestingly, in the FH: VC–DRC/NN case, the power of the approach is given by the fact that, the VC–DRC part is acting as quantum embedding.

## Methods

### Review of existing QML frameworks

In this section we briefly review three different necessary building block QML architectures for our software package : hybrid-neural networks23,28,29, variational circuits2,47 and data-reuplodaing24,25.

#### Hybrid classical–quantum classifier (Hybrid)

Recent findings of the Ref.55,56 on applying Hybrid quantum based memory-centric and heterogeneous multiprocessing architecture, have revealed the practical advantage of hybrid algorithms compared to standard classical algorithms in both the computational speed and quality of the solution.These findings encapsulate a strong motivation for studying hybrid classical–quantum architectures for obtaining practical advantage.

Hybrid neural networks are formed by concatenating classical and quantum neural networks and can bring a great advantage by having a number of features in the initial classical layers that exceeds the number of qubits in the quantum layer. Normally we assume that in each layer we have one qubit for each feature and a sequence of one and two-qubit gates acting on it.

To create a quantum-classical neural network, a hidden layer is normally implemented utilising a parameterized quantum circuit (Fig. 7). By “parameterized quantum circuit”, we mean a quantum circuit where, for instance, the rotation angles for each gate are trainable parameters, specified by the components of a classical input vector. The outputs from our neural network’s previous layer will be collected and used as the inputs for our parameterized circuit. Normally measurement statistics at the end of the quantum circuit would be fed into the subsequent classical neural network layer. Notice that this kind of approach establishes a link between the classical and quantum neural networks. An important point to note is that a single qubit classifier generates no entanglement, and can therefore be simulated classically. If one hopes to achieve a quantum advantage using hybrid neural networks, one needs to introduce several qubits and consequently entangle them, harnessing that quantum resource.

#### Variational Quantum Algorithms (VQA)

Variational circuits are quantum circuits that have learning parameters that are optimised through classical learning subroutines, in spirit, this kind of approach is reminiscent of a Variational Quantum Eigensolver (VQE)2,47.

As schematically shown in Fig. 8, the first step towards developing a VQA is to define a cost or loss function C which encompasses the solution to the problem. After that, an ansatz is introduced through the quantum operation depending on a set of continuous or discrete parameters that can be optimized. This ansatz is then trained in a hybrid quantum-classical loop to solve the optimization task at hand

\begin{aligned} \theta ^{*}=\arg \min _{\theta } C(\theta ). \end{aligned}
(1)

The trademark of VQAs is that a quantum computer is utilised to estimate the cost function $$C(\theta )$$ while harnessing the power of classical optimizers for training the quantum parameters. A rather crucial assumption here is that one cannot efficiently compute the cost function on the classical computer, as this would imply an absence of quantumm advantage in the VQA framework.

Data re-uploading is a subclass of quantum embedding which is realised by catenating repeating units in a row. Single-qubit rotations applied several times along the circuit generate the necessary non-linearity for engineering a functional neural network. Moreover, it has been demonstrated that a single qubit can realise both being a universal quantum classifier24 and being a universal approximant25.

To load $$[x_{1},x_{2}]$$ into the qubit, we just start from some initial state vector, $$|0 \rangle$$, apply the unitary operation $$U(x_{1},x_{2},0)$$ and end up at a new point on the Bloch sphere. Here we have padded 0 since our data is only 2-dimensional. Authors of Ref.24 discuss how to load a higher dimensional data point $$[x_{1},x_{2},x_{3},x_{4},x_{5},x_{6}]$$ by breaking it down into sets of three parameters $$(U(x_{1},x_{2},x_{3},U(x_{4},x_{5},x_{6})$$.

After the data loading stage, we want to have some trainable non-linear model analogous to a deep neural network with a non-linear activation function where one can learn the weights of the model. Fig. 9 are showing how data reuploading is implemented by the sequence of B repeating units which correspond to the layers of classical neural networks, consequently one expects that with increasing B one gets a deeper neural network and consequently better learning can be obtained. Each unit is realised as a product of two unitaries $$U(x_{1},x_{2},0)$$ and $$U(\theta _{1},\theta _{2},\theta _{3})$$, where the second unitary contains the trainable parameters. This approach can be boosted by introducing strongly entangling layers through utilisation of CNOT gates as it is shown on Fig. 10.

As it has been mentioned in the previous section, one can also speculate that multiple qubits with an entanglement between them could provide some quantum advantage over classical neural networks.

### FH:NN/VC–DRC and FH:VC–DRC/NN Full hybrid neural networks enriched with Variational and data-reuploading technics

#### VC–DRC

A Variational classifier circuit (VC) consists of a data embedding layer which in turn loads the classical data into the qubits followed by the entangling layers (CNOT gates that entangle each qubit with its neighbour) and the measurement outcome is the expectation value of a Pauli observable for each qubit59. In our case we use an angle embedding $$R_x$$. In order to combine a VC circuit with DRC technique we define as one block (B) a sequence of data embedding and entangling layers (L). By adding many blocks we re-introduce the input data into the model. In Fig. 11 we illustrate such VC–DRC circuit for B = 2 , L = 1.The trainable parameters are the Rotational gates $$R_x$$ and R in the Angle embedding and Entangling layers of each block respectively.

#### QNode

Pennylane is an open-source software framework for differentiable programming of quantum computers. All our models are builded using this framework. In Pennylane an object QNode represents a quantum node in the hybrid computational graph. Here a quantum function is used to create a quantum node, or QNode object, encapsulating the quantum function (corresponding to a variational circuit) and the device used to execute the function.

Here we would like to clarify what we call a QNode in the scope of the current manuscript. As depicted in the first row of the Fig. 12, a QNode is a specific circuit where input data are passed to the quantum Node which consists of a VC–DRC and a final classical decision layer.

The input classical data is passed into the quantum circuit as rotation angles $$R_x$$ (“angle embedding”) on the Bloch sphere. After the computation on the quantum node is completed, measurement is performed and the outcome is passed to the classical decision layer which decides the final prediction label of the binary classifier.

#### FH:NN/VC–DRC and FH:VC–DRC/NN

In this section we propose two varieties of a new binary classifier architectures which which are named under the common name Full Hybrid (FH). On a basic level, FH consists of a VC–DRC combined with classical layers. We came up with two novel architectures for the FH circuits depending on whether the VC–DRC circuit is at the end or at the beggining of the model(named FH: NN/VC–DRC and FH: VC–DRC/NN respectively). These architectures are extensively studied in the current manuscript as novel candidates for performing binary classification on noisy datasets (See second row of Fig. 12). Moreover, we demonstrate in great detail, in the next section, that FH architectures outperforms several previously known quantum classifiers and performs equally well compared to classical counterparts. We comment that, in the FH:VC–DRC/NN case the power of the approach is given by the fact that the VC–DRC part can be acting as a quantum embedding as evidenced by Refs.26,27. In this case the goal is to derive the angle embedding for which the separation of the data labels is maximized in the Hilbert space. Mean absolute error (MAE) is the loss function we used. The loss is the mean overseen data of the absolute differences between true and predicted values. In what follows, we provide more technical details on the Full Hybrid architectures with an emphasis on explaining and giving more details on the Master, Feeding and Decision classical layers which are depicted in the second row of the Fig. 12.

For FH: NN/VD-DRC the first part is a classical neural network(NN), followed by the VC–DRC circuit and a final decision layer, which is just a single neuron layer with a sigmoid activation function. We use a classical NN that is not fine-tuned for this specific classification task. Moreover, as it can be seen on Fig. 12, the classical NN can contain an arbitrary number of layers, and each layer can contain an arbitrary number of neurons but the last layer (Feeding classical layer) should always have the same number of neurons as number of qubits. In our 2D case the classical NN, consists of a 2-neuron layer with ReLU as the activation function (Master classical layer), followed by a 2-neuron layer with a Leaky ReLU activation function (Feeding classical layer).

For FH: VC–DRC/NN the first part is a VC–DRC circuit, followed by the previously described NN network and the same final decision layer. We remark that we also tried Sigmoid , tanh and general geometric functions, and the best performing activation functions were selected.