Can Quantum Computers Learn Like Classical Computers? A Co-Design Framework for Machine Learning and Quantum Circuits

Despite the pursuit of quantum supremacy in various applications, the power of quantum computers in machine learning (such as neural network models) has mostly remained unknown, primarily due to a missing link that effectively designs a neural network model suitable for quantum circuit implementation. In this article, we present the ﬁrst co-design framework, namely QuantumFlow, to ﬁxed the missing link. QuantumFlow consists of a novel quantum-friendly neural network (QF-Net) design, an automatic tool (QF-Map) to generate the quantum circuit (QF-Circ) for QF-Net, and a theoretic-based execution engine (QF-FB) to efﬁciently support the training of QF-Net on a classical computer. We discover that, in order to make full use of the strength of quantum representation, data in QF-Net is best modeled as random variables rather than real numbers. Moreover, instead of using the classical batch normalization (which is key to achieve high accuracy for deep neural networks), a quantum-aware batch normalization method is proposed for QF-Net. Evaluation results show that QF-Net can achieve 97.01% accuracy in distinguishing digits 3 and 6 in the widely used MNIST dataset, which is 14.55% higher than the state-of-the-art quantum-aware implementation. A case study on a binary classiﬁcation application is conducted. Running on IBM Quantum processor’s “ibmq_essex” backend, a neural network designed by QuantumFlow can achieve 82% accuracy. To the best of our knowledge, QuantumFlow is the ﬁrst framework that co-designs both the machine learning model and its quantum circuit.


Introduction
In the past decade, deep neural networks 1-3 have become the mainstream machine learning models, and have achieved consistent success in numerous of Artificial Intelligence applications, such as image classification [4][5][6][7] , object detection [8][9][10][11] , and natural language processing [12][13][14] . The key factor is the significantly improved prediction accuracy by making networks deeper, known as deep learning; however, with the growing depth of neural networks, the storage and computation requirement sharply increases 15 , which gradually becomes the performance bottleneck in classical computers (e.g., the well-known memory wall issues 16 ). Among all computing platforms, the quantum computer is one of the most promising ones to address such challenges 17,18 to act as a quantum accelerator for deep neural networks [19][20][21] . Unlike classical computers with N digit bits to represent 1 N-bit number at one time, quantum computers with M qbits can represent 2 M M-bit numbers and manipulate them at the same time 22 . Recently, a machine learning programming framework, TensorFlow Quantum, has been proposed for quantum computer 23 ; however, how to exploit the power of quantum computing in deep learning is still remained unknown.
One of the most challenging obstacles to implementing deep learning algorithms on a quantum computer is the missing link between the designs of the neural networks and the corresponding quantum circuits. The existing works sepa-rately design neural network and quantum circuits from two directions. The first direction is to map the existing neural networks designed for classical computers to quantum circuits; for instance, recent works [24][25][26][27] map McCulloch-Pitts (MCP) neurons 28 onto quantum circuits. Such an approach can take full use of the traditional innovations in machine learning (e.g., the stochastic gradient descent in training model), but has difficulties in consistently mapping the trained model to quantum circuits. For example, it needs a large number of qbits to realize the multiplication of real numbers. To overcome this problem, some existing works [24][25][26][27] assume binary representation (i.e., "-1" and "+1") of activation, which cannot well represent data as seen in modern machine learning applications. For instance, in computer vision related applications, data in images are commonly represented as real numbers. In addition, some typical operations in machine learning algorithms cannot be implemented on quantum circuits, leading to inconsistency. For example, to enable deep learning, batch normalization is a key step in a deep neural network to improve the training speed, model performance, and stability; however, directly conducting normalization on the output qbit (say normalizing the qbit with maximum probability to probability of 100%) is equivalent to reset a qbit without measurement, which is simply impossible. In consequence, batch normalization is not applied in the existing multi-layer network implementation 25 .  Figure 1. QuantumFlow, an end-to-end co-design framework, provides a missing link between machine learning and quantum circuit designs, which consists of: (a) four sub-components QF-Net, QF-FB, QF-Circ, QF-Map that work collaboratively to design neural network and quantum implementations; (b) a data representation of data sample (e.g., from the MNIST dataset) using random variables following a two-point distribution; (c) a quantum-friendly neural network with batch normalization.
The other direction is to design neural networks dedicated to quantum computers, like the tree tensor network (TTN) 29,30 . Such an approach has the potential to exploit quantum advantages but suffers from scalability problems. More specifically, it lacks the efficient forward/backward propagation procedure on classical computers, leading to a 2-layer neural network to take hundreds of CPU days to train a model. As such, it is simply intolerant for training a larger network, which limits the scale of networks. The effectiveness of machine learning algorithms is based on a trained model via the forward and backward propagation on large training sets. However, it is too costly to directly train one network by applying thousands of times forward and backward propagation on quantum computers; in particular, there are limited available quantum computers for public access at the current stage. An alternative way is to run a quantum simulator on a classical computer for training the models for quantum circuits, but the time complexity of quantum simulation is O(2 m ), where m is the number of qbits. This significantly restricts the trainable network size for quantum circuits.
To address all the above obstacles, we claim that it is demanded to take quantum circuit implementation into consideration when designing neural networks. This paper proposes the first co-design framework, namely QuantumFlow, where four sub-components (QF-Net, QF-FB, QF-Circ, and QF-Map) work collaboratively to design a neural network and implement it to a quantum computer, as shown in Figure 1(a).
QF-Net is a novel quantum-friendly neural network. In QF-Net, we discover that to take full advantage of the quantum representation, the data in a neural network, instead of being treated as real numbers, are best modeled as random variables following a two-point distribution as shown in Figure  1(b). Neural Computation (NC), one key operation in QF-Net, is designed based on such converted random variables. QF-Net also integrates a quantum-friendly batch normalization (BN) as shown in Figure 1(c). It includes additional parameters to normalize the output of a neuron, which are tuned during the training phase. To support both the infer-ence and training of QF-Net, we further develop QF-FB, a forward/backward propagation engine for QF-Net. When QF-FB is integrated into PyTorch to conduct inference and training of QF-Net on classical computers, we denote it as QF-FB(C), whose design is based on the probability theory. QF-FB(C) is efficient for both inference and training. QF-FB can also be executed on a quantum computer or a quantum simulator. Based on Qiskit Aer simulator, we implement QF-FB(Q) for inference with or without error models.
For each operation in QF-Net (e.g., neural computation and batch normalization), a corresponding quantum circuit is designed in QF-Circ. In neural computation, an encoder is involved to encode the inputs and weights. The output will be sent to the batch normalization which involves additional control qbits to adjust the probability of a given qbit to be ranged from 0 to 1. Based on QF-Net and QF-Circ, QF-Map is an automatic tool to conduct (1) network-to-circuit mapping (from QF-Net to QF-Circ); (2) virtual-to-physic mapping (from virtual qbits in QF-Circ to physic qbits in quantum processors). Network-to-circuit mapping guarantees the consistency between QF-Net and QF-Circ with or without internal measurement; while virtual-to-physic mapping is based on Qiskit with the consideration of error rates.
As a whole, given a dataset, QuantumFlow can design and train a quantum-friendly machine learning model and automatically generate the corresponding quantum circuit. The proposed co-design framework is evaluated on the IBM Qikist Aer simulator and IBM Quantum Processors.

Results
This section presents the evaluation results of all four subcomponents in QuantumFlow. We first evaluate the effectiveness of QF-Net on the commonly used MNIST dataset 31 for the classification task. Then, we show the consistency between QF-FB(C) on classical computers and QF-FB(Q) on the Qiskit Aer simulator. We finally conduct an end-to-end case study on a binary classification test case on IBM quantum processors to test QF-Circ and QF-Map.  Figure 2 reports the results of different approaches for the classification of handwritten digits on the commonly used MNIST dataset 31 . Results clearly show that with the same network structure (i.e., the same number of layers and the same number of neurons in each layer), the proposed QF-Net can achieve the highest accuracy than the existing models: (i) multi-level perceptron (MLP) with binary weights for the classical computer, denoted as MLP(C); (ii) MLP with binary inputs and weights designed for the classical computer, denoted as binMLP(C); and (iii) a state-of-the-art quantum-aware neural network with binary inputs and weights 25 , denoted as FFNN(Q).

QF-Net Achieves High Accuracy on MNIST
In the experiments, for each network, we have two implementations: one with batch normalization (w/ BN) and one without batch normalization (w/o BN). Kindly note that FFNN 25 does not consider batch normalization between layers. To show the benefits and generality of our newly proposed BN for improving the quantum circuits' accuracy, we add that same functionality to FFNN for comparison. From the results, we can see that the proposed "QF-Net w/ BN" (abbr. QF-Net_BN) achieves the highest accuracy among all networks (even higher than MLP running on classical computers). Specifically, for the dataset of {3, 6}, the accuracy of QF-Net_BN is 97.01%, achieving 1.84% and 14.55% higher accuracy gain against MLP(C) and FFNN(Q), respectively. Similar improvements are also achieved for QF-Net_BN on the dataset {3, 8}. Because of the similarity of 3 and 8, QF-Net_BN only achieves an accuracy of 86.95%, but it is still the best accuracy among all networks. The above results validate that the proposed QF-Net has a great potential in solving machine learning problems and our co-design framework is effective to design a quantum network with high accuracy.
Furthermore, we have an interesting observation for our proposed batch normalization (BN). For all test cases, BN helps to improve the accuracy of QF-Net, and the most significant improvement is observed for dataset {1, 3, 6}, from less than 60% to 87.08%. Interestingly, BN also helps to improve MLP(C) accuracy significantly for dataset {1, 3, 6} (from less {3,6} 28 (4)  This shows that the importance of batch normalization in improving model performance and the proposed BN is definitely useful for quantum neural networks.

QF-FB(C) and QF-FB(Q) are Consistent
Next, we evaluate the results of QF-FB(C) for QF-Net on classical computers, and that of QF-FB(Q) for the quantum circuit QF-Circ built upon QF-Net via QF-Map. Table 1 reports the comparison results in accuracy and elapsed time, where results under Column QF-FB(C) are the golden results.
Because of the limitation of Qiskit Aer (whose backend is "ibmq_qasm_simulator") used in QF-FB(Q) that can maximally support 32 qbits, we have to measure the results after each neuron. Specifically, in the first hidden layer, it needs 23 qbits (16 input qbits, 4 encoding qbits, and 3 auxiliary qbits) for neural computation and 4 qbits for batch normalization, and 1 output qbit; as a result, it requires 28 qbits in total. To add a new neuron, it requires the additional 8 qbits for encoding and batch normalization, which exceeds the limitation of 32 qbit for implementing two neurons on one circuit. In consequence, we repeatedly simulate each neuron in QF-Net and forward the output of neurons to the next layer. The number of qbits used for each hidden layer ("L1" and "L2") is reported in column "Qbits", where numbers in parenthesis indicate the number of neurons in a hidden layer. Column "Accuracy" in Table 1 reports the accuracy comparison. We can see that there is a small difference between QF-FB(C) and QF-FB(Q). Specifically, the results obtained by QF-FB(C) have slightly higher accuracy over QF-FB(Q). This is because Qiskit Aer simulation used in QF-FB(Q) is based on the Monte Carlo method, and the output probability of different neurons may quite close for some cases, leading deviations. Another potential issue is that the trained model is based on QF-FB; however, as will be shown later, it is not practical to employ Qiskit Aer for training due to large elapsed time. The above results demonstrate QF-Net can be consistently implemented on classical and quantum computers.
{3,8}, and {1,3,6}, respectively. As we can see from the table, QF-FB(Q) takes over 2,500 Hours for classifying 2 digits and 14,000 Hours for classifying 3 digits, while QF-FB(C) only takes less than 16 seconds for all datasets. The speedup is more than six orders of magnitude larger (i.e., 10 6 ×). This verifies that QF-FB can provide an efficient forward propagation procedure to support the lengthy training of QF-Net.
In Figure 3, we further verify the accuracy of QF-FB by conducting a comparison for design 4 in Figure 4(d) on IBM quantum processor with "ibm_armonk" backend. Kindly note that different backends are selected by QF-Map. For QF-FB(Q), we have two configurations: (1) QF-FB(Q)-ideal assuming perfect qbits; (2) QF-FB(Q)-noise with error models derived from "ibm_armonk". We launch either simulation or execution for respective approaches for 10 times, each of which is represented by a dot in Figure 3. We observe that the results of QF-FB(Q)-ideal are distributed around that generated by QF-FB(C) within 1% deviation; while QF-FB(Q)-noise obtains similar results of that on the IBM quantum processor. These results verify that the QF-Net on the classical computer can achieve consistent results with that of QF-Circ deployed on a quantum computer with perfect qbits.

QF-Circ and QF-Map on IBM Quantum Processor
This subsection further evaluates the efficacy of Quantum-Flow on IBM Quantum Processors. We first show the importance of quantum circuit optimization in QF-Circ to minimize the number of required qbits. Based on the optimized circuit design, we then deploy a 2-input binary classifier on IBM quantum processors. Figure 4 demonstrates the optimization of a 2-input neuron step by step. All quantum circuits in Figures 4(a)-(d) achieve the same functionality, but with a different number of required qbits. The equivalency of all designs will be demonstrated in the Supplementary Information. Design 1 in Figure 4(a) is directly derived from the design methodology presented in Methods section. To optimize the circuit using fewer qbits, we first convert it to the circuit in Figure 4(b), denoted as design 2. Since there is only one controlled-Z gate from qbit I0 to qbit E/O, we can merge these two qbits, and obtain  an optimized design in Figure 4(c) with 2 qbits, denoted as design 3. The circuit can be further optimized to use 1 qbit, as shown in Figure 4(d), denoted as design 4. The function f in design 4 is defined as follows: where x = sin 2 α 2 , y = sin 2 β 2 , representing input probabilities. To compare these designs, we deploy them onto IBM Quantum Processors, where "ibm_velencia" backend is selected by QF-Map. In the experiments, we use the results from QF-FB(C) as the golden results. Figure 4(e) reports the deviations of all designs against the golden results. The results clearly show that design 4 is more robust than others, because it uses fewer qbits in the circuit. Specifically, the deviation of design 4 against golden results is always less than 5%, while reaching up to 13% for design 1. In the following experiments, design 4 is applied in QF-Circ.
Next, we are ready to introduce the case study on an endto-end binary classification problem as shown in Figure 5. In this case study, we train the QF-Net based on QF-FB(C). Then, the tuned parameters are applied to generate QF-Circ. Finally, QF-Map optimizes the deployment of QF-Circ to IBM quantum processor, selecting the "ibmq_essex" backend.
The classification problem is illustrated in Figure 5(a), which is a binary classification problem (two classes) with two inputs: x and y. For instance, if x = 0.2 and y = 0.6, it indicates class 0. The QF-Net, QF-Circ, and QF-Map are demonstrated in Figure 5(b)-(d). First, Figure 5(b) shows that QF-Net consists of one hidden layer with one 2-input neuron and batch normalization. The output is the probability p 0 of class 0. Specifically, an input is recognized as class 0 if p 0 ≥ 0.5; otherwise it is identified as class 1.   accuracy; (f) QF-FB(Q) achieves 98% accuracy where 2 marked error cases having probability deviation within 0.6% ; (g) results on "ibmq_essex" using the default mapping, achieving 68% accuracy; (h) results obtained by "ibmq_essex" with the mapping in (d), achieving 82% accuracy; shots number in all tests is set as 8,192.
The quantum circuit QF-Circ of the above QF-Net is shown in Figure 5(c). The circuit is composed of three parts, (1) neural computation, (2) batch_adj in batch normalization, and (3) indiv_adj in batch normalization. The neural computation is based on design 4 as shown in Figure 4(d). The parameter of Y gate in neural computation at qbit q 0 is determined by the inputs x and y. Specifically, f (x, y) = 2 · arcsin( √ x + y − 2 · x · y), as shown in Formula 1. Then, batch normalization is implemented in two steps, where qbits q 2 and q 4 are initialized according to the trained BN parameters. During the process, q 1 holds the intermediate results after batch_adj, and q 3 holds the final results after indiv_adj. Finally, we measure the output on qbit q 3 1 . After building QF-Circ, the next step is to map qbits from the designed circuit to the physic qbits on the quantum processor, and this is achieved through our QF-Map. In this experiment, QF-Map selects "ibm_essex" as backend with its physical properties shown in Figure 5(d), where error rates of each qbit and each connection are illustrated by different colors. By following the rules as defined by QF-Map (see Method section), we obtain the physically mapped QF-Circ shown 1 A Quirk-based example of inputs 0.2 and 0.6 leading to f (x, y) = 1.6910 can be accessed by https://wjiang.nd.edu/quirk_0_2_ 0_6.html, which is accessible at 06-19-2020. The output probability of 60.3% is larger than 50%, implying the inputs belong to class 0.
in Figure 5(d). For example, the input q 0 is mapped to the physical qbit labeled as 4.
After QuantumFlow goes through all the steps from input data to the physic quantum processor, we can perform inference on the quantum computer. In this experiments, we test 100 combinations of inputs from x, y = 0.1, 0.1 to x, y = 1.0, 1.0 . First, we obtain the results using QF-FB(C) as golden results and QF-FB(Q) as quantum simulation assuming perfect qbits, which are reported in Figure 5(e) and (f), achieving 100% and 98% prediction accuracy. The results verify the correctness of the proposed QF-Net. Second, the results obtained on quantum processors are shown in Figure  5(h), which achieves 82% accuracy in prediction. For comparison, in Figure 5(g), we also show the results obtained by using the default mapping algorithm in IBM Qiskit, whose accuracy is only 68%. This result demonstrates the value of QF-Map in further improving the physically achievable accuracy on a physical quantum processor with errors.

Discussion
In summary, we propose an integrated QuantumFlow framework to co-design the machine learning models and quantum circuits. A novel quantum-aware QF-Net is first designed. Then an accurate and efficient inference engine, QF-FB, is

5/10
proposed to enable the training of QF-Net on classical computers. Based on QF-Net and the training results, the QF-Circ can automatically generate and optimize a corresponding quantum circuit. Finally, QF-Map will map QF-Circ to a quantum processor with the consideration of error rates of qbits.
Neural computation is one key component in Quantum-Flow to achieve state-of-the-art accuracy. We show that the existing quantum-aware neural network 25 that interprets inputs as the binary form will degrade the network accuracy as shown in Figure 2. To address this problem, QF-Net models real number inputs as random variables following a two-point distribution. Details will be introduced in the next section.
Batch normalization is another key technique in improving accuracy, particularly when the network grows deeper. This can be seen from the results in Figure 2. One main reason is that the data passing a nonlinear function y 2 will lead to outputs that are significantly shrunken to a small range around 0 for real number representation and 1/m for a two-point distribution representation, where m is the number of inputs. Unlike the straightforward way of doing normalization on classical computers, it is non-trivial to normalize a set of qbits. Innovations are made in QuantumFlow to design a quantumfriendly batch normalization.
We have experimentally tested the QuantumFlow on a 32qbit Qiskit Aer simulator and a 5-qbit IBM quantum processor based on superconducting technology. We show that the proposed quantum oriented machine learning model QF-Net can obtain state-of-the-art accuracy on the MNIST dataset. It can even outperform the conventional model on a similar scale for the classical computer. For the experiments on IBM quantum processors, we demonstrate that, even with the high error rates of the current quantum processor, QF-Net can be applied to classification tasks with high accuracy.
In order to accelerate the QF-FB on classical computers to support training, we make the assumptions that the perfect qbits are used. This enables us to apply theoretic formulations to accelerate the simulation process; however, it leads to some error in predicting the outputs of its corresponding deployment on a physical quantum processor with high error rates (such as the current IBM quantum processor with error rates in the range of 10 −2 ). However, we do not deem this as a drawback of our approach, rather this is an inherent problem of the current physical implementation of quantum processors. As the error rates get smaller with future quantum processors, it will help to narrow the gap between what QF-Net predicts and what quantum processor delivers. With the innovations on reducing the error rate of physic qbits, QF-Net will achieve better results.

Methods
We are going to introduce QuantumFlow in this section. Neural computation and batch normalization are two key components in a neural network, and we will present the design and implementation of these two components in QF-Net, QF-FB, QF-Circ, and QF-Map, respectively.  Figure 6 demonstrates two fundamental components in the proposed quantum-friendly neural network QF-Net: (1) neural computation in Figure 6(a); and batch normalization in Figure  6(c). We discuss the details of each component as follows.

QF-Net
An m-input neural computation component is illustrated in 6(a), where m input data I 0 , I 1 , · · · , I m−1 and m corresponding weights w 0 , w 1 , · · · , w m−1 are given. Input data I i is a real number ranging from 0 to 1, while weight w i is a {−1, +1} binary number. Neural computation in QF-Net is composed of 4 operations: i) R: this operation converts a real number p k of input I k to a two-point distributed random variable x k , where P{x k = −1} = p k and P{x k = +1} = 1 − p k , as shown in 6(b). For example, we treat the input I 0 's real value of p 0 as the probability of x 0 that outcomes −1 while q 0 = 1 − p 0 as the probability that outcomes +1. ii) C: this operation calculates y as the average sum of weighted inputs, where the weighted input is the product of a converted input (say x k ) and its corresponding weight (i.e., w k ). Since x k is a twopoint random variable, whose values are −1 and +1 and the weights are binary values of −1 and +1, if w k = −1, w k · x k will lead to the swap of probabilities P{x k = −1} and P{x k = +1} in x k . iii) A: we consider the quadratic function as the non-linear activation function in this work, and A operation outputs y 2 where y is a random variable. iv) E: this operation converts the random variable y 2 to 0-1 real number by taking its expectation. It will be passed to batch normalization to be further used as the input to the next layer. ual adjustment ("indiv_adj"). Basically, batch_adj is proposed to avoid data to be continuously shrunken to a small range (as stated in Discussion section). This is achieved by normalizing the probability mean of a batch of outputs to 0.5 at the training phase, as shown in Figure 7(c)-(d). In the inference phase, the outputẑ can be computed as follows: After batch_adj, the outputs of all neurons are normalized around 0.5. In order to increase the variety of different neurons' output for better classification, indiv_adj is proposed. It contains a trainable parameter λ and a parameter γ (see Figure 7(e)). Since different neurons have different values of λ , the variation of outputs can be controlled by λ . In the inference phase, its outputz can be calculated as follows.
The determination of parameters t, θ , and γ is conducted in the training phase, which will be introduced later in QF-FB.

QF-FB
QF-FB involves both forward propagation and backward propagation. In forward propagation, all weights and parameters are determined, and we can conduct neural computation and batch normalization layer by layer. The neural computation will compute y = ∑ ∀i {x i ×w i } m and y 2 , where x i is a two-point random variable, as shown in Figure 6(b). The distributions of y and y 2 are illustrated in Figure 7(a)-(b). It is straightforward to get the expectation of y 2 by using the distribution; however, for m inputs, it involves 2 m terms (e.g., ∏ q i is one term), and leads to the time complexity to be O(2 m ). To reduce the time complexity, QF-FB takes advantage of independence of inputs to calculate the expectation as follows: where E(∑ ∀i [w i x i ] 2 ) = m, since [w i x i ] 2 = 1 and there are m inputs in total. The above formula derives the following algorithm to conduct neural computation efficiently.
The forward propagation for batch normalization can be efficiently implemented based on the output of the neural computation. A code snippet is given as follows. For the backward propagation, we need to determine weights and parameters (e.g., θ in BN). The typically used optimization method (e.g., stochastic gradient descent 32 ) is applied to determine weights. In the following, we will discuss the determination of BN parameters t, θ , γ.
The batch_adj sub-component involves two parameters, t and θ . During the training phase, a batch of outputs are generated for each neuron. Details are demonstrated in Figure  7(c)-(d) with 6 outputs. In terms of the mean of outputs in a batch p mean , there are two possible cases: (1) p mean ≤ 0.5 and (2) p mean > 0.5. For the first case, t is set to 0 and θ = 2 × arcsin( 0.5−p mean 1−p mean ) can be derived from Formula 2 by settinĝ z to 0.5; similarly, for the second case, t is set to 1 and θ =  2 × arcsin( 0.5 p mean ). Kindly note that the training procedure will be conducted in multiple iterations of batches. As with the method for batch normalization in the conventional neural network, we employ moving average to record parameters. Let x i be the parameter of x (e.g., θ ) at the i th iteration, and x cur be the value obtained in the current iteration. For x i , it can be calculated as where m is the momentum which is set to 0.1 by default in the experiments.
In forward propagation, the sub-module indiv_adj is almost the same with batch_adj for t = 0; however, the determination of its parameter γ is slightly different from θ for batch_adj. As shown in Figure 7(e), the initial probability ofẑ after batch_adj is p z . The basic idea of indiv_adj is to moveẑ by an angle, γ. It will be conducted in three steps: (1) we move start point at p z to point A with the probability of (p z /n + 0.5) × λ , where n is the batch size and λ is a trainable variable; (2) we obtain γ by moving point A to p = 0.5; (3) we finally move solution at p z by the angle of γ to obtain the final result. By replacing P mean by (p z /n + 0.5) × λ in batch_adj when t = 1, we can calculate γ. For each batch, we calculate the mean of γ, and we also employ the moving average to record γ.

QF-Circ
We now discuss the corresponding circuit design for components in QF-Net. The quantum circuit of the neural computation in Figure 6(a) is illustrated in Figure 8(a); while the design of different cases of batch normalization in Figure 6(b) are illustrated in Figure 8(b)-(e). A detailed demonstration of the equivalency between QF-Circ and QF-Net can be found in the Supplementary Information.
For the neural computation (NC) circuit in Figure 8(a), it is composed of m input (I) qbits, and k = log 2 m encoding (E) qbits, and 1 output (O) qbit. In accordance with the operations of QF-Net, the NC circuit is composed of three parts. In the first part, the NC circuit applies m Y gates with parameter θ = 2 × arcsin( √ p k ) (details see the Supplementary Information) to initialize input qbit I k in terms of the input real value p k , such that the state of I k is changed from The initialized qbits represent the random variables in Figure 6(b), where P{I k = |1 } = p k , and P{I k = |0 } = q k . Other qbits, including encoding qbits E m and output qbit O, are initialized as |0 . The second part in the NC circuit completes the average sum of weighted inputs with a quadratic activation function, which can be further divided into four steps. In step 1, the NC circuit conducts dot product of inputs and weight on qbits I k and prepares to encode qbits in superposition. For input qbits, we place a W component to perform the multiplication. Specifically, for qbit I k , a X gate is placed if and only if W k = −1; this will swap the amplitude/probability of I k on states |0 and |1 . For all encoding qbits, we apply the Hadamard (H) gate to make them enter superposition. In step 2, the NC circuit realizes the m − k encoder (e.g., m = 4 and k = 2 is a 4 − 2 encoder). During the encoding, all terms in y and y 2 's distribution (see Figure 7(a)-(b)) will be generated. The intuition is that the k qbits in encoder can have 2 k = m states, and we can encode m states into k qbits. Here, the input qbits control the flip of sign for the amplitude of encoding qbits; the controlled Z gates (solid dots) are placed to indicate |1 , and in encoding qbits the controlled not Z gates (circles) are placed to indicate |0 . For instance, the control gate from I m−1 is to flip the sign of state |00...0 (all circles). In step 3, the H gates on encoding qbits collect the results on states |I 0 I 1 · · · I m−1 ⊗ |00 · · · 0 . Finally, step 4 applies the control gate to extract the amplitudes in states |I 0 I 1 · · · I m−1 ⊗ |00 · · · 0 to qbit O.
After the second part of NC circuit, the expectation information has already been integrated to output qbit O, whose state is |O = 1 − E(y 2 )|0 + E(y 2 )|1 . The third part in NC circuit is to measure qbit O to obtain the output real number E(y 2 ). Kindly note that for a multi-layer QF-Net, there is no need to have a measurement at interfaces, because the converting operation R will initialize a qbit to the state exactly the same with |O . In addition, the following presented batch normalization circuit will also take |O as input. Now, we discuss the implementation of the batch normalization in quantum circuits. In these circuits, three qbits are 8/10 involved: (1) qbit I for input, which can be the output of qbit O in NC circuit without measurement, or initialized using a Y gate according to the measurement of qbit O in NC circuit; (2) qbit P conveys the parameter, which is obtained via training procedure, see details in QF-FB; (3) output qbits O, which can be directly used for the next layer or be measured to convert to a real number. Figures 8(b)-(c) show the circuit design for two cases in batch_adj. Since parameters in batch_adj are determined in the inference phase, if t = 0, we will adopt the circuit in Figure 8(b), otherwise, we adopt that in Figure 8(c). Then, Figure 8(d) shows the circuit for indiv_adj. We can see that circuit in Figures 8(c) and (d) are the same except the initialization of parameters, θ and γ. For circuit optimization, we can merge the above two circuits into one by changing the input parameters to g(θ , γ), as shown in Figure 8(e). In this circuit,z = z × sin 2 g(θ ,γ) 2 , while for applying circuits in Figures 8(c) and (d), we will havez = z × sin 2 θ 2 × sin 2 γ 2 . To guarantee the consistent function, we can derive that g(θ , γ) = 2 × arcsin(sin θ 2 × sin γ 2 ).

QF-Map
QF-Map is an automatic tool to map QF-Net to the quantum processor through two steps: network-to-circuit mapping and virtual-to-physic mapping. Figure 9 illustrates an example of mapping from QF-Net to QF-Circ. QF-Net in Figure 9(a) is a neural network with 2 hidden layers, designed based on neural computation (NC) and batch normalization (BN) subcomponents in Figure 6 for classical computers. In QF-Circ, we notice that the results are stored in state |1 of the output qbits for both NC and BN; while the initialization operation R (using Y gate) is to encode the previous results to state |1 . As a result, NC and BN circuits in QF-Circ can directly take the output qbit from the previous circuit without measurement as input. As such, Figure 9(b) shows a corresponding network to QF-Circ, where internal data type conversions in Figure  9(a) are removed. Of course, alternatively, BN can still take a given real number (e.g., measured results from the previous circuit) as input, but it would need a Y gate to initialize qbit I. The network-to-circuit mapping from the network in Figure  9(b) to the quantum circuit QF-Circ is demonstrated in Figure  9(c). In QF-Circ, the outputs of the first hidden layer are independent, and therefore 2 NC components in first hidden layers are based on independent inputs, initialized with the same Y gates. There is no such an independent requirement for the outputs of the second layer, and therefore, 2 NC components use the same inputs. In this example, we assume the parameter t equals 0 in 2 BN components. Finally, the results can be measured at O 4 and O 5 .
After QF-Circ is generated, the final step is to map QF-Circ to quantum processors, called virtual-to-physic mapping. In this paper, we deploy QF-Circ to various IBM quantum processors. Virtual-to-physic mapping in QF-Map has two tasks: (1) select a suitable quantum processor backend, and (2) map qbits in QF-Net to physic qbits in the selected backend. For the first task, QF-Map will i) check the number of qbits needed; ii) find the backend with the smallest number of qbit to accommodate QF-Circ; iii) for the backends with the same number of qbits, QF-Map will select a backend for the minimum average error rate. The second task in QF-Map is to map qbits in QF-Net to physic qbits. The mapping follows two rules: (1) the qbit in QF-Net with more gates is mapped to the physic qbit with a lower error rate; and (2) qbits in QF-Net with connections are mapped to the physic qbits with the smallest distance.