Deep quantum neural networks equipped with backpropagation on a superconducting processor

Deep learning and quantum computing have achieved dramatic progresses in recent years. The interplay between these two fast-growing fields gives rise to a new research frontier of quantum machine learning. In this work, we report the first experimental demonstration of training deep quantum neural networks via the backpropagation algorithm with a six-qubit programmable superconducting processor. In particular, we show that three-layer deep quantum neural networks can be trained efficiently to learn two-qubit quantum channels with a mean fidelity up to 96.0% and the ground state energy of molecular hydrogen with an accuracy up to 93.3% compared to the theoretical value. In addition, six-layer deep quantum neural networks can be trained in a similar fashion to achieve a mean fidelity up to 94.8% for learning single-qubit quantum channels. Our experimental results explicitly showcase the advantages of deep quantum neural networks, including quantum analogue of the backpropagation algorithm and less stringent coherence-time requirement for their constituting physical qubits, thus providing a valuable guide for quantum machine learning applications with both near-term and future quantum devices.

Machine learning has achieved tremendous success in both commercial applications and scientific researches over the past decade.In particular, deep neural networks play a vital role in cracking some notoriously challenging problems, ranging from playing Go [1] to predicting protein structures [2].They contain multiple hidden layers and are believed to be more powerful in extracting high-level features from data than traditional methods [3,4].The learning process can be fueled by updating the parameters through gradient descent, where the backpropagation algorithm enables efficient calculations of gradients via the chain rule [3].
By harnessing the weirdness of quantum mechanics such as superposition and entanglement, quantum machine learning approaches hold the potential to bring advantages compared with their classical counterpart.In recent years, exciting progress has been made along this interdisciplinary direction [5][6][7][8][9][10].For example, rigorous quantum speedups have been proved in classification models [11] and generative models [12] with complexity-theoretic guarantees.In terms of the expressive power for quantum neural networks, there is also preliminary evidence showing their advantages over the comparable feedforward neural networks [13].Meanwhile, noteworthy progress has also been made on the experimental side [14][15][16][17][18][19][20][21][22].For examples, in Ref. [14], the authors realize a quantum convolutional neural network on a superconducting quantum processor.In Ref. [15], an experimental demonstration of quantum adversarial learning has been reported.Similar to deep classical neural networks with multiple layers, a deep quantum neural network (DQNN) with the layer-bylayer architecture is proposed [23,24], which can be trained via a quantum analog of the backpropagation algorithm.Under this framework, the quantum analog of a perceptron is a general unitary operator acting on qubits from adjacent layers, whose parameters are updated by multiplying the corresponding updating matrix of the perceptron in the training process.
In this paper, we report the first experimental demonstra-tion of training DQNNs through the backpropagation algorithm on a programmable superconducting processor with six frequency-tunable transmon qubits.We find that a three-layer DQNN can be efficiently trained to learn a two-qubit target quantum channel with a mean fidelity up to 96.0% and the ground state energy of molecular hydrogen with an accuracy up to 93.3% compared to the theoretical prediction.In addition, we also demonstrate that a six-layer DQNN can efficiently learn a one-qubit target quantum channel with a mean fidelity up to 94.8%.Our approach can carry over to other DQNNs with a larger width and depth straightforwardly, thus paving a way towards large-scale quantum machine learning with potential advantages in practical applications.
As sketched in Fig. 1(a), our DQNN has a layer-by-layer structure, and maps the quantum information layerwise from the input layer state ρ in , through L hidden layers, to the output layer state ρ out .Quantum perceptrons are the building blocks of the DQNN.As shown in Fig. 1(b), a single quantum perceptron is defined as a parameterized quantum circuit applied to the corresponding qubit pair at adjacent layers, which is directly implementable in experiments.A sequential combination of the quantum perceptrons constitutes the layerwise operation between adjacent layers.One of the key characteristics of the DQNN is the layer-by-layer quantum state mapping, allowing efficient training via the quantum backpropagation algorithm [23].We sketch the general experimental training process in Fig. 1(c).When performing the quantum backpropagation algorithm, one only requires the information from adjacent two layers, rather than the full DQNN, to evaluate the gradients with respect to all parameters at these two layers.Such a backpropagation-equipped DQNN bears the following merit: it significantly reduces the requirements for the ability to maintain many coherent qubits, since qubits in each layer only need to keep their coherence for no more than the duration of two-layer operations regardless of the depth of the DQNN.This advantage makes it possible to real-  ize DQNNs with reduced number of layers of qubits through qubit reusing [23].
Our experiment is carried out on a superconducting quantum processor, which possesses six two-junction and frequency-tunable transmon qubits [25][26][27][28][29][30][31][32].As photographed in Fig. 1(d), the chip is fabricated with the layout of the qubits being purposely and carefully optimized for a layer-by-layer structure.Each transmon qubit is coupled to an individual flux control line, XY control line, and quarter-wavelength readout resonator, respectively.All readout resonators are coupled to a common transmission line, which is connected through a Josephson parametric amplifier for high-fidelity single-shot readout of the qubits [33,34].In order to implement the twoqubit gates in the quantum perceptrons, two separate halfwavelength bus resonators are respectively used to mediate the interactions among the qubits between layers [16,35,36].The detailed experimental setup and device parameters can be found in Supplementary Information.
We first consider using DQNNs to learn a two-qubit quan-tum channel.We experimentally implement a three-layer DQNN with two qubits in each layer.This three-layer DQNN is denoted by DQNN 1 .Here, we choose |00 , |01 , |++ , and |+i + i as our input states ρ in x , where the subscript x = 1, 2, 3, 4 is the labeling, |0 and |1 are the eigenstates of Pauli Z matrix, |+ (|− ) is the eigenstate of Pauli X matrix, and | + i is the eigenstate of Pauli Y matrix.The four pairs of ρ in x , τ out x serve as the training dataset, where τ out x is the corresponding desired output state produced by the target quantum channel.We learn the target quantum channel by maximizing the mean fidelity between τ out x and the measured DQNN output ρ out x averaged over all four input states.The general training procedure goes as follows: 1) Initialization: we randomly choose the initial gate parameters θ for all perceptrons in DQNN 1 .2) Forward process (implemented on our quantum processor): for each training sample ρ in x , τ out x , we prepare the input layer to ρ in x , then apply layerwise forward channels E 1 and E out , and extract ρ 1 x and ρ out x successively by carrying out quantum state tomography [37].3) Backward process (im- Experimental results for learning a two-qubit quantum channel.We train the three-layer DQNN1 with 30 different initial parameters, and plot the mean fidelity as a function of training epochs for 10 of them for clarity.The upper left inset shows the distribution of the converged mean fidelities of these 30 different initial parameters.We choose one of the learning curves (marked with dark blue triangles), then randomly generate 100 different input quantum states, and test the fidelity between their output states given by the target quantum channel and the trained (untrained) DQNN1.In the lower left inset, the green (purple) curve shows the distribution of the fidelities for the trained (untrained) DQNN1.The right inset is a schematic illustration of DQNN1.At adjacent layers, we apply the quantum perceptrons in the order indicated with the colors: red, yellow, blue, and purple.plemented on a classical computer): we initialize the output layer to σ out x , which is determined by ρ out x and τ out x (see Supplementary Note 1), and then apply backward channels F out and F 1 on σ out x to successively obtain σ 1 x and σ 0 x .4) Based on { ρ l−1 x , σ l x }, we evaluate the gradient of the fidelity with respect to all the variational parameters in the adjacent layers l − 1 and l.Then we take the average over the whole training dataset for the final gradient, which is used to update the variational parameters θ. 5) Repeat 2), 3), 4) for s 0 rounds.The pseudocode for our algorithm is provided in Supplementary Note 1.
In Fig. 2, we randomly choose 30 different initial parameters θ, and then train DQNN 1 to learn the same target quantum channel.We observe that DQNN 1 converges quickly during the training process, with the highest fidelity above 96%.
Compared with the numerical simulation results (see Supplementary Note 2), the deviation of the final converged fidelities is due to experimental imperfections, including qubit decoherence and residual ZZ interactions between qubits [38][39][40].In the upper left inset of Fig. 2, we show the distribution for all the converged fidelities from these 30 repeated experiments.We expect that the distribution will concentrate to a higher fidelity for improved performance of the quantum processor.
To evaluate the performance of DQNN 1 , we choose one training process from the 30 experiments, and refer the DQNN 1 with parameters corresponding to the ending (starting) epoch of the training curve as the trained (untrained) DQNN 1 .We generate other 100 different input quantum states and experimentally measure their corresponding output states produced by the trained (untrained) DQNN 1 .We test the fidelity between output states given by the target channel and the trained (untrained) DQNN 1 .As shown in the lower left inset of Fig. 2, for the trained DQNN 1 , 43% of the fidelities exceed 0.95 (green curve) and 95% of the fidelities are higher than 0.9, which separate away from the distribution of the results of the untrained DQNN 1 (purple curve).This contrast illustrates the effectiveness of the training process of DQNN 1 .
Another application of DQNNs is learning the ground state energy of a given Hamiltonian H by minimizing the energy estimate tr (ρ out H) for the output state of the DQNN.Here we aim to learn the ground state energy of the molecular hydrogen Hamiltonian [41].By exploiting the Bravyi-Kitaev transformation and certain symmetry, the Hamiltonian of molecular hydrogen can be reduced to the effective Hamiltonian acting on two qubits: where X i , Y i , Z i are Pauli operators on the i-th qubit, and coefficients g j (j = 0, • • • , 5) depend on the fixed bond length of molecular hydrogen.We consider the bond length 0.075 nm in this work and the corresponding coefficients g i can be found in Ref. [41].
We use DQNN 1 again as the variational ansatz to learn the ground state of molecular hydrogen with the following procedure, similar to the previous one of learning a quantum channel: 1) Initialization: we prepare the input layer to the fiducial product state |00 , and randomly generate initial gate parameters θ for DQNN 1 .2) In the forward process (implemented on the quantum processor), we apply forward channels E 1 and E out in succession, and extract quantum states of the hidden layer (ρ 1 ) and the output layer (ρ out ) by quantum state tomography.3) In the backward process (implemented on a classical computer), we initialize the quantum state of the output layer to σ out , and then obtain σ 1 and σ 0 after successively applying backward channels F out and F 1 on σ out .4) Based on { ρ l−1 , σ l }, we calculate the gradient of the energy estimate with respect to all the variational parameters in the adjacent layers l − 1 and l, and then update all gate parameters in DQNN 1 .5) Repeat 2), 3), 4) for s 0 rounds.The pseudocode for our algorithm is provided in Supplementary Note 1.
We train DQNN 1 with 30 different initial parameters and show our experimental results in Fig. 3(a).We observe that DQNN 1 converges within 20 epochs.The lowest ansatz energy estimate reaches below −1.727 (hartree) in the learning process, with an accuracy up to 93.3% compared to the theoretical value of the ground state energy −1.851 (hartree).This shows the good performance of DQNN 1 and the accuracy of our experimental system control.The inset of Fig. 3(a) shows the distribution of all the converged energy from these 30 repeated experiments with different initial parameters, six of which have an accuracy above 90%.
To numerically investigate the effects of experimental imperfections on training DQNNs, we consider two possible sources of errors: decoherence of qubits and residual ZZ interactions between qubits.Taking into consideration these errors, we numerically train DQNN 1 with 30 different initial parameters.We find that for four of these initial parameters DQNN 1 converges to local minima instead of the global minimum, which is also observed in the experiment as shown in the inset of Fig. 3(a).Excluding these abnormal instances with local minima, we plot the average energy estimate as a function of the strength of the residual ZZ interaction with different coherence times in Fig. 3(b).We find that the increase of the coherence time around the experimental value has a minor effect on learning the ground state energy, while the reduction of the residual ZZ interactions provides larger improvements of the ground state energy estimation.These experimental imperfections can be suppressed after introducing advanced technologies in the design and fabrication of better superconducting quantum circuits, such as tunable couplers [42][43][44] and tantalum based qubits [45,46].
To further illustrate the efficiency of the quantum backpropagation algorithm, we construct another DQNN with four hidden layers (denoted as DQNN 2 ) by rearranging our six-qubit quantum processor into a six-layer structure, with one qubit respectively in each layer.We focus on the task of learning a one-qubit quantum channel.We choose |0 , |1 , |− as our input states and compare the measured output states of DQNN 2 with the desired ones from the target single-qubit quantum channel.The general training procedure is similar as in training DQNN 1 discussed above.Our experimental results are summarized in Fig. 4, which shows the learning curves for 10 different initial parameters.We find that DQNN 2 can learn the target quantum channel with a mean fidelity up to 94.8%.We notice that the variance among the converged mean fidelity in DQNN 2 is smaller than that for DQNN 1 , which may be attributed to the smaller total circuit depth and thus less error accumulation due to experimental imperfections.To study the learning performance, we choose one of these learning curves (marked in triangles), and refer DQNN 2 with parameters corresponding to the ending (starting) epoch of the learning curve as the trained (untrained) DQNN 2 .We then use other 100 different input quantum states to test the trained and untrained DQNN 2 by measuring the fidelities between the experimental output states and the corresponding desired ones given by the target quantum channel.As shown in the upper inset of Fig. 4, the fidelity distribution concentrates around 0.92 for the trained DQNN 2 , which stands in stark contrast to that of the untrained DQNN 2 and thus indicates a good performance after training.
In summary, we have demonstrated the training of deep quantum neural networks on a six-qubit programmable superconducting quantum processor.We experimentally exhibit its intriguing ability to learn quantum channels and learn the ground state energy of a given Hamiltonian.The quantum backpropagation algorithm demonstrated in our experiments can be directly applied to DQNNs with extended widths and depths.This approach significantly reduces the requirements for the coherence time of superconducting qubits, regardless of how many hidden layers DQNNs include.With further improvements in experimental conditions, the quantum perceptrons in our DQNNs can be constructed with deeper circuits to improve the expressive capacity, which allows DQNNs to tackle more challenging tasks in the future.

Methods
Framework.We consider a deep quantum neural network (DQNN) that includes L hidden layers with a total number m l of qubits in layer l.The qubits in two adjacent layers are connected with quantum perceptrons and each perceptron consists of two single-qubit rotation gates R x (θ 1 ) and R x (θ 2 ) along the x axis with variational angles θ 1 and θ 2 , respectively, followed by a fixed two-qubit controlled-phase gate.The unitary of the quantum perceptron that acts on the i-th qubit at layer l − 1 and the j-th qubit at layer l in the DQNN is written as U l (i,j) (θ l (i,j),1 , θ l (i,j),2 ).Then the unitary product of all quantum perceptrons acting on the qubits in layers l − 1 and l is denoted as ) .The DQNN acts on the input state ρ in and produces the output state ρ out according to where U ≡ U out U L U L−1 . . .U 1 is the unitary of the DQNN, and all qubits in the hidden layers and the output layer are initialized to a fiducial product state |0 • • • 0 .The characteristic of the layer-by-layer architecture enables ρ out to be expressed as a series of maps on ρ in : where In Supplementary Information, we prove that for the two machine learning tasks in our work, the derivative of the mean fidelity or the energy estimate with respect to θ l (i,j),k can be calculated with the information of layers l − 1 and l, which can be written as G(θ l , ρ l−1 , σ l ) with θ l incorporating all parameters in layers l − 1 and l.We note that ρ l−1 = E l−1 . . .E 2 E 1 ρ in . . .refers to the quantum state in layer l − 1 in the forward process, and σ l = F l+1 (. . .F out (• • • ) . ..) represents the backward term in layer l with F l being the adjoint channel of E l .
Generating random input quantum states.To evaluate the learning performance in the task of learning a target quantum channel, we need to generate many different input quantum states and test the fidelity between their output states produced by DQNN 1 and their desired output states given by the target quantum channel.
For the task of learning a two-qubit quantum channel, we generate these input quantum states by separately applying single-qubit rotation gates R a1 (Ω 1 ) ⊗ R a2 (Ω 2 ) on the two qubits initialized in |00 .Here each rotation gate has a random rotation axis a i in the x-y plane and a random rotation angle Ω i .
For the task of learning a one-qubit quantum channel, we generate the input quantum states by applying single-qubit rotation gates R b (Φ) on the input qubit initialized in |0 with a random rotation axis b in the x-y plane and a random rotation angle Φ.

Data availability
The data for experimental results presented in the figures is provided in https: //github.com/luzd19/Deep-quantum-neural-networks_equipped-with-backpropagation Supplementary Information: Deep quantum neural networks equipped with backpropagation on a superconducting processor SUPPLEMENTARY NOTE 1: THEORETICAL DETAILS FOR DEEP QUANTUM NEURAL NETWORKS In classical machine learning, deep neural networks are characterized by the ability to extract high-level features from data.With the rapid development in quantum machine learning [5][6][7][8][9], we expect a quantum generalization of a deep neural network architecture to bring promising insights.Recently, a deep quantum neural network (DQNN) and a quantum analog of the backpropagation algorithm have been proposed [23].In this ansatz, the quantum analog of a perceptron is a unitary operation which acts on qubits in two adjacent layers.During the training process, the unitary operator of a quantum perceptron is updated by multiplying the corresponding updating matrix.
In this paper, we experimentally demonstrate the training of parameterized DQNNs with a superconducting quantum processor.Equipped with the backpropagation algorithm, we can efficiently calculate the gradients during the training process.Our scheme is feasible for the experimental implementation in the noisy intermediate scale quantum era.In this section, we will introduce the basic structures, optimization strategies, and training procedures for DQNNs.

Basic structures
As mentioned in the main text, our DQNNs have layer-by-layer structures, and qubits in two adjacent layers are connected with the quantum perceptrons.In our ansatz, the quantum perceptrons are engineered as parameterized quantum circuits.For simplicity in this paper, we consider that each quantum perceptron acts on only two qubits in two adjacent layers.The circuit structure of a quantum perceptron is composed of two single-qubit rotation gates R x (θ 1 ) and R x (θ 2 ) with θ 1 and θ 2 as the variational parameters, followed by a fixed two-qubit controlled-phase gate, which is shown in Fig. 1(b) in the main text.A sequential combination of the quantum perceptrons constitutes the layer-by-layer transition mapping between adjacent layers.In this way, the DQNN maps the information layerwise from the input layer to the output layer through hidden layers.
Now we consider a DQNN including L hidden layers.The total number of qubits in layer l is denoted as m l .The unitary of a quantum perceptron which acts on the i-th qubit at layer l − 1 and the j-th qubit at layer l is written as U l (i,j) (θ l (i,j),1 , θ l (i,j),2 ), where θ l (i,j),k (k = 1, 2) denote the variational parameters of the two R x gates in the quantum perceptron U l (i,j) .The unitary product of all quantum perceptrons acting on the qubits in layers l − 1 and l is denoted as: We note that qubits in layer l are initialized to a fiducial product state |0 • • • 0 , and then the quantum state ρ l of the qubits in layer l can be written as the layer-by-layer transition mapping on ρ l−1 : In this way, the output state ρ out can be expressed as a series of maps on ρ in :

Optimization strategies
With the basic structures discussed above, now we can specify the learning tasks.In this paper, we consider two machine learning tasks.The first task is learning a target quantum channel.We expect the output state given by the DQNN to be as close as possible to the output state given by the target quantum channel for each input state.We aim to maximize the mean fidelity between output states given by the DQNN (ρ out x ) and the target quantum channel (τ out x ) averaged over N training data: The second task is learning the ground state of a Hamiltonian H.We aim to minimize the energy estimate Ē of this Hamiltonian computed with the DQNN output state ρ out : Ē = tr ρ out H .
To maximize the mean fidelity or minimize the energy estimate, we adapt the gradient descent method.In the main text, we mention that our DQNNs with the layer-by-layer architecture allow the quantum backpropagation algorithm.Via this algorithm, one only requires the information from two adjacent layers to calculate the gradients with respect to all gate parameters at these two layers.In other words, the derivative of the mean fidelity or the energy estimate with respect to θ l (i,j),k can be written as G(θ l , ρ l−1 , σ l ), where θ l incorporates all gate parameters in layers l − 1 and l.
Here, we derive the formula for G(θ l , ρ l−1 , σ l ).We first consider a function f with the form f (ρ out , X) = tr(Xρ out ), where X is a Hermitian matrix related to specific tasks.The derivative of f with respect to θ l (i,j),k can be expressed as: where h.c.stands for the Hermitian conjugate of the preceding terms.We show the proof as follows: where U ≡ U out U L U L−1 . . .U 1 .We use the shorthands l+1) .We define ρ l−1 = tr 1,...,l−2,l,...,out (T 1 ) as the quantum states of the qubits in layer l − 1 in the forward process, and σ l = tr l+1,...,out (( as the backward term in layer l.From this formula we obtain the recursive relation between σ l−1 and σ l : where F l is the adjoint channel of E l , and σ out = X.From this recursive relation, we can obtain the backward terms layerwise from the output layer to the input layer in the backward process.Specially, if the gate with parameter θ l (i,j),k in the DQNN is of the form e − i 2 θ l (i,j),k Pn with P n belonging to the Pauli group, we can utilize the "parameter shift rule" to calculate the gradient of f : where , and U l ± denotes the unitary that replaces the parameter θ l (i,j),k in U l with θ l (i,j),k ± π 2 .We show the proof as follows: Proof.
Now we prove the first term equals to h + .In the same way, we can prove the second term equals to h − .The first term can be written as: With the gradients of f obtained above, we need to derive the gradients of the mean fidelity F and the energy estimate Ē in the two tasks that are discussed in the main text.In the task of learning a target quantum channel, for each input state, we consider the derivative of F x with respect to θ l (i,j),k .For convenience, we omit the superscript and subscript of F x , ρ out x , and τ out x , and use the shorthand . Now, we further omit the superscript and subscript of θ l (i,j),k .
This yields which has the same form as the derivative of tr(ρ out X) with τ 1/2 B −1 τ 1/2 analogous to X.In the task of learning the ground state energy of a Hamiltonian H, the energy estimate tr (ρ out H) has the same form as tr (ρ out X), where H is analogous to X.So we can derive the gradients of the energy estimate Ē according to G(θ l , ρ l−1 , σ l ).With the gradients obtained, we can update the variational parameters in the DQNN by gradient descent methods.

Training procedures
In this section, we give a detailed description of how our DQNNs are trained via the quantum backpropagation algorithm for different tasks.
For the task of learning a quantum channel, first we need to generate the training dataset.Here, we randomly choose parameters θ t in the DQNN to generate a specific target quantum channel that we aim to learn.Then we apply the target quantum channel on each input state to obtain the corresponding output state to constitute the training dataset with N being the size of the training dataset.We assume the DQNN used in this task includes L hidden layers with a total number m l of qubits in layer l.Now we describe the general training procedure as follows: 1. Initialization: Randomly choose initial gate parameters for all perceptrons in the DQNN, which is denoted as θ I .
Forward channel E l : According to the main text, the forward channel E l applies on qubits in layer l − 1 of the quantum state ρ l−1 , and produces ρ l in layer l according to In our experiment, we prepare m l qubits in layer l to the fiducial product state |0 • • • 0 at first.Then we apply all quantum perceptrons acting on qubits in layers l − 1 and l.Finally, we carry out quantum state tomography to extract ρ l .
Backward channel F l : The backward channel F l applies on backward term σ l and produces σ l−1 according to In this paper, we carry out the backward channel on a classical computer due to the experimental challenges in preparing the quantum states for the backward terms σ l .We expect an efficient proposal for the experimental implementation of the backward process, which is important and remains as a future work.

Evaluate the mean fidelity and the gradients:
Compute the mean fidelity: Calculate the gradient with respect to θ l (i,j),k for each training data: x , σ l x ), and then take the average over the whole training dataset: x , σ l x ).Finally, update each θ l (i,j),k with the learning rate according to 5. Repeat 2, 3 and 4 for s 0 steps.

SUPPLEMENTARY NOTE 2: NUMERICAL RESULTS FOR SEVERAL MACHINE LEARNING TASKS
In this section, we simulate the training of DQNNs by realizing the forward channels and the backward channels with matrix calculations on a classical computer, and present some numerical results.
Task: learning a two-qubit quantum channel.Here, we choose DQNN 1 mentioned in the main text to learn a two-qubit target quantum channel.The training dataset is the same as that in the main text.We numerically train DQNN 1 with 50 different initial parameters and show our numerical results in Supplementary Fig. S1.We observe that DQNN 1 shows high convergence performance, with the average converged mean fidelity above 98%.
We choose one learning curve (marked in triangles in Supplementary Fig. S1) to test the learning performance of DQNN 1 .We refer DQNN 1 with parameters corresponding to the ending (starting) epoch of this training curve as the trained (untrained) DQNN 1 , and then use 100 different input quantum states to test the fidelities between their corresponding output states and the desired output states given by the target quantum channel.As shown in the lower inset of Supplementary Fig. S1, for the trained DQNN 1 , the mean fidelity exceeds 0.97 (green bars), which separates away from the distribution of the results of the untrained DQNN 1 (purple bars).This contrast indicates a satisfying performance of DQNN 1 .

Quantum state tomography
We extract the quantum state ρ l of the qubits in each layer of the DQNN by carrying out the quantum state tomography.To reconstruct a single-qubit state, we perform single-qubit Pauli measurements on four bases S 1 = {|g , |e , |+ , |i }.To reconstruct a two-qubit state, we perform two-qubit Pauli measurements on 16 bases S 2 = {|v 1 ⊗ |v 2 ; v 1 , v 2 ∈ S 1 }.In our experiment, we repeat the measurement in each basis 10 4 times to obtain a probability distribution r on the two and four computational bases for the single-qubit and two-qubit cases, respectively.r is sent to a classical convex optimizer to find the In our experiments for training DQNN1 and DQNN2, we apply the quantum perceptrons in the order from the top to the bottom in the first column.Each perceptron acts on the qubits listed in the second column.When applying different perceptrons, we need to set the qubits to different frequencies as listed.We also show the operation time and the rotation angle for the controlled-phase gate in each perceptron.
density matrix ρ l that produces the distribution as close as r.

Fig. 1 .
Fig. 1.A schematic of training deep quantum neural networks.(a), Architecture exhibition of a general DQNN.Information propagates layerwise from the input layer to the output layer.At adjacent two layers, we apply the quantum perceptron in the order according to the exhibited circuit in (b).A quantum perceptron is realized by applying two single-qubit rotation gates Rx(θ1) and Rx(θ2) (the rotations along the x axis with variational angles θ1 and θ2, respectively) followed by a fixed two-qubit controlled-phase gate.(c), Illustration of the quantum backpropagation algorithm.We apply forward channels E on ρ in and successively obtain {ρ 1 , ρ 2 . . .ρ out }, and apply backward channels F to successively obtain {σ out , σ L . . .σ 1 } in the backward process.These forward and backward terms are used for the gradient evaluation.(d), Exhibition of a quantum processor with six superconducting transmon qubits, which are used to experimentally implement the DQNNs.The transmon qubits (Q1-Q6) are marked in red and the bus resonators (B1 and B2) are marked in green.

Fig. 3 .
Fig. 3. Experimental and numerical results for learning the ground state energy of molecular hydrogen.(a), Experimental energy estimate at each epoch during the learning process for different initial parameters.The inset displays the distribution of converged energy estimates of 30 different initial parameters.(b), Numerical results for the mean energy estimates with different coherence times and residualZZ interaction strengths between qubits.Specifically, we adjust both the energy relaxation time and the dephasing time with the same ratio (T /T0), where T and T0 are the coherence times in the simulation and the experiment respectively, and vary the residual ZZ interaction strengths.

Fig. 4 .
Fig. 4. Experimental results for learning a one-qubit quantum channel.The mean fidelity of training the six-layer DQNN2 is plotted as the function of training epochs for different initial parameters.We randomly generate 100 different single-qubit states, and evaluate the fidelities between their output states produced by DQNN2 and their desired output states given by the target quantum channel.The upper inset displays the distribution in two cases: a well-trained DQNN2 (green) and an untrained (purple) DQNN2, both are defined with the learning curve marked in triangles.The lower inset is a schematic illustration of DQNN2, where we apply the perceptrons in the order indicated by the direction of the arrows.

Algorithm 2
end for Output the trained DQNN For the task of learning the ground state energy of a Hamiltonian H, we provide the pseudocode in Algorithm 2. Training the DQNN for learning the ground state for some Hamiltonian via the quantum backpropagation algorithm Input The DQNN model with L hidden layers, initial parameters θ I , Hamiltonian H, iteration steps s0 , learning rate .Output The trained DQNN for s = 1 to s0 do Forward: apply forward channels E 1 , E 2 ,. .., E out on initial fiducial product state |0 • • • 0 to obtain ρ 1 , ρ 2 , . . ., ρ out successively .
Supplementary Figure S5.The comparison between the swap operations with and without the flux compensation.With the frequencies of all other qubits well below 5.8 GHz, we prepare Q4 and Q5 in the |eg state, and take a step pulse in the flux line to modulate the frequency of Q4 down to reach the resonance with the |ge state.The modulated qubit frequency and the duration of the step pulse are varied in the experiment.Compared with the uncompensated step pulse, the predistortion compensation method has successfully recovered the chevron pattern of the expected |ge and |eg swap process.Calibration of the single-qubit phase induced by the shift of the working frequency.(a) The experimental pulse sequence.(b) The experimental result for the frequency shift process of Q4.The blue dots denote the probability of measuring |e state P |e .The blue star marks denote Pmax, which is the larger eigenvalue of the single-qubit density matrix.The near-unity values of Pmax indicate the measured final quantum states are close to pure states.The red circles denote the phase φ extracted from the final quantum states in the form |g + e iφ |e .The dashed line is a linear fit to the red circles to infer the desired frequency shift.
1 Training the DQNN for learning quantum channels via the quantum backpropagation algorithm Input The DQNN model with L hidden layers, initial parameters θ I , input quantum states { ρ in x } N x=1 , iteration steps s0 , learning rate , Output The trained DQNN Generate the training dataset: choose parameters θt for the DQNN, which serves as the target quantum channel, and then apply it to each input state to obtain the corresponding output state, which constitute the training dataset { ρ in x , τ out apply forward channels E 1 , E 2 ,. .., E out on ρ in x to obtain ρ 1 x , ρ 2 x , . . ., ρ out x successively .Backward: for each training data { ρ in x , τ out x }, x }, calculate σ out

TABLE S2 .
Experimental parameters for training DQNNs.