Local-measurement-based quantum state tomography via neural networks

Quantum state tomography is a daunting challenge of experimental quantum computing, even in moderate system size. One way to boost the efficiency of state tomography is via local measurements on reduced density matrices, but the reconstruction of the full state thereafter is hard. Here, we present a machine-learning method to recover the ground states of k\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k$$\end{document}-local Hamiltonians from just the local information, where a fully connected neural network is built to fulfill the task with up to seven qubits. In particular, we test the neural network model with a practical dataset, that in a 4-qubit nuclear magnetic resonance system our method yields global states via the 2-local information with high accuracy. Our work paves the way towards scalable state tomography in large quantum systems.

QST via RDMs is also a useful tool for characterizing ground states of local Hamiltonians.A many-body Hamiltonian i , where each term H (k) i acts non-trivially on at most k particles.In practical situations, we would mainly be interested in 2-local Hamiltonians, sometimes with certain interaction patterns, such as nearest-neighbor interactions on some lattices.In this case, k-RDMs of a k-local Hamiltonian uniquely determine its unique ground state [18].Therefore, for these ground states, one only needs k-local measurements (i.e., k-RDMs) for state tomography.
Although local measurements are efficient, in general, reconstructing the state from the measurement results is known to be hard [20].To be more precise, it is easy to obtain the k-RDMs of any n-particle quantum state |ψ .However, even if |ψ is uniquely determined by its k-RDMs, reconstructing |ψ from its k-RDMs is computationally hard.We remark that this is not due to the problem that |ψ needs to be described by exponentially many parameters.In fact, in many cases, ground states of k-local Hamiltonians can be effectively represented by tensor product states [21].
The state reconstruction problem naturally connects to the regression problem in supervised learning.Regression analysis, in general, seeks to discover the relation between inputs and outputs, i.e., to recover the underlying mathematical model.Unsupervised learning techniques have been applied to QST in various cases, such as in Refs.[22,23].In our case, as shown in Fig. 1, by knowing the Hamiltonian H, it is relatively easy to get the ground state |ψ H since the ground state is nothing but the eigenvector corresponds to the smallest eigenvalue.And then we could naturally achieve the k-local measurements M of |ψ H . Therefore, the data for tuning our reverse engineering model is accessible, which allows us to realize QST through supervised learning practically.
In this work, we proposed a local-measurement-based QST by fully-connected feedforward neural network, in which every neuron connects to every neuron in the next layer and information only passes forward (i.e., have no loop in the network).We first build a fully-connected feedforward neural network for 4-qubit ground states of fully-connected 2-local Hamiltonians.Our trained 4-qubit network not only predicts the test dataset with high fidelity but also reconstruct 4-qubit nuclear magnetic resonance (NMR) experimental states accurately.We use the 4-qubit case to demonstrate the potential of using neural networks to realize QST via RDMs.The versatile framework of neural networks for recovering ground states of k-local Hamiltonians could be extended to more qubits and various interaction structures; we then apply our methods to the ground states of sevenqubit 2-local Hamiltonians with nearest neighbors couplings and the ground states of translational invariant 2-local Hamiltonians up to 15 qubits.In both cases, neural networks give accurate predictions with high fidelities.

k-Local Hamiltonians
nalysis, in general, attempts to discover the relation nputs and outputs, i.e., to recover the mathemati-.Machine learning techniques have been applied to rious cases, such as in Refs.[15,16].In our case, g the Hamiltonian H, it is relatively easy to get the ate |ψ H ⟩ and then k-local measurements of |ψ H ⟩.
, the data for tuning our reverse engineering model is tial of using neural networks to realize QST via R versatile framework of neural networks for groun k-local Hamiltonians could be extended to more qu Theory-By the universal approximation theore ery continuous function on the compact subsets be approximated by a multi-layer feedforward neur with a finite number of neurons, i.e., computational

A. Theory
The universal approximation theorem [24] states that every continuous function on the compact subsets of R n can be approximated by a multi-layer feedforward neural network with a finite number of neurons, i.e., computational units.And by observing the relation between k-local Hamiltonian and local measurements of its ground state, as shown in Fig. 1, we are empowered to turn the tomography problem to a regression problem which fit perfectly into the neural network framework.
In particular, we first construct a deep neural network for 4-qubit ground states of full 2-local Hamiltonians as follows: where σ k , σ n , σ m ∈ ∆, and We denote the set of Hamiltonian coefficients as nm }.The coefficient vector h is the vector representation of H according to the basis set B = {σ m ⊗ σ n : n + m = 8, σ m , σ n ∈ ∆}.The configuration of the ground states is illustrated in Fig. 2a.
The number of parameters of the local observables M of ground states determines the amount of network input units.Concretely, M = {s , where σ n , σ m ∈ ∆ and ρ is the density matrix of the ground state.The input layer has 66 neurons since the cardinality of the set of measurement results is 66.Our network then contains two fully connected hidden layers, in which every neuron in the previous layer is connected to every neuron in the next layer.The number of output units equals to the number of parameters of our 2-local Hamiltonian, which is 66 in our 4-qubit case.More details of our neural network can be found in Methods section.
Our training data consist of the 120,000 randomly generated 2-local Hamiltonians as output and the local measurements of their corresponding ground states.The test data include 5,000 pairs of Hamiltonians and local measurement results (H i , M i ).
We train the network by a popular optimizer in the machine learning community called Adam (Adaptive Moment Estimation) [25,26].For loss function, we choose cosine proximity cos(θ) = ( h pred • h)/( h pred • h ), where h pred is the prediction of the neural network and h is the desired output.We find the cosine proximity function fits our scenario better than the more commonly chosen loss functions such as mean square error or mean absolute error.The reason can be understood as follow.Note the parameter vector h is the representation of corresponding Hamiltonian in the Hilbert space expanded by the local operators B. The angle θ between two vectors is a distance measure between two corresponding Hamiltonians [27].
As illustrated in Fig. 1, after getting predicted Hamiltonian from the neural network, we calculate the ground state ρ nn of the predicted Hamiltonian and take the result as the prediction of ground state that we attempt to recover.We remark that our predicted Hamiltonian is not necessarily exactly the same as the original Hamiltonian; Even if that happens, our numeric results suggest their ground states are still close.
We use two different fidelities to measure the distance between the randomly generated states ρ rd and our neural network predicted states ρ nn : Although the fidelity measure f 2 defined in Eq. ( 3) is standard [28], in experiments the measure f 1 are more convenient because it does not require the density matrix ρ to be positive definite; in NMR experiments, the density matrix obtained directly from the raw data of a state tomography experiment may not be positive definite.Thus, we use f 1 for between any two pair of ρ nn , theoretical state ρ th and experimental states ρ ml in the experiment section.
After supervised learning on the training data, our neural network is capable of predicting the 4-qubit output of the test set with high performance.The fidelity average over the whole test set is 97.5% for f 1 and 98.7% for f 2 .The maximum, minimum, standard deviation of fidelities for the test set show in Table I.Fig. 2c illustrates the two fidelities between 100 random states ρ rd and our neural network predictions ρ nn .Our framework generalizes directly to more qubits and different interaction patterns.We apply our framework to recover 7-qubit ground states of 2-local Hamiltonians with nearest neighbor interaction.The configuration of our 7-qubit states shows in Fig. 2b.The Hamiltonian of this 7-qubit case is where k and J (i) nm are coefficients.We trained a similar neural network with 250,000 pairs of random generated Hamiltonians and 2-local measurements of corresponding ground states.The network predicts the 5,000 randomly generated test set with fidelity f 1 of 95.9% and fidelity f 2 of 97.9%.More statistical performance shows in Table I and fidelity results of 100 random generated states show in Fig. 2d.

B. Experiment
So far, our theoretical model is noise-free.To demonstrate our trained machine learning model is resilient to experimental noises, we experimentally prepare the ground states of the random Hamiltonians and then try to reconstruct the final quantum states from 2-local RDMs using a four-qubit nuclear magnetic resonance (NMR) platform [29][30][31][32].The four-qubit sample is 13 C-labeled trans-crotonic acid dissolved in d6-acetone, where C 1 to C 4 are encoded as the four work qubits, and the rest spinhalf nuclei are decoupled throughout all experiments.Fig. 3 describes the parameters and structure of this molecule.Under the weak-coupling approximation, the Hamiltonian of the system writes, where ν j are the chemical shifts, J jk are the J-coupling strengths, and ν 0 is the reference frequency of 13 C channel in the NMR platform.All experiments were carried out on a Bruker AVANCE 400 MHz spectrometer at room temperature.We briefly describe our three experiments steps here and leave the details in the Methods section: (i) Initialization.The pseudo-pure state [33][34][35] for being the input of quantum computation |0000 is prepared.(More details are provided in the Methods section) (ii) Evolution.Starting from the state |0000 , we create the ground state of the random two-body Hamiltonian by applying the optimized shaped pulses.(iii) Measurement.We measure the two-body reduced density matrices and perform four-qubit quantum state tomography (QST) to estimate the quality of our implementations.
In experiments, we created the ground states of 20 random Hamiltonians of the form in Eq. ( 1) and performed 4-qubit QST for them after the state preparations.First, we report that the average fidelities between the experimental states ρ ml (Note the subscript ml denotes a standard tomography method called maximum likelihood, rather than machine learning) and the target ground state ρ th is about 98.2%.Second, we used 2-RDMs of these density matrices to reconstruct 4-qubit states by our neuralnetwork-based framework, obtaining a average fidelity f 1 (ρ ml , ρ nn ) of 97.9%, where ρ nn is the neural network predicted state.Fig. 4 shows the fidelity details of these density matrices.The results indicate that the original 4-qubit state can be efficiently reconstructed by our trained neural network using only 2-RDMs, instead of the traditional full QST.

A. Machine Learning
In this subsection, we discuss our training/test dataset generation procedure, the structure, and hyperparameters of our neural network, and the required number of training data during training.2)) for 20 experimental instances.The horizontal axis is the dummy label of the 20 experimental states.The cyan bars, f ml−th , are the fidelities between the theoretical states ρ th and the experimental states ρ ml .The blue triangles, f ml−nn , are fidelities between our neural network predictions ρ nn and the experimental states ρ ml with the average fidelity over 97.9%.And the green dots, f nn−th , are the fidelities between our neural network predictions and the theoretical states.
The training and test data sets are formed by random k-local Hamiltonians and k-local measurements of corresponding ground states.For our 4-qubit case, 2-local Hamiltonians as defined in Eq. (1).The parameter vector h of random Hamiltonians are uniformly drawn from random normal distributions without uniform mean values and standard deviations.It realized by applying function np.random.normal in Python.Similarly, for the 7-qubit case, Hamiltonian is defined in Eq. ( 4), and the corresponding parameter vector h is generated by the same method.As the blue dashed lines in Fig. 1 shown, after getting random Hamiltonians H, we calculate the ground states |ψ H (the eigenket corresponds to the smallest eigenvalue of H) and then get the 2-local measurements M.
In this work, we use a fully-connected feedforward neural network, which is famous as the first and most simple type of neural network [36].By fully-connected, it means every neuron is connected to every other neuron in the next layer.Feedforward or acyclic, as the word indicated, means information only passes forward; the network has no cycle.Our machine learning process is implemented using Keras [37] which is a high-level deep learning library running on top of the popular machine learning framework: Tensorflow [38].
As mentioned in the results section, experimental accessible data have been used as input to our neural network.The input is For the 4-qubit case, it is easy to see that M has 3 × 4 = 12 single body terms and C 2 4 × 9 = 54 2-body terms.By arranging these 66 elements in M into a row, we set it as the input of our neural network.The output set to be the vector representation of the Hamiltonian h, which also has 66 entries.For the 7-qubit 2-local case, where 2-body terms only appear on nearest qubits, the network takes 2-local measurements as input, and the number of neurons in the input layer is 7 × 3 + 6 × 3 × 3 = 75.The number of neurons in the output layer is also 75.
The physical aspect of our problem fixes the input and output layers.The principle for setting hidden layers leads by efficiency.While training networks, inspired by Occam's Razor principle, we choose fewer layers and neurons when increasing them do not significantly increase the performance but increases the required training epochs.In our 4-qubit case, two hidden layers of 300 neurons have been inserted between the input layer and the output layer.In the 7-qubit case, we use four fully-connected hidden layers with the following number of hidden neurons: 150-300-300-150.The activate function for each layer is ReLU (Rectified Linear Unit) [39], which is a widely used non-linear activation function.We also choose the optimizer having the best performance in our problem over almost all the built-in optimizers in Tensorflow: AdamOptimizer (Adaptive Moment Estimation) [25].The learning rate is set to be 0.001.The whole training dataset has been split into two parts, 80% used for training, and 20% used for validation after each epoch.A new data set of 5,000 data was used as the test set after training.The initial batch size was chosen as 512.As the amount of training data increases, the average fidelity of predicted states and the true test states goes up.The neural network reaches a certain performance after we fed sufficient training data.More training data requires more training epochs; however, replete epochs ebb the neural network performance due to over-fitting.Table II shows the average fidelities of using different training data and epochs.The first round of training locks down the optimal amount of training data, then we change the batch size and find the optimal epoch.We report the results for the second round training in Table III.For the 4-qubit case, appropriately Table III: Average fidelities on the test set by using different batch sizes.The size of test dataset is 5,000.The optimal batch size for the 4-qubit case is 1024 and for the 7-qubit case is 512.
increases the batch size can benefit the stability of training process thus improves the performance of the neural network.Though, by choosing the batch size as 512 and 2048, the network can also reach the same performance with larger epochs, we chose the batch size as 1028 since more epochs require more training time.After the same attempting for the 7-qubit case, we find 512 is promising batch size.

B. NMR states preparation
Our experiment procedure consists of three steps: initialization, evolution, and measurement.In this subsection, we discuss these three steps in details.
(i) Initialization.The computational basis state |0 ⊗n is usually chosen as the input state for quantum computation.Most of the quantum systems do not start from such an input state, so the special initialization processing is necessary before applying quantum circuits.In NMR, the sample initially stays in the Boltzmann distribution at room temperature, where I is the 16 × 16 identity matrix and = 10 −5 is the polarization.We can not directly use it as the input state for quantum computation, because such a thermal state is a highly-mixed state [33,40].We instead create a so-called pseudo-pure state (PPS) from this thermal state by using the spatial averaging technique [33][34][35], which consists of applying local unitary rotations and using z-gradient fields to destroy the unwanted coherence.The form of teh 4-qubit PPS can be wrote as Here, although the PPS ρ 0000 is also a highly-mixed state, the identity part I neither evolves under any unitary operations nor influences the experimental signal.It means that we can focus on the deviated part |0000 0000| and consider |0000 0000| as the initial state of our quantum system.Finally, 4-qubit QST was performed to evaluate the quality of our PPS.We found that the fidelity between the perfect pure state |0000 and the experimentally measured PPS is about 98.7% by the definition f 1 .This sets a solid ground for the subsequent experiments.
(ii) Evolution.In this step, we prepared the ground states of the given Hamiltonians using optimized pulses.The form of the considered Hamiltonian is chosen as Eq. ( 1).Here, the parameters ω nm mean the chemical shift and the J-coupling strength, respectively.In experiments, we create the ground states of different Hamiltonians by randomly changing the parameter set (ω nm ).For the given Hamiltonian, the gradient ascent pulse engineering (GRAPE) algorithm [41][42][43][44] is adopted to optimize a radio-frequency (RF) pulse to realize the dynamical evolution from the initial state |0000 to the target ground state .The GRARE pulses are designed to be robust to the static field distributions and RF inhomogeneity, and the simulated fidelity is over 0.99 for each dynamical evolution.
(iii) Measurement.In principle, we only need to measure the two-body reduced density matrices (2-RDMs) to determine the original 4-qubit Hamiltonian through our trained network.Experimentally, we performed 4-qubit QST, which naturally includes the 2-RDMs after preparing these states [45][46][47], to evaluate the performance of our implementations.Hence, we can estimate the quality of the experimental implementations by computing the fidelity between the target ground state ρ th = |ψ th ψ th | and the experimentally reconstructed density matrix.Considering that the ground state of the target Hamiltonian should be real, the experimental density matrix should also be real.We use a maximum likelihood approach to reconstruct the most likely pure state ρ ml [48].By reconstructing states ρ NN merely based on the experimental 2-RDMs, the performance of our trained neural network can be evaluated by comparing the states ρ ml with the states ρ nn .
Finally, we attempt to evaluate the confidence of the expected results by analyzing the potential error sources in experiments.The infidelity of the experimental density matrix is mainly caused by some primary aspects in experiments, including decoherence effects, imperfections of the PPS preparation, and imprecision of the optimized pulses.From a theoretical perspective, we numerically simulate the influence of the optimized pulses and the decoherence effect of our qubits.Then we compare the fidelity computed in this manner with the ideal case to evaluate the quality of the final density matrix.As a numerical result, about 0.2% infidelity was created on average and the 1.2% error related to the infidelity of the initial state preparation.Additionally, other errors can also contribute to the infidelity such as imperfections in the readout pulses and spectral fitting.

IV. DISCUSSION
As a famous double-edged sword in experimental quantum computing, QST captures full information of quantum states on the one hand, while on the other hand, its implementation consumes a tremendous amount of resources.Unlike traditional QST that requires exponential many experiments with the growth of system size, the recent approach by measuring RDMs and reconstructing the full state thereafter opens up a new avenue to efficiently realize experimental QST.However, there is still an obstacle in this approach, that it is in general computationally hard to construct the full quantum state from its local information.This is a typical problem empowered by machine learning.In this work, we apply the neural network model to solve this problem and demonstrate the feasibility of our method with up to seven qubits in the simulation.It should be noticed that 7-qubit QST in experiments is already a significant challenge in many platforms -the largest QST to date is of 10 qubits in superconducting circuits, where the theoretical state is a GHZ state with rather simple mathematical form [49].We further demonstrate that our method works well in a 4-qubit NMR experiment, thus validating its usefulness in practice.We anticipate this method to be a powerful tool in future QST tasks of many qubits due to its accuracy and convenience.
Our framework can be extended in several ways.First, we can consider excited states.As stated in the Results section, the Hamiltonian recovered by our neural network is not necessarily the original Hamiltonian, but their ground states are fairly close.We preliminarily examined eigenstates of predicted Hamiltonians.Although the ground states have considerable overlap, the excited states are not close to each other.It means, in this reverse engineering problem, ground states are numerically more stable than excited states.To recover excited states using our method, one may need to use more sophisticated neural networks such as convolutional neural network [50] (CNN) or Residual neural network [51] (ResNet).Second, although we haven't include noise in the training and test data, our network predicts the experimental 4-qubit fully-connected 2-local states with high fidelities.This indicates our method has certain error tolerant ability.For future study, one can add different noise to the training and test data.Data Availability.All data and code needed to evaluate the conclusions are available from the corresponding authors upon reasonable request.

Figure 1 :
Figure 1: Procedure of our neural network based local quantum state tomography method.As shown by the blue dashed arrows, we first construct training and test dataset by generating random k-local Hamiltonians H, calculate their ground states |ψ H | , and obtained local measurement results M .We then train the neural network with the generated training dataset.After training, as represented by the green arrows, we first obtain the Hamiltonian H through local measurement results M from the neural network, then recover the ground states from the obtained Hamiltonian.In contrast, the black arrow represents the direction of the normal QST process, which is computationally hard.

Figure 2 :
Figure 2: Theoretical results for 4 qubits and 7 qubits.(a) The configuration of our 4-qubit states.Each dot presents a qubit, and every qubit interacts with each other.(b) The f 1 and f 2 of 100 random 4-qubit states ρ rd and our neural network predictions ρ nn .(c) The configuration of our 7-qubit states: only nearest qubits have interactions.(d) The f 1 and f 2 of 100 random 7-qubit states ρ rd and our neural network predictions ρ nn .

Figure 3 :
Figure 3: The molecular structure and Hamiltonian parameters of the 13 C-labeled trans-crotonic acid.The atoms C 1 , C 2 , C 3 and C 4 are used as the four qubits in the experiment, and the atoms M, H 1 and H 2 are decoupled throughout the experiment.In the table, the chemical shifts with respect to the Larmor frequency and J-coupling constants (in Hz) are listed by the diagonal and off-diagonal numbers, respectively.The relaxation timescales T 2 (in seconds) are shown at the bottom.

Figure 4 :
Figure 4: The predication results with experimental data.Here we list three different fidelities (f 1 Eq.(2)) for 20 experimental instances.The horizontal axis is the dummy label of the 20 experimental states.The cyan bars, f ml−th , are the fidelities between the theoretical states ρ th and the experimental states ρ ml .The blue triangles, f ml−nn , are fidelities between our neural network predictions ρ nn and the experimental states ρ ml with the average fidelity over 97.9%.And the green dots, f nn−th , are the fidelities between our neural network predictions and the theoretical states.

Table I :
The statistical performance of our neural networks for 4-qubit and 7-qubit cases.

Table II :
Average fidelities on the test set by using different numbers of training data and epochs.The batch size is 512, and the size of test dataset is 5000.As the amount of training data increases, we find the average fidelity of predicted states and the true test states goes up, and the neural network reaches a certain performance after we fed sufficient training data.We also observe more training data requires more training epochs;