Adaptive Quantum State Tomography with Neural Networks

Quantum State Tomography is the task of determining an unknown quantum state by making measurements on identical copies of the state. Current algorithms are costly both on the experimental front -- requiring vast numbers of measurements -- as well as in terms of the computational time to analyze those measurements. In this paper, we address the problem of analysis speed and flexibility, introducing \textit{Neural Adaptive Quantum State Tomography} (NA-QST), a machine learning based algorithm for quantum state tomography that adapts measurements and provides orders of magnitude faster processing while retaining state-of-the-art reconstruction accuracy. Our algorithm is inspired by particle swarm optimization and Bayesian particle-filter based adaptive methods, which we extend and enhance using neural networks. The resampling step, in which a bank of candidate solutions -- particles -- is refined, is in our case learned directly from data, removing the computational bottleneck of standard methods. We successfully replace the Bayesian calculation that requires computational time of $O(\mathrm{poly}(n))$ with a learned heuristic whose time complexity empirically scales as $O(\log(n))$ with the number of copies measured $n$, while retaining the same reconstruction accuracy. This corresponds to a factor of a million speedup for $10^7$ copies measured. We demonstrate that our algorithm learns to work with basis, symmetric informationally complete (SIC), as well as other types of POVMs. We discuss the value of measurement adaptivity for each POVM type, demonstrating that its effect is significant only for basis POVMs. Our algorithm can be retrained within hours on a single laptop for a two-qubit situation, which suggests a feasible time-cost when extended to larger systems. It can also adapt to a subset of possible states, a choice of the type of measurement, and other experimental details.


I. INTRODUCTION
Quantum state tomography (QST) is the task of estimating the density matrix of an unknown quantum state, through repeated measurements of the source, assumed to put out identical copies of the state.This procedure can be used to characterize not only quantum states, but also processes acting on quantum states, and is an indispensable subroutine in quantum information processing tasks (see, for example, Ref. [1], for a review).QST is, however, a resource-heavy task, as one needs many measurements on many copies of the quantum state, to yield sufficient data for a good estimate of the d 2 − 1 real parameters needed to describe the state of a d-dimensional quantum system.For estimating quantum processes, the number of parameters needed is d 4 , a much worse scaling.Much of the research in QST is hence about minimizing the resource cost of getting an estimate with a target precision.
One angle of attack is to use adaptive measurements, adjusting the measurement to be done on the next copy of the state, based on the information gathered from the copies measured so far, to maximize (according to some chosen measure) the information gained from that next copy.One approach was proposed in [2] (under the name self-learning) and generalized in [3] as Adaptive Bayesian Quantum Tomography (ABQT).These methods were later experimentally implemented in [4,5].In this paper, we propose a neural network-aided alternative to adaptive Bayesian particle-filter based schemes in the vein of ABQT [3].We call our scheme Neural Adaptive Quantum State Tomography (NA-QST).
At its core, ABQT uses Bayes' rule to update the prior distribution on the quantum state space to the posterior according to the outcome of measurements done so far.Then, the adaptation is done using a merit function computed as an integration over the current posterior distribution.In order to make this integration numerically tractable, the state space is represented by discrete samples, known as particles.Accompanying each sample is a weight that represents its relative likelihood, which is sequentially updated in a Bayesian fashion based on measurement results.The adaptivity comes from the op-tion to choose subsequent measurement configurations to maximize the expected information gain, given the updated posterior distribution.
A practical issue encountered in filtering algorithms is the gradual decay of the vast majority of particle weights, which undercuts the efficacy of the weight updates.In ABQT, and other Bayesian-type tomography procedures, this is a particularly acute problem since the likelihood function that enters the posterior distribution typically becomes very sharply peaked as the number of copies measured grows.The numerical fix is to resample the bank periodically (i.e. the particles and weights must be chosen anew).As the last step of resampling, [5] proposes to use the Metropolis-Hastings algorithm to mutate the particles, or perturb them isotropically in state space.However, running the Metropolis-Hastings step is very computationally intensive, due to the need to perform a number of computations linear in the number of previous measurement results to determine the acceptance probability.Since the resampling must happen before the next measurements are determined, this significantly prolongs the overall runtime of the adaptive QST algorithm.
In this paper, we demonstrate a custom-built recurrent neural network architecture that learns to perform quantum state tomography directly from data.Our approach automatically develops an efficient resampling and mutation step to refine the candidate solutions, as well as a way of incorporating new measurements to its current best estimate of the state.Our algorithm also learns to predict the suitable next measurement that, if performed, would lead to the most information gained given its current knowledge about the state.We implemented the algorithm in an end-to-end differentiable manner, allowing us to train all of its components jointly.
The efficiency of our approach comes from using machine learning to learn an approximate replacement of the Bayesian update rule to learn new weights on the particles after every batch of measurements.This eliminates the problem of weight decay, and therefore, the need for time-costly resampling.Although our weights (unlike in ABQT) no longer have the interpretation of being the posterior distribution, they are just as effective as inputs to the adaptive step: to compute a heuristic which is maximized to choose the next measurement.All in all, our adaptive algorithm matches the performance of ABQT, while speeding it up significantly, giving a practical, genuinely on-the-fly adaptive QST scheme.Our NA-QST algorithm enjoys the same scaling of accuracy with number of measurements as ABQT, a stateof-the-art adaptive algorithm, but with a computational speedup of up to a million (for 10 7 measurements) when run on the same hardware.Our approach is furthermore agnostic to the number of qubits involved, and the type of measurements used, and can be retrained within hours on a single laptop to suit particular experiment's details.During the training phase, our algorithm runs a simulation of quantum mechanical measurements to substitute physical measurement results, that are to be provided during the algorithm's deployment after training.Some aspects of our approach can be generalized beyond the field of quantum tomography.We have developed a fully differentiable (in the machine learning sense of the word, i.e.,exactly differentiable using existing machine learning libraries) implementation of measurement in quantum mechanics, which more generally can simulate any experiment with probabilistic outcomes in Ten-sorFlow.In addition, a technical takeaway from this work is that when there are disjoint components in an algorithmic pipeline -in this case, particle filters that pass weights to the objective function optimized to choose the next measurement -an end-to-end machine learning pipeline can be trained jointly to integrate the steps and have one part of the system producing optimal results for the next.This is of interest to the many areas of Physics and Engineering to which particle filters have been applied [6,7].
We define here some basic terminology needed for discussing the problem of QST.As mentioned earlier, the task of QST is to estimate the state, i.e., the density matrix, ρ, of a quantum system.ρ is then a trace-1 and Hermitian operator on the d-dimensional Hilbert space H of the quantum system, represented by a d × d matrix.
We assume that we have access to a source that puts out independent and identical copies of the (unknown) state ρ.We are allowed to make measurements on the state, and from the gathered data, estimate ρ.Generalized measurements, also often referred to as positive operatorvalued measure (POVM), are permitted.These are describable as a set of outcome operators Π ≡ {Π y } on H, satisfying Π y ≥ 0 ∀y and y Π y = I, the d-dimensional identity operator.
For a chosen Π, the probability that one gets a click in the detector corresponding to the outcome Π y is given by the Born's rule, The likelihood for getting data D, a sequence of detector clicks, summarized by {n y }, where n y is the total number of clicks in the detector for Π y , is given by for p y s computed from ρ by the Born's rule.p(D|ρ) is a probability distribution over the data, i.e., D p(D|ρ) = 1.
We let ν y ≡ n y /N , where N ≡ y n y is the total number of copies measured.We refer to the ν y s as the relative frequencies for the different outcomes Π y .The relative frequencies ν y are estimates of the probabilities p y s, but note that while both ν y s and p y s are nonnegative numbers that sum (over the y label) to 1, p y s that come from a quantum state ρ through the Born's rule satisfy additional constraints due to the positivity of ρ.As such, setting p y = ν y does not always yield a set of Born probabilities for a valid quantum state, and much of the task of quantum state estimation, in converting data to a valid quantum state, is about dealing with the positivity constraints.
In Bayesian estimation procedures, one talks about the prior and the posterior distributions on the quantum state space.The prior distribution captures our initial knowledge about the identity of the state prior to any data-taking.We denote it as dρ p(ρ), for some suitably chosen volume measure dρ on the state space.p(ρ) is the prior density, while dρ p(ρ) is the infinitesimal probability that the true state lies in the volume dρ, according to our prior expectations.p(ρ) satisfies dρ p(ρ) = 1.The posterior distribution, denoted as dρ p(ρ|D) represents our updated knowledge, after obtaining data D. The update from prior to posterior densities follows from Bayes rule, where p(D) is the likelihood of the data D, playing the role of the normalization constraint: p(D) ≡ dρ p(ρ)p(D|ρ).
Note that the ABQT algorithm relies on the posterior distribution to make decisions about the next measurement to make; in our NA-QST, as we explain below, this posterior distribution is replaced by the choices of weights on the particle samples made by the trained neural network.

B. Machine learning and neural networks
Machine learning has been successfully used to learn approximate solutions to a wide variety of problems.Neural networks, and deep neural networks in particular, are a class of expressive functional approximators capable of learning complicated models across many domains, ranging from image classification [8], to game playing [9], to natural language understanding [10].
Neural networks are a class of functions that can be used as a functional ansatz in situations where mapping of inputs to outputs is needed, while an explicit algorithm is either unknown, hard to come by, or expensive to run.By means of gradient descent, neural networks are trained to successfully approximate an unknown function using a large number of examples and given a loss function specifying how unhappy one is with a solution.In cases where an exact input -output mapping is available but might be expensive to run, neural networks can develop simpler, faster effective descriptions learned directly from data.
By specifying a particular neural architecture -a particular choice of neurons, their connections, and the way they interact -we describe a family of functions.The selection of a single algorithm (i.e. a single choice of these parameters) is achieved via training.
Among widely used architectures are fully-connected neural networks, convolutional neural networks, and recurrent neural networks, each of which encode priors about the function they are trying to approximate in their structure.This makes learning easier and typically leads to faster convergence to a solution.For very specific tasks (such as game playing), a very particular choice of architecture is typically made to narrow down the space of possible functions enough.In this paper, we use our knowledge of traditional quantum state tomography algorithms as well as quantum mechanics itself to construct a suitable custom-built architecture for quantum state tomography.

C. Overview of our neural network algorithm
The task of quantum state tomography is a complicated procedure, and the adaptive mapping from previous measurement outcomes to the next measurement is highly nonlinear.As such, it is hard to learn simply by feeding the raw stochastic measurement outcomes into a generic fully-connected neural network (FC-NN).
We therefore custom-built a recurrent neural architecture that utilizes our prior knowledge of the problem, encoding it explicitly into the network structure.In order to be able to efficiently optimize our neural network, we write down the full computational graph of the problem in TensorFlow [11] (the most commonly used library for deep learning), that in turn handles analytical differentiation internally.As a crucial part of the process, it was necessary for us to develop a differentiable implementation of quantum mechanics in TensorFlow that serves as a quantum simulator during training, mimicking real physical measurement outcomes that will be provided by an experimentalist during algorithm's deployment after training.
To train a neural network, a large number of examples and an objective/loss function are needed to determine the suitable choice of free parameters.In particular, we are making an estimate ρ of a state ρ from a sequence of measurements and their outcomes, given a distance measure on density matrices.The measurements are allowed to be generalized measurements, or POVMs, as introduced earlier.The critical task is as follows: given an input (the relative frequencies of POVM outcomes), and given the POVM, output a good estimate of the state which gave rise to those measurement outcome statistics.To allow for POVM adaptivity, we perform this sequentially, enabling the subsequent estimation step to use the previous steps' estimate, as well as auxiliary useful information passed between the steps.The full approach is presented in detail in Algorithm 1 and illustrated in FIG. 1.The recurrent neural network architecture.A recurrent neural network (RNN) cell takes a previously used bank of candidate solutions -particles -and the best guess for the hidden density matrix as inputs, and outputs a refined bank of particles together with a new estimate of the density matrix and its internally generated confidence value.This process is successively repeated, gradually converging to the correct density matrix.During the training period, inside the RNN unit cell, a simulated quantum mechanical measurement is performed on the hidden density matrix, resulting in a noisy estimate of the distribution of the probability of measurement (POVM) outcomes.Equivalent distributions are calculated for density matrices in the particle bank.Based on L1 (total variation distance) and L2 (Hilbert-Schmidt distance) distances between these probability distributions, a neural network assigns weights as well as perturbation sizes ε to each particle.The weights are used to construct a new estimate of the hidden density matrix, as well as its internal confidence value.Each particle is perturbed by an amount determined by the perturbation size (see main text for details).For each perturbed particle, and the probability distribution induced on it by the POVM, we calculate various distance measures on probability distributions and let a neural network decide which perturbed particle is the closest to the solution.The closest particle is kept and replaces the initial particle in the bank of particles in the next iteration.Iterating on this procedure, the candidate solutions gradually converge on the true solution.During deployment after training, the quantum measurement simulator is replaced by real physical measurements supplied by an experimenter.After each estimate, a new POVM is proposed based on maximization of a heuristic function related to entropies of the particle bank and the current estimate of the density matrix.This allows the algorithm to learn to maximize the information gained from the following measurement.The whole algorithm consists of a large number of the RNN unit cells connected in series, producing an estimate of the state at each step.The computational graph comprising a series of linked RNN unit cells is end-to-end differentiable, allowing us to train it as a single function using existing machine learning libraries.The detailed description of the algorithm is presented in Algorithm 1.

Figure 1.
To generate training examples for the neural network, we prepared random true states to be estimated and performed simulated measurements on these states.The neural network then trains on these examples, eventually learning a sequence of functions that map the input to the output.The learning consists in tuning the weights of the neural network architecture such that the network's output is close to the true state in terms of some state distance measure.We chose the Bures distance; however, training using the simple Hilbert-Schmidt distance (Frobenius norm) works equally well.The training stops when the weights are sufficiently tuned, and no further improvement takes place.
Our algorithm allows for an iteratively improving estimate of the density matrix with every new batch of measurements.In particular, it can be stopped after any number of measurements, and since on average the distance to the true solution decreases, it provides the best estimate up to that moment.Note that even during training, our algorithm does not have direct access to the true density matrix.The only information about it comes indirectly through the relative frequencies of a finite number of simulated measurements on said state, provided by the quantum simulator we implemented.This mimics the situation during deployment in a laboratory, where the measurement data are provided by the experimenter and the true solution is unknown.
In the testing phase, the network with tuned weights is evaluated on how well it estimates unknown states based on their empirical measurement statistics.This is exactly the same task as in the training phase, but the network can no longer adjust its weights based on the error signal coming from the knowledge of the true state.If the network performs well in the testing phase, then it is considered well functioning and can be deployed in the laboratory with its newly-tuned weights.Note that there are two notions of learning pertaining to our algorithm: the outer loop learning chooses the right algorithm, while in the inner loop the algorithm itself learns iteratively about a particular density matrix based on the measurement outcomes provided to it.In essence, we are learning to learn about density matrices.

D. Detailed description of our neural network algorithm
The architecture that we use for the task at hand is broadly known as a Recurrent Neural Network (RNN), which features widely in Natural Language Processing tasks due to its ability to model sequences.The recurrence comes from the fact that the network takes in outputs from previous time (iteration) steps as inputs at the current step, together with any new information available at the time.The network also maintains a memory state that stores some information about states of the network at previous time steps.While retaining the basic common structure, we have changed the architecture of the unit cell -the particular operation performed at every step -completely, in order to make it suitable for working in quantum mechanical settings.We also chose specific relevant information to be passed from cell to cell, such as the bank of refined candidate solutions (particles) as well as a number of previously used POVMs and their corresponding empirical counts.
Figure 1a shows an overall schematic of the neural network.Our algorithm shares a basic structural similarity with ABQT's particle filter: a bank, B of particles {ρ i } i∈B -that is, candidate quantum states -and associated weights {w i } i∈B .As in ABQT, our running estimate for the state is the weighted average of these particles.The key difference is the use of neural networks to determine weights over the particle bank, the parti-cle resampling step, as well as to weigh solutions from all previous steps together into an estimate.This allows us to by-pass time consuming Bayesian calculations and replace them with simple, yet equally powerful heuristics learned directly from data.The details of the procedure are presented in Algorithm 1.
Figure 1b shows the inner architecture of each RNN unit cell, which during training includes simulated quantum mechanical measurements.During testing and while deployed in a lab, the black box measurements are supplied by the experimenter.After every batch of simulated measurements on the true state, the empirical counts of measurement outcomes are collected.These are then compared to the probability distributions induced by the current POVM {Π γ } γ on the candidate particles in the bank [p i = Tr(Π γ ρ i )} γ , ∀i ∈ B] via the L 1 and L 2 distances on probability distributions.Based on the distances, a neural network generates a set of weights {w i } i∈B to associate with the corresponding particles, which are in turn used to produce a state reconstruction.In addition, the optimal perturbation size per particle ε i (for use in the resampling step later) as well as an overall confidence score in the reconstruction are also generated using independent neural networks, as described in detail in Algorithm 1 and illustrated in Figure 1.
Before we go on to describe the technically more involved resampling and measurement adaptivity steps, we offer two comments.Firstly, NA-QST is guaranteed to output a valid density matrix as its running estimate of the state is always a convex linear combination of the 'particles' -which are themselves valid states -with normalized weights.Second, unlike methods that rely on Bayesian weight updates (such as ABQT), our weights do not have the interpretation of being a posterior distribution on the particles.Indeed, the Bayesian, principled method of updating weights comes at a heavy computational price.In ABQT, the posterior distribution forms the basis of their adaptation function (see further details below), and an accurate representation of the posterior distribution requires many particles, and an accurate update of weights.However, as mentioned earlier, as one accumulates data in QST, the posterior distribution gets more and more sharply-peaked.As a result, in ABQT, most of the weights of an initial bank of particles decay to small values and become non-zero only for few particles.
The numerical fix for this is resampling, which consists in choosing a new set of particles out of the old set with probabilities proportional to their old associated weights, perturbing them slightly, and then assigning them uniform weights.Taking a leaf from this book, NA-QST also periodically resamples particles, but significantly faster and more effectively as the resampling procedure is learned directly from data.In our case the resampling is not a mere numerical fix (as we do not suffer from the weight decay problem) but the crucial inferential step, as the refinement of the particle bank allows the algorithm to achieve better precision in the reconstruction.The resampling proceeds as follows: from the previous step, every particle has an associated random perturbation magnitude ε i , which represents the neural network's estimate of how far the i-th particle is from the true state.Each particle, generally a mixed state, is purified using a reference system of equal dimension and perturbed by ε i in N resample randomly-generated orthogonal directions according to the expression specified in Algorithm 1 (working with purified states allows more effective perturbation).The perturbed particle candidate is the density matrix obtained after partial-tracing away the reference system ("de-purifying").Of these N resample candidates, only one is retained: the one whose Born measurement probabilities are closest to the empirical distribution just measured.This procedure is repeated N steps times.Overall, all particles are perturbed so that they move closer to the true state; these then make up the new bank.A numerical perspective on this is that we have effectively run an approximation to gradient descent on distance(p particle , p empirical ), within the larger gradient descent training loop.This nesting is called metalearning [12] and it is an active area of machine learning research.
Finally there is the measurement adaptivity step given by maximizing the function A denotes a class of POVM considered for the adaptation procedure, and α labels a member of that class.As before, D is the measurement data collected thus far.p y is the probability from Born's rule [Eq.( 2)], written here with the arguments α, specifying the POVM used, and the state ρ, symbolizing our current best estimate of the state given the data D. E q [•] denotes the expectation value of the argument with respect to the distribution q.q(ρ|D) abstractly denotes some probability distribution over the continuous state space based on the data D; in practice, it is the discrete probability distribution as specified by the current bank B of particles ρ i , with their weights w i .We use ρ ≡ i∈B w i ρ i , which depends on the data through the adjustment of the bank as data is gathered.Note that this is the same function as used in the original ABQT paper [3], except that they have q(ρ|D) ≡ p(ρ|D) the posterior distribution.In NA-QST, the distribution q is determined by the bank chosen by the neural network.
The motivation for choosing this function is explained in the Appendix.In essence, we are looking for a POVM specified by a set of angles such that the algorithm's ability to distinguish between particles increases, as well as its ability to measure distances to the true density matrix.We experimented with a large number of approaches, including analytical gradient descent, however, the problem proved to be best solved by a simple approach: choose a random set of POVM angles, calculate the heuristic in Eq. ( 4) from the previous step, repeat several times (we chose 40 as a good trade off between accuracy and speed), and keep the POVM that maximizes the heuristic.We experimented with different choices of the heuristic as well as learning it from data, however, as demonstrated by empirical experiments summarized in Figure 3, the function Eq. ( 4) was simple enough for our random method to work well.
We round off this section with some remarks.The basic mechanism employed by NA-QST differs quite significantly from that of ABQT.In proposing new weights and perturbations, the neural network does so according to its own, learned, similarity metric between the particles and the true state, based on the distances between the measurement probabilities of each particle and the empirical measurement statistics.In ABQT, by contrast, the size of the perturbation is chosen from a fixed, approximately Gaussian distribution (see [5]).This is because in ABQT, the perturbation merely provides variation to the particles in the bank, whereas our perturbations are expressly chosen to bring the particles as close as possible to the current estimate of the state.Secondly, our algorithm does not merely concatenate the disjoint weight updates/resampling and measurement adaptivity steps; by putting them both in the same neural architecture, the measurement adaptivity heuristic influences training such that the weights output by the former two upstream tasks are well-suited for subsequent input into the latter heuristic.

III. METHODOLOGY
As our main benchmark for the performance of our adaptive neural network based algorithm, we use ABQT, the adaptive algorithm first proposed in [3] and computationally refined in [5].As mentioned earlier, the core idea of the ABQT algorithm is to keep a bank of samples from the state space and construct a posterior on this bank.The posterior is updated in a Bayesian fashion with every new batch of measurements.Subsequently, the measurement configuration for the next batch is chosen by optimizing a heuristic given by Eq. ( 4).The final state estimate is the weighted average of this bank.
We reimplemented the ABQT algorithm in the Python 3 programming language to be able to compare the runtime with our neural network algorithm, written in TensorFlow and Python.We verified our implementation against results in the original paper.
Following [5], we consider 2-qubit examples.The POVMs we consider here are product POVMs, i.e., Π ≡ Π y2 }, where Π (i) are 1-qubit POVMs.Product POVMs are typically the easiest to implement in experiments.In particular, we assume that Π (1) and Π (2) are the same type of 1-qubit POVM, although, we allow for the orientation of the POVM outcome operators for each qubit to be chosen independently (see details below).We follow [5] in using the squared Bu-res distance between the true state ρ and the estimate ρ is the squared-fidelity between two states ρ and σ.The Bures distance between two states has an operational interpretation as the maximum possible Kullback-Leibler (KL) divergence between output statistics of the same set of quantum measurements on the two states.Intuitively, a large Bures distance between two states indicates that they are easy to distinguish using quantum measurements, and vice versa [13].Although the Bures distance was the evaluation metric for reconstruction accuracy at test time, for training we used the Hilbert-Schmidt, or L 2 , distance as the loss function, for computational stability reasons.The choice of Bures or L 2 distance in the loss function had no observable effect on the trained solution.For small distances, both metrics are very similar.In addition, the required analysis runtime on a laptop favors NA-QST by a factor of 10 6 for 10 7 measurements (several seconds for our algorithm compared to a week (the bottom-right label) for ABQT per reconstruction).Comparing the performance of all our algorithms, we conclude that measurement adaptivity does not make a significant difference in performance using SIC-POVMs.
The parameters of variation that we explored are: 1. Reconstruction algorithm.
Our main benchmark is ABQT, but we also implemented what we will refer to as Standard Quantum State Tomography (see Chapter 8 of [14].)Since this method is not guaranteed to give a valid density matrix, we had to implement an additional The accuracy of reconstruction is shown as a function of the number of available measurement outcomes.The shading is a 1σ confidence interval.SIC (4 legs), 6 legs POVMs offer virtually identical performance and the effect of measurement adaptivity on them is negligible/non-existent.This is most likely due to the fact that they cover the whole space evenly and the POVM orientation therefore does not affect state distinguishability significantly.The basis POVM with random measurements performs significantly worse than the rest on average, and also has a wider uncertainty range.This is probably due to the effect of randomness on the correct POVM orientation.By using measurement adaptivity, our algorithm with basis POVMs performs as well as informationally complete POVMs.It seems that the effect of measurement adaptivity is to make the basis POVM informationally complete over several subsequent measurement steps.
projection step, following the algorithm of [15] to project the output of this algorithm onto the set of valid density matrices.
We considered several different types of 1-qubit POVMs: (a) Basis POVMs (i.e., projective measurements).Projective measurements, i.e., measurements of a single basis of states (two states for the qubit), are usually the easiest to implement in any experiment.Unfortunately, a single projective measurement is not informationally complete, i.e., one cannot reconstruct the full quantum state just by measuring a single basis all the time.This hints at a possible advantage of an adaptive procedure, where the orientation of the basis is chosen adaptively as one learns more about the state.This was done in [5], and we explore the example using NA-QST.Here, we examined the performance using 5, 30 and 100 particles in the bank for both the adaptive and random (see below) ABQT.Our neural algorithm used only 100 particles and we experimented with both random and adaptive measurements.The results are shown in Figure 5.
(b) The one-qubit symmetric informationally complete (SIC) POVM, i.e., the tetrahedon POVM [16].The tetrahedron POVM is thus named because its four outcomes can be represented in the qubit Bloch sphere as four legs extending from the centre of the sphere to the vertices of a regular tetrahedron.Here, we allow the tetrahedron orientation to be chosen by the algorithm.The results of comparing all algorithms restricted to using this POVM are shown in Figure 2.
(c) One-qubit POVMs with three or six legs per subspace -to see the performance of NA-QST for different POVM types.The orientation of the legs relative to each other is fixed; only the overall orientation is allowed to vary.The comparison of reconstruction accuracy for NA-QST using POVMs with different number of measurement outcomes is presented in Figure 4.
Random refers to the possibility of switching off measurement adaptivity -that is, to choose all measurement POVMs randomly before the start of the experiment, as opposed to choosing them adaptively with sequential measurements.Adaptive measurements are chosen by our algorithm according to Eq. ( 4) as the suitable choice for the next set of measurements, implicitly maximizing the information gained from it.The results of turning adaptivity on and off for the adaptive algorithms are shown in Figure 5.

Size of particle bank in ABQT.
We varied the size of the bank of particles to use 5, 30 and 100 particles and examined the impact on performance.Since in ABQT, the bank consists of samples of the posterior distribution used to compute the adaptivity heuristic [Eq.( 4)], finer sampling should improve performance but comes at a computational cost.Our computational power limited us to using no more than 100 particles for ABQT as well as NA-QST.

IV. RESULTS
A. When is adaptivity helpful?
Using our algorithm's rapid training capabilities, we can shed some light on when adaptivity is useful, for the variables considered here.In all our comparisons mentioned below, the metric we use for 'efficacy' is the scaling of accuracy of reconstruction (measured by Bures distance from the estimate ρ to the true ρ) with the number of measurements.
Our first comparison in Figure 2 fixes the POVM to be an SIC (4 legs per qubit) POVM.We then examine the performance of various reconstruction algorithms.Five algorithms were compared: 1) Standard Quantum State Tomography; ABQT (as in Refs.[3] and [5]) with 2) adaptively-and 3) randomly-chosen measurement configurations; NA-QST 4) adaptively and 5) randomlychosen measurement configurations.Effectively, the performances of all algorithms are identical, indicating that when SIC-POVMs are used, adaptivity yields no advantage over standard tomography.Another interesting contrast shown in Figure 2 is that between random and adaptive measurements in the adaptive algorithms.Essentially, adaptivity yields no benefit over random measurement configurations when an SIC-POVM is used.The situation is different in Figure 5, where projective measurements (2 legs) are used.Projective measurements are much less informative than SIC-POVMs, and so, the ability to adapt the POVM significantly improves performance.
Our second comparison in Figure 4 further explores the effect of POVM type on performance of the NA-QST algorithm.We ran NA-QST on POVMs with different numbers of legs per qubit -2 (Basis POVMs, subtending a line through the centre in the Bloch sphere), 3 (the trine measurement, a flat equilateral triangle), 4 (SIC-POVMs, a regular tetrahedron) and 6 (the standard six-state POVM measuring the three Pauli operators, a diamond).The performance is effectively identical for the last two, and is significantly worse for basis POVMs without measurement adaptivity.It appears to make no difference what type of POVM we use beyond using an informationally complete one (≥ 4 legs).The effect of measurement adaptivity on basis POVM (2 legs) seems to effectively turn it into an informationally complete POVM over several measurement steps.
Finally, we offer some numerical/empirical evidence to back up our claim that adaptivity is unhelpful for SIC POVMs and beyond: the range of variation of the heuristic function [Eq.( 4)], optimized in the adaptive step, is drastically smaller (2 − 3% of maximum for SIC-POVMs as compared to about 75% for basis POVMs).As we discuss in the Appendix, this heuristic function can, in fact, be identified with a quantity known as the accessible information of the ensemble of states in the bank, which represents the maximal mutual information between the random variable storing the next POVM measurement result, and the label of the bank particles -a random variable distributed according to the bank weights/posterior distribution.The fact that it does not vary much when the optimization is over the class of SIC-POVMs is an indication that the amount of information yielded by SIC-POVMs is almost invariant to their actual orientation.

B. Entropy heuristics for POVMs
To choose the POVM that will maximize information gained in the next measurement, we investigated a number of heuristics that could guide the selection of the measurement angles.We also attempted learning such a function from data.However, we found that what works best is the accessible information heuristic in Eq. ( 4).We optimize the POVM angles by choosing them at random and selecting the one attaining the highest value of the heuristic.This simple approach performed better than  more advanced tools such as meta-learning which we also experimented with.

C. Adaptivity with Basis POVMs: accuracy and runtime
Based on the conclusions of the previous part, we specialize to basis POVMs to showcase our adaptive algorithm in this subsection.We emphasize again that basis POVMs are also the easiest to implement in the laboratory.
Here we explicitly compare the performance of our Neural network (ours), 100 particles FIG. 6. Scaling of reconstruction runtime with number of copies measured.Our neural algorithm runtime empirically scales logarithmically with the number of copies measured, while ABQT empirically scales polynomially (approximately as m 0.7 ).We ran comparisons on the same hardware.However, for numbers of measurements > 10 5 we had to extrapolate the ABQT runtimes, since it was taking tens of hours per reconstruction.For 10 7 measurements, our algorithm performs a reconstruction in ≈5 seconds, while the ABQT would take approximately a week (extrapolated).
NA-QST in terms of reconstruction accuracy to ABQT. Figure 5 shows this comparison, using both random and adaptive measurements.Generally, ABQT improves as more particles are used -since better estimates can be made when the posterior is more finely sampled.If the number of particles is fixed at 100 for both NA-QST and ABQT, the performance of NA-QST is comparable to ABQT for both types of measurements.In our experiments, performance of NA-QST improves with more particles as well.For computational reasons (limited memory on a laptop), we could not efficiently explore more than 100 particles for NA-QST.ABQT's performance seems to empirically plateau for 30 particles and above.
Our second comparison examines the scaling of computational runtime with the number of measurements to analyze. Figure 6 shows that ABQT takes prohibitively long runtimes that are polynomial (to the power of ≈ 0.7 according to our experiments) in the number of measurements.This is because the resampling step requires computation of a full posterior over all previous measurements.By contrast, the runtime of NA-QST seems to be logarithmic in the number of measurements.For 10 7 measurements to analyze, our NA-QST converges to a solution in approximately 5 seconds on a laptop, whereas the ABQT would take approximately a week to run.

V. CONCLUSIONS
Overall, the neural network based algorithm for quantum state tomography (NA-QST) we developed and ran in Figures 2, 4 and 5 performs comparably to ABQT with both adaptive and random measurements, but with a sig-nificant reduction in computational time, as evidenced in Figure 6.This is due to our fast implementation of the resampling step which avoids the computational bottleneck of computing a full posterior and instead learns an effective heuristic directly from data.In the same amount of time, ABQT is able to keep track of much fewer samples of the posterior, which worsens its performance significantly.Furthermore, by training the resampling and adaptive components of our algorithm jointly using the same loss function, the resampled weights chosen by the neural network are near-optimal choices for the ensemble on which the adaptive heuristic is calculated.This allows for rapid adaptation to particular experiment's needs, as we can retrain within a few hours on a laptop.
Our work is part of an early wave of investigations that use neural networks to improve algorithms in experimental physics that rely on some aspect of signal processing and control theory.In quantum tomography specifically, efficacy-enhancing neural networks have also started to gain recognition.The work of [17] to reconstruct a manybody quantum wavefunction using Restricted Boltzmann Machines is one such example, although it is different both in terms of the machine learning architecture used and the desired outcome.One advantage of neural networks is their versatility; in this paper we have tackled estimation of density matrices of fully-general states, but if one has prior knowledge about the state to be estimated (for instance if it is known that the state is a low-rank state [18,19] or a Choi matrix for a particular class of quantum channels), it is a simple matter to retrain our algorithm only on that specific class of states, allowing it to specialize to it.A further improvement could be achieved by parameterizing the neural network's output appropriately.Alternatively, one might not necessarily be interested in a full description of the state's density matrix, but only functions of said matrix, such as entanglement entropy.For these purposes, our algorithm would only require minor modifications and would be quick to retrain.
A hurdle to full quantum tomography of highdimensional states is the fact that computational complexity grows exponentially in the number of qubits to be estimated.This is unavoidable in any tomography algorithm, since the dimension of the required output itself grows exponentially.However, neural networks and machine learning in general could be the key to achieving a favorable pre-factor for this exponential complexity, which would push the current tractable dimension boundary even higher.Our work shows a very significant runtime speedup that we intend to use to investigate systems > 2 qubits.
In this paper, we demonstrated the power of learning heuristic solutions to computationally complex problems directly from data.By doing so, we were able to develop a quantum state tomography algorithm that achieves orders of magnitude faster runtime while retaining stateof-the-art reconstruction accuracy.As a by-product, we implemented a fully differentiable (in the machine learn-ing sense) version of many quantum mechanical tools, which can be reused in other machine learning solutions to quantum mechanical problems.We further show that measurement adaptivity is only of significant help for basis POVMs, as the adaptivity improves the reconstruction accuracy to the level of other informationallycomplete POVMs.We are planning to publish the core part of our code.
The mutual information evaluated at the maximizing POVM is I acc (ε) Identifying the label of the particle in the bank with X, the posterior probabilities/weights with p X (x) and the optimal POVM's measurement outcome γ with Y , the maximal value of Equation 4is This interpretation, where we associate Alice's ensemble with the particle bank and the experimenter's POVM choice with Bob's next measurement, yields some insight into the interplay of the adaptivity and resampling steps in ABQT: in between Bob's adaptive measurement choices, Alice's ensemble gets refined due to posterior updates and resampling.This also answers the question of why the resampling step of NA-QST -which simply moves particles closer to the true state -works: the refinement of the ensemble makes the adaptivity heuristic I acc (ε) calculated on the ensemble more salient to the state tomography task, yielding more information about the true state.
More concretely, in the setting we described, Alice had randomly chosen one of the states in the ensemble to be the true state, and X is the label of the state.However, in ABQT, there is no obligation for the particle bank to contain the true ρ we wish to estimate (although it may contain close-by states).In fact, generically it does not, unless the ensemble is continuous over all of state space (i.e.infinitely many particles in the bank) and therefore contains ρ by default.Therefore, with the discrete sampling of state space needed to form the ensemble ε in ABQT, the adaptivity heuristic I acc (ε) is at best an approximation to I acc (ε), where ε is an ensemble that contains the true state.Furthermore, the more coarse-grained the sampling (i.e. the fewer particles in the bank), the poorer the approximation.The closer the particles cluster around the true state (as is the result of resampling) and more sharply-peaked the weights about the particles closest to the true state (as is the result of weight updates), the better the approximation.
a) Schematic of the recurrent neural network (RNN).b) One unit cell of the recurrent neural network (RNN), corresponding to the box labelled RNN cell in the panel above.
ρ) as our figure of merit for accuracy of reconstruction.Here, F (ρ, σ) Tr ρ

FIG. 3 .log 10 ( 10 (Bures distance 2 )
FIG.3.The entropy-based heuristic used to choose the optimal POVM angles.The figures show examples of two-dimensional sections of the entropy-based heuristic that we use to choose the POVM angles parametrizing its orientation in the respective qubit subspaces.The figures show an example for a basis POVM (two legs lying in opposite directions for each qubit), SIC POVM (4 legs in the shape of a tetrahedron for each qubit) and a POVM with 6 legs in ±X,±Y , and ±Z directions for each qubit.The range of values attained in the section for 4 and 6-legged POVM is negligible compared to the basis POVM.This is due to the fact that they, unlike the basis POVM, are informationally complete, i.e. provide similar amount of information regardless of their orientation.The x and y axis are the X-axis rotation angles for the two qubit subspaces.
Extrapolation t(m) = 8.4 s × m 0.7 x∈X p X (x)H Y p Y |X (•|x) = − y p Y |ε (y) log p Y |ε (y) + x p X (x) p Y |X (y|x) y log p Y |X (y|x)Using Equation A3 and the fact that p Y |ε (y) = p Y (y), we recover Equation A4.
Algorithm 1 NA-QST Training phase with a quantum simulation Input: A sequence of T numbers of copies to be measured Mt.An initial bank of N bank randomly chosen valid density matrices {ρi} i∈bankB .The true density matrix ρtrue.Output: A sequence of T reconstructions {ρt} and their associated losses.Overall reconstruction loss to use in training.for time step t from 0 to T − 1 do Generate a random POVM given a specification if operating in random POVM mode, or use the adaptively chosen POVM coming from the previous step.Simulate Mt measurement outcomes on the true density matrix drawn from p(ρtrue, POVM).Best guess from this step NN(combined distances,weights) → guess score Draw new bank from bank at random based on weights.for step s from 1 to Nsteps do for member of new particle bank ρ i do Purify ρ Keep the perturbed ρ with the smallest distance.end for New bank of particles → bank of particles end for From the step t we have (ρguess, guess score, bank) Using guesses from steps < t, calculate guess weights = softmax(guess scores).ρt = t ≤t guess weight t ρ guess,t i ρiwi → ρguess i → v i .Generate N resample random vectors { u}.for random vector u do Keep the orthogonal part o of u to v i .Normalize o/| o| → o Combine with purified bank particle asv perturbed i = (1 − ε i ) v i + ε i o.Obtain outcome probabilities pmean of measuring the POVM on the output guess ρt For all members of the particle bank i obtain outcome probabilities pi of measuring the POVM on the particle ρi.Calculate a mixed entropy heuristic f = S(pmean) − i wiS(pi), where S(p) is the entropy of distribution p. end for Choose the POVM corresponding to maximum f .end if end for Loss = t Losst