Rapid experimental progress realizing quantum enhanced technologies places an increased demand on methods for validation and testing. As such, various approaches to augment state- and process tomography have recently been proposed. A persistent problem faced by these contemporary approaches are systematic errors in state-preparation-and-measurements (SPAM). Such notoriously challenging errors are inevitable in any experimental realization.1,2,3,4,5,6,7,8,9,10,11 Here we develop a data-driven, deep-learning based approach to augment state- and detector tomography that successfully minimized SPAM error on quantum optics experimental data.

Several prior approaches have been developed to circumvent the SPAM problem. One line of thought leads to the so-called randomized benchmarking protocols,2,12,13 which were designed for quality estimation of quantum gates in the quantum circuit model. The idea is to average the error over a large set of randomly chosen gates, thus effectively minimizing the average influence of SPAM. Randomized benchmarking in its initial form however, only allowed to estimate an average fidelity for the set of gates, so more elaborate and informative procedures were developed.3,14 Another example is gate set tomography.4,15,16 Therein the experimental apparatus is treated as a black box with external controls allowing for (i) state preparation, (ii) application of gates and, (iii) measurement. These unknown components (i)–(iii) are inferred from measurement statistics. Both approaches require long sequences of gates and are not suited for a simple prepare-and-measure scenario in quantum communication applications. Indeed, in such a scenario the experimenter faces careful calibration of the measurement setup, or in other words quantum detector tomography,5,6,17 which works reliably if known probe states can be prepared.18,19,20,21

As (imperfect) quantum tomography is a data-driven technique, recent proposals suggest a natural benefit offered by machine-learning methods. Bayesian models were used to optimize the data collection process by adaptive measurements in state reconstruction,7,8,22 process tomography,23 Hamiltonian learning,24 and other problems in experimental characterization of quantum devices.25 Neural networks were proposed to facilitate quantum tomography in high-dimensions. In such approaches neural networks of different architectures, such as restricted Boltzmann machines,9,10,26 variational autoencoders,11 and other architectures27 are used for efficient state reconstruction; interestingly, a model for tackling a more realistic scenario of mixed quantum states has been proposed.28

Our framework differs significantly and is based on supervised learning, specifically tailored to address SPAM errors. Our method hence compensates for measurement errors of the specific experimental apparatus employed, as we demonstrate on real experimental data from high-dimensional quantum states of single photons encoded in spatial modes. The success of our approach bootstraps the well-known noise filtering class of techniques in machine learning.

Performing quantum state estimation implies the reconstruction of the density matrix ρ of an unknown quantum state given the outcomes of known measurements.29,30,31 In general, a measurement is characterized by a set of positive operator valued measures (POVMs) \(\{{{\mathbb{M}}}_{\alpha }\}\) with index \(\alpha \in {\mathcal{A}}\) the different configurations of the experimental apparatus (set \({\mathcal{A}}\)). Given the configuration α, the probability of observing an outcome γ is given:

$${\mathbb{P}}(\gamma | \alpha ,\rho )={\rm{Tr}}\,({M}_{\alpha \gamma }\rho ),$$

where \({M}_{\alpha \gamma }\in {{\mathbb{M}}}_{\alpha }\) are POVM elements, i.e., positive operators satisfying the completeness relation \({\sum }_{\gamma }{M}_{\alpha \gamma }={\mathbb{I}}\). A statistical estimator maps the set of all observed outcomes \({{\mathcal{D}}}_{N}={\{{\gamma }_{n}\}}_{n=1}^{N}\) onto an estimate of the unknown quantum state \(\hat{\rho }\). A more general concept of quantum process tomography stands for a protocol dealing with estimation of an unknown quantum operation acting on quantum states.32,33 Process tomography uses measurements on a set of known test states {ρα} to recover the description of an unknown operation. (See Supplementary Material for the thorough discussion of quantum process tomography and its application for calibration of the measurement setup).

The reconstruction procedure requires knowledge of the measurement operators {Mαγ}, as well as the test states {ρα} in the case of process tomography. However, both tend to deviate from the experimenter’s expectations due to stochastic noise and systematic errors. While stochastic noise may to some extent be circumvented by increasing the sample size, systematic errors are notoriously hard to correct. The only known way to make tomography reliable is to explicitly incorporate these errors in (Eq. 1). Thus, trial states and measurements should be considered as acted upon by some SPAM processes: \({\tilde{\rho }}_{\alpha }={\mathcal{R}}({\rho }_{\alpha })\) and \({\tilde{M}}_{\alpha \gamma }={\mathcal{M}}({M}_{\alpha \gamma })\), and the models for these processes should be learned independently from a calibration procedure. Such calibration is essentially tomography on its right. For example, the reconstruction of measurement operators is known as detector tomography5,6,17,34,35 and requires ideal preparation of calibration states. The most straightforward approach is calibration of the measurement setup with some close-to-ideal and easy to prepare test states, or calibration of the preparation setup with known and close-to-ideal measurements. In this case, one may then infer the processes \({\mathcal{R}}\) and/or \({\mathcal{M}}\) explicitly—for example—in the form of the corresponding operator elements, and incorporate this knowledge in the reconstruction procedure. Ideally, this procedure should produce an estimator free from bias caused by systematic SPAM errors. (See Supplementary Material for the detailed description of this procedure applied to our experiment).


Given the estimates of raw probabilities inferred from the experimental dataset \(\tilde{{\mathbb{P}}}(\gamma | \alpha ,\tilde{\rho })={\rm{Tr}}\,({\tilde{M}}_{\alpha \gamma }\tilde{\rho }),\) one wants to establish a one-to-one correspondence \(\tilde{{\mathbb{P}}}(\gamma | \alpha ,\tilde{\rho })\leftrightarrow {\mathbb{P}}(\gamma | \alpha ,\rho )\) with the ideal probabilities for the measurement setup free from SPAM errors. We use a deep neural network (DNN) to approximate the map from \(\tilde{{\mathbb{P}}}\) to \({\mathbb{P}}\).

To train and test the DNN we prepare a dataset of N Haar-random pure states \({{\mathcal{D}}}_{N}={\{\left|{\psi }_{i}\right\rangle \}}_{i=1}^{N}\). For a d-dimensional Hilbert space, reconstruction of a Hermitian density matrix with unit trace requires at least d2 different measurements. The network is trained on the dataset, consisting of d2 × N frequencies experimentally obtained by performing the same d2 measurements \({\{{\tilde{M}}_{\gamma }\}}_{\gamma=1}^{{d}^{2}}\) for all N states (in our experiments d = 6). These frequencies are fed to the input layer of the feed-forward network consisting of d2 = 36 neurons. Training is performed by minimization of the loss function, defined as the sum of Kullback–Leibler divergences between the distributions of predicted probabilities \({\{{p}_{\gamma }^{i}\}}_{\gamma=1}^{{d}^{2}}\) at the output layer of the network and the ideally expected probabilities\({\{{{\mathbb{P}}}_{\gamma }^{i}\}}_{\gamma=1}^{{d}^{2}}\), which are calculated for the test states as \({{\mathbb{P}}}_{\gamma }^{i}={\rm{Tr}}\,({M}_{\gamma }{\rho }_{i})\) assuming errorless projectors Mγ:

$$L=\sum\limits _{i=1}^{N}{D}_{KL}(\{{{\mathbb{P}}}^{i}\}| | \{{p}^{i}\})=\sum _{i=1}^{N}\sum _{\gamma =1}^{{d}^{2}}{{\mathbb{P}}}_{\gamma }^{i}\mathrm{log}\left(\frac{{{\mathbb{P}}}_{\gamma }^{\it{i}}}{{p}_{\gamma }^{\it{i}}}\right).$$

We tested different neural architectures with different configuration parameters; currently, there are few guidelines that explain how to find a suitable neural network to solve a specific problem. In general deeper architectures are more difficult to train due to the increasing number of parameters; the best architecture we have found uses two hidden layers as shown in Fig. 1. The first hidden layer is chosen to consist of 400 neurons, whilst the second contains 200. (See Supplementary Material for DNN architecture and the details of training process). To prevent overfitting we applied dropout between the two hidden layers with drop probability equal to 0.2, i.e., at each iteration we randomly drop 20% neurons of the first hidden layer in such a way that the network becomes more robust to variations. We use a rectified linear unit as an activation function after both hidden layers, while in the final output d2-dimensional layer we use a softmax function to transform the predicted values to valid normalized probability distributions. Following the standard paradigm of statistical learning, we divided our dataset of overall N = 10,500 states (represented by their density matrix elements) into 7000 states for training, 1500 states for validation, and 2000 for testing. The validation set is an independent set and is used to stop the network training as soon as the error evaluated for this set stops decreasing.

Fig. 1: The DNN architecture employed in our experiments.
figure 1

Input and output layers constitute of 36 neurons each and two hidden layers of 400 and 200 neurons, respectively. The DNN modifies its internal parameters to find a function \({\mathcal{F}}:\tilde{{\mathbb{P}}}(\gamma | \alpha ,\tilde{\rho })\to {\mathbb{P}}(\gamma | \alpha ,\rho )\) which translates between the experimentally estimated probabilities \(\tilde{{\mathbb{P}}}(\gamma | \alpha ,\tilde{\rho })\), subjected to SPAM errors, at the input and ideal \({\mathbb{P}}(\gamma | \alpha ,\rho )\) at the output. To achieve this goal the network is forced to reduce the Kullback–Leibler divergence amongst pairs of distributions. An early stopper is applied in order to avoid overfitting during the training phase.

We fix the set of tomographicaly complete measurements \(\{{{\mathbb{M}}}_{\alpha }\}={\mathbb{M}}\) to estimate all matrix elements of ρ using (1) and an appropriate estimator. We will assume that our POVM \({\mathbb{M}}\) consists of d2 one-dimensional projectors \({M}_{\gamma }=\left|{\varphi }_{\gamma }\right\rangle \left\langle {\varphi }_{\gamma }\right|\). These projectors are transformed by systematic SPAM errors into some positive operators \({\tilde{M}}_{\gamma }\). Experimental data consist of frequencies fγ = nγn, where nγ is the number of times an outcome γ was observed in a series of n measurements with identically prepared state ρ. For the time being, we assume, that all the SPAM errors can be attributed to the measurement part of the setup, and the state preparation may be performed reliably. This is indeed the case in our experimental implementation (see Supplementary Material).

We reconstruct high-dimensional quantum states encoded in the spatial degrees of freedom of photons. The most prominent example of such encoding uses photonic states with orbital angular momentum (OAM)36 as relevant to numerous experiments in quantum optics and quantum information. However, OAM is only one of two quantum numbers, associated with orthogonal optical modes, and radial degree of freedom of Laguerre–Gaussian beams37,38 as well as full set of Hermite–Gaussian (HG) modes39 offer viable alternatives for increasing the accessible Hilbert space dimensionality. One of the troubles with using the full set of orthogonal modes for encoding is the poor quality of projective measurements. Existing methods to remedy the situation40 trade reconstruction quality for efficiency, significantly reducing the latter. Complex high-dimensional projectors are especially vulnerable to measurement errors and fidelities of state reconstruction are typically at most ~0.9 in high-dimensional tomographic experiments.41 That provides a challenging experimental scenario for our machine-learning-enhanced methods.

Our experiment is schematically illustrated in Fig. 2. We use phase holograms displayed on the spatial light modulator as spatial mode transformers. At the preparation stage an initially Gaussian beam is modulated both in phase and in amplitude to an arbitrary superposition of HG modes, which are chosen as the basis in the Hilbert space. At the detection phase the beam passes through a phase-only mode-transforming hologram and is focused to a single mode fiber, filtering out a single Gaussian mode. This sequence corresponds to a projective measurement in mode space, where the projector \({\tilde{M}}_{\gamma }\) is determined by the phase hologram. (See Supplementary Material for the details of the experimental setup and state preparation and detection methods). In dimension d = 6, we are able to prepare an arbitrary superposition expressed in the basis of HG modes as \(\left|\psi \right\rangle ={\sum }_{i,j=0}^{2}{c}_{ij}\left|{{\rm{HG}}}_{ij}\right\rangle\). In the measurement phase we used a symmetric informationally complete POVM, which is close to optimal for state reconstruction and may be relatively easily realized for spatial modes.41

Fig. 2: Experimental setup for preparation and measurement of spatial qudit states.
figure 2

In the generation part, single photons from a heralded source are beam-shaped by a single mode fiber (SMF) and then transformed by a hologram displayed on a spatial light modulator. Analogously, the detection part consists of a hologram corresponding to the chosen detection mode, followed by a single mode fiber and a single photon counter. The hologram in the generation part produces high-quality HG modes with the use of amplitude modulation, while a phase-only hologram at the detection part sacrifices projection quality for efficiency.

We performed state reconstruction using maximum likelihood estimation42 for both raw experimental data and DNN-processed data. (See also Supplementary Material for extra information on spatial probability distribution of reconstructed states). In the former case, the log-likelihood function to be maximized with respect to ρ has been chosen as \({\mathcal{L}}({f}_{\gamma }^{i}| \rho )\propto {\sum }_{\gamma=1}^{36}{f}_{\gamma }^{i}{\log}\,[{\rm{Tr}}\,({M}_{\gamma }\rho )]\), with frequencies fγ = nγn and i numbering the test set states. Whereas in the latter case, these frequencies have been replaced with predicted probabilities pγ. The results for \({\hat{\rho }}_{(raw)}^{i}={\rm{argmax}}\;{\mathcal{L}}({f}_{\gamma }^{i}| \rho )\) and \({\hat{\rho }}_{(nn)}^{i}={\rm{argmax}}\;{\mathcal{L}}({p}_{\gamma }^{i}| \rho )\) with the prepared states \(\left|{\psi }^{i}\right\rangle\) are shown in Fig. 3. Interestingly, the average reconstruction fidelity increases from F(raw) = (0.82 ± 0.05) to F(nn) = (0.91 ± 0.03) and this increase is uniform over the entire test set. Similar behavior is observed for the purity—since we did not force the state to be pure in the reconstruction, the average purity of the estimate is less then unity: π(raw) = (0.78 ± 0.07), whereas π(nn) = (0.88 ± 0.04). If the restriction to pure states is explicitly imposed in the reconstruction procedure, the fidelity increase is even more significant, as shown in Fig. 3(c). In this case the initially relatively high fidelity of F(raw) = (0.94 ± 0.03) increases to F(nn) = (0.98 ± 0.02)—a very high value, given the states dimensionality.

Fig. 3: Results of experimental state reconstruction with phase-only holograms.
figure 3

a Fidelity of the experimentally reconstructed states with ideal \(F=\left\langle {\psi }^{i}\right|{\hat{\rho }}_{(raw/nn)}^{i}\left|{\psi }^{i}\right\rangle\) for 2000 test states reconstructed from raw data (orange bars) and reconstructed after neural network processing of the data (blue bars). b A similar diagram for purity of the reconstructed states, \(\pi ={\rm{Tr}}{\hat{\rho }}^{2}\). c Fidelity histogram for the case, when the state is reconstructed to be pure. The results of the filtering process are clearly witnessed by the modification of data histogram shapes. Besides the shifting towards higher values that shows average gain over our experimental data, the reduction of FWHM indicates filtering task by the neural network.


Our results were obtained with analytical correction for some known SPAM errors already performed. In particular, we have explicitly taken into account the Gouy phase-shifts acquired by the modes of different order during propagation (see Supplementary Material). This correction is however unnecessary for neural network post processing. The DNN has been trained without any need of data preprocessing over the experimental dataset, as to say without introducing any phase correction in our initial data, wherein considering the effect of a channel process \({\mathcal{E}}\). However, we have achieved average estimation fidelities of F(nn) = (0.81 ± 0.19) as compared with F(raw) = (0.54 ± 0.12) for this completely agnostic scenario, showing a dramatic improvement by straightforward application of a learning approach.

In our experiment the prepared input states were close to pure and we did not have a controllable way to systematically change their purity. However, it is exactly the case of pure input states, which is most seriously affected by SPAM errors, since in this case the outcome probabilities are the most sensitive to perturbations in the measurement operators. So the presented experimental results illustrate the worst-case performance of our method. We have examined the case of mixed states by numerical simulations (details and results of the simulation are provided in Supplementary Material), and conclude that DNN post processing enhances the reconstruction quality for any purity of the input state.

DNN processing may be straightforwardly generalized to process tomography, since the latter may always be formulated as a state tomogrphy in higher dimensional space due to Choi–Jamiołkowski isomorphism. It may present a valuable tool in the case where randomized benchmarking and similar protocols cannot be applied, such as in channel testing for quantum communication.

Although the number of neurons in the hidden layers is quite large in the current realization, the training procedure is still fast and we do not believe it will be an issue for reasonable applications. In the end, our method is specifically designed for full state tomography, which itself is limited to rather small Hilbert space dimensionalities due to fast growth of the number of measurements required. Optimization of DNN architecture and applications to protocols providing partial information about the state are interesting directions for future work.

To conclude, our results unambiguously demonstrate that a use of neural-network-architecture on experimental data can provide a reliable tool for quantum state-and-detector tomography.