Abstract
Quantum tomography is currently ubiquitous for testing any implementation of a quantum information processing device. Various sophisticated procedures for state and process reconstruction from measured data are well developed and benefit from precise knowledge of the model describing statepreparationandmeasurement (SPAM) apparatus. However, physical models suffer from intrinsic limitations as actual measurement operators and trial states cannot be known precisely. This scenario inevitably leads to SPAM errors degrading reconstruction performance. Here we develop a framework based on machine learning which generally applies to both the tomography and SPAM mitigation problem. We experimentally implement our method. We trained a supervised neural network to filter the experimental data and hence uncovered salient patterns that characterize the measurement probabilities for the original state and the ideal experimental apparatus free from SPAM errors. We compared the neural network state reconstruction protocol with a protocol treating SPAM errors by process tomography, as well as to an SPAMagnostic protocol with idealized measurements. The average reconstruction fidelity is shown to be enhanced by 10% and 27%, respectively. The presented methods apply to the vast range of quantum experiments which rely on tomography.
Introduction
Rapid experimental progress realizing quantum enhanced technologies places an increased demand on methods for validation and testing. As such, various approaches to augment state and process tomography have recently been proposed. A persistent problem faced by these contemporary approaches are systematic errors in statepreparationandmeasurements (SPAM). Such notoriously challenging errors are inevitable in any experimental realization.^{1,2,3,4,5,6,7,8,9,10,11} Here we develop a datadriven, deeplearning based approach to augment state and detector tomography that successfully minimized SPAM error on quantum optics experimental data.
Several prior approaches have been developed to circumvent the SPAM problem. One line of thought leads to the socalled randomized benchmarking protocols,^{2,12,13} which were designed for quality estimation of quantum gates in the quantum circuit model. The idea is to average the error over a large set of randomly chosen gates, thus effectively minimizing the average influence of SPAM. Randomized benchmarking in its initial form however, only allowed to estimate an average fidelity for the set of gates, so more elaborate and informative procedures were developed.^{3,14} Another example is gate set tomography.^{4,15,16} Therein the experimental apparatus is treated as a black box with external controls allowing for (i) state preparation, (ii) application of gates and, (iii) measurement. These unknown components (i)–(iii) are inferred from measurement statistics. Both approaches require long sequences of gates and are not suited for a simple prepareandmeasure scenario in quantum communication applications. Indeed, in such a scenario the experimenter faces careful calibration of the measurement setup, or in other words quantum detector tomography,^{5,6,17} which works reliably if known probe states can be prepared.^{18,19,20,21}
As (imperfect) quantum tomography is a datadriven technique, recent proposals suggest a natural benefit offered by machinelearning methods. Bayesian models were used to optimize the data collection process by adaptive measurements in state reconstruction,^{7,8,22} process tomography,^{23} Hamiltonian learning,^{24} and other problems in experimental characterization of quantum devices.^{25} Neural networks were proposed to facilitate quantum tomography in highdimensions. In such approaches neural networks of different architectures, such as restricted Boltzmann machines,^{9,10,26} variational autoencoders,^{11} and other architectures^{27} are used for efficient state reconstruction; interestingly, a model for tackling a more realistic scenario of mixed quantum states has been proposed.^{28}
Our framework differs significantly and is based on supervised learning, specifically tailored to address SPAM errors. Our method hence compensates for measurement errors of the specific experimental apparatus employed, as we demonstrate on real experimental data from highdimensional quantum states of single photons encoded in spatial modes. The success of our approach bootstraps the wellknown noise filtering class of techniques in machine learning.
Performing quantum state estimation implies the reconstruction of the density matrix ρ of an unknown quantum state given the outcomes of known measurements.^{29,30,31} In general, a measurement is characterized by a set of positive operator valued measures (POVMs) \(\{{{\mathbb{M}}}_{\alpha }\}\) with index \(\alpha \in {\mathcal{A}}\) the different configurations of the experimental apparatus (set \({\mathcal{A}}\)). Given the configuration α, the probability of observing an outcome γ is given:
where \({M}_{\alpha \gamma }\in {{\mathbb{M}}}_{\alpha }\) are POVM elements, i.e., positive operators satisfying the completeness relation \({\sum }_{\gamma }{M}_{\alpha \gamma }={\mathbb{I}}\). A statistical estimator maps the set of all observed outcomes \({{\mathcal{D}}}_{N}={\{{\gamma }_{n}\}}_{n=1}^{N}\) onto an estimate of the unknown quantum state \(\hat{\rho }\). A more general concept of quantum process tomography stands for a protocol dealing with estimation of an unknown quantum operation acting on quantum states.^{32,33} Process tomography uses measurements on a set of known test states {ρ_{α}} to recover the description of an unknown operation. (See Supplementary Material for the thorough discussion of quantum process tomography and its application for calibration of the measurement setup).
The reconstruction procedure requires knowledge of the measurement operators {M_{αγ}}, as well as the test states {ρ_{α}} in the case of process tomography. However, both tend to deviate from the experimenter’s expectations due to stochastic noise and systematic errors. While stochastic noise may to some extent be circumvented by increasing the sample size, systematic errors are notoriously hard to correct. The only known way to make tomography reliable is to explicitly incorporate these errors in (Eq. 1). Thus, trial states and measurements should be considered as acted upon by some SPAM processes: \({\tilde{\rho }}_{\alpha }={\mathcal{R}}({\rho }_{\alpha })\) and \({\tilde{M}}_{\alpha \gamma }={\mathcal{M}}({M}_{\alpha \gamma })\), and the models for these processes should be learned independently from a calibration procedure. Such calibration is essentially tomography on its right. For example, the reconstruction of measurement operators is known as detector tomography^{5,6,17,34,35} and requires ideal preparation of calibration states. The most straightforward approach is calibration of the measurement setup with some closetoideal and easy to prepare test states, or calibration of the preparation setup with known and closetoideal measurements. In this case, one may then infer the processes \({\mathcal{R}}\) and/or \({\mathcal{M}}\) explicitly—for example—in the form of the corresponding operator elements, and incorporate this knowledge in the reconstruction procedure. Ideally, this procedure should produce an estimator free from bias caused by systematic SPAM errors. (See Supplementary Material for the detailed description of this procedure applied to our experiment).
Results
Given the estimates of raw probabilities inferred from the experimental dataset \(\tilde{{\mathbb{P}}}(\gamma  \alpha ,\tilde{\rho })={\rm{Tr}}\,({\tilde{M}}_{\alpha \gamma }\tilde{\rho }),\) one wants to establish a onetoone correspondence \(\tilde{{\mathbb{P}}}(\gamma  \alpha ,\tilde{\rho })\leftrightarrow {\mathbb{P}}(\gamma  \alpha ,\rho )\) with the ideal probabilities for the measurement setup free from SPAM errors. We use a deep neural network (DNN) to approximate the map from \(\tilde{{\mathbb{P}}}\) to \({\mathbb{P}}\).
To train and test the DNN we prepare a dataset of N Haarrandom pure states \({{\mathcal{D}}}_{N}={\{\left{\psi }_{i}\right\rangle \}}_{i=1}^{N}\). For a ddimensional Hilbert space, reconstruction of a Hermitian density matrix with unit trace requires at least d^{2} different measurements. The network is trained on the dataset, consisting of d^{2} × N frequencies experimentally obtained by performing the same d^{2} measurements \({\{{\tilde{M}}_{\gamma }\}}_{\gamma=1}^{{d}^{2}}\) for all N states (in our experiments d = 6). These frequencies are fed to the input layer of the feedforward network consisting of d^{2} = 36 neurons. Training is performed by minimization of the loss function, defined as the sum of Kullback–Leibler divergences between the distributions of predicted probabilities \({\{{p}_{\gamma }^{i}\}}_{\gamma=1}^{{d}^{2}}\) at the output layer of the network and the ideally expected probabilities\({\{{{\mathbb{P}}}_{\gamma }^{i}\}}_{\gamma=1}^{{d}^{2}}\), which are calculated for the test states as \({{\mathbb{P}}}_{\gamma }^{i}={\rm{Tr}}\,({M}_{\gamma }{\rho }_{i})\) assuming errorless projectors M_{γ}:
We tested different neural architectures with different configuration parameters; currently, there are few guidelines that explain how to find a suitable neural network to solve a specific problem. In general deeper architectures are more difficult to train due to the increasing number of parameters; the best architecture we have found uses two hidden layers as shown in Fig. 1. The first hidden layer is chosen to consist of 400 neurons, whilst the second contains 200. (See Supplementary Material for DNN architecture and the details of training process). To prevent overfitting we applied dropout between the two hidden layers with drop probability equal to 0.2, i.e., at each iteration we randomly drop 20% neurons of the first hidden layer in such a way that the network becomes more robust to variations. We use a rectified linear unit as an activation function after both hidden layers, while in the final output d^{2}dimensional layer we use a softmax function to transform the predicted values to valid normalized probability distributions. Following the standard paradigm of statistical learning, we divided our dataset of overall N = 10,500 states (represented by their density matrix elements) into 7000 states for training, 1500 states for validation, and 2000 for testing. The validation set is an independent set and is used to stop the network training as soon as the error evaluated for this set stops decreasing.
We fix the set of tomographicaly complete measurements \(\{{{\mathbb{M}}}_{\alpha }\}={\mathbb{M}}\) to estimate all matrix elements of ρ using (1) and an appropriate estimator. We will assume that our POVM \({\mathbb{M}}\) consists of d^{2} onedimensional projectors \({M}_{\gamma }=\left{\varphi }_{\gamma }\right\rangle \left\langle {\varphi }_{\gamma }\right\). These projectors are transformed by systematic SPAM errors into some positive operators \({\tilde{M}}_{\gamma }\). Experimental data consist of frequencies f_{γ} = n_{γ}∕n, where n_{γ} is the number of times an outcome γ was observed in a series of n measurements with identically prepared state ρ. For the time being, we assume, that all the SPAM errors can be attributed to the measurement part of the setup, and the state preparation may be performed reliably. This is indeed the case in our experimental implementation (see Supplementary Material).
We reconstruct highdimensional quantum states encoded in the spatial degrees of freedom of photons. The most prominent example of such encoding uses photonic states with orbital angular momentum (OAM)^{36} as relevant to numerous experiments in quantum optics and quantum information. However, OAM is only one of two quantum numbers, associated with orthogonal optical modes, and radial degree of freedom of Laguerre–Gaussian beams^{37,38} as well as full set of Hermite–Gaussian (HG) modes^{39} offer viable alternatives for increasing the accessible Hilbert space dimensionality. One of the troubles with using the full set of orthogonal modes for encoding is the poor quality of projective measurements. Existing methods to remedy the situation^{40} trade reconstruction quality for efficiency, significantly reducing the latter. Complex highdimensional projectors are especially vulnerable to measurement errors and fidelities of state reconstruction are typically at most ~0.9 in highdimensional tomographic experiments.^{41} That provides a challenging experimental scenario for our machinelearningenhanced methods.
Our experiment is schematically illustrated in Fig. 2. We use phase holograms displayed on the spatial light modulator as spatial mode transformers. At the preparation stage an initially Gaussian beam is modulated both in phase and in amplitude to an arbitrary superposition of HG modes, which are chosen as the basis in the Hilbert space. At the detection phase the beam passes through a phaseonly modetransforming hologram and is focused to a single mode fiber, filtering out a single Gaussian mode. This sequence corresponds to a projective measurement in mode space, where the projector \({\tilde{M}}_{\gamma }\) is determined by the phase hologram. (See Supplementary Material for the details of the experimental setup and state preparation and detection methods). In dimension d = 6, we are able to prepare an arbitrary superposition expressed in the basis of HG modes as \(\left\psi \right\rangle ={\sum }_{i,j=0}^{2}{c}_{ij}\left{{\rm{HG}}}_{ij}\right\rangle\). In the measurement phase we used a symmetric informationally complete POVM, which is close to optimal for state reconstruction and may be relatively easily realized for spatial modes.^{41}
We performed state reconstruction using maximum likelihood estimation^{42} for both raw experimental data and DNNprocessed data. (See also Supplementary Material for extra information on spatial probability distribution of reconstructed states). In the former case, the loglikelihood function to be maximized with respect to ρ has been chosen as \({\mathcal{L}}({f}_{\gamma }^{i} \rho )\propto {\sum }_{\gamma=1}^{36}{f}_{\gamma }^{i}{\log}\,[{\rm{Tr}}\,({M}_{\gamma }\rho )]\), with frequencies f_{γ} = n_{γ}∕n and i numbering the test set states. Whereas in the latter case, these frequencies have been replaced with predicted probabilities p_{γ}. The results for \({\hat{\rho }}_{(raw)}^{i}={\rm{argmax}}\;{\mathcal{L}}({f}_{\gamma }^{i} \rho )\) and \({\hat{\rho }}_{(nn)}^{i}={\rm{argmax}}\;{\mathcal{L}}({p}_{\gamma }^{i} \rho )\) with the prepared states \(\left{\psi }^{i}\right\rangle\) are shown in Fig. 3. Interestingly, the average reconstruction fidelity increases from F_{(raw)} = (0.82 ± 0.05) to F_{(nn)} = (0.91 ± 0.03) and this increase is uniform over the entire test set. Similar behavior is observed for the purity—since we did not force the state to be pure in the reconstruction, the average purity of the estimate is less then unity: π_{(raw)} = (0.78 ± 0.07), whereas π_{(nn)} = (0.88 ± 0.04). If the restriction to pure states is explicitly imposed in the reconstruction procedure, the fidelity increase is even more significant, as shown in Fig. 3(c). In this case the initially relatively high fidelity of F_{(raw)} = (0.94 ± 0.03) increases to F_{(nn)} = (0.98 ± 0.02)—a very high value, given the states dimensionality.
Discussion
Our results were obtained with analytical correction for some known SPAM errors already performed. In particular, we have explicitly taken into account the Gouy phaseshifts acquired by the modes of different order during propagation (see Supplementary Material). This correction is however unnecessary for neural network post processing. The DNN has been trained without any need of data preprocessing over the experimental dataset, as to say without introducing any phase correction in our initial data, wherein considering the effect of a channel process \({\mathcal{E}}\). However, we have achieved average estimation fidelities of F_{(nn)} = (0.81 ± 0.19) as compared with F_{(raw)} = (0.54 ± 0.12) for this completely agnostic scenario, showing a dramatic improvement by straightforward application of a learning approach.
In our experiment the prepared input states were close to pure and we did not have a controllable way to systematically change their purity. However, it is exactly the case of pure input states, which is most seriously affected by SPAM errors, since in this case the outcome probabilities are the most sensitive to perturbations in the measurement operators. So the presented experimental results illustrate the worstcase performance of our method. We have examined the case of mixed states by numerical simulations (details and results of the simulation are provided in Supplementary Material), and conclude that DNN post processing enhances the reconstruction quality for any purity of the input state.
DNN processing may be straightforwardly generalized to process tomography, since the latter may always be formulated as a state tomogrphy in higher dimensional space due to Choi–Jamiołkowski isomorphism. It may present a valuable tool in the case where randomized benchmarking and similar protocols cannot be applied, such as in channel testing for quantum communication.
Although the number of neurons in the hidden layers is quite large in the current realization, the training procedure is still fast and we do not believe it will be an issue for reasonable applications. In the end, our method is specifically designed for full state tomography, which itself is limited to rather small Hilbert space dimensionalities due to fast growth of the number of measurements required. Optimization of DNN architecture and applications to protocols providing partial information about the state are interesting directions for future work.
To conclude, our results unambiguously demonstrate that a use of neuralnetworkarchitecture on experimental data can provide a reliable tool for quantum stateanddetector tomography.
Data availability
The data that support this study are available at https://github.com/QuantumMachineLearningInitiative/dnnquantumtomography.
Code availability
The code that supports this study is available at https://github.com/QuantumMachineLearningInitiative/dnnquantumtomography.
References
 1.
Rosset, D., FerrettiSchöbitz, R., Bancal, J.D., Gisin, N. & Liang, Y.C. Imperfect measurement settings: Implications for quantum state tomography and entanglement witnesses. Phys. Rev. A 86, 062325 (2012).
 2.
Knill, E. et al. Randomized benchmarking of quantum gates. Phys. Rev. A 77, 012307 (2008).
 3.
Merkel, S. T. et al. Selfconsistent quantum process tomography. Phys. Rev. A 87, 062119 (2013).
 4.
BlumeKohout, R. et al. Demonstration of qubit operations below a rigorous fault tolerance threshold with gate set tomography. Nat. Commun. 8, 14485 (2017).
 5.
Lundeen, J. et al. Tomography of quantum detectors. Nat. Phys. 5, 27 (2009).
 6.
Brida, G. et al. Ancillaassisted calibration of a measuring apparatus. Phys. Rev. Lett. 108, 253601 (2012).
 7.
Huszár, F. & Houlsby, N. M. T. Adaptive Bayesian quantum tomography. Phys. Rev. A 85, 052120 (2012).
 8.
Kravtsov, K. S. et al. Experimental adaptive Bayesian tomography. Phys. Rev. A 87, 062122 (2013).
 9.
Torlai, G. et al. Neuralnetwork quantum state tomography. Nat. Phys. 14, 447 (2018).
 10.
Carleo, G., Nomura, Y. & Imada, M. Constructing exact representations of quantum many body systems with deep neural network. Nat. Commun. 9, 5322 (2018).
 11.
Rocchetto, A., Grant, E., Strelchuk, S., Carleo, G. & Severini, S. Learning hard quantum distributions with variational autoencoders. npj Quantum Inf. 4, 28 (2018).
 12.
Magesan, E., Gambetta, J. M. & Emerson, J. Scalable and robust randomized benchmarking of quantum processes. Phys. Rev. Lett. 106, 180504 (2011).
 13.
Wallman, J. J. & Flammia, S. T. Randomized benchmarking with confidence. New J. Phys. 16, 103032 (2014).
 14.
Roth, I. et al. Recovering quantum gates from few average gate fidelities. Phys. Rev. Lett. 121, 170502 (2018).
 15.
BlumeKohout, R. et al. Robust, selfconsistent, closedform tomography of quantum logic gates on a trapped ion qubit (2013). https://arxiv.org/pdf/1310.4492.pdf.
 16.
Dehollain, J. P. et al. Optimization of a solidstate electron spin qubit using gate set tomography. New J. Phys. 18, 103018 (2016).
 17.
Bobrov, I. B., Kovlakov, E. V., Markov, A. A., Straupe, S. S. & Kulik, S. P. Tomography of spatial mode detectors. Optics Express 23, 649–654 (2015).
 18.
Mogilevtsev, D., Rehacek, J. & Hradil, Z. Selfcalibration for selfconsistent tomography. New J. Phys. 14, 095001 (2012).
 19.
Brańczyk, A. M. et al. Selfcalibrating quantum state tomography. New J. Phys. 14, 085003 (2012).
 20.
Straupe, S. S. et al. Selfcalibrating tomography for angular schmidt modes in spontaneous parametric downconversion. Phys. Rev. A 87, 042109 (2013).
 21.
Jackson, C. & van Enk, S. J. Detecting correlated errors in statepreparationandmeasurement tomography. Phys. Rev. A 92, 042312 (2015).
 22.
Granade, C., Ferrie, C. & Flammia, S. T. Practical adaptive quantum tomography. New J. Phys. 19, 113017 (2017).
 23.
Pogorelov, I. A. et al. Experimental adaptive process tomography. Phys. Rev. A 95, 012302 (2017).
 24.
Granade, C. E., Ferrie, C., Wiebe, N. & Cory, D. G. Robust online Hamiltonian learning. New J. Phys. 14, 103013 (2012).
 25.
Lennon, D. et al. Efficiently measuring a quantum device using machine learning (2018). https://arxiv.org/abs/1810.10042.
 26.
Carrasquilla, J., Torlai, G., Melko, R. G. & Aolita, L. Reconstructing quantum states with generative models. Nat. Mach. Intell. 1, 155 (2019).
 27.
Xin, T. et al. Localmeasurementbased quantum state tomography via neural networks (2018). https://arxiv.org/pdf/1807.07445.pdf.
 28.
Torlai, G. & Melko, R. G. Latent space purification via neural density operator. Phys. Rev. Lett. 120, 240503 (2018).
 29.
Banaszek, K., D’Ariano, G. M., Paris, M. G. A. & Sacchi, M. F. Maximumlikelihood estimation of the density matrix. Phys. Rev. A 61, 010304 (1999).
 30.
James, D. F. V., Kwiat, P. G., Munro, W. J. & White, A. G. Measurement of qubits. Phys. Rev. A 64, 052312 (2001).
 31.
Paris, M. & Řeháček, J. (eds) Quantum State Estimation, vol. 649 of Lecture Notes in Physics (SpringerVerlag, 2004). http://www.springer.com/gp/book/9783540223290.
 32.
Chuang, I. L. & Nielsen, M. A. Prescription for experimental determination of the dynamics of a quantum black box. J. Mod. Optics 44, 2455–2467 (1997).
 33.
Poyatos, J. F., Cirac, J. I. & Zoller, P. Complete characterization of a quantum process: the twobit quantum gate. Phys. Rev. Lett. 78, 390–393 (1997).
 34.
Fiurášek, J. Maximumlikelihood estimation of quantum measurement. Phys. Rev. A 64, 024102 (2001).
 35.
D’Ariano, G. M., Maccone, L. & Presti, P. L. Quantum calibration of measurement instrumentation. Phys. Rev. Lett. 93, 250407 (2004).
 36.
MolinaTerriza, G., Torres, J. P. & Torner, L. Twisted photons. Nat. Phys. 3, 305 (2007).
 37.
Salakhutdinov, V. D., Eliel, E. R. & Löffler, W. Fullfield quantum correlations of spatially entangled photons. Phys. Rev. Lett. 108, 173604 (2012).
 38.
Krenn, M. et al. Generation and confirmation of a (100 × 100)dimensional entangled quantum system. Proc. Natl Acad. Sci. 111, 6243–6247 (2014).
 39.
Kovlakov, E. V., Bobrov, I. B., Straupe, S. S. & Kulik, S. P. Spatial bellstate generation without transverse mode subspace postselection. Phys. Rev. Lett. 118, 030503 (2017).
 40.
Bouchard, F. et al. Measuring azimuthal and radial modes of photons. Optics Express 26, 31925–31941 (2018).
 41.
Bent, N. et al. Experimental realization of quantum tomography of photonic qudits via symmetric informationally complete positive operatorvalued measures. Phys. Rev. X 5, 041006 (2015).
 42.
Hradil, Z. Quantumstate estimation. Phys. Rev. A 55, R1561–R1564 (1997).
Acknowledgements
The authors acknowledge financial support under the Russian National Technological Initiative via MSU Quantum Technologies Centre and RFBR grant #195280034, and thank Timur Tlyachev and Dmitry Dylov for helpful suggestions on an early version of this study. E.K. acknowledges support from the BASIS Foundation.
Author information
Affiliations
Contributions
A.M.P., F.B. and D.Y. developed the neural network architecture and the training procedure. E.K. and S.S. devised the experiment, E.K. gathered the data. J.B. and S.K. supervised the project. All authors discussed the results and contributed to writing the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Palmieri, A.M., Kovlakov, E., Bianchi, F. et al. Experimental neural network enhanced quantum tomography. npj Quantum Inf 6, 20 (2020). https://doi.org/10.1038/s4153402002486
Received:
Accepted:
Published:
Further reading

Exact and GerchbergSaxton solutions of the onedimensional Pauli problem with Gaussian probability densities
Physical Review A (2021)

Quantum tomography of noisy ionbased qudits
Laser Physics Letters (2021)

Machine learning and applications in ultrafast photonics
Nature Photonics (2021)

Quantum State Learning via SingleShot Measurements
Physical Review Letters (2021)

Investigating reconstruction of quantum state distributions with neural networks
The European Physical Journal Plus (2021)