Crystallizing highly-likely subspaces that contain an unknown quantum state of light

In continuous-variable tomography, with finite data and limited computation resources, reconstruction of a quantum state of light is performed on a finite-dimensional subspace. In principle, the data themselves encode all information about the relevant subspace that physically contains the state. We provide a straightforward and numerically feasible procedure to uniquely determine the appropriate reconstruction subspace by extracting this information directly from the data for any given unknown quantum state of light and measurement scheme. This procedure makes use of the celebrated statistical principle of maximum likelihood, along with other validation tools, to grow an appropriate seed subspace into the optimal reconstruction subspace, much like the nucleation of a seed into a crystal. Apart from using the available measurement data, no other assumptions about the source or preconceived parametric model subspaces are invoked. This ensures that no spurious reconstruction artifacts are present in state reconstruction as a result of inappropriate choices of the reconstruction subspace. The procedure can be understood as the maximum-likelihood reconstruction for quantum subspaces, which is an analog to, and fully compatible with that for quantum states.

Recently, methods employed in classical statistical-model selection were used to localize the signal (see, for example, refs [23][24][25][26][27][28]. These methods involve the consideration of the popular Akaike criterion and the Bayesian information criterion to penalize the likelihood function for the problem and restrict models for up to a certain number of parameters. These strategies depend on the parametric models selected (which in our case corresponds to some pre-chosen reconstruction subspace) to optimize the signal locations, the accuracy of which depend heavily on the correctness of the chosen models. An average of these models may be carried out over the quantum state space to mitigate possible model inaccuracies 29 . Other methods of choosing reconstruction subspaces include the utilization of other prior knowledge about the source and assigning a partial dependence of the subspace dimension D rec on the number of measurement settings or groups of outcomes 30 .
In what follows, we shall present a systematic and practical procedure to locate subspaces that highly-likely contain a given unknown quantum state of light that is completely free of model (preconceived subspace) considerations and hard-to-justify assumptions about the source. This strategy converges to the appropriate "model" subspace based on information encoded in the collected measurement data alone. In a nutshell, the procedure makes use of the ML strategy to define an initial reconstruction subspace of low-dimension and gradually evolve the seed subspace to a reconstruction subspace of a stipulated dimension D rec -much like a typical nucleation process in the formation of crystals. The termination of the ML nucleation process, and the subsequent determination of D rec , is governed by the procedure of cross-validation, which is a prototypical statistical validation tool that ensures the reliability and predictive power of the resulting ML state estimator. This numerical nucleation process, which is naturally compatible with ML state estimation 1,2 , makes use of only the acquired measurement data in an experiment. The underlying physical reason is that all encoded information in the data reflects the features of the unknown quantum state, albeit with some statistical fluctuation, and can thus be systematically extracted to obtain the optimal reconstruction subspace and state estimator.
Without loss of generality, we shall assume here that the data associated with the continuous-variable quantum measurement, although finite, are sufficiently large enough such that statistical fluctuation is minimized within typical experimental means. In this situation, the relevant reconstruction error of interest is primarily influenced by the choice of reconstruction subspace.

Results
The ML subspace nucleation process. Suppose that the observer chooses to reconstruct the true quantum state ρ true of the source using the ML method from a set of data. In CV tomography, the data are event occurrences ∑ = n N j j of N sampling events (say voltage detection) collected with a measurement described by a set of probability operator measurement (POM) ∑ Π = = 1 j M j 1 consisting of M outcomes Π j ≥ 0. In this scenario, it is natural to consider an assignment of the reconstruction subspace that is compatible with the ML principle. Clearly, if the desired ML estimator ρ ML is the one that maximizes the log-likelihood function of ρ for the data, j j M j j j j 1 the reconstruction subspace should then be a subspace of a certain dimension D rec that optimizes this log-likelihood function. The problem now reduces to deciding the appropriate value of D rec ≥ 2 and searching for the optimal subspace of this dimension.
In real situations where detection losses are present, such that the efficiency of the overall quantum detection is less than unity, there exist many ways to cope with this additional detail depending on specific experimental situations. The complete likelihood is now one that accounts for both the detected and missing copies of quantum systems, with the sum of the measured probabilities ∑ < ′ ′ p 1 j j . If the source (usually photonic in this case) has a known Poissonian prior distribution for the total number of quantum systems (photons), then an effective likelihood function for the measured sampling events may be defined as an average of the complete likelihood over the missing quantum systems because of losses with this known prior distribution. Alternatively, we may carry on the maximum-likelihood philosophy and instead maximize the complete likelihood over the missing copies, and the result is an effective log-likelihood function of the form in Eq. (1) with p j replaced by ∑ ′ ′ p p / j j j 1,21 . In any case, the effective log-likelihood function is now the proper log-likelihood function for subsequent optimizations.
Since all we have are the measurement data {n j }, the most straightforward way to carry out the subspace search is subspace nucleation. Such a numerical nucleation process involves the surveillance of all possible d-dimensional discrete subspaces of some large Hilbert space of dimension D lim that defines some limit for the state reconstruc- that are diagonal in the computational basis and consist of D lim − d zeros and d ones. For any given operator A, its suboperator A l,d in the lth subspace is conveniently expressed as A l,d = S l,d AS l,d . Analogous to nucleation in crystal formation, subspace nucleation begins with a seed subspace of a certain smallest pre-chosen dimension d. For the purpose of illustration, we take d = 2. The qubit subspace S 2 (ML) appropriate for seeding the nucleation process is the one corresponding to the two-dimensional ρ ML of the largest maximal log-likelihood out of all possible projectors S l,2 . The subspace begins to grow along the trajectory of largest likelihood increment. The next optimal subspace to choose would be the subspace = + S S S 4 (ML) 2 (ML) 2 , where S 2 is the optimal orthogonal subspace to S 2 , such that the corresponding four-dimensional ρ ML yields the largest maximal log-likelihood. Nucleation continues, this time establishing the next larger optimal subspace = + S S S 6 (ML) 4 (ML) 2 such that, again, = S S 0 2 4 (ML) and the corresponding six-dimensional ρ ML gives the largest maximal log-likelihood, and so on. In this way, the reconstruction subspace matures in the direction of maximal sequential increase in the log-likelihood. Since the process evaluates the (log-)likelihood and maximizes it over subspaces, this process is entirely equivalent to a ML subspace estimation, which is fully analogous to a ML state estimation. The data alone contain all hidden signatures of the relevant subspace segments and the structures thereon, all of which are revealed by nothing else but the log-likelihood function. In the limit of large N of sampling events, which is an achievable commodity in homodyne tomography, for instance, these signatures accurately reflect those of ρ true . This function thus serves as the only important objective function following which subspace crystallization takes place. No additional parametric model selection or spurious assumptions about the source are necessary. We have therefore established a fully objective numerical procedure for assigning reconstruction subspaces that is compliant with ML state estimation.
Computationally, the ML subspace nucleation process is a continuous iteration of the following simple numerical steps over k, starting with the smallest optimal d-dimensional seed subspace defined by S d (ML) at k = 1 and proceeding till k = κ that defines the final reconstruction-subspace dimension D rec : 1. In the kth step, look for the full set of operators = The Methods section provides more explicit details on the numerical procedure.
Criterion for nucleation termination. The final task is now to decide on the reasonable value of D rec .
Various statistical tools are available for this purpose, the choice of which depends on the application of the statistical operator ρ ML . Typically, the estimator ρ ML is used for statistical prediction of probability distributions for future measurement schemes. Some measure of predictive power for the estimator is hence necessary to judge if the related subspace acquired from the nucleation process is sufficiently accurate in data prediction. Physically, the reconstruction subspace that best predicts data should be the largest possible subspace that tomographically covers all the possible datasets (infinite-dimensional in principle). In practice, however, all resources are finite and some sort of statistical certification is necessary to judge if the estimator of finite data that resides in a finite subspace is predictive enough. Cross-validation [31][32][33] (and a simplified variant discussed in ref. 34) is a decent certification tool of choice to judge if ρ ML resides in a large enough reconstruction subspace of reasonable coverage relative to the data of a given POM. Typically, when the estimator ρ ML fits the data from a POM according to the log-likelihood function, the estimator may not necessarily predict other data from the same POM, or any other POM for that matter, especially in a situation where the reconstruction subspace does not tomographically include the measurement data sufficiently. If, on average, ρ ML predicts different sets of data of the same POM, then the subspace yields a predictive ρ ML . For demonstrating the principles of cross-validation, we consider a two-fold cross-validation strategy and split the M measurement data into two datasets of equal size. Borrowing the language of machine learning, one dataset, the training set, is used to obtain ρ ML , and the other testing set is used to test the predictive power of ρ ML . The roles of both datasets are then switched, and training and testing are performed again. The average chi-square metric between the test data and the ML probabilities, th testing set describes the predictive power for ρ ML in terms of the prediction error for the given measurement scheme. Like all numerical algorithms, there are many ways to terminate the nucleation procedure. The observer may choose to set a pre-chosen tolerance level for PrErr beyond which the procedure stops; or compare the change in the current PrErr value relative to the preceding value and accept the reconstruction if the change falls below certain threshold; or simply repeat the procedure a pre-chosen number of times. The numerical stabilization of PrErr can serve as an indication that continuing the procedure will not give appreciable improvement in the resulting ML estimator and ML subspace. Since the value of PrErr fluctuates for every experimental run, its value for each reconstruction-subspace dimension D rec should be accompanied by a statistical quantifier for its reliability. As a typical choice, we shall assign confidence intervals to reflect the level of confidence (or signifcance) for these values. These confidence intervals are calculated using a known method of bootstrapping on the PrErr values (see Methods).

Numerical Experiments.
To put the ML subspace nucleation procedure to the test, for a given true state ρ true , we simulate an experimental run for a CV POM involving M = 1000 random rank-one POM outcomes distributed uniformly according to the Haar measure 35 . In this run, a total of N = 10 7 sampling events is measured with the POM and the resulting data are accumulated through Monte Carlo methods. To demonstrate the proposed numerical method, we investigate three examples, namely a coherent state, even coherent state and squeezed coherent state. Figures 1, 2 and 3 provide a visualization of the respective nucleation processes for these three states. Figure 4 presents the results of the nucleation process with statistical descriptions for the PrErr values.
Scientific RepoRts | 6:38123 | DOI: 10.1038/srep38123 The results obtained with simulated experiments verify the decreasing behavior for the values of the prediction error PrErr with increasing reconstruction-subspace dimension D rec . This behavior confirms that, logically, a larger subspace would more adequately accomodate the measurement outcomes and more accurately predict any data derived from these outcomes. Practical Aspects Of The Nucleation Methodology. Subspace coverage. In the usual situation, the observer has already an intended target quantum state ρ targ for the source in mind before setting up the experiment for a particular quantum protocol. Owing to experimental imperfections, the target state she intends to prepare is never the same as the true state ρ true to which she asymptotically measures. Nevertheless, if the control of the source is done well, the observer may have reasons to believe that ρ true should be close to ρ targ . For a given basis, the reconstruction dimension D rec , and hence the limit dimension D lim , should at least be large enough to encompass all the significant matrix elements of ρ true . Usually, the choice of D lim is decided from ρ targ by trusting that it is close enough to the unknown ρ true .
(e) (f) Figure 2. Subspace nucleation process from one set of data for (a) the even coherent state defined by  α α + − ( ) with α = 5 and a proper normalization. All other figure specifications are as described in Fig. 1. For this state, the eight-dimensional optimal ML subspace is sufficient for a rather accurate ML reconstruction.
However, such a gut feeling is not a necessary ingredient to pick the value of D lim , for the data themselves already contain all encoded information about the quantum-state features. If D lim is too small to cover the significant features of ρ true , the data should be able to tell us just that, which they do indeed. One way of capturing the tell-tale signs from the data is to inspect the PrErr values. If the PrErr saturates at a value that is large, then this is an indication that the subspace does not cover the state features very well, and the corresponding estimator does not explain the data obtained and will have limited predictive power. In this case, one would need to increase the value of D lim . The behavior of PrErr with D rec would eventually stabilize for sufficiently large D lim .
As an example, Fig. 5 shows the behavior of PrErr for different values of D lim , obtained from a fixed set of simulated data of a coherent state with mean photon number 30. As D lim increases, the saturation of PrErr lowers and vanishes for sufficiently large D lim . In this way, the choice of D lim is optimized without the need for a prior belief. Subspace truncation and rate of convergence. If the observer insists, she can certainly make use of an educated prior belief for the true state to enhance the ML subspace nucleation procedure in an objective way. To understand how, consider the simple case where ρ true is the single-photon Fock state |n = 1〉 〈 n = 1|. If one carries out the nucleation procedure in the computational basis with (hypothetical) noiseless data, then the procedure will terminate after just one step. This is because already after the very first step, the optimal qubit subspace is, of course, any of the subspaces that covers the n = 1 sector, and the resulting ML estimator ρ ML is precisely ρ true since all other matrix elements are zero in the computational basis. More generally, for (hypothetical) noiseless data and a rank-one ρ true , if the basis in which subspace truncation of the Hilbert space is performed happens to be the basis with one of the basis kets being the ket of ρ true , then the  nucleation procedure yields ρ true after the very first step, because in this basis all matrix elements are zero except for the one corresponding to the ket of ρ true . This argument is easily extended to any ρ true , in which case truncation in a basis containing all eigenstates of ρ true will give ρ true as the estimator in no more than rank {ρ true } steps. The exact number of steps would depend on the dimension d of the seed subspaces. Even for a realistic situation when the observer has access to only the target state ρ targ , not ρ true , and real data with statistical fluctuation, finding the right basis with a reasonable ρ targ to perform subspace truncation for the nucleation procedure can greatly speed up the nucleation procedure if ρ true is reasonably close to the target state, especially in the limit of large number of sampling events N.
It is now clear how the prior belief enters the nucleation procedure-it is simply used to set up the appropriate basis for subspace truncation in order to carry out subspace nucleation with significantly fewer steps. In no way is the final state estimator ρ ML dependent on the prior belief, only the rate of convergence to ρ ML , for the entire nucleation process is still controlled by data inspection alone once the basis is set up. Mathematically, if U is the unitary operator that converts the Fock basis to the appropriate basis, then a basis transformation ρ → UρU † on quantum states in the optimization routine is entirely equivalent to an inverse basis transformation Π j → U † Π j U on measurement outcomes due to the symmetry in the Born rule. If the observer believes that the target state ρ targ = | 〉 〈 | is the likely candidate for describing the source, she may take this and generate a D lim -dimensional eigenbasis of ρ targ and construct U out of this eigenbasis. The results in Fig. 6 further confirms the possibility of a significant improvement in nucleation convergence to the final optimal subspace and state estimator for a given set of data after a basis transformation.

Discussions
We have shown, from these findings, that the maximum-likelihood subspace nucleation procedure is a numerically feasible procedure for obtaining the valid optimal reconstruction subspace that contains the unknown quantum state and, at the same time, maximizes the (log-)likelihood with respect to the measured data. Throughout the procedure, no other assumptions about the source are required. The complete elimination of this requirement turns our proposed procedure into an extremely robust method for real experiments, where such assumptions are sometimes difficult to justify precisely. The reporting of all results on experimental state reconstructions and diagnostics using continuous-variable measurement schemes can now be done more reliably once this restriction is lifted, since the concern of reconstruction artifacts that typically arise from an unsuitable or a suboptimal choice of reconstruction subspace is now out of the picture.
The methods of cross-validation and bootstrapping are used to justify the appropriate size of the optimal reconstruction subspace by investigating its predictive power of future data from the same measurement scheme. Other statistical tools can also be invoked depending on the way the observer uses the resulting quantum-state estimator. In general, all these statistical tools would have to be improved in order to address statistical problems related to the quantum-state space, as the positivity constraint plays an important role in altering the probability distribution of any set of data generated from a quantum state, which would in general be different from its classical counterpart. The study of the implications of the positivity constraint on these statistical methods is beyond the scope of this article.
It should be emphasized that the nucleation methodology is completely general and applicable to quantum-state estimation strategies that are not necessarily invoking the maximum-likelihood principle. Very similar nucleation procedures may be implemented for strategies such as linear-inversion or weighted linear-inversion, for instance. The only difference is that the objective function is no longer the likelihood function, but some other function compatible with the chosen estimation strategy, and the quantum positive constraint can additionally be imposed on all such strategies. The subspace nucleation procedures for these strategies proceed as usual otherwise. The bottom line-the set of data obtained with any CV measurement scheme is the only essential element for an accurate subspace and state reconstruction.

Methods
Detailed numerical procedure for the ML subspace nucleation process. In this section, we shall also assume that the largest possible subspace for an efficient reconstruction has dimension defined by some large integer  D d lim -the limit for the state reconstruction. The "lth (reconstruction) subspace of dimension d" can therefore be synonymously understood as the D lim -dimensional projection operator S l,d .
The nucleation process for a particular CV measurement scheme makes use of a list of L seed subspaces of a pre-chosen dimension d. As an example, we shall take d = 2 and D lim = 16, which are the settings for the simulations. Each (D lim = 16)-dimensional projector S l,2 is used to compute the maximum log-likelihood with respect to the data. The operators involved in this computation are the state ρ l,d=2 and all the POM outcomes Π = j l d ( , 2) on this particular qubit subspace. More explicitly, for a given l, the two-dimensional ρ l,2 is simply represented as a two-dimensional positive, unit-trace matrix defined as  14 2 91 projectors which now corresponds to a set of subspaces that are orthogonal to the optimal subspace. The next larger (d = 4)-dimensional ML subspace is built from this seed by accomodating the optimal qubit seed subspace that is both orthogonal to the current subspace and maximizes the log-likelihood. The subsequent computation is very similar to that described for the previous case, only that ρ l,4 and Π j l ( ,4) are now four-dimensional operators. After this computation, the set of = ( ) 14 2 91 projectors is then reduced to the set of = ( ) 12 2 66 projectors that are orthogonal to all the selected projectors. The computation rate for this numerical scheme increases with each step as the set of seed subspaces on which ML estimation is performed decreases in size. The procedure continues in this manner until ρ ML fulfils some fixed criterion that would eventually terminate the nucleation process.

Cross-validation and bootstrapping. Cross-validation.
If the observer wants to use ρ ML to predict future measurement data, then the technique of cross-validation is a suitable approach to verify if this ML estimator is predictive. Here, cross-validation is used to verify its predictive power on at least the same measurement scheme. A common technique known as K-fold cross-validation involves the splitting of a set of M data into K datasets of equal size. A total of K − 1 datasets are chosen as training sets to obtain an ML estimator ρ ML . The remaining dataset, the testing set, is then used to test whether ρ ML gives ML probabilities that are close to these data on average. Other variants of cross-validation exists, some of which possess high computational complexities 32 . So far, no systematic studies of cross-validation has been performed for quantum tomography, as such the implications of the positivity constraint, if any, on the quantum-state space are not known. For the simulations, K is set to two to ensure that both the training set and testing set are equally large enough. For the specifications of typical homodyne experiments, the binned data are suitable for numerical computation for this value of K.
The predictive power of ρ ML is summarized by the prediction error where, without loss of generality, we have assumed that M is divisible by K. For a sufficiently large reconstruction subspace, PrErr would in principle approach zero if not for the slight statistical fluctuations of the measurement data. We mention in passing that in the case where a source drift is present, the true state of the source is no longer stable and describing the source with a single ML estimator using all the measurement data would result in an average bias for PrErr.
Bootstrapping. Since PrErr is statistical, it is in principle necessary to assign some statistical quantifier to it. Any statistical quantifier that describes the reliability of PrErr would generally require a sample of PrErr values for each D rec . With only one set of data, a viable option is to perform bootstrapping on this set of data to generate new sets of pseudodata for the construction of the quantifier. Without the assumption of a model for bootstrapping, the non-parametric bootstrap method is suitable and has been proven to give sample points that follow a distribution close to the population distribution. However, this convergence comes with strings attached, such as the adherence to a list of other assumptions, and these assumptions are not always satisfied for some cases, especially in the presence of the quantum positivity constraint. A workaround is to suggest that since the PrErr decreases with increasing D rec (if N is large enough that is), we may take the ρ ML estimator with the smallest PrErr for bootstrapping-the parametric bootstrapping strategy. This choice of model for the bootstrap data asymptotically guarantees that the resulting bootstrap distribution of random PrErr values converges to the actual population distribution from the true state as long as  N 1 (typical situation in CV experiments) and  PrErr 1. The procedure for generating a PrErr value from a set of pseudodata obtained from a run of parametric bootstrapping is exactly the same as in the case of real data. Parametric bootstrapping is then repeated to accumulate a sample of bootstrap PrErr values for each D rec .
The quantifier chosen as an example is the confidence interval that representatively quantifies the confidence level for each PrErr value. For a given significance level 0 < α < 1 that is small, we first compute the 1 − α/2 and α/2 percentiles from each bootstrap sample. Upon denoting the percentiles respectively by PrErr 1−α/2 and PrErr α/2 , the confidence interval is defined as the percentile interval [2 PrErr − PrErr 1−α/2 , 2 PrErr − PrErr α/2 ]. The advantage of this interval is that it is computationally efficient and captures approximately some essence of the sample dstributions. A more accurate interval can be acquired by performing a second-level bootstrapping for the standard deviation of each sample, which is often computationally intractable. As an estimate for the spread of PrErr, the percentile confidence interval provides sufficiently reliable information for general purposes. Besides, other statistical information is usually needed to supplement this interval for a more thorough data analysis.