Stochastic optimization on complex variables and pure-state quantum tomography

Real-valued functions of complex arguments violate the Cauchy-Riemann conditions and, consequently, do not have Taylor series expansion. Therefore, optimization methods based on derivatives cannot be directly applied to this class of functions. This is circumvented by mapping the problem to the field of the real numbers by considering real and imaginary parts of the complex arguments as the new independent variables. We introduce a stochastic optimization method that works within the field of the complex numbers. This has two advantages: Equations on complex arguments are simpler and easy to analyze and the use of the complex structure leads to performance improvements. The method produces a sequence of estimates that converges asymptotically in mean to the optimizer. Each estimate is generated by evaluating the target function at two different randomly chosen points. Thereby, the method allows the optimization of functions with unknown parameters. Furthermore, the method exhibits a large performance enhancement. This is demonstrated by comparing its performance with other algorithms in the case of quantum tomography of pure states. The method provides solutions which can be two orders of magnitude closer to the true minima or achieve similar results as other methods but with three orders of magnitude less resources.

The method is based on an estimation of the Wirtinger complex gradient of the target function, which is subsequently used to generate a sequence of complex estimates approaching the minimizer. Magnitude and direction of the gradient's estimate are calculated as the difference between the target function evaluated at two different points and as a complex vector whose components are randomly, independently generated, respectively. Thereby, all calculations are carried out within the field of complex numbers. The estimation of the complex gradient is asymptotically unbiased and the sequence of complex estimates converges to the solution of the minimization problem.
CSPSA enables the optimization of functions with unknown parameters, since the only input it requires are evaluations of the target function. For instance, in quantum tomography the value of the infidelity I can be obtained by measuring, on the system described by state |ψ〉, an observable that contains in its spectral decomposition the state φ | 〉. Thus, we can obtain the values of I for any φ | 〉 as long as |ψ〉 is an unknown but fixed parameter. Determining the amount of geometric entanglement 3,4 of an unknown state is also within reach of CSPSA. In this case the infidelity of an unknown multipartite pure quantum state is minimized with respect to the set of separable states, which requires the measurement of local observables. The violation of the Claus-Horne-Shimony-Holt 5 (CHSH) inequality with an unknown state, pure or mixed, can also be studied. In this case CSPSA maximizes the violation by driving the measurement bases to the optimal measurement setting.
CSPSA exhibits a large performance boost, in comparison to stochastic optimization algorithms for functions of real variables. We show this by applying CSPSA to the tomography of pure quantum states. Extensive numerical simulations via random sampling show that CSPSA achieves values of the mean infidelity orders of magnitude smaller than the ones provided by Self guided quantum tomography (SGQT), a quantum tomographic scheme based on a stochastic minimization method for functions of real variables 1 . These simulations consider the same amount of resources for both methods, that is, number of equally prepared quantum systems and total number of measurement outcomes or, equivalently, number of iterations and evaluations of the target function. Consequently, CSPSA leads to a considerable reduction in the resources required to estimate an unknown pure quantum state and provides a clear indication that optimization on complex variables can lead to higher performance methods. Furthermore, it has been shown that the use of resources by SGQT compares favorably to other quantum tomographic schemes 10 . Thus, CSPSA based quantum tomography provides a further improvement in the search for the efficient use of resources [11][12][13][14][15][16] in the determination of quantum states.

Method
The problem of optimizing a real-valued function of complex variables , where the set C is given by with n = 1, can be completely stated within the field of complex numbers. This requires the definition of Wirtinger derivatives 6 The Cauchy-Riemann equations establish necessary and sufficient conditions for the existence of the complex derivative ′ = Given the function f(z) = u(x, y) + iv(x, y) of  = + ∈ z x iy with x, y, u and  ∈ v , the Cauchy-Riemann conditions are ∂ x u = ∂ y v and ∂ y u = −∂ x v. Thus, in terms of the Wirtinger derivatives, the Cauchy-Riemann conditions are equivalent to ∂ = ⁎f 0 z , in which case the (standard) complex derivative f′(z) agrees with the definition of ∂ z f. However, Wirtinger derivatives ∂ z f and ∂ ⁎f z might exist even when the Cauchy-Riemann conditions do not hold. For example, for f = |z| 2 we have ∂ z f = z * and ∂ ⁎f z = z, while in this case the function f is non-holomorphic. Let us note that one of the advantages of Wirtinger derivatives is that they can be manipulated as real partial derivatives, where z and z * are treated as independent variables since ∂ z z * = 0 = ∂ z* z.
The search for stationary points of real-valued functions of complex variables cannot be carried out with the help of the standard complex derivative, which in this case does not exist. Therefore, the problem is studied at the level of the field of the real numbers by calculating the points at which the real gradient vanishes. Nevertheless, it is possible to define a complex vector gradient operator which allows for the search of stationary points easily and with mathematical rigor. For a complex-valued function f(μ) with µ = ∈ ⁎ C z z ( , ) n and an infinitesimal change δμ = (δz, δz * ), the change δf in the value of the function f is given by 17 In the case of a real-valued function f, we have that Thereby, stationary points are completely characterized by the vanishing of the gradient ∂ ⁎f z = 0 or, equivalently, by ∂ z f = 0 18,19 . Furthermore, for a given magnitude of δz, the maximum increase in f arises when δz is in the direction of ∂ ⁎f z . This approach to the optimization of functions of complex variables, holomorphic or not, allows to keep all manipulations within the field of complex numbers as well as to obtain simpler expressions.
The Complex simultaneous perturbation stochastic approximation generates a sequence of estimates ẑ k of the The estimate ẑ k of  z at the k-th iteration is updated according to the iterative rule www.nature.com/scientificreports www.nature.com/scientificreports/ of the real-valued gradient. Instead, CSPSA is based on an estimation ĝ k of the gradient g of f with respect to z * .
k k and c k a positive gain coefficient. The vector  Δ ∈ k n is randomly generated and ε ± k, describe the presence of noise in the values of ± ±ˆ⁎ f z z ( , ) k k . The estimation of g by means of evaluations of f becomes an advantage when g is not readily available. For instance, the evaluation of g is computationally resource intensive, g cannot be directly inferred from measurements in real-time applications, the exact functional relationship between f and z is unknown, or f depends on a set of unknown parameters. The estimation ĝk requires the evaluation of f at two different vectors ± z k regardless of the underlying dimension of the optimization problem. These evaluations are carried out by simultaneously varying all components of the vector ẑ k through the addition and subtraction of the randomly generated components of the vector Δ k . CSPSA also allows for the presence of noise in the evaluations of f, which might occur due to experimental inaccuracies in the acquisition of the values of f or due to finite sample size effects. Other optimization methods have similar properties, for instance Simultaneous perturbation methods (SPM) 20 and the Finite difference stochastic approximation (FDSA) 21 , which unlike CSPSA work on the field of the real numbers. SPM and FDSA are employed to optimize real-valued functions f(x) with  ∈ x n and are based on the update rule . This estimation is calculated on a point x k , which is generated by means of a stochastic process. However, CSPSA maintains all calculations and updates of ẑ k , f and ĝk within the field of complex numbers.
Stochastic optimization algorithms, such as SPM and FDSA, which are characterized by an iterative rule as in Eq. (3) but on the field of the real numbers, have been intensively studied 22,23 and conditions to guarantee local convergence have been firmly established. This can be suitably extended to encompass optimization on the field of the complex numbers by means of CSPSA. This is introduced in detail in the Supplementary Information by means of two theorems. In particular, it is possible to show that the sequence −  z z k as well as the conditional k k vanish asymptotically. Thereby, the sequence of estimates ẑ k provided by Eq. (3) converges almost surely to the minimizer  z of the optimization problem and gk defined by Eq. (4) is an asymptotically unbiased estimation of the gradient g of f. A property is satisfied almost surely if it is satisfied with probability one. Equivalently, the property does not hold for a null measure set.
Convergence and unbiasedness of CSPSA require conditions on Δ k , a k , c k and f that can be fulfilled with particular choices. The components of Δ k are independent and identically generated by selecting at each iteration with equal probability values in the set There is still, however, a considerable freedom in the choice of Δ k which also allows for improving the rate of convergence. Our choice of Δ k is given by ν p = {0, π/4, π/2, 3π/4}. This corresponds to a vector in  n 2 with vanishing components, which does not satisfy the conditions for the convergence of SPM and thus it cannot be employed as the direction of the estimation of a real gradient. The gain coefficients a k and c k control the convergence of CSPSA and are chosen as This choice is also employed in SPM. The values of a, A, s, b and r are adjusted to optimize the rate of convergence and depend on the target function. These are chosen in the case of CSPSA as the values which optimize the convergence of SPM in the asymptotic regime, that is, for a large number of iterations. Interestingly, these values lead to a much higher rate of convergence of CSPSA in the regime of a few iterations, when compared to SPM with standard (s = 0.602, r = 0.101, A = 0, a = 3, b = 0.1) or asymptotic (s = 1, r = 0.166, A = 0, a = 3, b = 0.1) gains. In the case of SPM, standard gains provide in the regime of a small number of iterations a faster convergence than the asymptotic gains.
An unknown pure quantum state |ψ〉 can be completely determined by minimizing the infidelity ψ φ = − |〈 | 〉| I z z ( ) 1 () 2 with respect to the complex variables z i that define the known pure quantum state The complex coefficients of state |ψ〉 entering in I(z) are considered to be unknown but fix parameters and the global minimum I = 0 is achieved when φ | 〉 = |ψ〉, for any |ψ〉. The optimization of I(z) by means of CSPSA, Eqs (3) and (4), requires at each iteration the values I(z k,± ), which are experimentally obtained by projecting the system in the unknown state |ψ〉 onto a base containing the state φ . The values I(z k,± ) are then estimated as 1 − n k,± /N where n k,± is the number of times the state φ is detected and N is the total number of detections. Thereby, the total number of available copies N tot of the quantum system in the unknown state is distributed among the total number of experiments for estimating two values of I at each iteration and the total number of iterations k, that is, N tot = 2Nk. Noise tolerance of CSPSA guaranties convergence even when projecting onto states slightly different than φ . The optimization of the infidelity can also be carried out on the field of the real numbers. In this case, the components of z are mapped onto the real numbers with the help of polar angles entering in hyper-spherical coordinates and arguments of complex phases. The infidelity becomes I(x) and now is possible to apply an optimization algorithm in the SPM family, for instance the Simultaneous perturbation stochastic approximation (SPSA) 24 . This employs an estimation of the real gradient and is described by Eqs (3) and (4) but replacing the complex vector z k by the real vector x k . The components of Δ k are independently and identically distributed and randomly selected from the set {+1, −1}. The application of SPSA to the determination of pure states has been introduced www.nature.com/scientificreports www.nature.com/scientificreports/ in the literature as SGQT and experimentally demonstrated 10 . Since CSPSA and SPSA (or SGQT) require at each iteration of exactly the same number and type of measurements, they are a perfect match for a comparative performance analysis. Figure 1 shows the mean infidelity I, obtained by sampling according to the Haar distribution an ensemble of 10 4 pairs of unknown states and initial guess states, as a function of N and k for the quantum tomography of a single qubit via CSPSA and SPSA. CSPSA achieves for k = 100 a mean infidelity which is at least 1 order of magnitude smaller than SPSA for a fixed amount of resources N tot . Thus, CSPSA clearly leads to an enhancement of the performance. The best mean infidelity achieved by SPSA at k = 100 is Ī ≈ 5 × 10 −4 with N = 10 4 , that is, N tot = 2 × 10 6 . This mean infidelity value can be achieved by CSPSA at k = 40 with N = 10 2 , that is, N tot = 8 × 10 3 . Thereby, CSPSA offers a performance comparable to SPSA but with a large reduction in the amount of resources. The inset in Fig. 1 reproduces our performance analysis by means of the median and the interquartile range for both methods, where CSPSA still exhibits a performance boost over SPSA. At this point we note that there is no known proof for the convergence of the median for SPSA or CSPSA. In the case of CSPSA median and mean infidelity exhibit close values while SPSA shows a large difference between these figures. This is an indication that SPSA produces an asymmetric distribution for the infidelity which is much wider than the one generated by CSPSA. Figure 2 shows the mean infidelity generated by CSPSA as function of the number of iterations and the dimension. To achieve a predefined mean infidelity the number of required iterations increases with the dimension. Numerical simulations indicate that in the regime of a small number of iterations, that is, k ≤ 100, and for the inspected dimensions, that is, d ≤ 32, CSPSA surpasses SPSA, both in mean and in median.

Results
CSPSA has other feasible applications to target functions with unknown parameters. For instance, the geometric measure of entanglement 2 of a pure n-partite state |ψ〉 defined as . CSPSA can be employed, likewise quantum tomography via the optimization of I, to obtain the value of E sin 2 for an unknown pure n-partite state by independently varying the local variables z i . Violation of Bell-like inequalities 5 also provide an interesting application of CSPSA. These are functions of a quantum state ρ, pure or mixed, and of measurement settings, typically observables. The maximal violation is obtained by optimizing with respect to the observables to be measured, which assumes the state ρ is known. If this is not the case, then we can apply CSPSA to the bases defining the observables in order to optimize the violation of the inequality. Thereby, the measurement of entanglement and the violation of Bell-like inequalities with unknown states can be implemented with the help of local measurements driven by CSPSA. The determination of ground state energy of complex physical systems 25 and the post-processing of quantum tomographic data via maximum-likelihood estimation [26][27][28] are difficult optimization problems due the large number of variables involved. Since CSPSA requires two evaluations of the target function independently of the number of complex variables, these problems might benefit from CSPSA. The utility of this methods goes beyond quantum mechanic and quantum information theory. Radio interferometric

Discussion
In summary, CSPSA allows to optimize real-valued functions of complex variables. This makes unnecessary to recast the problem as the minimization of a more convoluted function of real variables. CSPSA shares several properties with the family of SPM: no need to evaluate the gradient of the target function, a reduced number of evaluations of the target function, noise tolerance, asymptotic unbiasedness and convergence in mean to the minimizer. However, CSPSA can achieve a large performance enhancement when compared with methods within this family, as for instance SPSA. We show this at hand of an important problem: Tomography of pure quantum states. Here, CSPSA outperforms SPSA when employing the same resources, or provides a similar performance but with far less resources. Thus, CSPSA constitutes a clear indication that optimization methods formulated within the field of complex numbers can lead to higher performances and provides a guideline for generalizing other optimization methods to the field of complex numbers, such as for instance preconditioned gradient methods 33 . There are several scenarios where the performance of quantum tomography via CSPSA can be enhanced. For instance, CSPSA requires two values of the Infidelity at each iteration. These are obtained by projecting onto two orthonormal bases, which generates 2d − 2 probabilities. Only two of them are employed by CSPSA. It is thus possible that the concatenation of CSPSA to an inference method, such as maximum likelihood estimation or bayesian inference, leads to a further speed up of the convergence of the tomographic method. This a very interesting possibility. As Fig. (1) suggests, the mean Infidelity provided by CSPSA seems to enter into an asymptotic regime, that is, Ī ≈ α(d)/N, where α(d) is a function of the dimension d. A suitable choice of the inference method might lead to α(d) ≈ d − 1. Thereby, the tomographic method would reach the Gill-Massar lower bound for the estimation accuracy of pure states [34][35][36][37][38] . We have based the tomographic method on the measurement of the Infidelity. It is, however, possible to employ other metrics, such as, for instance, mean squared error, that can be measured in interferometric experiments. We can also consider an extension of the present results to the case of reconstructing unknown coherent states and Schrödinger cat states of the electromagnetic field, where the Infidelity can be measured as the probability of projecting a displaced coherent state onto the vacuum state. Finally, we mention that an experimental demonstration of CSPSA in higher dimensions is within reach of current experimental setups 11,39-41 based on single photons and concatenated spatial light modulators.