Determination of the asymptotic limits of adaptive photon counting measurements for coherent-state optical phase estimation

Physical realizations of the canonical phase measurement for the optical phase are unknown. Single-shot phase estimation, which aims to determine the phase of an optical field in a single shot, is critical in quantum information processing and metrology. Here we present a family of strategies for single-shot phase estimation of coherent states based on adaptive non-Gaussian, photon counting, measurements with coherent displacements that maximize information gain as the measurement progresses, which have higher sensitivities over the best known adaptive Gaussian strategies. To gain understanding about their fundamental characteristics and demonstrate their superior performance, we develop a comprehensive statistical analysis based on the Bayesian optimal design of experiments, which provides a natural description of these non-Gaussian strategies. This mathematical framework, together with numerical analysis and Monte Carlo methods, allows us to determine the asymptotic limits in sensitivity of strategies based on photon counting designed to maximize information gain, which up to now had been a challenging problem. Moreover, we show that these non-Gaussian phase estimation strategies have the same functional form as the canonical phase measurement in the asymptotic limit differing only by a scaling factor, thus providing the highest sensitivity among physically-realizable measurements for single-shot phase estimation of coherent states known to date. This work shines light into the potential of optimized non-Gaussian measurements based on photon counting for optical quantum metrology and phase estimation.


I. INTRODUCTION
Optical phase estimation is ubiquitous in many fundamental and practical problems ranging from quantum state preparation [1][2][3], sensing [4], communications [5][6][7][8][9][10][11][12][13][14], and information processing [15]. In a photonic metrology problem [16][17][18][19], an optical probe state interacts with a physical system to interrogate its properties. This interaction maps parameters of the system to the state of the optical probe, where an optimal readout can be performed [15,[17][18][19]. When the physical property of the system is mapped onto the phase of the optical probe, the optimal quantum measurement is the canonical phase measurement [20], which consists of projections onto phase eigenstates [21]. However, while theoretically the canonical phase measurement exists, the physical realization of projections onto phase eigenstates are not physically known [22]. Thus in practical estimation problems in quantum metrology one seeks to determine the limits in precision of physically realizable measurements, and the degree with which they approach to the fundamental quantum limit in sensitivity [17,[23][24][25][26][27].
Physically realizable measurements of the optical phase have been widely investigated [20,21,28] for diverse metrological problems with quantum and classical fields [29][30][31][32][33] including sensing small deviations from a known phase [29][30][31][32][33][34] and estimation with repeated * fbecerra@unm.edu sampling [35,36] and measurements [30,[37][38][39][40][41]. Beyond these specific estimation problems, measurements of the phase of a single optical mode in a single-shot are central for quantum state preparation and detection [42,43], waveform estimation and sensing [44][45][46][47], and quantum information processing [48][49][50]. The standard measurement for optical phase estimation is the heterodyne measurement [21], which samples both quadratures of the electromagnetic field simultaneously from which the phase can be estimated. However, the achievable sensitivity of the heterodyne measurement [21] is far below the ultimate measurement sensitivity allowed by physics, given by the canonical phase measurement [21,51]. Adaptive measurement techniques based on homodyne detection, a Gaussian measurement, can be used to align the phase quadrature of the optical field with the measurement setting where the homodyne measurement provides maximum sensitivity [32]. Adaptive homodyne has been theoretically shown to surpass the heterodyne limit and get closer to the canonical phase measurement for optical phase estimation of coherent states [20,21], providing the most sensitive Gaussian measurement of the optical phase so far [21].
In a complementary measurement paradigm, quantum measurements of coherent states based on photon counting, displacement operations, and feedback [2,6,[8][9][10][11][12][13][14]52] have enabled state discrimination below the Gaussian sensitivity limits and approaching the Helstrom bound [5]. Recently, some of the authors proposed and demonstrated a non-Gaussian measurement strategy for single-shot phase estimation of coherent states, able to surpass the heterodyne limit and approach the sensitivity limit of a canonical phase measurement in the presence of loss and noise of real systems [2]. These measurement strategies are based on realizing coherent displacements of the input field and monitoring the output field with photon number resolving detection. The information from the detection outcomes is then used to implement real time feedback of displacement operations optimized to maximize the measurement sensitivity of the phase of the input state. This estimation strategy is the most sensitive single-shot non-Gaussian measurement of a completely unknown phase encoded in optical coherent states so far [2]. In this strategy the displacement operation optimization is realized by maximizing either the information gain in subsequent adaptive steps of the measurement or the sharpness of the posterior phase distribution. While these cost functions are functionally different, both perform similarly and get close to the ultimate sensitivity allowed by physics, the Crámer-Rao lower bound (CRLB), in the limit of many photons and many adaptive measurements. While the work in Ref. [2] demonstrated the potential of non-Gaussian measurements for single-shot phase estimation, the superiority over adaptive homodyne detection was not proven. A deeper understanding of the properties of convergence and ultimate limits of the estimators produced by non-Gaussian measurements is still missing. This is an open problem due to the complexity associated with the analysis of these adaptive non-Gaussian strategies.
In this article we investigate a family of adaptive non-Gaussian strategies based on photon counting for singleshot optical phase estimation, and assess their performance compared to Gaussian measurements and the canonical phase measurement. To analyze these non-Gaussian strategies, we use the mathematical framework of Bayesian optimal design of experiments, which provides a natural description of non-Gaussian adaptive strategies, allowing us to investigate their fundamental characteristics and determine the limits in sensitivity, which up to now has been a challenging problem. Our work provides a comprehensive statistical analysis of adaptive non-Gaussian measurements for parameter estimation, and the requirements to approach optimal bounds in the asymptotic limit. We show that strategies based on photon counting and feedback for single-shot phase estimation of coherent states provide superior sensitivity over the best known adaptive Gaussian strategies, having the same functional form as the canonical phase measurement in the asymptotic limit, differing only by a scaling factor. This work provides a deep insight into the potential of optimized non-Gaussian measurements for quantum communication, metrology, sensing, and information processing.

A. Holevo Variance of Non-Gaussian Estimation Strategies
The non-Gaussian phase estimation strategies investigated here are based on photon counting, displacement operations, and feedback, and are optimized by maximizing a specific cost function. These strategies maximize either the estimation precision (by minimizing the Holevo variance [51]), or the information gain about the unknown parameter based on entropy measures including mutual information, the Kullback-Lieber divergence, and the conditional entropy [53,54]. We note that these cost functions produce non-identifiable likelihood functions that do not allow to correctly estimate a cyclic parameter, such as the phase [55]. To address this problem, these non-Gaussian strategies use the Fisher information to optimize the displacement operations, which are the dynamical control variable in the strategy, to guarantee that these cost functions provide identifiable likelihood functions, and to enable optical phase estimation with near optimal performance.
In the problem of single-shot phase estimation with coherent states, an electromagnetic field in a coherent state ρ 0 = |α α| interacts with a physical system and experiences a unitary transformation e iφn , wheren is the number operator. The phase φ induced in the coherent state carries information about the system, which can be extracted by a measurement of the output state ρ(φ) = e iφn ρ 0 e −iφn = e iφ α e −iφ α .
Measurements onto ρ(φ), together with an estimator φ on the measurement outcomes, provide an estimate of φ, and a measurement strategy aims to obtain the best estimation of the physical parameter. The efficiency of such a strategy is characterized by the estimation variance as a function of the number of photons in the coherent state. The most efficient strategy provides a variance at the highest convergence rate towards zero as the number of photons increase.
The standard measurement paradigm for phase estimation of Gaussian states is the heterodyne measurement (a Gaussian measurement), with an estimator variance of Var φ Het = 1/2|α| 2 . Within the paradigm of Gaussian measurements, adaptive homodyne strategies optimized to minimize the Holevo variance have achieved the best performance among Gaussian measurements for single-shot phase estimation of coherent states [20]. The best adaptive Gaussian measurement reported to date, termed the Adaptive Mark-II (MKII), achieves a Holevo variance in the limit of large number of photons of: While this optimized Gaussian strategy surpasses the heterodyne limit, it has an error of order 1/|α| 3 above the Cramér-Rao Lower Bound (1/4|α| 2 ), and does not reach the performance of the canonical phase measurement [21]: In this work, we numerically show that non-Gaussian strategies for single-shot phase estimation based on photon counting, optimized displacement operations, and real-time feedback achieve an estimator variance smaller than Gaussian strategies with an asymptotic scaling in the limit of high mean photon numbers of: Figure 1A summarizes the main result comparing the three asymptotic variances for the canonical phase measurement (solid blue), MKII (solid red), and non-Gaussian strategies (solid green and points). Figure 1B, shows the excess Holevo variance compared to the canonical phase measurement (Var[·] − Var[φ CPM ]) for Heterodyne (purple), MKII (red), and non-Gaussian strategies (green). These non-Gaussian strategies implement a series of adaptive steps with displacement operations optimized to maximize information gain, while ensuring efficient phase estimators in the asymptotic limit. These strategies surpass the best known Gaussian strategy in Eq. (1), and have the same functional form as the canonical phase measurement in the asymptotic limit, differing only by a scaling factor, thus providing the highest sensitivity among physically-realizable measurements for single-shot phase estimation of coherent states known to date.
Achieving this performance with non-Gaussian estimation strategies, however, requires a deep understanding of the measurement process. To gain this understanding, we use the mathematical framework of Bayesian optimal experimental design, which provides a natural description of adaptive non-Gaussian measurements. This allows us to optimize these strategies for single-shot phase estimation with a Holevo variance given by Eq. (3).

B. Bayesian optimal design of experiments
Phase estimation of coherent states based on photon counting with adaptive coherent displacement operations can be defined in the context of Bayesian optimal design of experiments. Optimal design of experiments allows for improving statistical inferences about quantities of interest by appropriately selecting the values of control variables known as designs [54,56,57]. In this framework, it is assumed that the experimental data y (the measurement outcomes) can be modeled as an element of the set where d is a parameter called design chosen from some set D called design space, φ ∈ Φ is an unknown parameter to be estimated, and the data y comes from a sample space Y ⊆ R. In this paradigm, the experimenter has full control over the designs d and the ability to adjust them prior to making a measurement. This allows for optimizing such measurement for estimating the unknown parameter φ. Bayesian optimal design of experiments goes beyond standard methods for parameter estimation based on Bayesian statistical inference [58][59][60][61][62], by providing the suitable mathematical framework to ensure optimal designs to find efficient estimators for a general parameter space [63][64][65][66].
In the Bayesian approach of optimal design [54,56], the initial lack of knowledge about φ is modeled as a prior probability distribution p(φ). The measurement aims to reduce the uncertainty of φ as much as possible by the use of Bayes' theorem over the prior distribution. In an estimation problem, the optimal choice for the designs d ∈ D maximize the expected value of a cost function U (d, φ, y) with respect to the possible outcomes of y and φ: A standard approach in optimal design of experiments considers choosing d opt at the beginning of the experiment an then sample data from p(y | d opt , φ) for all subsequent trials. An alternative approach considers dynamically updating d opt on each trial, as more data is collected. The advantage of this approach is that adaptive estimation strategies are never less efficient than the non-adaptive ones [67].
The implementation of Bayesian optimal design of experiments requires a cost function, such as the Kullback-Lieber divergence between the prior and posterior distributions [68]: The Kullback-Lieber divergence provides a distance between probability distribution p(φ | d, y) and p(φ) [53]. If p(φ | d, y) is equal to p(φ) then U (d, y) = 0 and there is not any gain of information about φ by measuring with design d and outcome y.
Another cost function considered in optimal design is the conditional entropy between the plausible values of φ and y which is a measure of how much information is needed to describe the outcomes of the random variable φ given that the value of another random variable Y is known [53]. However, we note that cost functions Eq. (6) and Eq. (7) can be related to the mutual information [53]: As a result, designs d maximizing any of these cost functions in Eq. (6), Eq. (7), or Eq. (8) are equivalent [54,56,69]. Moreover, in the asymptotic limit, maximizing these cost functions is equivalent to minimizing the determinant of the covariance matrix (D-optimality design criteria), that is, in the asymptotic limit, optimizing these cost functions is equivalent to optimizing any member within the family of D-optimal designs [54,68].
Our goal is to apply the theory of Bayesian optimal design of experiments to the problem of phase estimation of coherent states with photon counting and adaptive coherent displacement operations. The adaptive non-Gaussian estimation strategy consists of several parts: i) in the first adaptive step, it uses an specific cost function and the prior information to choose the design; ii) then, it performs a measurement; iii) based on the measurement outcome, it uses Bayes' theorem to update the probability distribution; iv) and lastly, it uses a recursive approach, where this posterior probability distribution becomes the prior of the subsequent measurement step i). The estimation of φ at each adaptive step is obtained from the maximum posterior estimator (MAP) of the posterior probability distribution. This approach requires that the MAP converge to the true value of the parameter when the number of measurements increases.
In the adaptive mathematical framework of optimal experimental design, Paninski [67] proved that under a set of regular modelling conditions and in the case when φ ∈ R, cost functions based on mutual information can allow for designs that lead to asymptotically consistent and efficient MAP estimators with variance Here co (F (φ; d)) denotes the convex closure of the set of Fisher information functions F (φ; d). In other words, the estimations produced by φ converge to φ (consistency), and the distribution of φ tends to a normal distribution with mean φ and variance given by Eq. (9) when the number of adaptive steps tends to infinity (efficiency). Formally, the regularity conditions introduced in [67] can be stated as follows: 6. The maximum of the convex closure of the set of Fisher information functions We note that in the case of estimation of a scalar parameter, the maximization of mutual information is equivalent to the minimization of the mean square error (MSE) [67]: This shows that the MSE is a trade-off between the estimator's variance and its bias. As a result, since the phase is a scalar quantity, in the asymptotic limit both the mutual information Eq. (8) and the mean square error Eq. (10) are appropriate cost functions to find optimal estimation strategies. Even more, for unbiased estimators (such as the MAP estimator on the asymptotic limit), the MSE corresponds to the estimator variance, then the maximization of mutual information is equivalent to the minimization of estimator variance. In practice, however, an estimation strategy would use a cost function that can be calculated efficiently and with a high rate of convergence.
Phase estimation in coherent states. Optimal phase estimation of coherent states of light aims to obtain the best estimate, from the outcomes of a physical measurement, of an unknown phase φ ∈ [0, 2π) encoded in a coherent state ρ(φ) = e iφ α e −iφ α . The most general description of a physical measurement is given by a POVM, a positive operator-valued measure.
According to information theory, if an estimator φ of φ is constructed using a sample from a POVM M , the limit in the accuracy of φ is given by the Crámer-Rao Bound [71,72] Here F M (φ) is the Fisher information of M about φ, which quantifies how much information about φ is carried in a sample from M : Since the Fisher information in Eq. (13) depends on the POVM M , the maximization of the Fisher information over all POVMs provides the lowest possible Cramér-Rao bound. This maximum Fisher information over all POVMs is known as the quantum Fisher information F Q (QFI), and the lowest possible bound in the accuracy of φ is known as the quantum Cramér-Rao bound (QCRB) [5,72,73]. In the case of phase estimation of coherent states F Q = 4|α| 2 , and the QCRB is: In the limit of large photon number |α| 1, the QCRB is saturated by the canonical phase measurement (the optimal phase measurement), which is described by the POVM [51]: where |n is an eigenstate of the number operatorn. The operator M ( φ) is an element of the canonical phase measurement whose outcome is a number φ ∈ [0, 2π), which can be used as an estimation for φ. The optimality of the canonical phase measurement indicates that its Holevo variance is the fundamental bound of estimation precision [20,51].
Although there are proposals that attempt to implement this POVM, they have not been able to reach the fundamental bound Eq. (16), or Eq. (2). For instance, in [74] it was possible to obtain the canonical measurement distribution as the marginal of a joint measurement in phase space producing a worse performance in the context of phase estimation. Moreover, the canonical phase measurement was implemented for the case of one-photon wave packet using quantum feedback [22]. However, for the case of higher dimensional states, such as coherent states, this problem remains open. While there is not a satisfactory known method to implement the canonical phase measurement, Gaussian strategies serve as a standard of physically realizable measurement techniques for phase estimation of coherent states. The natural benchmark in the Gaussian strategies is the heterodyne detection, whose variance is lower bounded by Var φ Het = 1/2|α| 2 [21]. Several adaptive Gaussian schemes have been shown to exceed the lower bound for heterodyne detection. The most efficient Gaussian measurement reported to date, termed the Adaptive Mark-II (MkII) strategy [20], has a variance in the limit of |α| 1 given by Eq. (1). Nonetheless, these adaptive Gaussian strategies cannot reach the performance for the canonical phase measurement in Eq. (2) [21].
The proposed non-Gaussian strategies for single-shot phase estimation of coherent states are based on optimized adaptive measurements with photon number resolving detection. These non-Gaussian strategies are able to outperform the best known Gaussian strategies and closely follow the performance of the canonical phase measurement in the limit of large photon number.

C. Adaptive non-Gaussian phase estimation
The proposed non-Gaussian adaptive estimation strategies based on adaptive photon counting [2] aim to estimate the phase φ ∈ Φ = [0, 2π) of a coherent state ρ(φ) = e iφ α e −iφ α with mean photon number E [n] = |α| 2 using a finite number of adaptive measurement steps, and based on the prior information p(φ) about φ. In every adaptive step l = 1, 2, · · · , L, the input coherent state with energy |α| 2 /L interferes with a local oscillator field, which implements a displacement operationD (β) |α = |α + β , β ∈ C, with phase and amplitude chosen by some rule, in general depending on previous measurement outcomes. This is followed by a photon number detection measurement with a given photon number resolution (PNR) m of the detectors [6], m ∈ N. In practice, since the energy in each adaptive step is |α| 2 /L, the strategy will only require moderate PNR resolution (m < 10) as L increases.
In the first adaptive measurement l = 1 [2], the strategy makes a random guess hypothesis β 0 ∈ C about the optimal β, and applies the POVM over the state e iφ α/ √ L . In the POVM in Eq. (17), the sum over Fock states |n n| models the photon detection on the displaced state with a detector with PNR(m) [6]. The corresponding Wigner function describing a photonnumber resolving detector shows non-Gaussian features with negative values. For this reason, these adaptive estimation techniques are called "non-Gaussian", despite that the estimator produced is asymptotically normal (this result will be proved in the remainder of this section).
Given the detection outcome n 1 in l = 1, the posterior probability distribution becomes where L (φ | n 1 ; β 0 ) is the likelihood function given by The phase estimate φ 1 in this adaptive step corresponds to the MAP estimator φ MAP , φ 1 = φ MAP (n 1 ), with the posterior probability distribution in Eq. (18). Using the posterior phase distribution in Eq. (18) as the prior for the next adaptive step l = 2, the strategy optimizes a cost function U (β) to obtain the next value of β denoted as β 1 , and implements the POVM in Eq. (17) with β 1 . The Bayesian updating procedure is repeated at each step l ≥ 2. After l adaptive measurements the posterior probability distribution becomes Here n i is the observed photon detection at step i. Using the MAP on this phase distribution, we obtain the lth estimation φ l . The procedure is repeated until the last measurement step L. This parameter estimation strategy is a particular case of Bayesian optimal design of experiments, where the parameters β ∈ C are the designs, and which are optimized to estimate a phase φ ∈ [0, 2π). Since the optimal value for β on each adaptive step depends on all previous detection results, the cost function to be optimized is a function of the posterior distribution in Eq. (20). A suitable choice of cost function, such as the mutual information or the estimator variance, can provide a sequence of estimations φ n that approaches the true value of φ [2].
In the case of estimation of cyclic parameters, such as phase estimation, the posterior distribution in Eq. (20) is 2π periodic, and the moments of φ cannot be calculated as in the linear case [51]. In such situations, the first moment of the cyclic random variable X is defined as E e iX , and the dispersion of an estimator φ is calculated using the Holevo Variance [51]: which is the analogous to the mean square error. The minimization of the uncertainty about φ (positive square root of Eq. (21)), requires maximization of S(β, m) = E e i φ , known as the sharpness of the distribution. Then, the suitable cost function for the adaptive protocol is the average sharpness: Identifiability of Likelihood.
To guarantee a consistent asymptotic estimator the optimized estimation strategies require to satisfy the regularity conditions 1-6 described in Sec. II B. For phase estimation, the regularity conditions 1 and 2 are satisfied given that φ is an interior point of Φ = [0, 2π).
Moreover, given that the probability is well defined and two times differentiable, the conditions 4, 5, and 6 are directly satisfied. However, the condition 3 is not satisfied in general. If one chooses β as the value that maximizes the mutual information (8) or the average sharpness (22), the resulting likelihood function can have in general two maxima, that is, a non-identifiable likelihood function [30]. In that case it is not possible to guarantee the existence of a consistent estimator.
To address the challenge of working with nonidentifiable likelihood functions, we use designs with a fixed relation between the phase of β and the amplitude |β|, given by β = f (θ)e iθ , with θ ∈ [0, 2π) and f (θ) a real function. These experimental designs result in a cost function U that is a function of θ.
To see how this method solves the problem of nonidentifiability, we observe that the probability p(n | φ; β) in Eq. (23) is Poisson distributed, Then, for L adaptive steps with results n = (n 1 , ..., n L ), the likelihood function is the product of L probability functions of the form of Eq. (24): Here, each θ i depends on the cost function and previous POVM outcomes. The choice of the experimental designs with |β| = f (θ) can force the adaptive strategy to change θ i in each step. In this case, the likelihood function in Eq.
(25) becomes identifiable because the product of probability functions with different θ i produces a likelihood with a global maximum. To see this, note that if θ i = θ is fixed, even if the parameter |β| is different in each adaptive step of the protocol, the likelihood function from a sequence of independent random variables with probability function given by Eq. (24) has two maxima over Φ. One of the maxima will always be around φ and the other will depend on the value of θ. On the other hand, if the strategy allows θ to change at each adaptive step, the second maximum is suppressed, since the functions whose product constitutes the likelihood Eq. (25) will each have a second maxima at different positions θ i = θ j (i = j). As a result, with the experimental designs β = f (θ) exp(iθ), the likelihood functions satisfy all the regularity conditions described in Sec. II B.
Bayesian optimal design of |β|. Any viable estimation strategy should aim to achieve the QCRB (14). Therefore, natural choice for |β| is the one that maximizes the Fisher information. For a discrete random variable, the Fisher information is given by [72,75]: The Fisher information for a particular design β = |β|e iθ for Poisson distributions Eq. (24) results in Optimizing over |β|, the Fisher information becomes: where is the value of |β| that maximizes the Fisher information. Unfortunately, since β opt depends on φ, its implementation is not practical, because it would be required to already know a priori the very same parameter that one wants to estimate. To address this problem, the optimal Bayesian design β opt can be estimated as: where φ is the MAP estimator. As a result, the non-Gaussian estimation strategy has now only one design, the phase θ. With β = β opt , the Fisher information becomes: F (∆, βopt) = 4 sin 2 (∆) α 2 L(cos 2 (δ + ∆) − 2 cos(∆) cos (δ + ∆) + 1) , where δ = φ − φ and ∆ = φ − θ.
Note that F (∆, β opt ) becomes a random variable with outcomes depending on φ through β opt . Moreover, for a random initial design θ 1 , a cost function given by the expected sharpness or the mutual information with β opt in Eq. (30), the likelihood function in Eq. (25) becomes identifiable. As a result, the non-Gaussian strategy then leads to consistent and efficient MAP estimators [67].

Asymptotic behavior.
In general δ = φ − φ = 0, and the Fisher information has a strong dependence on θ. For example, if θ → φ then F (∆ = 0, β opt ) → 0, and negligible information is gained when the system is measured. Therefore, the optimal value of θ should result in the value of ∆ that maximizes the expected value of F (∆, β opt ). By observing that the expected value E[δ 2n+1 ] = 0, with n ∈ N, we see that the optimal value of ∆ is π/2. As a result, an efficient adaptive strategy would make ∆ L = φ − θ L tend to π/2 as L increases. However, in the limit of ∆ → π/2 and δ → 0, | β opt | → ∞ (see Eq. (30)), and we expect that when the strategy is implemented ∆ < π/2. These findings are consistent with our numerical simulations of the non-Gaussian adaptive strategy, where we observe that for L 1, ∆ → π/2 − , with a small positive real number, and |β| stays finite around a value that the PNR detector in the strategy can resolve. Note thatβ opt , which maximizes the Fisher information given θ, does not necessarily maximizes the cost function U (β) for β ∈ C. However restricting β to the set that maximizes the Fisher information makes the phase of β change in each step forcing the likelihood to be identifiable. Moreover, whenφ → φ,β opt tends to the value that maximizes U [2].
Given that the Fisher Information is additive, for any L measurements made using the optimal design, β opt in Eq. (29), the total Fisher information for this design corresponds to the quantum Fisher information 4|α| 2 . However, since β opt is not known, it is not possible to choose a design β for which its Fisher information equals the quantum Fisher information. This is already implied by the fact that the canonical phase measurement does not saturate the Cramér-Rao bound. To show this, we observe that for and estimator very close to the true value δ = φ − φ 1 for L adaptive measurements, the Fisher information (Eq. (31)) is On the other hand, the best possible estimator of δ in each step satisfies E δ 2 ≥ 1/4|α| 2 , so that independently of L. We conclude that the adaptive non-Gaussian strategies do not saturate the Cramér-Rao bound for finite |α|. Nevertheless, these adaptive estimation schemes outperform the most sensitive Gaussian strategy known to date, and show a similar asymptotic scaling as the canonical phase measurement.

D. Performance of Non-Gaussian Adaptive Strategies
We numerically investigate the performance of the estimator produced by non-Gaussian adaptive strategies for phase estimation for different number of adaptive steps L, photon number resolution PNR(m), and average photon number |α| 2 . To assess the performance of these strategies, we compare our results with the best known Gaussian measurement for phase estimation, termed Mark II (MKII) strategy [21], and with the canonical phase measurement. As discussed in Sec. II B, the performance of non-Gaussian adaptive strategies using cost functions including the Kullback-Lieber divergence Eq. (6), conditional entropy Eq. (7), mutual information Eq. (8), and expected sharpness Eq. (22) are equivalent in the asymptotic limit. In our numerical simulations, however, we use the expected sharpness in Eq. (22) as the cost function for the optimization of the strategy, because it substantially reduces the number of operations for this optimization and the computational overhead.
In our numerical analysis, we use Monte Carlo simulations and generate sufficient numerical data samples to reduce statistical uncertainties. Fig. 2A shows the Holevo variance for the non-Gaussian adaptive strategy, as a function of the number of adaptive steps L for |α| 2 = 1, for different PNR: m = 1, 3, 6. We observe that the adaptive non-Gaussian scheme with PNR(1) and L ≥ 100 (green dots with error bars) outperforms the most sensitive Gaussian strategy known to date, the MKII, (red dashed line). Moreover, strategies with PNR(3) (yellow dots with error bars) and PNR(6) (light blue dots with error bars) outperform the MKII strategy with only L ≈ 30 and L ≈ 20, respectively, achieving a smaller Holevo variance with fewer adaptive measurements compared to PNR(1). We have observed similar behavior for non-Gaussian strategies optimizing different cost functions, such as the mutual information.
To investigate the asymptotic behavior of the adaptive non-Gaussian strategy, we assume an exponential dependence for the Holevo variance with L (solid lines in Fig. 2A): The exponential model is a few parameter model that allows us to quantify asymptotic trends when the datasets have rapidly decaying tails. This is a widely used model for studying the asymptotic scaling of estimators as a function of resources in diverse metrological problems [64,76,77].
We fit the numerical data from Monte Carlo simulations to Eq. (34) to estimate the constants A, B, and C. Our results show that the asymptotic constant C = 0.751 ± 0.002 for PNR(1), C = 0.719 ± 0.004 for PNR(3), and C = 0.714 ± 0.003 for PNR (6). We observe that these values are smaller than the asymptote of the MKII Gaussian strategy, but larger than the one for the canonical phase measurement (0.673, blue dashed line). We note that while C for PNR(3) is larger than for PNR(6), they are statistically equal due to their uncertainties. This prevents us from drawing any conclusions for larger values of m. Figure 2b shows the design ∆ = φ − θ, the phase of the displacement field, as a function of L. We observe that for non-Gaussian adaptive strategies with increasing PNR, ∆ tends to π/2 for large L (L = 200). This observation is consistent with the theoretical framework of optimal design of experiments (Sec. II C), which states that ∆ optimal = π/2 for L → ∞. Moreover, as PNR increases, the strategies show a faster convergence to the asymptotic value of the Holevo variance, which translates in a smaller error in the estimation (see Fig. 2a.) These observations further support our theoretical model of non-Gaussian strategies for phase estimation. We investigate the asymptotic performance of the estimator variance produced by the non-Gaussian strategy for large |α| 2 . Figure 3 shows the Holevo variance for the non-Gaussian adaptive strategy as a function of |α| 2 for different L normalized to the QCRB. We observe that for large |α| 2 (with L = 100), the adaptive non-Gaussian strategy tends to the CRLB.
To build a model for the performance of this strategy for large |α| 2 , we propose three candidate models for the Holevo variance for L 1: to describe our numerical observations in Fig. 3 based on the Monte Carlo simulations. The model that best describes our observations allows us to determine, with a certain degree of confidence, the performance of non-Gaussian adaptive strategies, and compare them with the best Gaussian strategy, Eq. (1), and the canonical phase measurement, Eq. (2).
We discriminate among plausible candidate models using the technique of backward elimination [78]. We start with the candidate y 1 , Eq. (35a), and test the deletion of each variable A i using the p-value of a hypothesis testing procedure (H 0 : A i = 0, H A : A i = 0). Given that p > 0.1 for A 2 and p < 0.001 for A 1 and A 3 , we conclude, with confidence larger than 99% , that the model reflecting the behavior of the data presented in Fig. 3 is y 3 , Eq. (35c).
To find A 1 and A 3 in the limit L → ∞ in model y 3 , we fit the Holevo variance as a function of |α| 2 for |α| 2 > 5 to the model y 3 for different values of L. Given a number of adaptive steps L, each fitting provides a set of coefficients {A 1 , A 3 }. Hence, to obtain the trend of each coefficient A 1 and A 2 as L increases, we fit them to an exponential model of the form Fig. 4 shows the coefficients A 1 and A 3 as a function of L. Adjusting this exponential model and observing that in the limit L 1 A i → F i , we obtain A 1 = 0.250 ± 0.001 and A 3 = 0.52 ± 0.01. Therefore, we conclude with a 99% confidence level that the Holevo variance for the adaptive non-Gaussian strategy in the limit L → ∞ for large |α| is: This equation is the main result of this work, also shown in Eq. (3). We observe that the Holevo variance for the adaptive non-Gaussian strategy has a similar dependance with |α| as the canonical phase measurement, differing only in the scaling of the correction term of order 1/|α| 4 , see Eq. (2). Moreover, we note that the best Gaussian strategy known to date, the MKII adaptive homodyne, has a correction term of order 1/|α| 3 [20], see Eq. (1). Then, for large |α|, the non-Gaussian estimation strategies show a much better scaling in the Holevo variance than the best known Gaussian strategy, and closely follows the canonical phase measurement. Furthermore, these non-Gaussian phase estimation strategies can be implemented with current technologies [2], and our work demonstrates their superior performance over all the physically realizable strategies for single-shot phase estimation of coherent states reported to date.
The optimized non-Gaussian adaptive strategies based on photon counting analyzed in this work are not the only possible strategies for single-shot phase estimation, and there could be other strategies based on photon counting with better performances. In this work, we studied estimation strategies using cost functions that are consistent with D-optimal designs. However, there may be other cost functions that could provide a further improvements to non-Gaussian strategies. Moreover, we note that the relation in Eq. (30) between the phase and the magnitude of the displacement field was used to obtain identifiable likelihood functions and ensure efficient estimators in the asymptotic limit. While we chose the relation in Eq. (30) because it maximizes the Fisher information, there is no mathematical proof that this choice is optimal, or that other choices for this relation would not provide higher sensitivities. Finally, in the presented adaptive non-Gaussian strategies for phase estimation, the displacement operations were optimized in every adaptive step at a time, i. e. using local optimizations in the adaptive steps. We note, however, that local optimal strategies do not necessarily ensure global optimality [13]. Strategies with global optimizations, where all adaptive steps are optimized simultaneously, could probably lead to better performances. However, the computational overhead required for performing global optimizations beyond L = 10 adaptive steps prevents us from being able to investigate global optimized strategies. To overcome these limitations, future investigations could make use of machine learning methods, such as neural networks and reinforcement learning [79], to lower the complexity of these calculations.

III. DISCUSSION
Non-Gaussian measurement strategies for phase estimation approaching the quantum limits in sensitivity, as set by the canonical phase measurement, can become a tool for enhancing the performance of diverse protocols in quantum sensing, communications, and information processing. Optical phase estimation approaching the quantum limit can be used to prepare highly squeezed atomic spin states based on measurement backaction [42] for quantum sensing and metrology [80]; enhance phase contrast imaging of biological samples with optical probes at the few photon level to avoid photodamage and ensure integrity of the sample; and to enhance fidelities in quantum communications with phase coherent states that require phase tracking close to the quantum level between receiver and transmitter with few-photon pulses [81,82], while allowing for quantum receivers to decode information encoded in coherent states below the quantum noise limit [13,52,83].
As a direct application for quantum information processing, non-Gaussian measurements based on photon counting and displacement operators can be used for full reconstruction of quantum states with on-off detectors [84] and PNR detectors [85] in a multi-copy state setting by sampling phase space with non-Gaussian projections. The theoretical methods to assess the performance of adaptive non-Gaussian measures for phase estimation described in this work could be used to study methods for adaptive quantum tomography [86,87] based on photon counting for high dimensional quantum states, and investigate their asymptotic advantages over adaptive homodyne tomography [88].
From the practical point of view, our work provides insight into the design of practical, highly efficient mea- surement strategies for phase estimation based on photon counting. It shows that non-Gaussian strategies optimizing any cost function within the family of D-optimal designs are equally efficient, and demonstrates the advantages of higher photon number resolution in the strategies to reduce estimation errors. This understanding allows the experimenter to chose cost functions that can be efficiently calculated and optimized to achieve higher convergence rates, while selecting a PNR given the desired/target error budget in the estimation problem for specific applications. This knowledge will be critical for the design and development of future sensors based on photon counting for diverse applications in communication, phase-contrast imaging, metrology, and information processing.
In conclusion, we investigate a family of non-Gaussian strategies for single-shot phase estimation of optical coherent states. These strategies are based on adaptive photon counting measurements with a finite number of adaptive steps implementing coherent displacement operations, optimized to maximize information gain as the measurement progresses [2]. We develop a comprehensive statistical analysis based on Bayesian optimal design of experiments that provides a natural description of adaptive non-Gaussian strategies. This theoretical framework gives a fundamental understanding on how to optimize these strategies to enable efficient estimators with a high degree of convergence towards the ultimate limit, the Cramér Rao lower bound.
We use numerical simulations to show that optimized adaptive non-Gaussian strategies producing an asymptotically efficient normal estimator achieve a much higher sensitivity than the best Gaussian strategy known to date, which is based on adaptive homodyne [20]. Moreover, we show that the Holevo variance of the estimator for the adaptive non-Gaussian strategy has a similar dependence as the canonical phase measurement in the asymptotic limit of large photon numbers, differing by a scaling factor in the second-order correction term. Our work complements the work in single-shot phase measurements for single-photon wave packets in two dimensions [22] using quantum feedback with Gaussian measurements, and paves the way for the realization of optimized feedback measurements approaching the canonical phase measurement for higher dimensional states based on non-Gaussian operations.

DATA AVAILABILITY
The data that support the findings of this study are available from the authors upon request.

CODE AVAILABILITY
Our numerical results have been implemented using MATLAB and Julia language. A functions to reproduce the numerical results can be publicly found in the repository [89]. To obtain a set of estimations produced by the non Gaussian strategy, use the program: Bootstrap_phase_estimation.jl, setting the desired values of L, Bootstrap size, MPN, in the Adapt_Steps, Boot_Reps, and MPNS variables.

ACKNOWLEDGMENTS
We thank Laboratorio Universitario de Cómputo de Alto Rendimiento (LUCAR) of IIMAS-UNAM for their service on information processing. This research was supported by the Grant No. UNAM-DGAPA-PAPIIT IG101421 and the National Science Foundation (NSF) grans # PHY-1653670 and PHY-2210447.

AUTHOR CONTRIBUTIONS
F.E.B. and P.B. conceived the idea and supervised the work. M.A.R. and P.B. conducted the theoretical and statistical study. M.T.D. and M.A.R. contributed in the code for simulations. All authors contributed to the analysis of the theoretical and numerical results and contributed to writing the manuscript.