Determination of the asymptotic limits of adaptive photon counting measurements for coherent-state optical phase estimation

Physical realizations of the canonical phase measurement for the optical phase are unknown. Single-shot phase estimation, which aims to determine the phase of an optical field in a single shot, is critical in quantum information processing and metrology. Here we present a family of strategies for single-shot phase estimation of coherent states based on adaptive non-Gaussian, photon counting, measurements with coherent displacements that maximize information gain as the measurement progresses, which have higher sensitivities over the best known adaptive Gaussian strategies. To gain understanding about their fundamental characteristics and demonstrate their superior performance, we develop a comprehensive statistical analysis based on Bayesian optimal design of experiments, which provides a natural description of these non-Gaussian strategies. This mathematical framework, together with numerical analysis and Monte Carlo methods, allows us to determine the asymptotic limits in sensitivity of strategies based on photon counting designed to maximize information gain, which up to now had been a challenging problem. Moreover, we show that these non-Gaussian phase estimation strategies have the same functional form as the canonical phase measurement in the asymptotic limit differing only by a scaling factor, thus providing the highest sensitivity among physically-realizable measurements for single-shot phase estimation of coherent states known to date. This work shines light into the potential of optimized non-Gaussian measurements based on photon counting for optical quantum metrology and phase estimation.


INTRODUCTION
Optical phase estimation is ubiquitous in many fundamental and practical problems ranging from quantum state preparation 1-3 , sensing 4 , communications [5][6][7][8][9][10][11][12][13][14] , and information processing 15 . In a photonic metrology problem [16][17][18][19] , an optical probe interacts with a physical system to interrogate its properties. This interaction maps parameters of the system to the state of the optical probe, where an optimal readout can be performed 15,[17][18][19] . When the physical property of the system is mapped onto the phase of the optical probe, the optimal quantum measurement is the canonical phase measurement 20 , which consists of projections onto phase eigenstates 21 . However, while theoretically the canonical phase measurement exists, the physical realization of projections onto phase eigenstates are not physically known 22 . Thus in practical estimation problems in quantum metrology one seeks to determine the limits in precision of physically realizable measurements, and the degree with which they approach to the fundamental quantum limit in sensitivity 17,[23][24][25][26][27] .
Physically realizable measurements of the optical phase have been widely investigated 20,21,28 for diverse metrological problems with quantum and classical fields [29][30][31][32][33] including sensing small deviations from a known phase [29][30][31][32][33][34] and estimation with repeated sampling 35,36 and measurements 30,[37][38][39][40][41] . Beyond these specific estimation problems, measurements of the phase of a single optical mode in a single-shot are central for quantum state preparation and detection 42,43 , waveform estimation and sensing [44][45][46][47] , and quantum information processing [48][49][50] . The standard measurement for optical phase estimation is the heterodyne measurement 21 , which samples both quadratures of the electromagnetic field simultaneously from which the phase can be estimated. However, the achievable sensitivity of the heterodyne measurement 21 is far below the ultimate measurement sensitivity allowed by physics, given by the canonical phase measurement 21,51 . Adaptive measurement techniques based on homodyne detection, a Gaussian measurement, can be used to align the phase quadrature of the optical field with the measurement setting where the homodyne measurement provides maximum sensitivity 32 . Adaptive homodyne has been theoretically shown to surpass the heterodyne limit and get closer to the canonical phase measurement for optical phase estimation of coherent states 20,21 , providing the most sensitive Gaussian measurement of the optical phase so far 21 .
In a complementary measurement paradigm, quantum measurements of coherent states based on photon counting, displacement operations, and feedback 2,6,[8][9][10][11][12][13][14]52 have enabled state discrimination below the Gaussian sensitivity limits and approaching the Helstrom bound 5 . Recently, some of the authors proposed and demonstrated a non-Gaussian measurement strategy for single-shot phase estimation of coherent states, able to surpass the heterodyne limit and approach the sensitivity limit of a canonical phase measurement in the presence of loss and noise of real systems 2 . These measurement strategies are based on realizing coherent displacements of the input field and monitoring the output field with photon number resolving detection. The information from the detection outcomes is then used to implement real-time feedback of displacement operations optimized to maximize the measurement sensitivity of the phase of the input state. This estimation strategy is the most sensitive single-shot non-Gaussian measurement of a completely unknown phase encoded in optical coherent states so far 2 . In this strategy the optimization of the displacement operation is realized by maximizing either the information gain in subsequent adaptive steps of the measurement or the sharpness of the posterior phase distribution. While these cost functions are functionally different, both perform similarly and get close to the ultimate sensitivity allowed by physics, the Crámer-Rao lower bound (CRLB), in the limit of many photons and many adaptive measurements. While the work in ref. 2 demonstrated the potential of non-Gaussian measurements for single-shot phase estimation, the superiority over adaptive homodyne detection was not proven. A deeper understanding of the properties of convergence and ultimate limits of the estimators produced by non-Gaussian measurements is still missing. This is an open problem due to the complexity associated with the analysis of these adaptive non-Gaussian strategies.
In this article we investigate a family of adaptive non-Gaussian strategies based on photon counting for single-shot optical phase estimation, and assess their performance compared to Gaussian measurements and the canonical phase measurement. To analyze these non-Gaussian strategies, we use the mathematical framework of Bayesian optimal design of experiments, which provides a natural description of non-Gaussian adaptive strategies, allowing us to investigate their fundamental characteristics and determine the limits in sensitivity, which up to now has been a challenging problem. Our work provides a comprehensive statistical analysis of adaptive non-Gaussian measurements for parameter estimation, and the requirements to approach optimal bounds in the asymptotic limit. We show that strategies based on photon counting and feedback for single-shot phase estimation of coherent states provide superior sensitivity over the best known adaptive Gaussian strategies, having the same functional form as the canonical phase measurement in the asymptotic limit, differing only by a scaling factor. This work provides a deep insight into the potential of optimized non-Gaussian measurements for quantum communication, metrology, sensing, and information processing.

Holevo variance of non-Gaussian estimation strategies
The non-Gaussian phase estimation strategies investigated here are based on photon counting, displacement operations, and feedback, and are optimized by maximizing a specific cost function. These strategies maximize either the estimation precision (by minimizing the Holevo variance 51 ), or the information gain about the unknown parameter based on entropy measures, including mutual information, the Kullback-Lieber divergence, and the conditional entropy 53,54 . We note that these cost functions produce non-identifiable likelihood functions that do not allow to correctly estimate a cyclic parameter, such as the phase 55 . To address this problem, these non-Gaussian strategies use the Fisher information to optimize the displacement operations, which are the dynamical control variable in the strategy, to guarantee that these cost functions provide identifiable likelihood functions, and to enable optical phase estimation with near-optimal performance.
In the problem of single-shot phase estimation with coherent states, an electromagnetic field in a coherent state ρ 0 ¼ α j i α h j interacts with a physical system and experiences a unitary transformation e iϕn , wheren is the number operator. The phase ϕ induced in the coherent state carries information about the system, which can be extracted by a measurement of the output state ρðϕÞ ¼ e iϕn ρ 0 e Àiϕn ¼ e iϕ α e Àiϕ α .
Measurements onto ρ(ϕ), together with an estimator b ϕ on the measurement outcomes, provide an estimate of ϕ, and a measurement strategy aims to obtain the best estimation of the physical parameter. The efficiency of such a strategy is characterized by the estimator variance as a function of the number of photons in the coherent state. The most efficient strategy provides a variance with the highest convergence rate towards zero as the number of photons increase.
The standard measurement paradigm for phase estimation of Gaussian states is the heterodyne measurement (a Gaussian measurement), with an estimator variance of Var½ b ϕ Het ¼ 1=2jαj 2 . Within the paradigm of Gaussian measurements, adaptive homodyne strategies optimized to minimize the Holevo variance have achieved the best performance among Gaussian measurements for single-shot phase estimation of coherent states 20 . The best adaptive Gaussian measurement reported to date, termed the Adaptive Mark-II (MKII), achieves a Holevo variance in the limit of large number of photons of: While this optimized Gaussian strategy surpasses the heterodyne limit, it has an error of order 1/|α| 3 above the Cramér-Rao Lower Bound (1/4|α| 2 ), and does not reach the performance of the canonical phase measurement 21 : In this work, we numerically show that non-Gaussian strategies for single-shot phase estimation based on photon counting, optimized displacement operations, and real-time feedback achieve an estimator variance smaller than Gaussian strategies with an asymptotic scaling in the limit of high mean photon numbers of: h i % 0:250 ± 0:001 jαj 2 þ 0:520 ± 0:010 jαj 4 : (3) Figure 1a summarizes the main result comparing the three asymptotic variances for the canonical phase measurement (solid blue), MKII (solid red), and non-Gaussian strategies (solid green and points). Figure 1b, shows the excess Holevo variance compared to the canonical phase measurement (Var½Á À Var½ b ϕ CPM ) for Heterodyne (purple), MKII (red), and non-Gaussian strategies (green).
These non-Gaussian strategies implement a series of adaptive steps with displacement operations optimized to maximize information gain, while ensuring efficient phase estimators in the asymptotic limit. These strategies surpass the best known Gaussian strategy in Eq. (1), and have the same functional form as the canonical phase measurement in the asymptotic limit, differing only by a scaling factor, thus providing the highest sensitivity among physically-realizable measurements for singleshot phase estimation of coherent states known to date.
Achieving this performance with non-Gaussian estimation strategies, however, requires a deep understanding of the measurement process. To gain this understanding, we use the mathematical framework of Bayesian optimal experimental design, which provides a natural description of adaptive non-Gaussian measurements. This allows us to optimize these strategies for single-shot phase estimation with a Holevo variance given by Eq. (3).
Bayesian optimal design of experiments Phase estimation of coherent states based on photon counting with adaptive coherent displacement operations can be defined in the context of Bayesian optimal design of experiments. Optimal design of experiments allows for improving statistical inferences about quantities of interest by appropriately selecting the values of control variables known as designs 54,56,57 . In this framework, it is assumed that the experimental data y (the measurement outcomes) can be modeled as an element of the set where d is a parameter called design chosen from some set D called design space, ϕ ∈ Φ is an unknown parameter to be estimated, and the data y comes from a sample space Y R. In this paradigm, the experimenter has full control over the designs d and the ability to adjust them prior to making a measurement. This allows for optimizing such measurement for estimating the unknown parameter ϕ. Bayesian optimal design of experiments goes beyond standard methods for parameter estimation based on Bayesian statistical inference [58][59][60][61][62] , by providing the suitable mathematical framework to ensure optimal designs to find efficient estimators for a general parameter space [63][64][65][66] .
In the Bayesian approach of optimal design 54,56 , the initial lack of knowledge about ϕ is modeled as a prior probability distribution p(ϕ). The measurement aims to reduce the uncertainty of ϕ as much as possible by the use of Bayes' theorem over the prior distribution. In an estimation problem, the optimal choice for the designs d 2 D maximize the expected value of a cost function U(d, ϕ, y) with respect to the possible outcomes of y and ϕ: A standard approach in optimal design of experiments considers choosing d opt at the beginning of the experiment an then sample data from p(y|d opt , ϕ) for all subsequent trials. An alternative approach considers dynamically updating d opt on each trial, as more data is collected. The advantage of this approach is that adaptive estimation strategies are never less efficient than the non-adaptive ones 67 .
The implementation of Bayesian optimal design of experiments requires a cost function, such as the Kullback-Lieber divergence between the prior and posterior distributions 68 : The Kullback-Lieber divergence provides a distance between probability distribution p(ϕ|d, y) and p(ϕ) 53 . If p(ϕ|d, y) is equal to p(ϕ) then U(d, y) = 0 and there is not any gain of information about ϕ by measuring with design d and outcome y.
Another cost function considered in optimal design is the conditional entropy between the plausible values of ϕ and y UðdÞ ¼ ÀHðϕjYÞ ¼ À P y2Y pðyÞ R Φ pðϕjd; yÞ log pðϕjd; yÞ ½ dϕ; which is a measure of how much information is needed to describe the outcomes of the random variable ϕ given that the value of another random variable Y is known 53 . However, we note that cost functions Eq. (6) and Eq. (7) can be related to the mutual information 53 : As a result, designs d maximizing any of these cost functions in Eq. (6), Eq. (7), or Eq. (8) are equivalent 54,56,69 . Moreover, in the asymptotic limit, maximizing these cost functions is equivalent to minimizing the determinant of the covariance matrix (D-optimality design criteria), that is, in the asymptotic limit, optimizing these cost functions is equivalent to optimizing any member within the family of D-optimal designs 54,68 .
Our goal is to apply the theory of Bayesian optimal design of experiments to the problem of phase estimation of coherent states with photon counting and adaptive coherent displacement operations. The adaptive non-Gaussian estimation strategy consists of several parts: i) in the first adaptive step, it uses an specific cost function and the prior information to choose the design; ii) then, it performs a measurement; iii) based on the measurement outcome, it uses Bayes' theorem to update the probability distribution; iv) and lastly, it uses a recursive approach, where this posterior probability distribution becomes the prior of the subsequent measurement step i). The estimation of ϕ at each adaptive step is obtained from the maximum posterior estimator (MAP) of the posterior probability distribution. This approach requires that the MAP converges to the true value of the parameter when the number of measurements increases.
In the adaptive mathematical framework of optimal experimental design, Paninski 67 proved that under a set of regular modelling conditions and in the case when ϕ 2 R, cost functions based on mutual information can allow for designs that lead to asymptotically consistent and efficient MAP estimators with variance Here co Fðϕ; dÞ ð Þ denotes the convex closure of the set of Fisher information functions F(ϕ; d). In other words, the estimations produced by b ϕ converge to ϕ (consistency), and the distribution of b ϕ tends to a normal distribution with mean ϕ and variance given by Eq. (9) when the number of adaptive steps tends to infinity (efficiency).
Formally, the regularity conditions introduced in 67 can be stated as follows: 1. The parameter space Φ is a compact metric space. 2. The log likelihood, log pðyjϕ; dÞ ð Þis uniformly Lipschitz in ϕ with respect to some dominating measure on Y.
3. The likelihood function is identifiable for ϕ, that is, the likelihood function has a unique global maximum. 4. The prior distribution, assigns a positive probability to any neighborhood of the real value of ϕ. 5. The Fisher information functions F(ϕ; d) are well defined for any ϕ ∈ Φ and d 2 D. 6. The maximum of the convex closure of the set of Fisher information functions Fðϕ; dÞjd 2 D; ϕ 2 Φ f gmust be positive-definite, i.e., max C2co Fðϕ;dÞ ð Þ C j j > 0.
We note that in the case of estimation of a scalar parameter, the maximization of mutual information is equivalent to the minimization of the mean square error (MSE) 67 : This shows that the MSE is a trade-off between the estimator's variance and its bias. As a result, since the phase is a scalar quantity, in the asymptotic limit both the mutual information Eq. (8) and the mean square error Eq. (10) are appropriate cost functions to find optimal estimation strategies. Even more, for unbiased estimators (such as the MAP estimator on the asymptotic limit), the MSE corresponds to the estimator variance, then the maximization of mutual information is equivalent to the minimization of estimator variance. In practice, however, an estimation strategy would use a cost function that can be calculated efficiently and with a high rate of convergence.
Phase estimation in coherent states. Optimal phase estimation of coherent states of light aims to obtain the best estimate, from the outcomes of a physical measurement, of an unknown phase ϕ ∈ [0, 2π) encoded in a coherent state ρðϕÞ ¼ je iϕ αihe Àiϕ αj.
According to information theory, if an estimator b ϕ of ϕ is constructed using a sample from a POVM M, the limit in the accuracy of b ϕ is given by the Crámer-Rao Bound 71,72 Here F M (ϕ) is the Fisher information of M about ϕ, which quantifies how much information about ϕ is carried in a sample from M: Since the Fisher information in Eq. (13) depends on the POVM M, the maximization of the Fisher information over all POVMs provides the lowest possible Cramér-Rao bound. This maximum Fisher information over all POVMs is known as the quantum Fisher information F Q (QFI), and the lowest possible bound in the accuracy of b ϕ is known as the quantum Cramér-Rao bound (QCRB) 5,72,73 . In the case of phase estimation of coherent states F Q = 4|α| 2 , and the QCRB is: In the limit of large photon number |α| ≫ 1, the QCRB is saturated by the canonical phase measurement (the optimal phase measurement), which is described by the POVM 51 : where n j i is an eigenstate of the number operatorn. The operator Mð b ϕÞ is an element of the canonical phase measurement whose outcome is a number b ϕ 2 0; 2π ½ Þ, which can be used as an estimation for ϕ.
The optimality of the canonical phase measurement indicates that its Holevo variance is the fundamental bound of estimation precision 20,51 . Although there are proposals that attempt to implement this POVM, they have not been able to reach the fundamental bound Eq. (16), or Eq. (2). For instance, in 74 it was possible to obtain the canonical measurement distribution as the marginal of a joint measurement in phase space producing a worse performance in the context of phase estimation. Moreover, the canonical phase measurement was implemented for the case of one-photon wave packet using quantum feedback 22 . However, for the case of higher dimensional states, such as coherent states, this problem remains open. While there is not a satisfactory known method to implement the canonical phase measurement, Gaussian strategies serve as a standard of physically realizable measurement techniques for phase estimation of coherent states. The natural benchmark in the Gaussian strategies is the heterodyne detection, whose variance is lower bounded by Var½ b ϕ Het ¼ 1=2jαj 221 . Several adaptive Gaussian schemes have been shown to exceed the lower bound for heterodyne detection. The most efficient Gaussian measurement reported to date, termed the Adaptive Mark-II (MKII) strategy 20 , has a variance in the limit of |α| ≫ 1 given by Eq. (1). Nonetheless, these adaptive Gaussian strategies cannot reach the performance for the canonical phase measurement in Eq. (2) 21 .
The proposed non-Gaussian strategies for single-shot phase estimation of coherent states are based on optimized adaptive measurements with photon number resolving detection. These non-Gaussian strategies are able to outperform the best known Gaussian strategies and closely follow the performance of the canonical phase measurement in the limit of large photon number.

Adaptive non-Gaussian phase estimation
The proposed non-Gaussian adaptive estimation strategies based on adaptive photon counting 2 aim to estimate the phase ϕ ∈ Φ = [0, 2π) of a coherent state ρðϕÞ ¼ e iϕ α e Àiϕ α with mean photon number En ½ ¼ jαj 2 using a finite number of adaptive measurement steps, and based on the prior information p(ϕ) about ϕ. In every adaptive step l = 1, 2, ⋯ , L, the input coherent state with energy |α| 2 /L interferes with a local oscillator field, which implements a displacement operationD β ð Þ α j i ¼ α þ β j i; β 2 C; with phase and amplitude chosen by some rule, in general depending on previous measurement outcomes. This is followed by a photon number detection measurement with a given photon number resolution (PNR) m of the detectors 6 , m 2 N. In practice, since the energy in each adaptive step is |α| 2 /L, the strategy will only require moderate PNR resolution (m < 10) as L increases.
In the first adaptive measurement l = 1 2 , the strategy makes a random guess hypothesis β 0 2 C about the optimal β, and applies the POVM over the state e iϕ α= ffiffi ffi L p . In the POVM in Eq. (17), the sum over Fock states n j i n h j models the photon detection on the displaced state with a detector with PNR(m) 6 . The corresponding Wigner function describing a photon-number resolving detector shows non-Gaussian features with negative values. For this reason, these adaptive estimation techniques are called "non-Gaussian", despite that the estimator produced is asymptotically normal (this result will be proved in the remainder of this section).
Given the detection outcome n 1 in l = 1, the posterior probability distribution becomes p ϕjn 1 ; β 0 ð Þ/L ð ϕjn 1 ; β 0 ÞpðϕÞ; where L ϕjn 1 ; β 0 ð Þis the likelihood function given by The phase estimate ϕ 1 in this adaptive step corresponds to the MAP estimator b ϕ MAP , ϕ 1 ¼ b ϕ MAP ðn 1 Þ, with the posterior probability distribution in Eq. (18). Using the posterior phase distribution in Eq. (18) as the prior for the next adaptive step l = 2, the strategy optimizes a cost function U(β) to obtain the next value of β denoted as β 1 , and implements the POVM in Eq. (17) with β 1 . The Bayesian updating procedure is repeated at each step l ≥ 2. After l adaptive measurements the posterior probability distribution becomes pðϕjn; βÞ ¼ pðϕjn l ; ; n 1 ; β lÀ1 ; ; β 0 Þ / Q l i¼1 pðn i jϕ; β iÀ1 ÞpðϕÞ: Here n i is the observed photon detection at step i. Using the MAP on this phase distribution, we obtain the lth estimation b ϕ l . The procedure is repeated until the last measurement step L.
This parameter estimation strategy is a particular case of Bayesian optimal design of experiments, where the parameters β 2 C are the designs, and which are optimized to estimate a phase ϕ ∈ [0, 2π). Since the optimal value for β in each adaptive step depends on all previous detection results, the cost function to be optimized is a function of the posterior distribution in Eq. (20). A suitable choice of cost function, such as the mutual information or the estimator variance, can provide a sequence of estimations b ϕ n that approaches the true value of ϕ 2 .
In the case of estimation of cyclic parameters, such as phase estimation, the posterior distribution in Eq. (20) is 2π periodic, and the moments of b ϕ cannot be calculated as in the linear case 51 . In such situations, the first moment of the cyclic random variable X is defined as E e iX ½ , and the dispersion of an estimator b ϕ is calculated using the Holevo Variance 51 : which is the analogous to the mean square error. The minimization of the uncertainty about ϕ (positive square root of Eq. (21)), requires maximization of Sðβ; mÞ ¼ jE½e i b ϕ j, known as the sharpness of the distribution. Then, the suitable cost function for the adaptive protocol is the average sharpness: Identifiability of likelihood. To guarantee a consistent asymptotic estimator the optimized estimation strategies require to satisfy the regularity conditions 1-6 described in section "Bayesian optimal design of experiments". For phase estimation, the regularity conditions 1 and 2 are satisfied given that ϕ is an interior point of Φ = [0, 2π). Moreover, given that the probability is well defined and two times differentiable, the conditions 4, 5, and 6 are directly satisfied. However, the condition 3 is not satisfied in general. If one chooses β as the value that maximizes the mutual information (8) or the average sharpness (22), the resulting likelihood function can have in general two maxima, that is, a non-identifiable likelihood function 30 . In that case it is not possible to guarantee the existence of a consistent estimator.
To address the challenge of working with non-identifiable likelihood functions, we use designs with a fixed relation between the phase of β and the amplitude |β|, given by β = f(θ)e iθ , with θ ∈ [0, 2π) and f(θ) a real function. These experimental designs result in a cost function U that is a function of θ.
To see how this method solves the problem of nonidentifiability, we observe that the probability p(n|ϕ; β) in Eq. (23) is Poisson distributed, with λ ¼ jαj 2 =L þ jβj 2 À 2jαjjβj cos ϕ À θ ð Þ= ffiffi ffi L p . Then, for L adaptive steps with results n = (n 1 , …, n L ), the likelihood function is the product of L probability functions of the form of Eq. (24): Here, each θ i depends on the cost function and previous POVM outcomes. The  produces a likelihood with a global maximum. To see this, note that if θ i = θ is fixed, even if the parameter |β| is different in each adaptive step of the protocol, the likelihood function from a sequence of independent random variables with probability function given by Eq. (24) has two maxima over Φ. One of the maxima will always be around ϕ and the other will depend on the value of θ. On the other hand, if the strategy allows θ to change at each adaptive step, the second maximum is suppressed, since the functions whose product constitutes the likelihood Eq. (25) will each have a second maxima at different positions θ i ≠ θ j (i ≠ j). As a result, with the experimental designs β ¼ f ðθÞ expðiθÞ, the likelihood functions satisfy all the regularity conditions described in section "Bayesian optimal design of experiments".
Bayesian optimal design of |β|. Any viable estimation strategy should aim to achieve the QCRB (14). Therefore, natural choice for |β| is the one that maximizes the Fisher information. For a discrete random variable, the Fisher information is given by 72,75 : The Fisher information for a particular design β = |β|e iθ for Poisson distributions Eq. (24) results in Optimizing over |β|, the Fisher information becomes: where is the value of |β| that maximizes the Fisher information. Unfortunately, since β opt depends on ϕ, its implementation is not practical, because it would be required to already know a priori the very same parameter that one wants to estimate.
To address this problem, the optimal Bayesian design β opt can be estimated as: where b ϕ is the MAP estimator. As a result, the non-Gaussian estimation strategy has now only one design, the phase θ. With β ¼ b β opt , the Fisher information becomes: where δ ¼ b ϕ À ϕ and Δ = ϕ − θ. Note that FðΔ; b β opt Þ becomes a random variable with outcomes depending on b ϕ through b β opt . Moreover, for a random initial design θ 1 and a cost function given by the expected sharpness or the mutual information with b β opt in Eq. (30), the likelihood function in Eq. (25) becomes identifiable. As a result, the non-Gaussian strategy then leads to consistent and efficient MAP estimators 67 .
Asymptotic behavior. In general δ ¼ b ϕ À ϕ≠0, and the Fisher information has a strong dependence on θ. For example, if θ → ϕ then FðΔ ¼ 0; b β opt Þ ! 0, and negligible information is gained when the system is measured. Therefore, the optimal value of θ should result in the value of Δ that maximizes the expected value of FðΔ; b β opt Þ. By observing that the expected value E[δ 2n+1 ] = 0, with n 2 N, we see that the optimal value of Δ is π/2. As a result, an efficient adaptive strategy would make Δ L = ϕ − θ L tend to π/2 as L increases. However, in the limit of Δ → π/2 and δ → 0, j b β opt j ! 1 (see Eq. (30)), and we expect that when the strategy is implemented Δ < π/2. These findings are consistent with our numerical simulations of the non-Gaussian adaptive strategy, where we observe that for L ≫ 1, Δ → π/2 − ϵ, with ϵ a small positive real number, and |β| stays finite around a value that the PNR detector in the strategy can resolve. Note thatβ opt , which maximizes the Fisher information given θ, does not necessarily maximize the cost function U(β) for β 2 C. However restricting β to the set that maximizes the Fisher information makes the phase  (1). The shaded regions represent 1-σ standard deviation. b Optimal design as a function of L. As L increases the optimal design tends to π/2. Numerical data are obtained with 10000 Monte Carlo sequences. Error bars represent a 1-σ standard deviation.
of β change in each step forcing the likelihood to be identifiable. Moreover, whenφ ! ϕ,β opt tends to the value that maximizes U 2 .
Given that the Fisher Information is additive, for any L measurements made using the optimal design, β opt in Eq. (29), the total Fisher information for this design corresponds to the quantum Fisher information 4|α| 2 . However, since β opt is not known, it is not possible to choose a design β for which its Fisher information equals the quantum Fisher information. This is already implied by the fact that the canonical phase measurement does not saturate the Cramér-Rao bound. To show this, we observe that for and estimator very close to the true value δ ¼ b ϕ À ϕ ( 1 for L adaptive measurements, the Fisher information (Eq. (31)) is On the other hand, the best possible estimator of δ in each step satisfies E δ 2 Â Ã ! 1=4jαj 2 , so that independently of L. We conclude that the adaptive non-Gaussian strategies do not saturate the Cramér-Rao bound for finite |α|. Nevertheless, these adaptive estimation schemes outperform the most sensitive Gaussian strategy known to date, and show a similar asymptotic scaling as the canonical phase measurement.

Performance of non-Gaussian adaptive strategies
We numerically investigate the performance of the estimator produced by non-Gaussian adaptive strategies for phase estimation for different numbers of adaptive steps L, photon number resolution PNR(m), and average photon number |α| 2 . To assess the performance of these strategies, we compare our results with the best-known Gaussian measurement for phase estimation, termed Mark II (MKII) strategy 21 , and with the canonical phase measurement. As discussed in section "Bayesian optimal design of experiments", the performance of non-Gaussian adaptive strategies using cost functions including the Kullback-Lieber divergence Eq. (6), conditional entropy Eq. (7), mutual information Eq. (8), and expected sharpness Eq. (22) are equivalent in the asymptotic limit. In our numerical simulations, however, we use the expected sharpness in Eq. (22) as the cost function for the optimization of the strategy, because it substantially reduces the number of operations for this optimization and the computational overhead. In our numerical analysis, we use Monte Carlo simulations and generate sufficient numerical data samples to reduce statistical uncertainties. Figure 2a shows the Holevo variance for the non-Gaussian adaptive strategy, as a function of the number of adaptive steps L for |α| 2 = 1, for different PNR: m = 1, 3, 6. We observe that the adaptive non-Gaussian scheme with PNR(1) and L≥100 (green dots with error bars) outperforms the most sensitive Gaussian strategy known to date, the MKII, (red dashed line). Moreover, strategies with PNR(3) (yellow dots with error bars) and PNR(6) (light blue dots with error bars) outperform the MKII strategy with only L ≈ 30 and L ≈ 20, respectively, achieving a smaller Holevo variance with fewer adaptive measurements compared to PNR(1). We have observed similar behavior for non-Gaussian strategies optimizing different cost functions, such as the mutual information.
To investigate the asymptotic behavior of the adaptive non-Gaussian strategy, we assume an exponential dependence for the Holevo variance with L (solid lines in Fig. 2a): The exponential model is a few parameter model that allows us to quantify asymptotic trends when the datasets have rapidly decaying tails. This is a widely used model for studying the asymptotic scaling of estimators as a function of resources in diverse metrological problems 64,76,77 .
We fit the numerical data from Monte Carlo simulations to Eq. (34) to estimate the constants A, B, and C. Our results show that the asymptotic constant C = 0.751 ± 0.002 for PNR(1), C = 0.719 ± 0.004 for PNR(3), and C = 0.714 ± 0.003 for PNR (6). We observe that these values are smaller than the asymptote of the MKII Gaussian strategy, but larger than the one for the canonical phase measurement (0.673, blue dashed line). We note that while C for PNR(3) is larger than for PNR(6), they are statistically equal due to their uncertainties. This prevents us from drawing any conclusions for larger values of m. Figure 2b shows the design Δ ¼ b ϕ À θ, the phase of the displacement field, as a function of L. We observe that for non-Gaussian adaptive strategies with increasing PNR, Δ tends to π/2 for large L (L = 200). This observation is consistent with the theoretical framework of optimal design of experiments (see section Bayesian optimal design of |β|), which states that Δ optimal = π/2 for L → ∞. Moreover, as PNR increases, the strategies show a faster convergence to the asymptotic value of the Holevo variance, which translates in a smaller error in the estimation (see Fig. 2a.) These observations further support our theoretical model of non-Gaussian strategies for phase estimation.
We investigate the asymptotic performance of the estimator variance produced by the non-Gaussian strategy for large |α| 2 . Figure 3 shows the Holevo variance for the non-Gaussian adaptive strategy as a function of |α| 2 for different L normalized to the QCRB. We observe that for large |α| 2 (with L = 100), the adaptive non-Gaussian strategy tends to the CRLB.
To build a model for the performance of this strategy for large |α| 2 , we propose three candidate models for the Holevo variance for L ≫ 1: (35c) to describe our numerical observations in Fig. 3 based on the Monte Carlo simulations. The model that best describes our observations allows us to determine, with a certain degree of confidence, the performance of non-Gaussian adaptive strategies, and compare them with the best Gaussian strategy, Eq. (1), and the canonical phase measurement, Eq. (2). We discriminate among plausible candidate models using the technique of backward elimination 78 . We start with the candidate b y 1 , Eq. (35a), and test the deletion of each variable A i using the p-value of a hypothesis testing procedure (H 0 : A i = 0, H A : A i ≠ 0). Given that p > 0.1 for A 2 and p < 0.001 for A 1 and A 3 , we conclude, with confidence larger than 99%, that the model reflecting the behavior of the data presented in Fig. 3 is b y 3 , Eq. (35c).
To find A 1 and A 3 in the limit L → ∞ in model b y 3 , we fit the Holevo variance as a function of |α| 2 for |α| 2 > 5 to the model b y 3 for different values of L. Given a number of adaptive steps L, each fitting provides a set of coefficients A 1 ; A 3 f g. Hence, to obtain the trend of each coefficient A 1 and A 2 as L increases, we fit them to an exponential model of the form Figure  4 shows the coefficients A 1 and A 3 as a function of L. Adjusting this exponential model and observing that in the limit L ≫ 1 A i → F i , we obtain A 1 = 0.250 ± 0.001 and A 3 = 0.52 ± 0.01. Therefore, we conclude with a 99% confidence level that the Holevo variance for the adaptive non-Gaussian strategy in the limit L → ∞ for large |α| is: This equation is the main result of this work, also shown in Eq. (3). We observe that the Holevo variance for the adaptive non-Gaussian strategy has a similar dependance with |α| as the canonical phase measurement, differing only in the scaling of the correction term of order 1/|α| 4 , see Eq. (2). Moreover, we note that the best Gaussian strategy known to date, the MKII adaptive homodyne, has a correction term of order 1/|α| 320 , see Eq. (1). Then, for large |α|, the non-Gaussian estimation strategies show a much better scaling in the Holevo variance than the best known Gaussian strategy, and closely follows the canonical phase measurement. Furthermore, these non-Gaussian phase estimation strategies can be implemented with current technologies 2 , and our work demonstrates their superior performance over all the physically realizable strategies for single-shot phase estimation of coherent states reported to date.
The optimized non-Gaussian adaptive strategies based on photon counting analyzed in this work are not the only possible strategies for single-shot phase estimation, and there could be other strategies based on photon counting with better performances. In this work, we studied estimation strategies using cost functions that are consistent with D-optimal designs. However, there may be other cost functions that could provide a further improvements to non-Gaussian strategies. Moreover, we note that the relation in Eq. (30) between the phase and the magnitude of the displacement field was used to obtain identifiable likelihood functions and ensure efficient estimators in the asymptotic limit. While we chose the relation in Eq. (30) because it maximizes the Fisher information, there is no mathematical proof that this choice is optimal, or that other choices for this relation would not provide higher sensitivities. Finally, in the presented adaptive non-Gaussian strategies for phase estimation, the displacement operations were optimized in every adaptive step at a time, i. e. using local optimizations in the adaptive steps. We note, however, that local optimal strategies do not necessarily ensure global optimality 13 . Strategies with global optimizations, where all adaptive steps are optimized simultaneously, could probably lead to better performances. However, the computational overhead required for performing global optimizations beyond L = 10 adaptive steps prevents us from being able to investigate global optimized strategies. To overcome these limitations, future investigations could make use of machine learning methods, such as neural networks and reinforcement learning 79 , to lower the complexity of these calculations.

DISCUSSION
Non-Gaussian measurement strategies for phase estimation approaching the quantum limits in sensitivity, as set by the canonical phase measurement, can become a tool for enhancing the performance of diverse protocols in quantum sensing, communications, and information processing. Optical phase estimation approaching the quantum limit can be used to: prepare highly squeezed atomic spin states based on measurement backaction 42 for quantum sensing and metrology 80 ; enhance phase contrast imaging of biological samples with optical probes at the few photon level to avoid photodamage and ensure the integrity of the sample; and enhance fidelities in quantum communications with phase coherent states that require phase tracking close to the quantum level between receiver and transmitter with few-photon pulses 81,82 , while allowing for quantum receivers to decode information encoded in coherent states below the quantum noise limit 13,52,83 .
As a direct application for quantum information processing, non-Gaussian measurements based on photon counting and displacement operators can be used for full reconstruction of quantum states with on-off detectors 84 and PNR detectors 85 in a multi-copy state setting by sampling phase space with non-Gaussian projections. The theoretical methods to assess the performance of adaptive non-Gaussian measures for phase estimation described in this work could be used to study methods for adaptive quantum tomography 86,87 based on photon counting for high dimensional quantum states, and investigate their asymptotic advantages over adaptive homodyne tomography 88 .
From the practical point of view, our work provides insight into the design of practical, highly efficient measurement strategies for phase estimation based on photon counting. It shows that non-Gaussian strategies optimizing any cost function within the family of D-optimal designs are equally efficient, and demonstrates the advantages of higher photon number resolution in the strategies to reduce estimation errors. This understanding allows the experimenter to chose cost functions that can be efficiently calculated and optimized to achieve higher convergence rates, while selecting a PNR given the desired/target error budget in the estimation problem for specific applications. This knowledge will be critical for the design and development of future sensors based on photon counting for diverse applications in communication, phase-contrast imaging, metrology, and information processing.
In conclusion, we investigate a family of non-Gaussian strategies for single-shot phase estimation of optical coherent states. These strategies are based on adaptive photon counting measurements with a finite number of adaptive steps implementing coherent displacement operations, optimized to maximize information gain as the measurement progresses 2 . We develop a comprehensive statistical analysis based on Bayesian optimal design of experiments that provides a natural description of adaptive non-Gaussian strategies. This theoretical framework gives a fundamental understanding on how to optimize these strategies to enable efficient estimators with a high degree of convergence towards the ultimate limit, the Cramér Rao lower bound.
We use numerical simulations to show that optimized adaptive non-Gaussian strategies producing an asymptotically efficient normal estimator achieve a much higher sensitivity than the best Gaussian strategy known to date, which is based on adaptive homodyne 20 . Moreover, we show that the Holevo variance of the estimator for the adaptive non-Gaussian strategy has a similar dependance as the canonical phase measurement in the asymptotic limit of large photon numbers, differing only by a scaling factor in the second-order correction term. Our work complements the work in single-shot phase measurements for single-photon wave packets in two dimensions 22 using quantum feedback with Gaussian measurements, and paves the way for the realization of optimized feedback measurements approaching the canonical phase measurement for higher dimensional states based on non-Gaussian operations.

DATA AVAILABILITY
The data that support the findings of this study are available from the authors upon request.

CODE AVAILABILITY
Our numerical results have been implemented using MATLAB and Julia language. A functions to reproduce the numerical results can be publicly found in the repository 89 . To obtain a set of estimations produced by the non Gaussian strategy, use the program: Bootstrap_phase_estimation.jl, setting the desired values of L, Bootstrap size, MPN, in the Adapt_Steps, Boot_Reps, and MPNS variables.