Iterative Quantum Amplitude Estimation

We introduce a new variant of Quantum Amplitude Estimation (QAE), called Iterative QAE (IQAE), which does not rely on Quantum Phase Estimation (QPE) but is only based on Grover's Algorithm, which reduces the required number of qubits and gates. We provide a rigorous analysis of IQAE and prove that it achieves a quadratic speedup compared to classical Monte Carlo simulation. Furthermore, we show with an empirical study that our algorithm outperforms other known QAE variants without QPE, some even by orders of magnitude, i.e., our algorithm requires significantly fewer samples to achieve the same estimation accuracy and confidence level.


I. INTRODUCTION
Quantum Amplitude Estimation (QAE) [1] is a fundamental quantum algorithm with the potential to achieve a quadratic speedup for many applications that are classically solved through Monte Carlo (MC) simulation. It has been shown that we can leverage QAE in the financial service sector, e.g., for risk analysis [2,3] or option pricing [4][5][6], and also for generic tasks such as numerical integration [7]. While the estimation error of classical MC simulation scales as O(1/ √ M ), where M denotes the number of (classical) samples, QAE achieves a scaling of O(1/M ) for M (quantum) samples, which implies the aforementioned quadratic speedup.
The canonical version of QAE is a combination of Quantum Phase Estimation (QPE) [8] and Grover's Algorithm. Since other QPE-based algorithms are believed to achieve exponential speedup, most prominently Shor's Algorithm for factoring [9], it has been speculated as to whether QAE can be simplified such that it uses only Grover iterations without a QPE-dependency. Removing the QPE-dependency would help to reduce the resource requirements of QAE in terms of qubits and circuit depth and lower the bar for practial applications of QAE.
Recently, several approaches have been proposed in this direction. In [10] the authors show how to replace QPE by a set of Grover iterations combined with a Maximum Likelihood Estimation (MLE), in the following called Maximum Likelihood Amplitude Estimation (MLAE). In [11], QPE is replaced by the Hadamard test, analog to Kitaev's Iterative QPE [12,13].
Both [10] and [11] propose potential simplifications of QAE, but do not provide rigorous proofs of the correctness of the proposed algorithms. In [11], it is not even clear how to control the accuracy of the algorithm other than possibly increasing the number of measurements of the evolving quantum circuits. Thus, the potential quantum advantage is difficult to compare and we will not * Electronic address: wor@zurich.ibm.com discuss it in the remainder of this paper.
In [14], another variant of QAE was proposed. There, for the first time, it was rigorously proven that QAE without QPE can achieve a quadratic speedup over classical MC simulation. Following [14], we call this algorithm QAE, Simplified (QAES). Although this algorithm achieves the desired asymptotic complexity, the involved constants are very large, and likely to render this algorithm impractical unless further optimized -as shown later in this manuscript.
In the following, we propose a new version of QAE -called Iterative QAE (IQAE) -that achieves better results than all other tested algorithms. It provably has the desired asymptotic behavior up to a multiplicative log(2/α log 3 (3π/20 )) factor, where > 0 denotes the target accuracy, and 1 − α the resulting confidence level.
Like in [14], our algorithm requires iterative queries to the quantum computer to achieve the quadratic speedup and cannot be parallelized. Only MLAE allows the parallel execution of the different queries as the estimate is derived via classical MLE applied to the results of all queries. Although parallelization is a nice feature, the potential speedup is limited. Assuming the length of the queries is doubled in each iteration (like for canonical QAE and MLAE) the speedup is at most a factor of two, since the computationally most expensive query dominates all the others.
With MLAE, QAES, and IQAE we have three promising variants of QAE that do not require QPE and it is of general interest to empirically compare their performance. Of similar interest is the question whether the the canonical QAE with QPE -while being (quantum) computationally more expensive -might lead to some performance benefits. To be able to better compare the performance of canonical QAE with MLAE, QAES, and IQAE, we extend QAE by a classical MLE postprocessing based on the observed results. This improves the results without additional queries to the quantum computer and allows us to derive proper confidence intervals.
The remainder of this paper is organized as follows. Sec. II introduces QAE in its canonical form, its considered variants, as well as the proposed MLE postpro-cessing. In Sec. III, we introduce IQAE and provide the corresponding theoretical results. Empirical results, comparing the performance of the different algorithms on various test cases, are reported in Sec. IV and illustrate the efficiency of our new algorithm. To conclude, we discuss our results and open questions in Sec. V.

II. QUANTUM AMPLITUDE ESTIMATION
QAE was first introduced in [1] and assumes the problem of interest is given by an operator A acting on n + 1 qubits such that where a ∈ [0, 1] is the unknown, and |ψ 0 n and |ψ 1 n are two normalized states, not necessarily orthogonal. QAE allows to estimate a with high probability such that the estimation error scales as O(1/M ), where M corresponds to the number of applications of A. To this extent, an operator . In the following, we denote applications of Q as quantum samples or oracle queries. The canonical QAE follows the form of QPE: it uses m ancilla qubits -initialized in equal superposition -to represent the final result, it defines the number of quantum samples as M = 2 m and applies geometrically increasing powers of Q controlled by the ancillas. Eventually, it performs a QFT on the ancilla qubits before they are measured, as illustrated in Fig. 1. Subsequently, the measured integer y ∈ {0, . . . , M − 1} is mapped to an anglẽ θ a = yπ/M . Thereafter, the resulting estimate of a is defined asã = sin 2 (θ a ). Then, with a probability of at least 8/π 2 ≈ 81%, the estimateã satisfies which implies the quadratic speedup over a classical MC simulation, i.e., the estimation error = O(1/M ). The success probability can quickly be boosted to close to 100% by repeating this multiple times and using the median estimate [2]. These estimatesã are restricted to the grid sin 2 (yπ/M ) : y = 0, . . . , M/2 through the possible measurement outcomes of y. Alternatively, and similarly to MLAE, it is possible to apply MLE to the observations for y. For a given θ a , the probability of observing |y when measuring the ancilla qubits is derived in [1] and given by where ∆ is the minimal distance on the unit circle between the angles θ a and πỹ/M , andỹ = y if y ≤ M/2 and y = M/2 − y otherwise. Given a set of y-measurements, this can be leveraged in an MLE to get an estimate of θ a that is not restricted to grid points. Furthermore, it allows to use the likelihood ratio to derive confidence intervals [15]. This is discussed in more detail in Appendix A. In our tests, the likelihood ratio confidence intervals were always more reliable than other possible approaches, such as the (observed) Fisher information. Thus, in the following, we will use the term QAE for the canonical QAE with the application of MLE to the y measurements to derive an improved estimate and confidence intervals based on the likelihood ratio.
All variants of QAE without QPE -including oursare based on the fact that where θ a is defined as a = sin 2 (θ a ). In other words, the probability of measuring |1 in the last qubit is given by The algorithms mainly differ in how they derive the different values for the powers k of Q and how they combine the results into a final estimate of a. MLAE first approximates P[|1 ] for k = 2 j and j = 0, 1, 2, . . . , m − 1, for a given m, using N shots measurements from a quantum computer for each j, i.e., in total, Q is applied N shots (M − 1) times, where M = 2 m . It has been shown in [10] that the corresponding Fisher information scales as O(N shots M 2 ), which implies a lower bound of the estimation error scaling as O(1/( √ N shots M )). Crucially, [10] does not provide an upper bound for the estimation error. Confidence intervals can be derived from the measurements using, e.g., the likelihood ratio approach, see Appendix A.
In contrast to MLAE, QAES requires the different powers of Q to be evaluated iteratively and cannot be parallelized. It iteratively adapts the powers of Q to successively improve the estimate and carefully determines the next power of Q. However, instead of a lower bound, a rigorous error bound is provided. QAES achieves the optimal asymptotic query complexity O(log(1/α)/ ), where α > 0 denotes the probability of failure. In contrast to the other algorithms considered, QAES provides a bound on the relative estimation error. Although the algorithm achieves the desired asymptotic scaling, the constants involved are very large -likely too large for practical applications unless they can be further reduced.
In the following, we introduce a new variant of QAE without QPE. As for QAES, we provide a rigorous performance proof. Although our algorithm only achieves the quadratic speedup up to a multiplicative factor log(2/α log 3 (3π/20 )), the constants involved are orders of magnitude smaller than for QAES. Note that in practice this factor is small for any reasonable target accuracy and confidence level 1 − α, as we will show in IV.

III. ITERATIVE QUANTUM AMPLITUDE ESTIMATION
IQAE leverages similar ideas as [10,11,14] but combines them in a different way, which results in a more efficient algorithm while still allowing for a rigorous upper bound on the estimation error and computational complexity. As mentioned before, we use the quantum computer to approximate P[|1 ] = sin 2 ((2k+1)θ a ) for the last qubit in Q k A |0 n |0 for different powers k. In the following, we outline the rationale behind IQAE, which is formally given in Alg. 1. The main sub-routine Find-NextK is outlined in Alg. 2.
Suppose a confidence interval [θ l , θ u ] ⊆ [0, π/2] for θ a and a power k of Q as well as an estimate for sin 2 ((2k + 1)θ a ). Through exploiting the trigonometric identity sin 2 (x) = (1 − cos(2x))/2, we can translate our estimates for sin 2 ((2k + 1)θ a ) into estimates for cos((4k + 2)θ a ). Unlike in Kitaev's Iterative QPE, we cannot estimate the sine (only its square), and the cosine alone is only invertible without ambiguity if we know the argument is restricted to either [0, π] or [π, 2π], i.e., the upper or lower half-plane. Thus, we want to find the largest k such that the scaled interval [(4k + 2)θ l , (4k + 2)θ u ] mod 2π is fully contained either in [0, π] or [π, 2π]. If this is given, we can invert cos((4k + 2)θ a ) and improve our estimate for θ a with high confidence. This implies an upper bound of k, and the heart of the algorithm is the procedure used to find the next k given [θ l , θ u ], which is formally introduced in Alg. 2 and illustrated in Fig. 2. In the following theorem, we provide convergence results for IQAE that imply the aforementioned quadratic speedup. The respective proof is given in Appendix B.
Theorem 1 (Correctness of IQAE). Suppose a confidence level 1 − α ∈ (0, 1), a target accuracy > 0, and a number of shots N shots ∈ {1, ..., N max ( , α)}, where In this case, IQAE (Alg. 1) terminates after a maximum number of log 2 (π/4 ) rounds, where we define one round as a set of iterations with the same k i , and each round consists of at most N max ( , α)/N shots iterations.
and returns [a l , a u ] with a u − a l ≤ 2 and Thus,ã = (a l + a u )/2 leads to an estimate for a with |a −ã| ≤ with a confidence of 1 − α. Furthermore, for the total number of Q-applications, N oracle , it holds that Note that the maximum number of applications of Q given in Thm. 1 is a loose upper bound since the proof uses Chernoff bound to estimate intermediate confidence intervals in Alg. 1. Using more accurate techniques instead -such as Clopper-Pearson's confidence interval for Bernoulli distributions [17] -can significantly lower the constant in N max but is more complex to analyze theoretically. In Sec. IV, we demonstrate this empirically and while K ≥ 2Ki do return (ki, upi) // return old value provide an average estimate for N max ( , α) based on the Clopper-Pearson instead of the Chernoff bound. In Alg. 2, we require that K i+1 /K i ≥ 2, otherwise we continue with K i . The choice of the lower bound is somewhat arbitrary and our proof can be adjusted to any value strictly larger than one. However, in practice, the chosen lower bound was working very well.
Thm. 1 provides a bound on the query complexity, i.e., the total number of oracle calls with respect to the target accuracy. However, it is important to note that the computational complexity, i.e., the overall number of operations, including classical steps such as all applications of FindNextK and computing the intermediate confidence intervals, scales in exactly the same way.

IV. RESULTS
In this section, we empirically compare IQAE, MLAE, QAES, QAE, and classical MC with each other and determine the total number of oracle queries necessary to achieve a particular accuracy. We are only interested in measuring the last qubit of Q k A |0 n |0 for different powers k, and we know that P[|1 ] = sin 2 ((2k + 1)θ a ). Thus, for a given θ a and k, we can consider a Bernoulli distribution with corresponding success probability or a single-qubit R y -rotation with angle 2(2k + 1)θ a to gen- erate the required samples. All algorithms mentioned in this paper are implemented and tested using Qiskit [18] in order to be run on simulators or real quantum hardware, e.g., as provided via the IBM Q Experience.
For IQAE and MC, we compute the (intermediate) confidence intervals based on Clopper-Pearson [17]. For QAE and MLAE, we use the likelihood ratio [15], cf. Appendix A. For QAES, we report the chosen (multiplicative) target accuracy.
To compare all algorithms we estimate a = 1/2 with a 1 − α = 95% confidence interval. For IQAE, MLAE, and QAE, we set N shots = 100. As shown in Fig. 3, IQAE outperforms all other algorithms. QAES, even though achieving the best asymptotical behavior, performs worst in practice. On average, QAES requires about 10 8 times more oracle queries than IQAE which is even more than for classical MC simulation with the tested target accuracies. MLAE performs comparable to IQAE, however, the MLE becomes numerically challenging with increasing m, which results in a rather limited number of data points compared to the other algorithms. Lastly, QAE with MLE-postprocessing performs a bit worse than IQAE and MLAE, which answers the question raised at the beginning: Applying QPE in the QAE setting does not lead to any advantage but only increases the complexity, even with an MLE-postprocessing. Thus, using IQAE instead does not only reduce the required number of qubits and gates, it also improves the performance. Note that the MLE problem resulting from canonical QAE is significantly easier to solve than the problem arising in MLAE, since the solution can be efficiently computed with a bisection search, see Appendix A. However, to evaluate QAE we need to simulate an increasing number of (ancilla) qubits, even for the very simple problem considered here, which makes the simulation of the quantum circuits more costly.
In the remainder of this section we analyze the performance of IQAE in more detail. In particular, we empirically analyze the total number of oracle queries when using the Clopper-Pearson confidence interval as well as the resulting k-schedules.
More precisely, we run IQAE for all a ∈ {i/100 | i = 0, . . . , 100} discretizing [0, 1], for all ∈ {10 −i | i = 3, . . . , 6}, and for all α ∈ {1%, 5%, 10%}. We choose N shots = 100 for all experiments. For each combination of parameters, we evaluate the resulting number of total oracle calls N oracle and compute i.e., the constant factor of the scaling with respect to and α. We evaluate the average as well as the worst case over all considered values for a. The results are illustrated in Fig. 4 and show that the worst case overhead in this experiment is below ten, while the average overhead is below five. Thus, the empirical complexity analysis of IQAE leads to: where N avg oracle denotes the average and N wc oracle the worst case complexity, respectively, and much smaller than the bound derived in Thm. 1. Furthermore, the worst case factors seem to decrease with α, while the average factors seem to be more stable.
To analyze the k-schedule, we set a = 1/2, = 10 −6 , α = 5% and again N shots = 100. Fig. 5 shows for each iteration the resulting average, standard deviation, minimum, and maximum of K i+1 /K i , over 1, 000 repetitions of the algorithm, for the K i defined in Alg. 2. It can be seen that N shots = 100 seems to be too small for the first round, i.e., another iteration with the same K i is necessary before approaching an average growth rate slightly larger than four.

V. CONCLUSION AND OUTLOOK
We introduced Iterative Quantum Amplitude Estimation, a new variant of QAE that realizes a quadratic speedup over classical MC simulation. Our algorithm does not require QPE, i.e., it is solely based on Grover iterations, but still allows us to prove rigorous error and convergence bounds. We demonstrate empirically that our algorithm outperforms the other existing variants of QAE, some even by several orders of magnitude. This development is an important step towards applying QAE on quantum hardware to practically relevant problems and achieving a quantum advantage.
Our algorithm achieves the quadratic speedup up to a log(2/α log 3 (3π/20 ))-factor. In contrast, QAES, the other known variant of QAE without QPE and with a rigorous convergence proof, achieves optimal asymptotic complexity at the cost of very large constants. It is an open question for future research whether there exists a variant of QAE without QPE that is practically competitive while having an asymptotically optimal performance bound. Another difference between IQAE and QAES is the type of error bound: IQAE provides an absolute and QAES a relative bound. Both types are relevant in practice, however, in the context of QAE, where problems often need to be normalized, a relative error bound is sometimes more appropriate. We leave the question of a relative error bound for IQAE open to future research.
Another open question for further investigation is the optimal choice of parameters for IQAE. We can set the required minimal growth rate for the oracle calls as well as the number of classical shots per iteration and both affect the performance of the algorithm. Determining the most efficient setting may further reduce the required number of oracle calls for a particular target accuracy.
As we have demonstrated in Sec. IV, there is a significant gap between the bound on the total number of oracle calls provided in Thm. 1 and the actual performance due to the different methods used to estimate confidence intervals. This raises the question whether a tighter analytic bound is possible.
To summarize, we introduced and analyzed a new variant of QAE without QPE that outperforms the other known variants and we provide a rigorous convergence theory. This helps to reduce the requirements on quantum hardware and is an important step towards leveraging quantum computing for real-world applications.

VI. CODE AVAILABILITY
The mentioned algorithms are available open source as part of Qiskit and can be found in https://github. com/Qiskit/qiskit-aqua/. Tutorials explaining the algorithm and its application are located in https:// github.com/Qiskit/qiskit-iqx-tutorials.   i = 1, . . . , N sampled from it, MLE is a method to obtain an estimateθ for θ. This is done by maximizing the likelihood L, which is a measure for how likely it is to observe the data {x i } i , given that θ is the true parameter. Numerically it is often favourable to maximize the log-likelihood log L.
In QAE, we try to approximate a with the MLEâ. The probability distribution f is the probability to sample a certain grid point in one measurement. From [1], this  Fig. 6a for a visualization of the probability distribution fitted to the QAE samples and Fig. 6b for the respective log-likelihood function log L.
To find the maximum of log L without much overhead, we exploit the information given by the QAE output: if QAE is successful its estimate is the closest grid point to a. Thus, we only have to search the intervals of the neighbouring grid points to find the exact a, which is done using a bisection search. Note that for N → ∞, this search would return the exact amplitude, i.e.â = a, independent of the number of qubits m.
Confidence intervals for the MLE can be derived using the Fisher information in combination with the centrallimit theorem or with the likelihood ratio (LR) [15]. In our tests, the LR was more reliable than other approaches such as (observed) Fisher information. Due to the databased definition of the LR confidence intervals, the better performance fits the expectations [19,20]. The LR confidence interval uses the fact that for large sample numbers N the LR statistic is approximately χ 2 -distributed, with one degree of freedom: The statistic can be used to conduct a two-sided hypothesis test with the null hypothesis H 0 :â = a, which is rejected at the α level if the LR statistic exceeds the (1 − α) quantile of the χ 2 distribution, q χ 2 1 (1 − α). The corresponding confidence interval is a ∈ [0, 1] : log L(a ) ≥ log L(â) − q χ 2 1 (1 − α)/2 . Using the likelihood function for the QAE samples we obtain confidence intervals for the MLEâ of a. We apply the same approach to derive confidence intervals for MLAE.
Appendix B: Proof of Theorem 1 Proof. Let us outline the strategy first. We are going to use the union bound to combine the estimates from different rounds of the algorithm and an upper bound T for the actual number of rounds t required to derive an upper bound for the total query complexity in terms of the desired precision for the parameter a and confidence level 1 − α.
Suppose a given confidence interval [a l , a u ] for a, and recall that a = sin 2 (θ a ). This implies Thus, to achieve |a u − a l |/2 ≤ , it suffices to achieve |θ u − θ l |/2 ≤ for our estimate of θ a . Suppose that in round i our knowledge of θ a is the confidence interval [θ l , θ u ], and we just applied the Grover operator k i times. Recall that the application of Q ki to A |0 effectively multiplies the angle θ a by K i = 4k i + 2. Observe, that if there exists j ∈ {0, . . . , 5}, such that then we can always find a new K i+1 , such that q i := K i+1 /K i ≥ 3. Analogously, if we know that there exists j ∈ {0, . . . , 9}, such that then we can always find a new K i+1 , such that q i ≥ 5.
In both cases, we have q i ≥ 3. The only case where we cannot be sure that q i ≥ 3 is when the boundaries of the interval [θ min i , θ max i ] lie strictly in different intervals, e.g., in [0, π/3] and [π/3, 2π/3], and correspondingly in [0, π/5] and [2π/5, 3π/5], which is depicted by the red interval in Fig. 7. In order to ensure that at least one of the conditions (B2) or (B3) is true, it is sufficient that for all i we have We will determine the maximally required number of shots N max to ensure this condition later in the proof. Since a ∈ [0, 1], we know that θ a ∈ [0, π/2]. In addition, let us assume condition (B4) is always satisfied. Then, the length of the confidence interval of θ a decreases by a multiplicative factor of q i ≥ 3 in every round i. Note, however, that at the very beginning, we may need to keep k 1 = 0, i.e. K 1 = 2, for a sufficient number of shots N max , such that we get the width of the confidence interval of θ a below 2L.
This allows us to derive an upper bound T for the actually required number of rounds t to achieve the desired absolute error for θ a . First, given the actual schedule Secondly, we define T as follows: i.e. T := log 3 (3L/K 1 ) = log 3 (π/20 ) . Note, that this definition together with (B5) and q i ≥ 3 leads to the following estimate: which itself implies 3 T −1 > 3 t−2 , i.e T ≥ t, as it should be.
Let us introduce at this point the variables a i for each round i, i.e., the probability we approximate and the corresponding confidence intervals [a min If we require then the union bound asserts that the main condition of the theorem -the guarantee for the total error probability being bounded by α -is satisfied: From (B9) we deduce: then from (B12) and from the fact sin 2 ( θi ) ≤ sin 2 (θ i ) it follows that sin 4 ( θi ) ≤ sin 2 (θ i ) sin 2 ( θi ) = 4 2 ai = sin 4 (L), (B14) which implies θi ≤ L for θ i , L ≤ π/2, i.e., condition (B4) is satisfied. It follows that the maximum number of shots required is given by N max ( , α) = 12 sin 4 (L) log 2T α < 100520 log 2 α log 3 3π 20 .
Remark 1. In Thm. 1 and Alg. 1 we used T := log 2 (π/4 ) , which is different to the value that appeared in the proof. There are two reasons for this: stronger requirement in the proof (q i ≥ 3, in Alg. 2 we only require q i ≥ 2) and the fact, that in practice, we do not always have to shrink the confidence interval [θ min i , θ max i ] to a length smaller than 2L. In fact, in Alg. 2 we can require q i ≥ r for any r > 1. This will modify the formula for T and we can set T := log r (rπ/8 ) . The bounds and parameters given in the proof are worst case bounds, i.e., they provide the maximum number of shots such that we can derive an upper bound for the number of rounds as well as the number of iterations per round.
Remark 2. We emphasize, that the proof provides a loose upper bound on the number of shots N max . This was essentially done to ensure that condition q i ≥ 3 is satisfied. In practice, we can make significantly less number of measurements N shots < N max in a given round until Alg. 2 finds q i big enough. In that case, a given round may consist of several iterations of the while-loop in Alg. 1. Furthermore, we can get better constants in the proof if we use Clopper-Pearson interval bound for i.i.d. Bernoulli random variables instead of the Chernoff bound. However, analytic treatment of this case is either impossible or highly involved, so we instead provide numerical evidence for the better performance of the Clopper-Pearson case in Sec. IV.