A Quantum Solution for Efficient Use of Symmetries in the Simulation of Many-Body Systems

A many-body Hamiltonian can be block-diagonalized by expressing it in terms of symmetry-adapted basis states. Finding the group orbit representatives of these basis states and their corresponding symmetries is currently a memory/computational bottleneck on classical computers during exact diagonalization. We apply Grover's search in the form of a minimization procedure to solve this problem. Our quantum solution provides an exponential reduction in memory, and a quadratic speedup in time over classical methods. We discuss explicitly the full circuit implementation of Grover minimization as applied to this problem, finding that the oracle only scales as polylog in the size of the group, which acts as the search space. Further, we design an error mitigation scheme that, with no additional qubits, reduces the impact of bit-flip errors on the computation, with the magnitude of mitigation directly correlated with the error rate, improving the utility of the algorithm in the Noisy Intermediate Scale Quantum era.


I. INTRODUCTION
As several quantum computing platforms become available for general use, finding practical applications for quantum computers is a key driver for the development and adoption of quantum computing technology. Additionally, since the field is expected to remain in the Noisy Intermediate Scale Quantum (NISQ) era [1] for the next few decades, designing error mitigation strategies for these algorithms is essential. In this paper, we identify a new application for quantum computers, as well as show how the algorithm should be implemented in the NISQ era.
Much of the excitement around quantum computing started with the introduction of two algorithms: Shor's factorization algorithm [2] and Grover's search algorithm [3]. Though the former represents the paradigmatic example of quantum speed up, the latter has been criticized as often only nominally showing speed-up. The criticism stems from the fact that although the oracle-query scaling is polynomially reduced, any quantum oracle which contains all the information of the database must scale with the size of the database [4]. This suggests we must look to problems where the oracle in Grover's search can be applied efficiently, treating it as a means to invert a Boolean function.
Dürr and Høyer [5] suggested a use for Grover's algorithm as a method to find the minimal element of a database. The general idea is to hold the bestknown minimum value and search for a member less than that. If a better value is found, the bestknown value is updated and the process is repeated for a set number of oracle calls. Assuming the oracle can be efficiently implemented, such a process might not be ideal in all cases as it still scales expo-nentially compared to approximation schemes such as adiabatic evolution and related minimization processes such as quantum approximate optimization algorithm (QAOA) [6]. However, as the names suggest, these are only approximate methods. Furthermore, adiabatic evolution is sensitive to phase transitions due to a closing gap, and QAOA may require significant classical computational overhead. These limitations ultimately stem from the fact that such methods are sensitive to not just order, but also 'distance.' Grover minimization (Gmin) on the other hand is only dependent on the order. It treats the minimum the same whether it's separated from the next largest value by 1 or 100. This suggests that in special cases where an exact minimum is required or where we wish to ignore distance (or there is no notion of distance), Gmin is a good alternative.
We present one such problem which occurs in simulation of strongly-correlated materials, such as when performing exact diagonalization. A model Hamiltonian for some material often contains symmetries such as translation or rotation which can be formalized as a group. One can leverage these symmetries by using group representation theory to block-diagonalize the full Hamiltonian [7], making the remaining diagonalization computationally cheaper. However, to calculate the block-diagonal matrix elements, each basis state must be associated to an orbit representative which for convenience is chosen as the one labeled with the smallest integer value. One must also know the group operator connecting a basis state to its representative [8]. Solving this problem has become a serious bottleneck for using symmetries in exact diagonalization problems as either one has to store these values explicitly which becomes costly in terms of memory, or calculate them on-the-fly, which is computation-ally expensive. An alternative method is proposed in Ref. [8] using a divide & conquer method based upon sub-lattice coding. This splits the costs between memory and computational time, but only reduces the time by a constant factor, and the memory by a polynomial amount.
In this paper, we consider the use of Gmin for this problem, which results in a quadratic speed-up over the classical algorithms, and requires virtually no classical memory and relatively little quantum memory. We improve upon the textbook version of Gmin to optimize the number of oracle calls and reduce the number of qubits required to implement the oracle. Furthermore, we show that for many reasonable problem instances, the oracle is poly-log in the size of the group and dimension of the Hamiltonian's Hilbert space assuming the group action generators can be efficiently simulated on a quantum computer, making this a practical use for Grover's algorithm. We consider the full circuit implementation for a benchmark case as well as the effects of error on the performance of the algorithm. Our errormitigation scheme based on real-time post-selection on measurement results between coherent steps of the algorithm represents a near-term use for prefault tolerant quantum computing. Furthermore, using Gmin as a sub-routine in classical exact diagonalization is an example of the power of interfacing quantum and classical machines for hybrid algorithms. Alternately, we envision that this algorithm could also be used as a sub-routine which generates the matrix entries of a larger quantum algorithm using symmetry-adapted basis states to simulate a strongly-correlated quantum system. The remainder of the paper is structured as follows: In Section II, we introduce the problem of finding the orbit representative, give an overview of the existing classical solutions, and then describe in detail our quantum algorithm, including the full circuit description and an analysis of the running time of the algorithm. Section III shows results from the simulation on the Intel Quantum Simulator [9]. Section IV discusses our error mitigation strategies in the presence of noise and their numerical simulation. We conclude in Section VI.

II. OVERVIEW OF THE PROBLEM AND THE QUANTUM SOLUTION
In this section, we describe the problem of finding the group orbit representative with some comments on the classical methods which are used to solve it. We then purpose a quantum method based on Gmin which exponentially reduces the memory cost while yielding a quadratic reduction in compu-tational time.

A. Orbit Representative Problem Statement
For a detailed discussion of the use of the following procedure, we refer the reader to [8] and references therein.
Problem statement: suppose we have some finite group G with a group action G × V → V such that (g, v) → gv. We shall refer to V as the position set and its members positions, though they may not correspond to physical position, but rather index some basis set for a Hamiltonian's Hilbert space. Furthermore, we have some function int(v) which totally orders the set V . We assume int maps to the integer value used to label v 1 . Define the orbit of a position orbit(v) = {gv : for all g ∈ G}, which is represented byṽ ∈ orbit(v) such that for all u ∈ orbit(v), int(ṽ) ≤ int(u), i.e. it is the smallest element.
Given a member v ∈ V , we wish to find the orbit representativeṽ as well as the group element which gives that representative, i.e find g v such that g v v =ṽ.
In the general case, one expects that log |G| log |V | ≤ |G| |V |. Table I gives a list of the solutions to this problem including Gmin and compares the costs. We denote the classical time complexity cost of computing the group action on an arbitrary member of V by C(G) and in general, the quantum time-complexity cost of implementing an operator A on a quantum computer as C(A).

B. Classical Solutions
There are three classical means of addressing this problem: 1. Look-up: Store orbit representatives corresponding to every element in V and connecting group elements in a look-up table. This can then be efficiently searched when needed, but it requires O(|V |) amount of memory.
2. On-the-fly: When needed, calculate the full orbit to find the smallest element and the connecting group element. This is efficient in terms of memory, but the computation scales as O(|G|). |   Table I: List of the different methods for solving the group representative problem. We generally expect that log |G| log |V | ≤ |G| |V |. C(G) is the classical cost to calculate the action of G on an arbitrary v, while C(G) is the quantum cost to calculate the action of G on an arbitrary v.

Cl. Mem Q Mem Time
3. Divide & conquer: There exist sub-lattice coding methods [8], which allow one to split the costs between memory and computation (see Table I for these costs).
While the divide & conquer method represents a significant reduction in the resources needed, this bottleneck can still be prohibitively expensive. To the best of our knowledge, no one has considered using quantum methods for solving this problem as we discuss in the next section.

C. Overview of the Grover Minimization Algorithm
In this section we look to use the Gmin algorithm to solve the problem. We first review the algorithm as given in Ref. [5] and then adapt it for this problem which includes modifications to optimize the memory and time costs.
Gmin utilizes the function f v : G → V such that g → f v (g) = int(gv) acting on an unsorted database of |G| items; g acts as an index and we want to find the index which points to the smallest value in f v 2 . To encode the group, we also introduce an index on the group elements g : N <|G| → G such that x → g(x) . Then the number of bits (qubits) needed to index all members of the group is m = O(log |G|). The original algorithm proceeds as follows: Let α be some real, positive number which we refer to as the oracle budget parameter from which we define α |G| as the oracle budget. Using two quantum registers each of size m (referred to as the group registers), choose an index 0 < y < |G|−1 randomly, and repeat the following, using no more than α |G| Grover steps: 1. Initialize the two registers in the state 3. Apply a "Grover search with an unknown number of marked elements" (Gsun) [10] to the first register and 4. measure the first register with outcome y ; if It is argued in the reference that for α = 45 2 , the second register holds the minimum value with a probability of at least 50%. Below, we discuss how to relate the success rate and α using numerical methods. Appendix A gives a modified analytic derivation such that one finds a better value of α = 45 8 to achieve a success rate of at least 50%.
To make this algorithm more explicit, we must address how to implement the second and third steps, which is equivalent to a method for implementing Gsun and its oracle. In general for Grover search, if the number of marked elements is known, one can apply the exact number of Grover steps to reach one of the marked states with high probability. However, this probability is not monotonic with the number of oracle calls. One can "overshoot" the target state and reduce the probability of reaching the answer with additional oracle calls. Thus, not knowing the number of marked elements could be problematic if we don't include some additional procedures. We refer those unfamiliar with Grover's search algorithm to Refs. [3,11] for details. Ref. [10] provides a solution given by Gsun. Gsun iterates the search and randomly draws the number of Grover steps from a running interval. Those authors prove that the probability of selecting a marked element is asymptotically bounded below by 1 4 , thus insuring we can find a marked element with probability greater that 50% after a number of oracle calls that still scales as |G|.
To mark elements as in step two, we must define the oracle. According to Refs. [3,10], marking an element means the oracle produces the action on any computational basis state |x , Note the second step requires we calculate f v (x) and f v (y) which implies we also require quantum registers to hold these values. There may exist multiple methods for implementing such an oracle, but the simplest and perhaps cheapest method for our problem is to further hold the value v in a quantum register of size n = O(log |V |) which we refer to as the first position register. Furthermore, we replace the second group register with a second position register of size n. So our method is not to store the best-known value for the group index (y in the above algorithm) as was done in previous implementations of Grover minimization, but rather storeṽ best = f v (y) in a quantum register. y can then be stored classically and updated whenṽ best is updated. This innovation reduces the number of gates and qubits required for the oracle. The oracle is then implemented as follows: We first implement the group action operatorĜ on the group register and the first position register which has been initialized with v such that We then apply a quantum circuit that in general acts on two quantum registers of equal size such that it applies a negative sign to the state if the computational basis state of the first register is less than that of the second. We refer to this circuit as phase comparator (PhComp) which has the behavior So after applying the group action operator, we apply PhComp to the two position registers, and then uncompute the group action operator. This completes the oracle as show in Fig. 1. To complete one Grover step (Grov), we then apply the usual reflection operator defined as where |s = 1 √ |G| x |x and V is any unitary such that V |0 = |s . For completeness, the circuit for Grov is shown in Fig. 2.
If we unpack Gsun and integrate this into our modified version of Gmin, the psuedo-code flow of the algorithm is shown in Algorithm 1.
(Grov) p |ψG |ψ1 |ψ2 11: Measure(x ← |ψG ) 12: if fv(x) < v best then 13: v best ← fv(x) 14: x best ← x 15: t ← max(1, βt) 16: else 17: t ← min(γt, |G|) 18: end if 19: end while 20: return v best , x best Note that we have chosen the initializing best guess v best = v, as we assume v is effectively random. Also, we count the check step in line 12 as an effective oracle call so that the classical and quantum solutions can be more accurately compared. γ ∈ 1, 4 3 and β ∈ [0, 1] are additional parameters which we use to minimize α. γ is discussed in Ref. [10] and controls the rate of the exponential "ramp-up" for the parameter t which in turn determines the ceiling of the random sampling for number of oracle calls used in the Grover search step of the algorithm. In principle, a large γ reduces the time to reach t ∼ |G| which is optimal if v best is near the minimum (the number of marked elements is small; the search takes longer). However if v best is far from the minimum, γ being too large and t ∼ |G| increases the chances that we apply too many oracle calls and dramatically overshoot a state of high overlap with a marked element. Thus, we need to balance the rate at which t increases by optimizing γ. β is a parameter which we introduce here. As the algorithm was originally written, after a better value of v best is found in line 13 of Algorithm 1, Gsun effectively ends and on the next cycle is re-called. Gsun then assumes it knows nothing about how close we are to the minimum by resetting the value of t back to 1 (as would be the case for β = 0). However, we do know something, namely that we are closer to the minimum than the iteration before (the number of marked elements has decreased). Thus we don't need the ramp-up time for t which is only included to address when we are far from the minimum. By including the β parameter, we are looking to exploit this limited knowledge about the number of marked elements. We discuss the exact values chosen for these parameters in Sec. II E.

D. Circuit Implementation of Grov and its Cost
We now discuss a full circuit implementation of all subroutines of Grov. AsĜ is specified by the problem instance, we only give an explicit implementation for the group G N add which represents addition modulo N = 2 n or translation on a cycle of N positions. Otherwise, we discuss a general strategy for more complicated realistic groups.
The simplest part of Grov to implement is the standard U s operator as defined in Eq. (4). As discussed in Ref. [10], if |G| = 2 n for some n, then V = H ⊗n is given by the Hadamard gate acting on every qubit of the group register. The remaining reflection is implemented by a controlled π phase gate on a computational 0 input, i.e. apply NOT to all qubits and then apply the multi-controlled Z gate. Finally, we uncompute everything but the multi-controlled Z gate. An example of this circuit is shown in Fig. 3. If |G| is not a power of 2, we only have to modify the change of basis given by V to some other change of basis operator such as the quantum Fourier transform (QFT). The cost of the We next consider an implementation of PhComp as defined in Eq. (3) by considering a bitwise comparison of the input registers. We start with the most significant bit of the binary expansion of a computational input value and proceed to the least significant. At the i th bit, we need to calculate two binary values, the first representing whether or not we should apply the π phase at the current bit, and the second representing whether or not we should continue to compare on the remaining lesser bits. That is, if the two bits differ, the value containing 1 is greater, so we need to prevent any additional phases from being apply on lesser bits. A truth table for this calculation is given in Table II for input bits a i and b i . From this, we find that (apply phase) i = a i b i conditioned on the truth (AND-ed with) all greater (continue) j = a j ⊕ b j bits for j > i. So our method of implementing PhComp is to NOT all qubits of the first register a and then compare from the most to least significant qubit. At the i th qubit, we calculate (continue) i on b i using CNOT, but not before calculating (apply phase) i in the phase with a multi-control Z gate between a i , b i and all the b j for j > i ( which now contain the (continue) bits). Finally, we uncompute the CNOT and NOT gates. An example circuit is shown in Fig 4. Assuming the cost of a multi-control Z gate scales linearly with the number of controls, the cost of PhComp is  ditional ancilla qubits available, we can use these to reduce C(PhComp) ∼ O(log |V |). See Appendix B for details. The form of the group action operator is entirely dependent on the group. We take the simplest case first which is an abelian group with a single cycle, whereby g(x) = g x for the group generator g. We assume we can form a circuit for the operatorĝ acting on a position register which achieveŝ We then controlĝ 2 i on the i th qubit of the group register as show in Fig. 5. This method can then be generalized to multi-cycle abelian groups by subdividing the group register so there is one subregister for each cycle and generate a circuit similar to Fig.  5 for each cycle. If the group is non-abelian, one has to consider a strategy for indexing powers of the generators and their order. For example, suppose the group is generated by two non-commuting operators g 1 and g 2 . Each generator forms its own abelian subgroup so we can use the same strategy for them separately and with their own sub-group register. Furthermore, the order for applying these  operators can be controlled by a single qubit, |order , using the circuit in Fig. 6. If |order = |0 , then the group operator applied is g x2 2 g x1 1 and if |order = |1 then the group operator applied is g x1 We note this may not be the most efficient method in terms of qubit use for the group register qubits. For example, if x 2 = 0, then the state of |order doesn't matter and so there are redundant index states in the group register. This is also the case if there are redundancies in the order of non-zero powers of the generators, i.e. g x1 , for some values of the indices. The most efficient method depends on the group, but we include this example to demonstrate that, in principle, one can handle non-abelian groups using roughly the same strategy as was used for abelian groups.
The scaling ofĜ is highly dependent on the group being used, but it should be clear that in many reasonable cases, the scaling should be C(Ĝ) ∼ log(|G|)C(ĝ) where we assume the generators can be implement at cost C(ĝ) ∼ C(ĝ n ) ∼ polylog(|V |) for any power n. That is, implementing the power of a generator must not scale with that power. To demonstrate the importance of this, consider the single-cycle abelian case. If we implementĝ 2 with two copy ofĝ and so on for the other powers, then Clearly, this is not efficient, and our oracle scales with the size of the search space. However, if the implementation of the powers ofĝ can be simplified so as to scale on the order ofĝ or less, then we achieve our desired scaling C(Ĝ) ∼ C(ĝ) log |G|. It is reasonable to believe this is possible in the general case. Suppose we take for granted the complexity of a quantum circuit corresponding to a periodic operator scales with the size of its period.ĝ 2 has half the period ofĝ andĝ 4 has half the period ofĝ 2 and so on. So one would expect thatĝ is actually the most expensive power to implement.
To make this discussion more concrete, consider the example of the group representing addition mod N = 2 n for some n which we denote G N add , i.e.
G N add |x |y = |x |x + y , where mod N is implicit. Implementing this operator using the methods discussed here 3 , one can useĝ add consisting of a sequence of multi-control NOT gates as shown in Fig. 7 (where we recall that g add |y = |y + 1 ). It is clear thatĝ 2 add is given by removing all gate action and control lines on the least significant bit, and so on for the other powers. For such a simple case, it's easy to see this simplification, but for a compiler which only moves commutative gates and considers local pattern matching, this dramatic reduction might go unexploited, so a manually-optimized implementation might be preferred.
It is worth considering the specific case of G for spin Hamiltonians as this is the most natural use for a quantum solution to this problem. The natural mapping of the problem would assign one qubit to each spin in the physical system, and most geometric symmetry generators such as those for translation or rotation (as opposed to spin symmetries), can be simulated by aĝ spin consisting of swap gates. For example, translation on a spin chain would use aĝ spin consisting of a cascade of nearest-neighbor swaps. Note that here and in general, C(ĝ spin ) ∼ O(|G spin |) = O(log |V |), but this is because the group is already exponentially small in the size of V . So in general we expect C(Ĝ spin ) ∼ |G spin | log |G spin |.
In terms of comparing costs with the classical onthe-fly method, we expect C(G) to be of the order of C(Ĝ) in which case the quantum solution out performs the classical one. The classical divide & conquer method has a smaller constant coefficient, so the quantum solution outperforms it for relatively larger group sizes, but always uses exponentially less memory.

E. Oracle Budget and Probability of Success
To complete the algorithm, we need to determine the constants α, β and γ. As we have the exact solution for the probability of success for a single Grover search [3,10] (line 9-11 of Algorithm 1), we are able to simulate the classical parts of the algorithm using G N add as the group in order to determine the behavior of these parameters. Note that, without error, the oracle query complexity is unaffected by the details of the group (aside from its size), so the following results should be general. As we know the solution to the orbit representative problem for this trivial example, we can run the simulation until the correct answer is obtained. This allows us to empirically determine the probability of success as a function of the total number of calls. For a window of probabilities P success ∈ [0.2, 0.995] 4 , we find that the asymptotic form of the probability for large N is given by where T is the number of oracle calls and a is the rate parameter which is a function of only β, γ and is empirically determined. By linearizing Eq.(8) with 1 a as the slope, we can calculate the rate parameter as is the case in Fig. 8 as well as demonstrate this is the correct asymptotic form. One can see that the R 2 -value of the linear regression asymptotically approaches 1 and the rate parameter approaches a constant for fixed β, γ. This allows us to determine the oracle budget parameter α. For a given application, if we allow for a probable error in the solution of > 0, then α is given by So we want to determine the values of β and γ such that we minimize a. Figure 9 shows a survey of a as a function of β and γ. From this, we have chosen γ = 1.15 and β = 0.95 as the near optimal values. This value of γ is near previously discussed values, where Ref.
[10] suggests 6 5 . However, β being near one suggests we gain a good deal of information knowing that the number of marked items has decreased from one call of Gsun to another. For comparison, if we use = 0.5 and a ≈ 2-4, the resulting oracle budget parameter is α ≈ 1.6-3.3 which is a considerable reduction compared to α ≈ 5.6 for the analytic value found in Appendix A. For applications which require a high probability of success i.e. = 0.01, we obtain α ∼ 4.3-8.6.

III. FULL SIMULATION FOR A PERFECT QUANTUM MACHINE
To check the behavior of Algorithm 1, we implement a full quantum simulation using the Intel Quantum Simulator (Intel-QS) [9] and G 2 n add as our group for n = 4-8. This requires 12-24 qubits using no additional ancilla to reduce the depth of the quantum circuits. Although G 2 n add is not a useful problem instance, it does maximize the group size relative to the number of positions, i.e. log |G| = log |V | and so this represents the most efficient benchmark using the fewest qubits. Furthermore just as with  the purely classical simulation from the last section, knowing the correct answer allows us to avoid choosing an oracle budget, and instead run the algorithm until the solution is found to determine the probability of success as a function of total number of calls 5 . As our simulation is exact, i.e. we are treating the quantum machine as perfect, the details of the group do not affect the results. All quantum subroutines are implemented according to the discussion in Section II D, where multi-controlled gates have been broken down to one-and two-qubit gates using methods from Ref. [13]. This was done to better simulate the algorithm acting on real hardware once noise is added in Section IV C. Figure 10 shows the probability of success as a function of oracle calls, where the insert shows an effective rate parameter. We note that the probability in Eq. (8) is asymptotically correct in the limit of large N and as such, the curves for these smaller group sizes do not fit this form well. Instead we define the effective rate parameter, a eff via where we treat P success as a function of oracle calls, T , and diff T is the difference between two successive values of T . Despite the poor fit, a eff is still indicative of the trends. We then determine error bars for a eff via where σ eff is the standard deviation of the expression which is averaged in Eq. (10) and M is the number of trials. From Fig. 10, we find the behavior as expected from the classical simulations. We notice however that the effective rate parameter is higher for the full simulation as compared to the classical simulation. Although not optimal, the effective rate parameter still suffices. For example, if we desired a 99% chance of success and we chose the rate parameter to be a = 4 (α ≈ 5.7), then the oracle budgets would be 23, 32, 45, 64 and 91, respectively, for the group sizes shown in Fig. 10. From the figure, we see that we would achieve nearly or better than our target 99% chance of success.

IV. ERROR MITIGATION STRATEGIES AND THEIR SIMULATION
One of the benefits of Gmin is that the best-known value for the minimum is alway monotonically decreasing with the number of oracle calls. Unlike Grover search, this is true even for a faulty implementation on an imperfect quantum machine. Furthermore, vagaries of faulty implementation are partially compensated for by the classical random sampling of the number of oracle calls for any single coherent Grover search. Put a different way, though we allot a set oracle budget, not all these calls are implemented in a single coherent step. This suggests Grover minimization is a reasonable use for near-term, noisy hardware. Still, noise has its costs. In this section, we describe some strategies for mitigating the cost of errors. We then simulate some of these methods to determine their effectiveness.

A. Strategies
We start by describing two error mitigation strategies. As mentioned, the approach to a solution is monotonic regardless of the error rates. Thus the most obvious method is to simply increase the oracle budget, leaving all else the same, a method we refer to as static error mitigation (SEM). The obvious downside to this method is that the increase in the oracle budget would reasonable need to scale with the size of the system -assuming roughly independent error rates for each qubit-in which case, we may lose our quantum advantage. This is supported by analytic results on Grover search with a faulty oracle in Refs. [14,15], where for certain toy error models, the polynomial quantum speed-up is either partially or entirely lost.
The other strategy takes advantage of the additional qubits which do not hold the search space. The two position registers are included only as a means of marking elements of the search space and implementing the oracle. As such, they should hold the same computational basis value at the beginning and end of a single call to Grov. This allows us to measure these registers without disturbing the coherence of the group register which is responsible for the quantum speed-up. Moreover, any terms in the full state of the system (as expanded in the computational basis) which hold values in the position registers which differ from v and v best are in error and measuring the correct values projects the system back to an un-errored, or at least less-errored state. Thus we suggest the following: at the end of any call to Grov, measure the two position registers. If their measured values differ from that of the classically stored values v and v best , we abort the remaining Grover steps on line 10 of Algorithm 1 and go back to step 8, for which the errored oracle calls do not count against our oracle budget. It is important to note that we do not randomly sample p again as this would introduce a bias toward smaller values of p as they are less likely to experience an error. We refer to this strategy as active error mitigation (AEM). This is because the total number of oracle calls, both errored and un-errored, is not fixed, but depends on the rate of error.
The downside of this method is that all the oracle calls up to the point an error is found still cost time which is now wasted due to the error state. To mitigate this waste, before restarting the Grover search, we measure the group register and continue to check to see if a better value is found. To do so is practically free (up to one additional effective oracle call to perform the check) and it can only increase our chances of finding the minimum, even if by a minuscule amount. Moreover, there is reason to believe this increase is significant and simulations bear this out. All together, the AEM version of the Gmin algorithm is presented in Algorithm 2.
Note we have added a hard stop for total number of oracle calls as given by δ to avoid infinite run-time. Ideally AEM "protects" the probability of success for a fixed oracle budget and a large range of error rates. That is, P success as a function of un-errored oracle calls (i.e. as a function of the c 1 count in Algorithm 2) takes the form of Eq. (8) with a rate parameter which is only weakly dependent on the error rates. Again, the downside is the non-deterministic run-time which can bloat if the error rate is too high.

B. Heuristic Argument for Necessary Scaling of Qubit Lifetime with AEM
In this section, we give a heuristic argument for the necessary scaling of the coherence time such that we maintain the quantum speed-up.
In order for AEM Gmin to produce a quantum speed-up, there must be a significant chance that some number of oracle calls p ∼ |G| experiences no measured error, i.e.
Prob (Grov p succeeds) > s, (12) for some constant s ∈ (0, 1). The key for AEM is that after a measurement where no error is detected, both position registers are projected onto their correct values. Thus on the next call, the probability of finding an error in those registers after the call is complete should be roughly the probability of an Algorithm 2 AEM Grover Minimization 1: Allocate QRegister |ψG of size m 2: Allocate QRegister |ψ1 of size n 3: Allocate QRegister |ψ2 of size n 4: v best =← v, x best ← 0, good ← true 5: c1 ← 0, c2 ← 0, t ← 1 6: while c1 < α |G| AND c2 < δ|G| do 7: if good then 8: p ←rand(0, t − 1) (Grov) |ψG |ψ1 |ψ2 16: For a single call to Grov, we expect the probability of success to decay exponentially in the depth of Grov as measured in some hardware-dependent coherence time t (which might be the average between the T 1 and T 2 parameters discussed below), which is to say, Prob (Grov succeeds without AEM) Suppose AEM does not detect a fraction f of errors. Then, Prob (Grov succeeds) This allows us to give a general scaling for the coherence time to maintain the value of s, where the last expression is the asymptotic form for large p ∼ |G|. Thus AEM leads to a constant factor reduction in the required qubit lifetime.

C. Simulation of Error Mitigation Strategies
To simulate noisy hardware, we used the error model included in the Intel-QS package which is based upon the Pauli-twirling approximation error model [16]. In this model, before a gate is applied, a random single qubit rotation is applied to each qubit acted on by that gate. The error unitary is given by where X, Y, Z are the single-qubit Pauli operators. v x , v y and v z are parameters chosen at random from a Gaussian distribution whose variance grows with the time from the last gate action in units of the hardware dependent parameters T 1 , T φ and T 2 respectively. As X, Y and Z are dependent on one another, the parameters are related by Because T 1 is associated with the X Pauli operator which flips the computational state, we can think of T 1 as the "bit-flip" error rate. Likewise, T 2 is associated with the Z Pauli operator which applies a π phase, so we can think of this as the "phase-flip" error rate. To accurately accommodate for this nondeterministic, measurement-based algorithm, some modifications had to be made to the Intel-QS. See Appendix C for details. Simulations for both SEM and AEM are shown in Fig. 11 for log |G| = 4, 5 where we have fixed either T 1 or T 2 to be a large, effectively infinite constant and varied the other. This allows us to determine the effect of each kind of error. T 1 and T 2 are measured in units of the single-qubit gate time (SQGT); see Appendix C for details.
In terms of bit-flip error, we can see that AEM does protect the rate parameter over the values of T 1 shown in Fig. 11a and 11b as evidenced by the flatness of the curves for AEM, T 2 = ∞. However, AEM only partially protects the rate parameter against phase-flip error. This should not be surprising as phase error would persist even after the projection due to measurement at the end of a call to Grov. That is, phase error tends to accumulate in the superposition of the group register and is not corrected by the AEM strategy. Still looking at the SEM results, we see that the algorithm is altogether less susceptible to phase-flip error.
The protection of the rate parameter by AEM is important as it means our choice of the oracle budget parameter is less dependent on knowing the rate of error. However, the rate parameter is no-longer directly proportional to the run-time of the algorithm as errored calls to Grov are not counted against the oracle budget. Thus we have to evaluate whether the total run-time is better or worse under AEM, which not only includes the errored calls, but also includes the time to perform the measurements. Fig. 11c and 11d plots the average run-time as a function of either T 1 or T 2 for a fixed, large value of the other parameter. By average run-time, we mean the average over all trials of the total run-time (to find the correct answer) of the quantum computation cycles of the algorithm, including all measurements and gates, in units of the SQGT. This does not include time to perform the classical computation cycles of the algorithm 6 . From this figure, we see that AEM does not bloat the run-time for bit-flip error and as desired, significantly decreases the run-time for small T 1 times. It also only adds a modest, roughly constant increase for phase-flip error. Note that for higher T 1 and T 2 there is a cross-over where SEM has a smaller average run-time. This is due to the additional time needed to perform the measurements, which is only a constant time increase for each call to Grov.  From this analysis, we see that AEM is always preferred over SEM as it both protects the rate parameter and decreases the run-time except for when coherence times are sufficiently high, in which case its cost is only a constant for each call to Grov.

D. Reducing Phase-flip Error
AEM is effective against bit-flip error, but less so for phase-flip error. Even though the algorithm is less susceptible to this kind of error, it is worth considering a method for reducing phase-flip error. This can be achieved using simple fault-tolerant methods. As we are only looking to correct one channel of error, we can use simple, essentially classical fault-tolerant error-correcting codes such as a repetition code [17]. It should be sufficient to use an errorcorrecting code on the group register only to reduce the qubit overhead. With enough physical qubits to form robust logical qubits, we could achieve an effective T 2 ∼ ∞ in which case AEM should fully protect the rate parameter.

E. Simulation for Realistic Hardware
AEM Gmin requires interaction between quantum and classical instructions, but unlike similar hybrid computations such as decoding an error-correcting code or variational eigensolver (VQE), the classical computation cycles are simple and should not take a significant amount of time between coherent quantum steps. Thus AEM Gmin could stand as a good test of real-time hybrid quantum-classical computation. For this reason, we simulate AEM Gmin with realistic T 1 , T 2 times using the addition group of sizes n = log |G| = 4, 5 and 6. To increase the chances of a successful run, we use the maximum number of ancilla qubits to reduce the depth of the circuit. So the total qubits used is 3n+(n−2) = 4n−2, or 14, 18 and 22, respectively, for our cases. Methods for using the ancilla to reduce the depth are give in Appendix B. We used T 1 = T 2 = 700 SQGTs which are extracted from Ref. [18] for superconducting qubits. Fig. 12 plots the rate parameter and average runtime for AEM Gmin as well as SEM Gmin and no noise Gmin which are included for comparison. For these realistic hardware parameters, we see that the rate parameter is well-protected by AEM, and the increase in run-time over no-noise conditions is still within reason, whereas the time for SEM is beyond a reasonable run-time. When observing the simulation in real-time, we recognize for n = 6 the probability of failure for a single oracle call is high, implying that a test of any larger groups would require an increase in the T 1 and T 2 times as argued in Section IV B.

V. CONCLUSIONS
In this work, we have identified a new application for the Grover minimization algorithm, and provided a full quantum solution for the problem. Since Grover's search often comes with the caveat of not having an efficiently implementable oracle, our work is notable for finding a practical use for Grover's algorithm as the oracle is expected to scale polylogarithmically with the size of the group. We have discussed both the structure of the algorithm and refinements to the original version, as well as a full gate decomposition for the simplest group given by modular addition. We discussed how we can leverage the intermediate measurement steps to mitigate the effects of error, increasing the likelihood of the algorithm being useful in the NISQ era.
In addition to being a sub-routine in classical exact diagonalization, our algorithm could also be called by a larger quantum algorithm which is performing a simulation of a many-body quantum system using symmetry-adapted basis states.
The algorithm discussed is far more general than what has been presented here. We achieve a reasonably sized oracle by leveraging the structure of the group, whereas the unstructured nature of the search is encapsulated in the arbitrary labeling of positions/basis states. Similarly, we can envisage using Gmin to find/prepare the ground state of some Hamiltonian. In such a case, one leverages the structure of Hamiltonian dynamics by replacing the group action operator with phase estimation. We hope to explore this more in future work.
The error mitigation scheme we have designed is also likely to be generally applicable to oracles using ancilla qubits, and thus could be used in a much wider context to improve the accuracy of quantum oracles.
Let N be the size of the database. We follow the exact method used in Ref. [5] but use a tighter bound from Ref. [10] for the average number of oracle calls for Gsun to find the solution to a search among k marked elements. In particular, we use the exact expression for number of calls to reach the critical stage of the algorithm, with which we achieve a bound for Gsun of 9 ( whereas Ref. [5] used 9 2 N k ). Taking Lemma 1 from Ref. [5] for granted, we follow the procedure for Lemma 2 using this tighter bound to find that the average number of oracle calls to reach the minimum is bound above by To reduce the cost of PhComp, we avoid recalculating the AND of (continue) bits i.e. remove the multi-control Z gates. This is done by storing the AND between two (continue) bits in an ancilla initialized in the zero computational state. We then pass this down the circuit as shown in Fig. 13. The most significant and least significant bits do not benefit from having an ancilla, so we can use any number of ancilla up to log |V | − 2. For the maximum number, the cost of PhComp goes as C(PhComp) ∼ O(log |V |). A similar method can be used to reduce the cost ofĜ N add as shown in Fig. 14. With the maximum number of ancilla, which is again log |V | − 2, this reduces the cost of G N add to C(Ĝ N add ) ∼ O(log |G|). The resulting adder is on par with the ripple-carry adder from Ref. [12], but uses far more qubits. We include our version here to demonstrate that ancilla can be useful for reducing the group action operator. Furthermore, the ancilla can be shared between PhComp and the group action operator and measured along with the position registers in the AEM scheme. This was done for the data in Fig. 12.

Appendix C: Details of the Noisy Simulation
In this appendix, we discuss some of the details of the noisy simulation. Relative gate times are extracted from Ref. [18] which uses data for superconducting qubits. All single qubit gate times (SQGT) are assumed to be equal and all other simulation times are measured in units of this time. All twoqubit gates are assumed to be twice the SQGT and all gates are decomposed into one-and two-qubit gates. The Intel-QS does not have a feature to simulate measurements, so our source code has been altered to include measurement simulation capabilities. A Mersenne twist random number generator is added specifically to simulate the probabilistic na-  circuit which uses two additional ancilla to reduce the number gates needed for implementation. We show the inverse as the method for using the ancilla is more clear. G 2 n add is then given by reversing the order of these gates.
ture of quantum measurement. Furthermore, a measurement time of 10 SQGTs is added to simulate the accumulation of error that would occur in a real system while a measurement is being performed. We do not consider the possibility of error in the measured value as compared to the resulting quantum state though this is an important source of error to consider in a real system. Finally, the method by which the Intel-QS accounts for the time between gate action has been altered to include parallelization. A sequence of gates with disjoint support on the qubits is assumed to be applied in parallel in which case time is only incremented by the largest gate time in that sequence. No error is accumulated during classical computation cycles, though this is an important source of error to consider for real systems. All these considerations are used to calculate the total run-time for a single trial of Gmin. We also note that currently we do not make use of a compiler to reduce the number of gates. Therefore the error rates for all simulations are higher than they would be if we used such an optimizing software.