Hamiltonian simulation algorithms for near-term quantum hardware

The quantum circuit model is the de-facto way of designing quantum algorithms. Yet any level of abstraction away from the underlying hardware incurs overhead. In this work, we develop quantum algorithms for Hamiltonian simulation "one level below” the circuit model, exploiting the underlying control over qubit interactions available in most quantum hardware and deriving analytic circuit identities for synthesising multi-qubit evolutions from two-qubit interactions. We then analyse the impact of these techniques under the standard error model where errors occur per gate, and an error model with a constant error rate per unit time. To quantify the benefits of this approach, we apply it to time-dynamics simulation of the 2D spin Fermi-Hubbard model. Combined with new error bounds for Trotter product formulas tailored to the non-asymptotic regime and an analysis of error propagation, we find that e.g. for a 5 × 5 Fermi-Hubbard lattice we reduce the circuit depth from 1, 243, 586 using the best previous fermion encoding and error bounds in the literature, to 3, 209 in the per-gate error model, or the circuit-depth-equivalent to 259 in the per-time error model. This brings Hamiltonian simulation, previously beyond reach of current hardware for non-trivial examples, significantly closer to being feasible in the NISQ era.


Supplementary Methods The Sub-Circuit Model and Error Models
In this section we introduce the sub-circuit model, which we employ throughout this paper. We analyse it under two different error models; these respective models are applicable to NISQ devices with differing capabilities. Before defining these we introduce the mathematical definition of the sub-circuit model.
Definition 1 (Sub-circuit Model). Given a set of qubits Q, a set I ⊆ Q × Q specifying which pairs of qubits may interact, a fixed two qubit interaction Hamiltonian h, and a minimum switching time t min , a sub-circuit pulse-sequence C is a quantum circuit of L pairs of alternating layer types C = L l U l V l with U l = i∈Q u l i being a layer of arbitrary single qubit unitary gates, and V l = ij∈Γ l v ij t l ij being a layer of non-overlapping, variable time, two-qubit unitary gates: v ij (t) = e ith ij (1) with the set Γ l ⊆ I containing no overlapping pairs of qubits, and t ≥ t min . Throughout this paper we assume h ij = Z i Z j . As all σ i σ j are equivalent to Z i Z j up to single qubit rotations this can be left implicit and so we take h ij = σ i σ j .
The traditional quantum circuit model measures its run-time in layer count. This also applies in the sub-circuit-model.
or simply the circuit depth.
However, unlike the traditional quantum circuit model, the sub-circuit-model also allows for a different run-time metric for any given circuit C. Depending on the details of the underlying hardware, it can be appropriate to measure run-time as the total physical duration of the two-qubit interaction layers. This is justified for many implementations: for example superconducting qubits have interaction time scales of ∼ 50 − 500ns [1], while the single qubit energy spacing is on the order of ∼ 5Ghz, which gives a time scale for single qubit gates of ∼ 0.2ns. The run-time is normalised to the physical interaction strength, so that |h| = 1.

4
For both run-time and circuit depth we assume single qubit layers contribute a negligible amount to the total time duration of the circuit and we can cost standard gates according to both metrics as long as they are written in terms of a sub-circuit pulse-sequence. For example, according to Supplementary Definition 3 a CNOT gate has T cost = π/4 as it is equivalent to e −i π 4 ZZ up to single qubit rotations. How does this second cost model affect the time complexity of algorithms? I.e., given a circuit C, does T cost (C) ever deviate so significantly from C's gate depth count that the circuit would have to be placed in a complexity class lower? Under reasonable assumptions on the shortest pulse time we prove in the following that this is not the case. Proof. Clear since m(x) = O(T cost (C)/δ 0 (x)). The second claim is trivial.
An immediate consequence of using the cost model metric and the overhead of counting gates from Supplementary Remark 4 can be summarised as follows. Indeed, we can take this further and show that complexity classes like BQP are invariant under an exchange of the two metrics "circuit depth" and "T cost "; if e.g. δ 0 (x) = 1/ poly x, then again invoking Solovay-Kitaev lets one upper-bound and approximate any circuit while only introducing an at most poly-logarithmic overhead in circuit depth. However, a stronger result than this is already known, independent of any lower bound on pulse times, which we cite here for completeness.
Remark 6 (Poulin et. al. [2]). A computational model based on simulating a local Hamiltonian with arbitrarily-quickly varying local terms is equivalent to the standard circuit model.

Sub-Circuit Synthesis of Multi-Qubit Interactions Analytic Pulse Sequence Identities
In this section we introduce the analytic pulse sequence identities we use to decompose local Trotter steps e −iδh . Their recursive application allows us to establish, that for a k-qubit Pauli interaction h, there exists sub-circuit pulse-sequence C := L l U l V l which implements the evolution operator e −iδh . Most importantly, for any target time δ ≥ 0 the run-time of that circuit is bounded as according to the notion of run-time established in Supplementary Definition 3. For k = 2 n + 1 where n ∈ Z as noted by [3], this can be done inexactly using a well know identity from Lie algebra. For Hermitian operators A and B we have We make this exact for all t ∈ [0, 2π] for anti-commuting Pauli interactions in Supplementary Lemma 7, and use Supplementary Lemma 8 to extend it to all k ∈ Z.
Proof. Follows similarly to Supplementary Lemma 27.
with pulse times t 1 , t 2 , ϕ given by where c ∈ Z, and corresponding signs are taken in the two expressions.
Proof. Follows similarly to Supplementary Lemma 27.

Pulse-Time Bounds on Analytic Decompositions
In our later analysis we apply these methods to the interactions in the Fermi-Hubbard Hamiltonian. Depending on the fermionic encoding used, these interaction terms are at most 3-local or 4-local. Supplementary We have labelled the functions t i (t) from Supplementary Lemma 7 as t a i (t) in order to distinguish them from those given in Supplementary Lemma 8, which are now labelled t b i (t). This is to avoid confusion when using both identities in the one circuit, such as in circuit C b where we use Supplementary Lemma 7 to decompose the remaining 3-local gates. The exact run-time of the circuit C b -defined in Supplementary Figure 3b -is left in terms of T cost (C a ) and again follows directly from Supplementary Definition 3 as Supplementary Lemmas 9 and 10 bound these two functions and determine the optimal choice of the free pulse-time ϕ. Inserting these bounds into the above T cost expressions gives The pulse times t a i (t) are defined in Supplementary Lemma 7 and the run-time is bounded as Supplementary Figure 3: The definitions of circuits C a (t) and C b (t) -which respectively generate evolution under a three and four local Pauli interaction for target time t ≥ 0.
As Z ⊗k is equivalent to any k-local Pauli term up to single qubit rotations, these bounds hold for any three or four local Pauli interactions.
Lemma 9. Let H be as in Supplementary Lemma 7. For 0 ≤ t ≤ π/2, the pulse times t i in Supplementary Lemma 7 can be bounded by Proof. Choosing the negative t 1 (t) and corresponding t 2 (t) solution from Supplementary Lemma 7 and Taylor expanding about t = 0 gives Basic calculus shows that t 1 is always negative and t 2 is always positive for 0 ≤ t ≤ π/2, thus 8 Then it can be shown that the Taylor remainders R 1 and R 12 are positive and negative, respectively, giving the stated bounds.
Lemma 10. Let H be as in Supplementary Lemma 8. For 0 ≤ t ≤ t c and ϕ = (ct) 1/3 , the pulse times t i in Supplementary Lemma 8 can be bounded by where c = 1 4 (3 + 2 √ 2) and t c ∼ 0.33.
Proof. This follows similarly to Supplementary Lemma 9. We choose the positive branch of the ± solutions for pulse times with t 1 (t) and t 2 (t) given in Supplementary Lemma 8, and freely set ϕ = (ct) 1/3 for some positive constant c ∈ R. Within the range 0 ≤ t ≤ t c we have real pulse times t 1 ≤ 0 and t 2 ≥ 0. We can then Taylor expand the following about t = 0 to find Choosing c to minimise the first term in this expansion, and again showing that R ≤ 0, leads to the stated result where ϕ = (ct) 1/3 and c = 1 4 (3 + 2 √ 2). This is valid only with the region 0 ≤ t ≤ t c where t c ≈ 0.33.
Theorem 11. For a set of qubits Q, a set I ⊆ Q × Q specifying which pairs of qubits may interact, and a fixed two qubit interaction Hamiltonian h ij , if H is a k-body Pauli Hamiltonian then the following holds: For all t there exists a quantum circuit of L pairs of alternating layer types C = L l U l V l with U l = i∈Q u l i being a layer of arbitrary single qubit unitary gates, and V l = ij∈Γ l v ij t l ij being a layer of non-overlapping, variable time, two-qubit unitary gates v ij (t) = e ith ij with the set Γ l ⊆ I containing no overlapping pairs of qubits such that C = e itH and where 9 Proof. The proof of this claim follows from first noting that for any t < 0 one can conjugate e −itH by a single Pauli operator which anti-commutes with H in order to obtain e itH . Therefore we can consider w.l.o.g. we can take t > 0 as we have done up until now. The sub-circuit C which implements e itH is constructed recursively using the Depth 5 decomposition. We note that the Depth 5 decomposition has an important feature. The free choice of ϕ allows us to avoid incurring a fixed root overhead with every iterative application of this decomposition. That is when using it to decompose any e itZ ⊗k , we can always choose h 1 as a 2-local interaction and h 2 as a (k −1)-local interaction. We can choose ϕ ∝ t 1 k−1 and a similar analysis as in Supplementary Lemma 10 will show that this leaves the remaining pulse-times as t i ∝ t 1− 1 k−1 . This can be iterated to decompose the remaining gates, all of the form of evolution under (k − 1)-local interactions for times ∝ t 1− 1 k−1 . At each iteration we choose to h 1 as a 2-local interaction and ϕ ∝ t

Optimality
An obvious question to ask at this point is whether the proposed decompositions are optimal, in the sense that they minimise the total run-time T cost while reproducing the target gate h exactly. A closely related question is then whether relaxing the condition that we want to simulate the target gate without any error allows us to reduce the scaling of T cost with regards to the target time δ.
In this section we perform a series of numerical studies which indicate that the exact decompositions described in this section are indeed optimal within some parameter bounds, and that relaxing the goal to approximate implementations gives no benefit.
The setup is precisely as outlined in before: for U target = exp(iT Z ⊗k ) for some locality k > 1 and time T > 0, we iterate over all possible gate sequences of width k and length n, the set of which we call U n,k . For each sequence U ∈ U n,k , we perform a grid search over all parameter tuples (t 1 , . . . , t n ) ∈ [−π/2, π/2] n and δ ∈ [0, π/10], and calculate the parameter tuple (ϵ(U ), T cost ), where T cost is given in Supplementary Definition 3, and The results are binned into brackets over (δ, T cost ) ∈ [π/10, nπ/2] and their minimum within each bracket is taken. This procedure yields two outcomes: 1. For each target time δ and each target error ϵ > 0, it yields the smallest T cost , depth n circuit with error less than ϵ, and 2. for each target time δ and each T cost , the smallest error possible with any depth n gate decomposition and total pulse time less than T cost .
This algorithm scales exponentially both in k and n, and polynomial in the number of grid search subdivisions. The following optimisations were performed. 1. We remove duplicate gate sequences under permutations of the qubits (since U target is permutation symmetric).
2. We restrict ourselves to two-local Pauli gates, since any one-local gate can always be absorbed by conjugations, and 3. We remove mirror-symmetric sequences (since Paulis are self-adjoint).
4. For n > 4 we switch to performing a random sampling algorithm instead of grid search, since the number of grid points becomes too large.
Results for k = 3 and n = 3, 4, 5 are plotted in Supplementary Figures 4 and 5. As can be seen (plotted as red line), for n = 3 the optimal zero-error decomposition has T cost = π + δ from CNOT conjugation. For n = 4, the optimal decomposition is given by the implicitlydefined solution in Supplementary Lemma 7, with a T cost ∝ √ δ dependence. For the depth 5 sequences, it appears that the same optimality as for depth 4 holds. In contrast to n = 3 and n = 4, there is now a zero error solution for all T cost greater than the optimum threshold.

Suzuki-Trotter Formulae Error Bounds Existing Trotter Bounds
Trotter error bounds have seen a spate of dramatic and very exciting improvements in the past few years [4,5,6]. However, among these recent improvements we could not find a bound that was exactly suited to our purpose. We wanted bounds which took into account the commutation relations between interactions in the Hamiltonian, as we know this leads to tighter error bounds [5] [4]. However, we needed exact constants in the error bound when applied to 2D lattice Hamiltonians, such as the 2D Fermi-Hubbard model. For this reason we could not directly apply the results of [5] which only explicitly obtains constants for 1D lattice Hamiltonians.
Additionally, we needed to be able to straightforwardly compute the bound for any higher order Trotter formula. This ruled out using the commutator bounds of [4] as they become difficult to compute at higher orders. Furthermore these bounds require each Trotter layer to consist of a single interaction, meaning we wouldn't be able to exploit the result of Supplementary Theorem 23.
We followed the notation and adapted the methods of [5] to derive bounds that meet the above criteria. Additionally we incorporate our own novel methods to tighten our bounds in Supplementary Corollaries 16 and 22 and Supplementary Theorem 23.
The authors of [5] have recently extended their work further in [6]. We have not yet seen whether they will further tighten our analysis, though we are keen to do this in future work.

Hamiltonian Simulation by Trotterisation
In this section we derive our bounds for Trotter error. The standard approach to implementing time-evolution under a local Hamiltonian H = i h i on a quantum computer is to "Trotterise" the time evolution operator U (T ) = e −iHT . Assuming that the Hamiltonian breaks up into M mutually non-commuting layers and then implementing the approximation P 1 (δ) T /δ as a quantum circuit. Here R 1 (T, δ) denotes the error term remaining from the approximate decomposition into a product of individual terms. R 1 (T, δ) := U (T ) − P 1 (δ) T /δ is simply defined as the difference to the exact evolution U (T ). For M mutually non-commuting layers of interactions H i , we must perform M sequential layers per Trotter step. Supplementary Equation (33) is an example of a first-order product formula, and is derived from the Baker-Campbell-Hausdorff identity e A+B = e A e B e [A,B]/2 · · · and e A+B = e (δA+δB)/δ = e δA+δB 1/δ .
Choosing δ small in Supplementary Equation (33) means that corrections for every factor in this formula come in at O δ 2 i.e. in the form of a commutator, and since we have to perform 1/δ many rounds of the sequence e δA e δB the overall error scales roughly as O (δ). Since its introduction in [7], there have been a series of improvements, yielding higherorder expansions with more favourable error scaling. For a historical overview of the use of Suzuki-Trotter formulas in the context of Hamiltonian simulation, we direct the reader to the extensive overview given in [8, sec. 2.2.1]. In the following, we discuss the most recent developments for higher order product formulas, and analyse whether they yield an improved overall time and error scaling with respect to our introduced cost model.
To obtain higher-order expansions, Suzuki et. al. derived an iterative expression for product formulas in [9,10]. For the (2k) th order, it reads [5] where the coefficients are given by a k := 1/ 4 − 4 1/(2k−1) . The product limits indicate in which order the product is to be taken. The terms in the product run from right to left, as gates in a circuit would be applied, so that L j=1 A j = A L · · · A 1 .

Error Analysis of Higher-Order Formulae
We need an expression for the error R p (T, δ) arising from approximating the exact evolution U (T ) by a p th order product formula P p (δ) repeated T /δ times. As a first step, we bring the latter into the form: As before, M denotes the number of non-commuting layers of interactions in the local Hamiltonian. S = S p is the number of stages; the number of P p,j (δ) in a p th order decomposition from Supplementary Equation (35) , so that a second order formula is composed of 2 stages. (40) Proof. The first claim is obviously true for the first order formula in Supplementary Equation (33). For higher orders, by [5,Th. 3] and Supplementary Equation (33), we have that the first derivative Similarly, from Supplementary Equations (35) and (36), we have that Equating both expressions for the first derivative of P p (x) at x = 0 and realising that they have to hold for any H i yields the claim.
The second claim is again obviously true for a first order expansion, and follows immediately from Supplementary Equation (35) for p = 2. Expanding Supplementary Equation (36) for P 2k (δ) all the way down to a product of P 2 terms, the argument of each of the resulting factors will be a product of k − 1 terms of a k ′ or 1 − 4a k ′ for k ′ ≤ k. We further note that for k ≥ 2, |a k | ≤ |1 − 4a k |, as well as |a k | ≤ 1/2 and |1 − 4a k | ≤ 2/3, which can be shown easily. The b ji can thus be upper-bounded by B p , which in turn is upper-bounded by (1/2) (2/3) k−1 -where the final factor of (1/2) is obtained from the definition of P 2 .
Since we are working with a fixed product formula order p for the remainder of this section, we will drop the order subscript in the following and write P p = P, R p = R for simplicity. Assuming ∥H i ∥ ≤ Λ for all i = 1, . . . , M , and setting the error we can derive an expression for the p th order error term. First, note that approximation errors in circuits accumulate at most linearly in Supplementary Equation (43). Thus it suffices to analyse a single δ step of the approximation, i.e. U (δ) = P (δ) + R (δ, δ). Then We will denote ϵ p (δ) simply by ϵ in the following.
To obtain a bound on P (δ), we apply the variation of constants formula with the condition that P (0) = I, which always holds. As in [5, sec. 3.2], for δ ≥ 0, we obtain where the integrand R (τ ) is defined as Now, if P (δ) is accurate up to p th order -meaning that R (δ) = O δ p+1 -it holds that the integrand R (δ) = O (δ p ). This allows us to restrict its partial derivatives, as the following shows.
Lemma 13. For a product formula accurate up to p th order -i.e. for which Proof. We note that R (δ) is analytic, which means that we can expand it as a Taylor series R (δ) = ∞ j=0 a j δ i . We proceed by induction. If a 0 ̸ = 0, then clearly R (0) ̸ = 0, which contradicts the assumption that R (δ) is accurate up to p th order. Now assume for induction that ∀j < j ′ < p − 1 : a j = 0 and a j ′ ̸ = 0. Then which again contradicts that R (0) = O (δ p ). The claim follows.
Performing a Taylor expansion of R (τ ) around τ = 0, the error bound ϵ given in Supplementary Equation (44) simplifies to Further by Supplementary Lemma 13 all but the p th or higher remainder terms S (τ, 0) equal zero, so where we used the integral representation for the Taylor remainder S (τ, 0). Motivated by this, we look for a simple expression for the p th derivative of the integrand R (τ ), which capture this in the following technical lemma.
Lemma 14. For a product formula accurate to p th order, having S = S p stages for M non-commuting Hamiltonian layers with the upper-bound ∥H i ∥ ≤ Λ, the error term R (τ ) satisfies Proof. We first express P (τ ) from Supplementary Equations (37) Then the (p + 1) th derivative of this with respect to τ is where α is a multiindex on Σ, and |α| = I∈Σ α I . Following standard convention, the multinomial coefficient for a multiindex is defined as We can similarly express H with the same index set σ, and as a derivative of U via where we used the fact that S j=1 b ji = 1 by Supplementary Lemma 12, and the exponential expression of U I from Supplementary Equation (38). Now we can combine Supplementary Equations (54) and (56) as in Supplementary Equation (51) to obtain the p th derivative of the integrand R (τ ): Noting that ∥U We can therefore bound the norm of R (p) as follows: By Supplementary Lemma 12, we know that |b ji | = 1 when p = 1 and |b ji | ≤ (2/3) p/2−1 /2 for all j, i when p = 2k for k ≥ 1. Hence for p = 1 and for p = 2k for k ≥ 1 where C p (S, M ) is the sum of the multinomial coefficients of length p ∈ N; a simple expression can be obtained by reversing the multinomial theorem, since To obtain the final error bounds, we combine Supplementary Lemma 14 with the integral representation in Supplementary Equation (51).
Theorem 15 (Trotter Error). For a p th order product formula P p for p = 1 or p = 2k, k ≥ 1, with the same setup as in Supplementary Lemma 14, a bound on the approximation error for the exact evolution U (T ) with T /δ rounds of the product formula P p (δ) is given by Proof. We can use the bound on R (p) derived in Supplementary Lemma 14 and perform the integration over τ and x in Supplementary Equation (51), to obtain We remark that tighter bounds than the ones in Supplementary Theorem 15 are achievable for any given product formula, where the form of its coefficients b ji are explicitly available and not merely bounded as in Supplementary Lemma 12. Summing up these stage times exactly is therefore an immediate way to obtain an improved error bound. Furthermore, the triangle inequality on ∥R (p) (τ ) ∥ in the proof of Supplementary Lemma 14 is a crude overestimate: it looses information about (i). terms that could cancel between the two multiindex sums, and (ii.) any commutation relations between the individual Trotter stages.
In the following subsection, we will provide a tighter error analysis, featuring more optimal but less clean analytical expressions which we can nonetheless evaluate efficiently numerically.

Explicit Summation of Trotter Stage Coefficients
For the recursive Suzuki-Trotter formula in Supplementary Equation (36) we can immediately improve the error bound by summing the stage coefficients b ij up exactly, instead of bounding them as in Supplementary Lemma 12.
Corollary 16 (Trotter Error). For the recursive product formula in Supplementary Equation (36) and p = 2k for k ≥ 1, Proof. This follows from explicitly summing up the magnitudes of all the b ji 's obtained by solving the recursive definition of the product formula, which can easily be verified to satisfy Then from Supplementary Lemma 14, and the claim follows as before.
For later reference, we note that it is straightforward to generalise the error bound in Supplementary Corollary 16 for the case of a higher derivative R (q) , q ≥ p, but still for a p th order formula: the bound simply reads

Commutator Bounds
Our analysis thus far has completely neglected the underlying structure of the Hamiltonian. In this subsection we establish commutator bounds which are easily applicable to D-dimensional lattice Hamiltonians. We begin with the following technical lemmas.
Lemma 17. For a product formula accurate to p th order, having S = S p stages for M non-commuting Hamiltonian layers with the upper-bound ∥H i ∥ = Λ I , the error term R (τ ) satisfies Proof. As shown in Supplementary Lemma 14, We begin by commuting every U (1) (τ ). Consider this for some fixed J in the sum of over J. That is consider rewriting a particular summand from the second term above to obtain Now, by inserting this into the full expression for R (p) (τ ), we obtain Taking the norm of this expression gives This completes the proof.
Lemma 18. If every pair of Hamiltonians can be written as where for any i we have ∥h I i ∥ = ∥h J i ∥ = 1 and for any fixed term h J there are at most n terms in H I which do not commute with that specific term, then 20 Proof. First note that and Consider a fixed term in U (1) where we have dropped the subscript i. As there are N of these, we can bound the norm of the commutator as follows Where we have also used the triangle inequality and the fact that b J ≤ B p . Now consider fully expanding the U Here we have assumed that if any of the n non-commuting terms appear at any point in the expansion (the n), then that term will not commute with h J regardless of whatever other terms appear (the N β I −1 ). We can over-count by repeating this for each term expanded (the β I ). This gives The extra factor of 2 comes from bounding the commutators of the norm 1 Hamiltonians via triangle inequality.
Proof. We must obtain a simplified form for the bounded commutator appearing in Supplementary Lemma 17. We can sequentially expand this commutator and use the triangle inequality to write it as We can use Supplementary Lemma 18 to bound the first term. The commutator in the second term can be bounded as follows: Where we have used Supplementary Lemma 18 to bound the norm of the commutator of U Now by using this to bound the result of Supplementary Lemma 17 we obtain To simplify this expression, we must simplify an expression of the form where in our case x = N/Λ. This can be done by rewriting this expression in terms of a derivative with respect to x and reversing the multinomial theorem, which gives Using this and performing the summation over J and I simplifies the expression for R (p) (τ ) to Now we can use the preceding lemmas to establish a commutator bound for higher order Trotter formulae. Although it is cumbersome looking, it is easy to evaluate.
Assume that for any i, ∥h I i ∥ = ∥h J i ∥ ≤ 1. Additionally, assume that for any fixed term h J there exist at most n terms h I which do not commute with h J .
Then, for a p th order product formula P p with p = 1 or p = 2k, k ≥ 1 used to approximate the evolution operator under H, the approximation error for the exact evolution U (T ) with T /δ rounds of the product formula P p (δ) is bounded by with Proof. The error formula for a single Trotter step is given by Supplementary Equation (51) as Evaluating this using Supplementary Lemma 19 and then substituting the resultant expression in ϵ p (T, δ) ≤ (T /δ)ϵ p (δ) gives the stated expression.
For later reference, we note that it is straightforward to generalise the error bound in Supplementary Theorem 20, by incorporating similar techniques to Supplementary Corollary 16 in order to sum up the |b ij | exactly, instead of simply bounding them by B p . Additionally, we can also generalise to the case of a higher derivative R (q) , q ≥ p, but still for a p th order formula: with these two generalisations the bound simply reads with where Proof. The expression for ϵ stems from Taylor-expanding Supplementary Equation (51) to order q instead of p, and integrating over τ . The last term ϵ q+1,p is then simply the overall remainder S, as before; and we can use Supplementary Equation (70) to obtain a bound on it. The bound on ∥R(0) (l) ∥ is an immediate consequence of Supplementary Equation (57), where we set τ = 0.
This allows us to calculate a numerical bound on ∥R(0) (l) ∥, by bounding ∥H i ∥ ≤ Λ and allowing terms within the two sums over α and β to cancel. The benefit of this approach is that it is generically applicable to any given Trotter formula, and only depends on the non-commuting layers of H.

Proof. Follows immediately from Supplementary Lemma 21.
A selection of the series coefficients f (p, M, l) can be found in Supplementary Table 1. Supplementary Corollary 22 can then be applied in conjunction with e.g. the commutator error bound given in Supplementary Theorem 20 for the remaining term ϵ q+1 (δ, δ).

Spectral Norm of Fermionic Hopping Terms
Let a † and a be the standard fermionic creation and annihilation operators.
Theorem 23. Let Ω = {ij} be a set of pairs of indices such that no two pairs share an index. Define: Given a normalized fermionic state |ψ⟩ such that N |ψ⟩ = n |ψ⟩: Where M is the number of fermionic modes. This bound is tight.
Proof. Consider that h ij has eigenvalues in {−1, 0, 1}, since h 2 ij = (N i − N j ) 2 which has eigenvalues {0, 1}. Suppose there existed a normalised state |ψ⟩ such that N |ψ⟩ = n |ψ⟩ and H Ω |ψ⟩ = λ |ψ⟩ where |λ| > min(n, M − n, |Ω|) . Since H Y , N and all h ij are all mutually commuting, we may choose |ψ⟩ to be an eigenstate of all h ij wlog (by convexity). Then it must be the case that h 2 ij |ψ⟩ = |ψ⟩ for at least |λ| pairs ij, which implies that in the Fock basis |ψ⟩ = a |0 i , 1 j , ..⟩ + b |1 i , 0 j , ..⟩. Therefore for at least |λ| pairs ij ∈ Ω we have ⟨ψ| (N i + N j ) |ψ⟩ = 1. So ⟨ψ| N |ψ⟩ ≥ |λ| and M − ⟨ψ| N |ψ⟩ ≥ |λ|. If min(n, M − n, |Ω|) = n then ⟨ψ| N |ψ⟩ > n which is a contradiction. If min(n, M − n, |Ω|) = M − n then M − ⟨ψ| N |ψ⟩ > M − n which is a contradiction. If min(n, M − n, |Ω|) = |Ω| then |λ| > |Ω| which is a contradiction. This proves the bound. Now we need only show the bound is tight. Consider the following state: With Γ s composed of creation and annihilation operators which do not include i or j. This state is an eigenstate of h ij = a † i a j + a † j a i : Observe: 26 Consider a set of pairs of indices ω ⊆ Ω. Choose an ordering on ω and define with b a bit-string indexed by ij. Note that N ϕ b ω = |ω| ϕ b ω . We now argue that b can always be chosen such that: Choose a pair ij ∈ ω, the state ϕ b ω can be expressed as: With Γ s composed of creation and annihilation operators which do not include i or j. So Let us choose b ij such that ∆ ij = 1. Noting that ∆ ij is independent of ∆ pq when pq ̸ = ij we can do the same for all other b pq . This gives: Note that n = |ω| < |Ω| and |Ω| < M/2 and so the bound is shown to be tight in the case where min(n, M − n, |Y |) = n.
If we consider the case where min(n, M − n, |Ω|) = |Ω|, then we may always choose ω such that it is composed of a set of pairs of indices such that no two pairs share an index, and such that Ω ⊆ ω. In this case, by a similar argument Finally, in the case where min(n, M − n, |Ω|) = M − n one may choose the particle-hole symmetric state and a similar argument follows by particle hole symmetry.

Simulating Fermi-Hubbard via Sub-Circuit Algorithms Overview and Benchmarking of Analysis
In the following sections we primarily adopt the per-time error model and associated metric for costing circuits, Supplementary Definition 3. We first establish asymptotic bounds on the run-time T cost of performing a time-dynamics simulation of a 2D spin Fermi-Hubbard Hamiltonian using a p th -order Trotter formula with M = 5 Trotter layers, for a target time T and target error ϵ t . We perform this analysis for both the compact and VC encodings and the results are summarised in Supplementary Corollary 26.
We first want to compare using sub-circuit vs standard circuit decompositions in a pertime error model. We do this in conjunction with our Trotter bounds. To this end we establish the analytic bounds for the same simulation task, first using the standard conjugation method to generate evolution under higher weight interactions using only standard CNOT gates and single qubit rotations as opposed to a sub-circuit pulse sequence. We choose this method as it doesn't introduce any unfair and needless analytic error into the comparison. We decompose the Trotter steps into a standard gate set of CNOTs and singlequbit rotations which are gates of the form e ±iπ/4ZZ up to single qubit rotations. We cost this with the same metric using a per-time error model, but do not allow the comparison to contain any gates of the form e ±iδZZ as this would constitute a sub-circuit gate.
In this comparative analytic expression we still use our Trotter error bounds, so Supplementary Corollary 25 only serves to evaluate the impact of differing Trotter step synthesis methods on the asymptotic scaling of the run-time T cost in a per-time error model.
Later in this section we perform a tighter numerical analysis of both our proposal and our standard circuit model comparison. In these numerics we compare our Trotter bounds to readily applicable bounds from the literature [4, Prop. F.4.]. We point out that these bounds do not exploit the underlying structure of the Hamiltonian or make use of the recent advances of [5], [6]. However these bounds contained all constants, were applicable to 2D lattices and could easily be evaluated for arbitrary p and allowed us to make use of Supplementary Theorem 23. We were able to compare our bounds to [5] for the case of a simple 1D lattice and establish that our bounds are preferable for medium system sizes, not in asymptotic limits of system size, as was our intention in reformulating bounds for NISQ applications.
After this comparison in the framework of a per-time error model, we numerically analyse the impact of sub-circuit techniques in the per-gate error model, calculating the circuit depth of sub-circuit Trotter simulations. This is shown in Supplementary Figures 16 and 18. opposite spin. In terms of fermionic creation and annihilation operators the Hamiltonian for this system is

The Fermi-Hubbard Hamiltonian and Fermionic Encodings
where σ ∈ {↑, ↓} and the sum over hopping terms runs over all nearest neighbour fermionic lattice sites i and j. The interaction strengths are u and v and we assume that v = 1, and that they are bounded as |v|, |u| ≤ r. Before we proceed we have to choose how to encode this Hamiltonian in terms of spin operators. The choice of encoding has a significant impact on the run-time of the simulation. There are many encodings in the literature [12] but we will only analyse two, the Verstraete-Cirac (VC) encoding [13], and the recent compact encoding from [14]. We choose our encoding in order to minimise the maximum Pauli weight of the encoded interaction terms. Using the VC and compact encodings this is constant at weight-4 and weight-3 respectively. In comparison the Jordan-Wigner encoding results in a maximum Pauli weight of the encoded interaction terms that scales with the lattice size as O (L), the Bravyi-Kitaev encoding [15] has interaction terms of weight O(log L), and the Bravyi-Kitaev superfast encoding [15] results in weight-8.
The encodings require the addition of ancillary qubits as well as two separate lattices encoding spin up and spin down fermions. For VC 4L 2 qubits are needed to encode L 2 fermionic sites. In contrast compact requires (L − 1) 2 ancillary qubits and 2L 2 data qubits.  The layout of these ancillary qubits are indicated in Supplementary Figure 7. Note that we must also choose an ordering of the lattice sites. This is also indicated in Supplementary  Figure 7.
The two encodings map the Fermi-Hubbard Hamiltonian terms to interactions between qubits. In both encodings, on-site interaction terms become Only the encoded hopping terms differ. The exact expressions for hopping interactions depend on whether two nearest neighbour fermionic sites are horizontally or vertically connected on the lattice. The horizontally connected hopping terms are encoded as while the vertically connected hopping terms are encoded as In this notation i labels the data qubit for lattice site i and σ its spin lattice. Dashed indices such as i ′ refer to ancillary qubits. These are illustrated in grey in Supplementary  Figure 7. In the VC encoding there is an ancillary qubit for every site on each spin lattice. In compact these ancillary qubits are laid out in a checker-board pattern on the faces on each spin lattice. Here f ′ ij labels the ancillary qubit to i and j. There is also a sign determined by g(i, j) = 0, 1. The details of this can be found in [14]. These are not the only choices of Trotter layers. In a later Supplementary Method called "Regrouping Interaction Terms" we show that we can implement a p th order product formula with only 3 Trotter layers. We do this for the compact encoding only as it is particularly neat. This is despite grouping the interactions in a way where not all interactions within a layer commute with one another. A combination of this and the previous results still enable us to directly implement each layer without incurring any further analytic error.
The norm of these layers appears in the Trotter bounds we derive. We bound these as ∥H i ∥ ≤ Λ for all i. In Supplementary Theorem 23 it is shown that Λ can be related to fermion number and this fact is used to obtain tighter bounds on the Trotter error in the numerics we perform. We confine ourselves to a sector of 5 fermions in all our numerical calculations. We do this both for pragmatic reasons, since the Hilbert space dimension is just large enough to be classically hard, and because for a 5 × 5 lattice a roughly quarterfilling already implies interesting crossover phenomena appear [16,17]. We also leave Λ explicit in our analytic bounds so that we can explore different parameter regions in later work.  We proceed by obtaining bounds on the run-time of the most costly Trotter layer in each encoding. This expression depends on whether we use our methods (summarised in the main text in Supplementary Equations (11) and (12)) or the conjugation method (see Supplementary Equation (9) in the main text) to implement each Trotter step.
For the VC encoding the most costly layers will be the the vertical hopping layers H 3 and H 4 . As they are both a sum of disjoint terms which we assume can be performed in parallel this is simply given by the cost of implementing a single vertical hopping interaction. Remembering that the interaction strengths satisfy |v|, |u| ≤ r, we bound this as The second simplification follows from both terms in the interaction commuting, thus allowing them to be performed sequentially. The same is true for the compact encoding and so we have The final expressions now depend only on how we decompose local Trotter steps, either in terms of CNOT gates and single qubit rotations or using circuits such as those in Supplementary  (11) and (12) in the main text, respectively. As both of these expressions diverge as δ → 0 it is optimal to maximise δ with respect to an allowable analytic Trotter error ϵ t . This is captured in the following lemma which uses the simplest bounds on Trotter error established previously.
Lemma 24 (Optimal FH δ). For a target error rate ϵ t , the maximum Trotter step for a p th formula saturating the error bound in Supplementary Theorem 15 is Proof. Follows from Supplementary Theorem 15 by solving for δ.
Now we can obtain the final analytic bounds on the total run-time of simulating the Fermi-Hubbard Hamiltonian for each of these four cases.
Corollary 25 (Standard-circuit Minimised Run-time). If standard synthesis techniques are used to implement local Trotter steps in terms of CNOT gates and single-qubit rotations with an optimal Trotter step size δ 0 saturating Supplementary Lemma 24, the simulation cost for the Fermi-Hubbard Hamiltonian with a p th order Trotter formula with maximum error ϵ t is as follows and Proof. The proof follows by choosing δ such that the bound obtained in Supplementary Lemma 24 is saturated and substituting this and the respective expressions in Supplementary Figure 13 into Supplementary Equation (143).
Corollary 26 (Sub-Circuit Minimised Run-time). If sub-circuit synthesis techniques are used to implement local Trotter steps with an optimal Trotter step size δ 0 saturating Supplementary Lemma 24, the simulation cost for the Fermi-Hubbard Hamiltonian with a p th order Trotter formula with maximum error ϵ t is as follows

Trivial Stochastic Error Bound
So far we have only considered the unitary error introduced by approximating the real Hamiltonian evolution with a Trotterized approximation. However, in a near-term quantum device without error correction in place, we expect the simulated evolution to be noisy. We model the noise by interspersing each circuit gate in the product formula by an iid channel E; for simplicity we will assume that E is a single qubit depolarising channel E = N q , defined as for a noise parameter q ∈ [0, 1]. 1 Here I denotes the identity channel and T takes any state to the maximally mixed state τ = I/d. Figure 14 can then be found by just calculating the probability of no error occuring at all; disregarding the beneficial effects of a causal lightcone behind the observable O, and denoting with U := U † circ · U circ the clean circuit, and with U ′ the circuit saturated with intermediate errors, we get the expression

A trivial error bound for a circuit as in Supplementary
where V is the circuit's volume (i.e. the number of E interspersed in U ′ ). It is clear to see that this error bound asymptotically approaches 1, and does so exponentially quickly. Thus, to stay below a target error rate ϵ tar , a sufficient condition is that or alternatively Instead of assuming that each error channel E in Supplementary Figure 14 has the same error probability q, we can analyse the case where q is proportional to the pulse length of the preceding or anteceding gate; corresponding relations as given in Supplementary Equations (159) and (160) can readily be derived numerically.

Error Mapping under Fermionic Encodings
In [18], the authors analyse how noise on the physical qubits translates to errors in the fermionic code space. To first order and in the W3 encoding, all of {X, Y, Z} errors on the face, and {X, Y } on the vertex qubits can be detected. Z errors on the vertex qubits -as evident from the form of h on−site from Supplementary Equation (140) -result in an undetectable error; as shown in [18,Sec. 3.2], this Z error induces fermionic phase noise.
It is therefore a natural extension to the notion of simulation to allow for some errors occur -if they correspond to physical noise in the fermionic space. And indeed, as discussed more extensively in [18,Sec. 2.4], phase noise is a natural setting for many fermionic condensed matter systems coupled to a phonon bath [19,20,21,22,23,24,25] and [26,Ch. 6.1&eq. 6.17].
We further assume that we can measure all stabilisers (including a global parity operator) once at the end of the entire circuit. We could imagine measuring these stabilisers after each individual gate of the form e iδh i -where h i is any term in the Hamiltonian. However, as every stabiliser commutes with every term in the Hamiltonian the outcome of the stabiliser measurement is unaffected and so we need only measure all stabilisers once at the end of the entire circuit. It is evident that measuring the stabilisers can be done by dovetailing an at most depth 4 circuit to the end of our simulation -much like measuring the stabilisers of the Toric code. It is thus a negligible overhead to the cost of simulation T cost .
However, errors may occur within the decomposition of gates of the form e iδh i into single qubit rotations and two-qubit gates of the form e itZZ . The stabilisers do not generally commute with these one-and two-local gates. In spite of this we can commute a Pauli error which occurs within a decomposition past the respective gates that make up that decomposition. For example, consider h i = X 1 Z 2 X 3 . Then we would decompose e iδh i first via e iδX 1 Z 2 X 3 ≈ e iδ 1/2 X 1 X 2 e −iδ 1/2 Y 2 X 3 e −iδ 1/2 X 1 X 2 e iδ 1/2 Y 2 X 3 and then furthermore using identities such as e iδ 1/2 X 1 X 2 = H 1 H 2 e iδ 1/2 Z 1 Z 2 H 1 H 2 . Suppose a Pauli X error were to occur on the second qubit during one of these steps, say for example instead leading us to perform the circuit H 1 H 2 e iδ 1/2 Z 1 Z 2 X 2 H 1 H 2 e −iδ 1/2 Y 2 X 3 e −iδ 1/2 X 1 X 2 e iδ 1/2 Y 2 X 3 . By commuting the error X 2 to the end of the circuit we can see that this is still O(δ 1/2 ) close to performing Z 2 e iδX 1 Z 2 X 3 as HZ = XH, which is an error we can detect as previously discussed. As a Pauli Z error is equally likely to occur this doesn't introduce any noise bias. Therefore in cases where δ is sufficiently small, we can use error detection to reduce the fidelity requirements needed in the device by accounting for this additional error term. This is done numerically and shown in the dashed lines of Supplementary Figures 18 and 19.
This means that while the fermionic encodings do not provide error correction, they do allow error detection to some extent; we summarise all first order error mappings in Supplementary Table 2. This means we can numerically simulate the occurrence of depolarising noise throughout the circuit, map the errors to their respective syndromes, and classify the resulting detectable errors, as well as non-detectable phase, and non-phase noise.
This means we can analyse T cost with a demonstrably suppressed error, by allowing the non-detectable non-phase noise to saturate a target error bound. The resulting simulation is such that it corresponds to a faithful simulation of the fermionic system, but where we allow fermionic phase error occurs -where we emphasise that since detectable non-phase errors occur roughly with the same probability as non-detectable phase errors we know that, in expectation, only O(1) phase error occurs throughout the simulation; in brief, it is not a very noisy simulation after all.
The resulting required depolarising noise parameters for various FH setups we summarise in Supplementary Figures 16 to 19, and the resulting post-selection probabilities in Supplementary Figure 20.

Numeric Results
We can tighten the preceding analysis in several ways. First of, instead of crudely upper bounding the cost of individual gates we can sum these pulse times exactly. To this end, we use both the explicitly defined Trotter formulae coefficients h ij , and also the exact formulae for the pulse times derived previously.
Secondly, we can use tighter bounds for Trotter error, which take into account the commutation relations between pairs of interactions across Trotter layers H i , the coefficients defined in Supplementary Lemma 12. Additionally, we obtain a bound which rewrites the Trotter error as a Taylor series, and then bound the Taylor remainder using methods which  [18]. All but Z errors on the faces are detectable; the latter result in fermionic phase noise.
usually bound the total Trotter error; which bound is tighter depends on the order p of the formula, and the target simulation time. As explained in Supplementary Lemma 24, a tighter Trotter error allows us to choose a larger δ while achieving the same error ϵ t , reducing the total cost of the simulation. Finally, as explained in the Supplementary Method entitled "Trivial Stochastic Error Bound", a certain simulation circuit size will determine the amount of stochastic error present within the circuit. We assume that the depolarising noise precedes each Trotter layer; in the error-per-time model we assume that the noise parameter scales with the pulse length; in the more traditional error-per-gate model we assume that the noise is independent of the pulse length.
Furthermore, instead of taking the analytic Trotter bounds, we can also numerically calculate the optimal Trotter pulse length by calculating ϵ p (T, δ) from Supplementary Equation (43) explicitly, and maximizing δ till ϵ p saturates an upper target error rate. Naturally, this is computationally costly; so we perform this calculation for FH lattices up to a size 3 × 3, and extrapolate these numbers by qubit count to the desired lattice lengths up to 10 × 10; the dependence of δ 0 on the target time and number of qubits was extracted from the asymptotic Trotter bound in Supplementary Lemma 24, i.e.
for four fit parameters a 0 , a 1 , b 0 , b 1 ; all other dependencies are assumed constant. We show such numerically fitted data in Supplementary Figure 15; a similar analysis was made for Trotter orders p = 1, 2, 4, 6, and target error rates ϵ t = 0.1, 0.05, 0.01. These numerical bounds are much tigher than the analytical bounds, but can still be assumed to overestimate the actual error: the operator norm distance between Trotter evolution and true evolution likely overestimates the error that would occur if starting from a particular initialized state.
Supplementary Figure 15: Extraploated optimal Trotter error step size δ 0 . The data points stem from the simulation of FH lattices up to size 3 × 3, for a target error rate ϵ t = 0.1, Trotter order p = 2, and for times T = 0.5, 1.0, . . . , 15.0. The fitted surface follows the formula in Supplementary Equation (161).
We found that randomizing the Trotter layer order does not yield any advantage over keeping it fixed, and similarly we did not obtain an advantage by permuting the fixed order further. Supplementary Table 3 compares these numerics for the case of a lattice with L = 5 and for a sufficiently long time in terms of units set by the lattice spacing as we assume v = 1. So for any L we choose T = ⌊ √ 2L⌋. We choose an analytic error of ϵ t = 0.1 as there is no point making the analytic error smaller than the experimental error present in NISQ-era gates. As the compact encoding results in the smallest run-time we investigate how T cost varies with ϵ t for L = 3, 5 and 10 below. In these numerics we choose the order p which minimise T cost at each value of T .

Simulating Fermi-Hubbard with Three Trotter Layers Further Circuit Decompositions
In this section we show that we can actually simulate a 2D spin Fermi-Hubbard model with M = 3 Trotter layers as opposed to the previous M = 5. First we need to introduce another circuit decomposition in the same spirit as before.
Equating these and solving for t 1 and t 2 gives the expressions in the Supplementary Lemma.
We then need to establish the overhead associated with implementing this decomposition. We will see that for a target time t, the pulse times in Supplementary Lemma 27 are as t i (t) ∝ t.
Lemma 28. Let H = cos(θ)h 1 +sin(θ)h 2 be as in Supplementary Lemma 27. For 0 ≤ t ≤ π/2 and 0 < θ < π/2, the pulse times t 1 , t 2 in Supplementary Lemma 27 can be bounded by Feasible simulation time as a function of hardware error rate In this Appendix we analyse the simulation performance from the perspective of hardware error rate, rather than circuit depth. Specifically, we analyse the following question: given a maximum allowable simulation error ϵ tar and a given hardware noise rate q, for how long can we simulate a given Fermi Hubbard Hamiltonian? That is, what is the maximum simulation time T tar such that the total simulation error remains below the target threshold, ϵ ≤ ϵ tar ? In Supplementary Table 5 we consider this question for an L = 5 Fermi Hubbard model with target error ϵ tar = 10%. When error mitigation is unused the only contribution to the total error ϵ is from Trotter error ϵ t and error due to stochastic noise ϵ s . That is we have ϵ = ϵ 2 t + ϵ 2 s ≤ ϵ tar . When mitigation is used there is an additional contribution ϵ c from commuting errors past short-time gates. That is ϵ = ϵ 2 t + ϵ 2 s + ϵ 2 c ≤ ϵ tar . All Trotter bounds used to produce Supplementary Table 5 are numeric. We are considering a fermion occupation number of Λ = 5.
In Supplementary Table 5 we consider both per-time and per-gate error models, as well as standard circuit decompositions and subcircuit decompositions. Recall that the former only allows a standard gate set, whereas the latter allows access to a continuous family of two-qubit gates.
In the per-gate error model, the error mitigation techniques described in the Supplementary Method entitled "Trivial Stochastic Error Bound" do not apply, and we choose the decomposition which yields the lowest depth circuit. In both the standard and subcircuit decompositions, this implies using the conjugation technique Supplementary Equation (9) rather than the decomposition techniques of before. For example, when decomposing 3local terms, using a standard gate set this conjugation decomposition must be applied twice to leave us with only single qubit rotations and gates that are equivalent to CNOT gates (up to single-qubit rotations). Whereas if we are in a per-gate error model but allowing a subcircuit gate-set, then we only need to decompose via conjugation once.
Where error mitigation can be applied in the per-gate model, we decompose all gates via our subcircuit decompositions as otherwise error mitigation is not possible. Similarly, in the per-time error model, we always use these decompositions, as they are always more efficient than conjugation decompositions in that model. We see that in the per-time error model using the compact encoding, using our subcircuit synthesis with the error mitigation techniques they enable yields the best performance across the hardware error rates considered. For q = 10 −6 admitting subcircuit gates and allowing for error mitigation takes us from a maximum simulation time of T tar = 3.17 to T tar = 896. For the VC encoding, error mitigation does not yield an improvement. However subcircuit gates and our subcircuit synthesis techniques do, taking us from T tar = 2.25 to T tar = 3.81 for q = 10 −6 .
For both encodings smaller improvements are seen for q = 10 −5 and q = 10 −4 . In the per-gate error model for the compact encoding, we find that our subcircuit decomposition techniques and error mitigation strategy yields an improvement, but over error rates of q = 10 −6 and q = 10 −5 . For the VC encoding we see that error mitigation does not help. However for q = 10 −6 subcircuit gates still yield an improvement, taking us from T tar = 1.63 to T tar = 2.2.
In cases where error mitigation is used we include the classical post-selection overhead. We have deliberately capped this at a maximum of ≈ 10 4 , to disallow excessive post-selection overhead. In some cases this cap is reached, indicating that our error mitigation techniques could yield further improvement still, but at a potentially unreasonably high post-selection overhead.