Introduction

One main reason that led Feynman1 and others to propose the idea of quantum computers was the fact that problems like simulating the dynamics of quantum systems are intractable on a classical computer. Starting from the seminal work of Lloyd2, much research3 has been done to develop algorithms for simulating Hamiltonians, culminating in various techniques like product formulas4,5, quantum walks6, linear combination of unitaries7, truncated Taylor series8, and quantum signal processing9. Special techniques have been developed for simulating particular physical systems10,11,12,13,14,15,16,17, which might find applications in developing new pharmaceuticals, catalysts and materials. Phase estimation can be combined with quantum simulation to find the ground state energy18 and excited state energies19,20,21 of the Hamiltonian. This is called the electronic structure problem14, which is important in chemistry and material science. Research in quantum simulation has also inspired the development of quantum algorithms for various other problems22,23,24,25,26.

One main challenge for digital quantum simulation is the implementation of efficient circuits that can produce reliable results. Without it, a theoretical exponential speedup may not lead to a useful algorithm if a typical practical application requires an amount of time and memory that is beyond the reach of even a quantum computer. There are a number of factors that can affect the efficiency of a quantum circuit i.e. its running time and error, for example, the number of qubits, depth, gate count, etc. So depending upon the applications or other hardware constraints, one can design algorithms that optimize or reduce the count/depth of one particular type of quantum gate or other resources. For example, there are algorithms that do T-count and T-depth-optimal synthesis27,28,29 given a unitary or does re-synthesis of a given circuit with reduced T-count, T-depth30,31,32 or CNOT-count33,34,35. The non-Clifford T gate has known constructions in most of the error correction schemes and the cost of fault-tolerantly implementing it exceeds the cost of the Clifford group gates by as much as a factor of hundred or more36,37,38. Quantum error correction and fault tolerance are especially significant for large quantum circuits, else the accumulation of errors will make any output highly unreliable and hence useless. The minimum number of T-gates required to implement certain unitaries is a quantifier of difficulty in many algorithms39,40 that try to classically simulate quantum computation. So, even though alternative fault-tolerance methods such as completely transversal Clifford+T scheme41 and anyonic quantum computing42 are also being explored, minimization of the number of T gates in quantum circuits remain an important and widely studied goal. Multi-qubit gates like CNOT introduce more error than single qubit gates, so reducing CNOT gate is important and especially relevant for the noisy intermediate scale quantum (NISQ) computers.

Our contributions

(I) One main result in this paper is Lemma 2.4, which shows that it is possible to implement an exponentiated sum of commuting Paulis with at most m (controlled)-rotation gates, where m is the number of distinct non-zero eigenvalues (ignoring sign). For illustration, we consider the Hamiltonian for the Heisenberg model and we show that it is possible to achieve about 50% reduction in the rotation gate cost and for certain underlying graphs this reduction can be about 75%. However, the cost of Toffolis may increase. We have given explicit circuits for 4-qubit and 6-qubit chain (or cycle), where we attempt to reduce both the rotation and Toffoli gate cost.

(II) In most previous works, circuits for individual exponentiated Paulis are synthesized and combined. We show that it is possible to reduce the gate count (not only non-Clifford gates) if we instead consider groups of commuting Paulis. To give some practical demonstration, we consider the qDRIFT Hamiltonian simulation algorithm43. We call the error introduced due to the algorithm as ‘simulation error’. We take the 1-D 4 qubit and 6 qubit Heisenberg Hamiltonians (Fig. 7) and also 4-qubit Hamiltonians for H2 and LiH (with freezing in the STO-3G basis) (Fig. 8), and compare the case where a single Pauli term is selected with the case where a set of commuting Pauli terms is selected for implementation at each iteration of qDRIFT. We observe that the error accumulation is less for multiple terms and also the rotation gate cost is less in this error regime. The number of Toffoli pairs is roughly equal to the number of Rz/cRz used, in case of multiple terms. So overall, we have less T-count when implementing multiple commuting Paulis per iteration of qDRIFT. This adds to the motivation of building efficient circuits for such Hamiltonians.

(III) In subsection ‘Optimized Circuits for Quantum Chemistry’ we derive explicit quantum circuits for the two-body excitation terms appearing in the Coulomb Hamiltonian in quantum chemistry. We mainly use the Clifford+T universal fault-tolerant gate set to implement unitaries. We design efficient circuits for a different grouping of commuting Pauli operators. It is evident (Table 2) that the rotation gate cost depends on the coefficients of the Pauli summands. For some combination of coefficients the circuits derived here are optimal, in the sense, that they have the minimum (i.e. 1) number of cRz gates. Though our focus is on reducing the non-Clifford gate count, but most of the quantum circuits derived here have an overall reduced gate count, including a reduction in the 2-qubit gates like CNOT. In Table 1 we have compared the number of gates required to implement one of the Hamiltonians considered in this paper with a previous construction. For the remaining Hamiltonians we did not find any compact previous construction to compare with. In short, our approach can be useful not only in the fault-tolerant regime, but also in the NISQ era.

Table 1 Comparison of gate counts required to simulate eiHt (Eqs. (19), (20)) using the circuit synthesized by us with the circuit in ref. 50.

(IV) In Algorithm 1, we describe a greedy method of grouping into commuting Paulis, but the objective is to optimize the number of non-Clifford gates. There have been a host of work that tackles the question of how to group the commuting Paulis and to the best of our knowledge most (if not all) of them has the objective to reduce the number of measurements required to make an estimation44. The latter problem is especially important for variational quantum eigensolvers. The grouping that optimizes the non-Clifford gates may not optimize the number of measurements. In most cases, finding the optimal grouping is difficult. But we can always ask the question that given a grouping (for whatever objective), is it possible to compile efficient circuits. In this case, we can use our techniques (Lemma 2.4) to reduce the gate count. Thus our methods can also be used to design circuits for the measurement problem.

In this paper, we use the Jordan–Wigner (JW) transformation45 to map from the fermionic to the qubit space. And then we group into commuting Paulis. Other transformations like Bravyi–Kitaev and parity transformations46 can also be used and may be beneficial in circumstances where Clifford operations are costly or inherent quantum error correction is desirable. We focus on Jordan–Wigner for two reasons. First, in this paper, we focus on the synthesis of efficient quantum circuits for exponentiated commuting Paulis and the techniques hold no matter whichever mapping is considered. Second, previous work has not shown obvious advantages for Bravyi–Kitaev transformations within the domain of fault-tolerant quantum computing.

How we compare the cost of non-Clifford resources

In all the constructions discussed in this paper, two approximately implementable gates are used—Rz and controlled Rz (cRz), whose T-count varies inversely with precision or synthesis error. From the results given in29 and from the implementations performed here until the error 10−6, we believe that T-count of cRz can be less than that of Rz for most modestly small rotation angles. However, for convenience, we assume these have equal cost and with some abuse of terms, we refer to the T-count of Rz/cRz as the ‘(non-Clifford) rotation gate cost’. The only exactly implementable non-Clifford unitary/gate considered in the constructions is Toffoli with T-count 727 or 447. For low error regime, the T-count of approximately implementable Rz/cRz will dominate, while in a high error regime the T-count of Toffoli may matter, if we use a lot of them. To reduce the T-count of compute-uncompute Toffoli pairs, we can use the temporary logical AND gadget, proposed by Gidney48. In fact, in our circuits, we use Rz gates controlled on n qubits (n > 1), each of which can be decomposed into (compute–uncompute) pairs of NOT gates controlled on n qubits and a cRZ gate. Each such multi-controlled NOT can be implemented with n−1 Toffoli or 4n−4 T-gates48. If we combine compute-uncompute pairs then the overall T-count of the circuit can reduce further, by using logical AND gadget. We must keep in mind that the implementations in47,48 use classical resources and measurements, and it is not straightforward to argue that it will give advantage, inspite of using less number of T-gates. We can also use the construction in49 that implements an n-controlled NOT gate using 4n−4 T, 4n−3 CNOT and n−1 ancillae qubits. In our paper, we have expressed the non-Clifford T-gate cost in terms of the rotation gate cost and the number of Toffoli pairs used.

Related work

In ref. 50 the authors studied the non-Clifford resource cost required to simulate the chemical process of biological nitrogen fixation by nitrogenase. In ref. 51 the authors developed algorithms to synthesize circuits for the Clifford operators that diagonalize a group of commuting Paulis. The goal was to reduce the two-qubit CNOT gate count because of its low fidelity and limited qubit connectivity of near-term quantum computer architectures. A similar diagonalization algorithm has been used in52 for efficient simulation of Hamiltonian dynamics. Much work has been done for the construction of quantum circuits for the evolution of molecular systems16,53,54,55,56,57,58,59 and the Heisenberg model60.

Results

Notation

In many places we write G(q) to denote that the gate or operator G acts on qubit q. For multi-qubit gates we write CNOT(c, t) to denote a CNOT with control at qubit c and target at qubit t. For convenience, we have removed the parenthesis in the subscript whenever there is less ambiguity. We write [K] = {1, 2, …, K}. We denote the n × n identity matrix by \({{\mathbb{I}}}_{n}\) or \({\mathbb{I}}\) if dimension is clear from the context. We denote the set of n-qubit unitaries by \({{{{\mathcal{U}}}}}_{n}\). The size of an n-qubit unitary is N × N where N = 2n. We have given detailed description about the n-qubit Pauli operators (\({{{{\mathcal{P}}}}}_{n}\)), Clifford group (\({{{{\mathcal{C}}}}}_{n}\)) and the group (\({{{{\mathcal{J}}}}}_{n}\)) generated by the Clifford and T gates in Supplementary Note 1.

Optimizing Trotter-decompositions

The time evolution of a quantum system, described by a Hamiltonian H is eiHt. Most often the Hamiltonian H can be decomposed as the sum \(H=\mathop{\sum }\nolimits_{j = 1}^{m}{\alpha }_{j}{H}_{j}\), where each Hj is Hermitian. There can be more than one decomposition of H and we select the one such that for each Hj the unitary \({e}^{-i\tau {H}_{j}}\) is efficiently implementable on a quantum computer, for any τ. The goal of the Hamiltonian simulation problem is to find an approximation of eitH into a sequence of \({e}^{-i\tau {H}_{j}}\), up to some desired precision. For example, using the Lie-Trotter formula5 we have that

$${e}^{-iHt}=\mathop{\lim }\limits_{k\to \infty }{\left(\mathop{\prod}\limits_{j}{e}^{-i(t/k){\alpha }_{j}{H}_{j}}\right)}^{k}.$$

In the non-asymptotic regime, the Trotter scheme provides a first-order approximation, with the norm of the difference between the exact and approximate time evolution scaling as O(t2/k). More advanced higher order schemes3,4 are also available. Alternatively, a randomized approach called qDRIFT can be used in place of a Trotter formula wherein the quantum state is evolved according to the probabilistic channel

$$\rho \,\mapsto \mathop{\sum}\limits_{j}\frac{{\alpha }_{j}}{| \alpha {| }_{1}}{e}^{-i| \alpha {| }_{1}{H}_{j}t}\rho {e}^{i| \alpha {| }_{1}{H}_{j}t}.$$
(1)

Note the error here is also O(t2); however, in this case, a single exponential is performed rather than O(m) as would be needed for the comparable Trotter formula. The cost of such an approach scales as \(O(| \alpha {| }_{1}^{2}{t}^{2}/\epsilon )\) for error ϵ and does not directly depend on m.

The approximation errors arising in the use of product formulas are caused by non-commuting terms in the Hamiltonian. For example, see ref. 61 for a detailed exposition on Trotter errors. Given any set of mutually commuting operators P1, …, Pm we have the following:

$${e}^{-it\mathop{\sum }\nolimits_{j = 1}^{m}{P}_{j}}=\mathop{\prod }\limits_{j=1}^{m}{e}^{-it{P}_{j}}$$
(2)

Thus, the operators are partitioned into mutually commuting subsets. Time evolution for the sum of mutually commuting operators in each such subset is trivial, and the product formulas can be applied to the sum of Hamiltonians formed as the sum of each subset. This approach becomes especially applicable in scenarios where the Hamiltonian can be expressed as a sum of Pauli operators, for which the commutation relations can easily be evaluated.

As a specific example, consider the case where H = aZZZ. Since the Hamiltonian is diagonal, eiaZZZt has computational basis vector \(\left\vert {b}_{1},{b}_{2},{b}_{3}\right\rangle\) and eigenvalues \({e}^{-i{(-1)}^{{b}_{1}\oplus {b}_{2}\oplus {b}_{3}}at}\). Thus the eigenvalues are determined by the parity of the bit strings, which can be computed using CNOT gates. From this reasoning the following quantum circuit will perform the simulation of this Pauli operator exactly.

(3)

As every Pauli operator of weight 3 can be diagonalized by Clifford conjugation, this circuit up to an elementary basis transformation, will simulate any weight 3 Pauli Hamiltonian. The exact same strategy of diagonalizing and simulating the Pauli operator in the eigenbasis shows that each exponential of a weight ν Pauli operator Hamiltonian requires 2(ν−1) CNOT operators and one rotation gate. This strategy is at the heart of most elementary networks for simulating chemistry and spin models53,62.

The work of50 provided another way of thinking about these decompositions by showing an explicit method that can diagonalize sums of commuting operators that appear in chemistry simulations by transforming into a simultaneous eigenbasis of such terms. In full generality, such transformations reduce the circuit depth but need not reduce the circuit size. However, we will see here that for some Hamiltonians these transformations can reduce the circuit size as well.

As a motivating example, consider the Hamiltonian H = XX + YY + ZZ. This Hamiltonian can be simulated, up to a global phase, by

(4)

This can be implemented using two Toffoli gates and a single qubit rotation. In contrast, the standard approach from53,62 would use three single qubit rotations and no Toffoli gates. As rotation synthesis often is 10 times more expensive than Toffoli gates27,29,50, this will almost always be a favorable way of performing the simulation. In contrast, if this symmetry is broken then the Hamiltonian term will be more expensive to simulate. Thus it can be favorable to introduce such symmetries as needed artificially. For example, consider

$$H=XX+YY+(3/2)ZZ=(XX+YY+ZZ)+ZZ/2.$$
(5)

Such a simulation can be performed using two rotation gates rather than the 3 naïvely needed and so it makes sense to compile the Hamiltonian terms this way to reduce the overall complexity.

As another example, not all rotations are equally expensive and so we should also combine terms in such a way as to minimize the cost. For example consider the time-evolution operator

$$U(t)={e}^{-i(\pi /4\sqrt{2}-\epsilon )Z-i(\pi /4\sqrt{2}+\epsilon )X}\approx {e}^{-i\pi /4(X+Z)/\sqrt{2}}{e}^{-i(X-Z)\epsilon }.$$
(6)

While the first operation in this Trotterization is not a Clifford operation, it is a simulation of a Hadamard gate for time π/4. As this corresponds to a special angle and since the Hadamard gate can be diagonalized using a constant size H and T circuit, the cost of implementing this first term is O(1) and thus the dominant cost is the remaining rotation. In contrast, if this property were not used then we would have two arbitrary rotations in the Trotterization which would be nearly twice the cost of this simplified approach. These ideas can further be used in concert: remainder terms that arise from inexactly rounding a Hamiltonian evolution to a known cheap simulation can be absorbed into other terms or even other Trotter steps.

Algorithm 1

Hamiltonian compilation using Greedy 1-norm minimization

We propose an algorithm in Algorithm 1 that exploits this intuition through a greedy decomposition of the Hamiltonian into sums of commuting terms. These mutually commuting terms, or fragments, are chosen such that the ratio of the fraction of the Hamiltonian that is simulated by the term to the cost of the term is maximized. This choice is motivated in part by the fact that the query complexity of a quantum simulation is lower bounded by Ω(α1t)63 and thus designing circuits that simulate as large of a fraction of this one-norm as possible per quantum gate operation is a sensible optimization heuristic for our greedy algorithm. Unlike traditional approaches to partitioning the Hamiltonian, our approach allows partial allocation of Hamiltonian terms to multiple commuting sets. Further, the allocation can be negative in our approach. This negative allocation is important because we will see that in some cases the introduction of more Hamiltonian weights on some terms can be more than offset by the reduced costs of simulating the fragment.

The number of optimization steps required for our greedy algorithm is at most O(m2). To see this, assume that the optimal strategy involves μ iterations of the outer loop for μ Ω(m) and assume that the inner loop optimization requires ν iterations. Since COST ≥ 1 it holds that \({\Gamma }_{\max }\le {\sum }_{j}| {\alpha }_{j}^{{\prime} }| -{\sum }_{j}| {\alpha }_{j}^{{\prime} }-{\beta }_{j}|\). Assume that \({\sum }_{j}| {\alpha }_{j}^{{\prime} }| -{\sum }_{j}| {\alpha }_{j}^{{\prime} }-{\beta }_{j}| < | {\alpha }^{{\prime} }{| }_{\infty }\). In this case, by assumption there exists a trivial solution that outperforms this where the largest term is simulated in isolation at cost 1. Therefore we must have that \({\sum }_{j}| {\alpha }_{j}^{{\prime} }| -{\sum }_{j}| {\alpha }_{j}^{{\prime} }-{\beta }_{j}| \ge | {\alpha }^{{\prime} }{| }_{\infty }\). Then from standard norm inequalities we have that \(| {\alpha }^{{\prime} }{| }_{\infty }\ge | {\alpha }^{{\prime} }{| }_{1}/m\). Thus the one-norm of the vector is given by a first-order difference equation of the form α(j+1)1 ≤ (1 − 1/m)α(j)1. The general solution to this is (1−1/m)jα1 which is ϵ for \(j\in O(\log (1/\epsilon )/\log (1/(1-1/m)))\in O(m\log (1/\epsilon ))\). This implies that \(\mu \in O\left(m\log (1/\epsilon )\right.\). Next ν is the maximum number of iterations for the inner loop. Since each iteration continues until the total number of terms remaining is reduced by one we have that νO(m). Thus the total number of iteration steps is \(\mu \nu \in O({m}^{2}\log (1/\epsilon ))\). This shows that the algorithm scales polynomially with the number of terms if the optimization process is also efficient.

The cost of optimization can vary strongly depending on the continuity/convexity of the objective function and without making further assumptions we cannot assume that the optima over \(\overrightarrow{\beta }\) can be found in polynomial time. If we assume, however, that the optimizer works by considering one of a polynomial number of potential circuits for simulating the terms and then uses linear programming to find the optimal value of \(\overrightarrow{\beta }\), we have that the optimization problem can be solved in polynomial time on a classical computer. Such a choice corresponds exactly to the discussion in the next sections, where we propose the use of a discrete set of optimization strategies for simulating chemistry that can then be used within Algorithm 1 to greedily find the best possible simulation circuit given these discrete set of optimizations for the value of \(\overrightarrow{\beta }\) chosen.

Truncating Hamiltonian

We can terminate Algorithm 1 before all terms are allocated i.e. we output {Hj = hjiPi: j = 1, …, m} such that \(\mathop{\sum }\nolimits_{j = 1}^{{m}^{{\prime\prime} }}{H}_{j}=\tilde{H}\,\ne \,H\). This leads to truncation errors in our simulation algorithm that will be present even if an algorithm such as qDRIFT is used for the simulation. We show here that if we truncate some terms of the Hamiltonian, then the error incurred is at most twice the error incurred from the complete Hamiltonian simulation by qDRIFT, given that the distance of the truncated and given Hamiltonian is at most square root of the qDRIFT simulation error. We do this because in some cases we may be able to simulate the truncated Hamiltonian with less number of gates.

Suppose we write the given Hamiltonian as follows.

$$H=\mathop{\sum }\limits_{j=1}^{M}{w}_{j}{H}_{j}+\delta H=\tilde{H}+\delta H\qquad [\parallel H\parallel \le 1]$$
(7)

Here each Hj is a Hermitian matrix for which an efficient simulation circuit exists. The protocol working with the truncated Hamiltonian \(\tilde{H}\), samples each Hj independently with probability \({p}_{j}=\frac{{w}_{j}}{\lambda }\) (where λ = ∑iwi), in each iteration.

The error per iteration of qDRIFT, i.e. ϵN, is given by bounding the diamond distance between the channel \({{{{\mathcal{U}}}}}_{N}(\rho )\) corresponding to the Hamiltonian H and the channel \(\tilde{{{{\mathcal{E}}}}}(\rho )\) implemented by the protocol.

Lemma 2.1

The error observed when there are N time-steps taken using a qDRIFT channel, ϵN, as quantified by the diamond distance as a function of the truncation error in the Hamiltonian δ is

$${\epsilon }_{N}\le \parallel \tilde{{{{\mathcal{E}}}}}({\rho} )-{{{{\mathcal{U}}}}}_{N}({\rho} ){\parallel }_{\Diamond}\le {\epsilon }_{qDRIFT}+2\delta \sqrt{{\epsilon }_{qDRIFT}}$$

where \({\epsilon }_{qDRIFT}\lessapprox \frac{2{\lambda }^{2}{t}^{2}}{{N}^{2}}\) and λ = ∑iwi.

The proof has been given in Supplementary Method 4 (Lemma 8). Thus the total error after all repetitions is as follows.

$$\epsilon \, \le \, N{\epsilon }_{N}\lessapprox N\frac{2{\lambda }^{2}{t}^{2}}{{N}^{2}}+2\sqrt{2}\delta N\frac{\lambda t}{N}=\frac{2{\lambda }^{2}{t}^{2}}{N}+2\sqrt{2}\delta \lambda t$$
(8)

This shows that if \(\delta \in O(\sqrt{{\epsilon }_{qDRIFT}})\) then the asymptotic scaling is not impacted by the exclusion of the terms from the Hamiltonian.

Expected cost

Let the cost of implementing the unitary \({e}^{it{w}_{j}{L}_{j}/N}\) be cj. Cost can be defined in many ways, like total number of gates, number of non-Clifford gates like T or Toffoli gate, number of multi-qubit gates like CNOT, etc. In our paper we focus mainly on the number of non-Clifford gates. Let \({{{{\mathcal{C}}}}}_{N}\) be the variable denoting the cost per repetition of our protocol. Then the expected cost and the variance per repetition is as follows.

$${\mathbb{E}}[{{{{\mathcal{C}}}}}_{N}]=\mathop{\sum }\limits_{j=1}^{M}{p}_{j}{c}_{j}=\frac{1}{\lambda }\mathop{\sum }\limits_{j=1}^{M}{w}_{j}{c}_{j}={\mu }_{N}\quad {{{\rm{and}}}}\quad \,{{\mbox{Var}}}\,[{{{{\mathcal{C}}}}}_{N}]=\frac{1}{{\lambda }^{2}}\left(\lambda \mathop{\sum }\limits_{j=1}^{M}{w}_{j}{c}_{j}^{2}-{\left(\mathop{\sum }\limits_{j = 1}^{M}{w}_{j}{c}_{j}\right)}^{2}\right)={\sigma }_{N}^{2}$$
(9)

By Chebyshev’s inequality (Supplementary Note 1) we have the following for some real number k > 0.

$$\Pr \left[| {{{{\mathcal{C}}}}}_{N}-{\mu }_{N}| \ge k{\sigma }_{N}\right]\le \frac{1}{{k}^{2}}$$
(10)

The cost per repetition of our protocol is a bounded variable i.e. \(a\le {{{{\mathcal{C}}}}}_{N}\le b\), for some real numbers a, b. If \({{{\mathcal{C}}}}\) is the variable denoting the cost of all repetitions of our protocol, then

$${\mathbb{E}}[{{{\mathcal{C}}}}]=N{\mu }_{N}$$
(11)

and since each repetition is independent, making the corresponding cost variables per repetition distributed identically and independently, so we apply Hoeffding’s inequality (Supplementary Note 1) and obtain the following.

$$\Pr \left[\left\vert {{{\mathcal{C}}}}-N{\mu }_{N}\right\vert \ge cN{\mu }_{N}\right]\le 2\exp \left(-\frac{2{c}^{2}{N}^{2}{\mu }_{N}^{2}}{N{(b-a)}^{2}}\right)=2\exp \left(-\frac{2{c}^{2}N{\mu }_{N}^{2}}{{(b-a)}^{2}}\right)={\epsilon }_{c}\qquad [c > 0]$$
(12)

Thus with probability at least 1−ϵc, the cost of all repetitions of the protocol is at most \(\frac{(c+1)N}{\lambda }\mathop{\sum }\nolimits_{j = 1}^{M}{w}_{j}{c}_{j}\), where \(c=\frac{b-a}{{\mu }_{N}\sqrt{2N}}\log \left(\frac{2}{{\epsilon }_{c}}\right)\).

Error in simulation while sampling multiple Paulis

We consider the qDRIFT protocol43 for simulating Hamiltonians. If H = ∑jhjHj, then in each iteration we sample Hj with probability proportional to hj and then simulate it for a short time period. Now Hj can be a single Pauli operator or a sum of commuting Paulis, as is achieved in Algorithm 1, to optimize the cost of simulation. Here we derive a bound on the difference in simulation error for these two cases.

Let \({H}_{j}=\mathop{\sum }\nolimits_{{i}_{j} = 1}^{{L}_{j}}{P}_{{i}_{j}}\)—sum over commuting Paulis and M be the total number of Pauli operators. So the Hamiltonian can be written as \(H=\mathop{\sum }\nolimits_{j = 1}^{L}{h}_{j}{H}_{j}=\mathop{\sum }\nolimits_{j = 1}^{L}\mathop{\sum }\nolimits_{{i}_{j} = 1}^{{L}_{j}}{h}_{j}{P}_{{i}_{j}}\). We assume the most general case where a single Pauli can be shared between multiple commuting groups i.e. Hj.

In the first case, a group of commuting Paulis i.e one of the Hj is selected independently with probability \({q}_{j}=\frac{{h}_{j}}{{\sum }_{j}{h}_{j}}\). In the second case, one single Pauli operator Pk is sampled independently with probability \({p}_{k}^{{\prime} }=\frac{{\sum }_{{j}^{{\prime} }}{h}_{{j}^{{\prime} }}}{{\sum }_{i}{h}_{i}{L}_{i}}\), where in the numerator the sum is over all the commuting Pauli groups in which Pk appears. Let λ = ∑jhj and \({\lambda }^{{\prime} }={\sum }_{j}{h}_{j}{L}_{j}\). We define the Liouvillian that generates unitaries under Hamiltonian Hj and \({P}_{{i}_{j}}\) so that

$${{{{\mathcal{L}}}}}_{j}=i({H}_{j}\rho -\rho {H}_{j})\quad {{{\rm{and}}}}\quad {{{{\mathcal{L}}}}}_{{i}_{j}}=i({P}_{{i}_{j}}\rho -\rho {P}_{{i}_{j}}).$$
(13)

Thus if \({{{\mathcal{L}}}}=i(H\rho -\rho H)\), then \({{{\mathcal{L}}}}=\mathop{\sum }\nolimits_{j = 1}^{L}{h}_{j}{{{{\mathcal{L}}}}}_{j}=\mathop{\sum }\nolimits_{j = 1}^{L}{h}_{j}\mathop{\sum }\nolimits_{{i}_{j} = 1}^{{L}_{j}}{{{{\mathcal{L}}}}}_{{i}_{j}}\). We define two channels \({{{{\mathcal{E}}}}}_{1}=\mathop{\sum }\nolimits_{j = 1}^{L}{q}_{j}{e}^{\tau {{{{\mathcal{L}}}}}_{j}}\) and \({{{{\mathcal{E}}}}}_{2}=\mathop{\sum }\nolimits_{j = 1}^{L}{p}_{j}\mathop{\sum }\nolimits_{{i}_{j} = 1}^{{L}_{j}}{e}^{{\tau }^{{\prime} }{{{{\mathcal{L}}}}}_{{i}_{j}}}\), where \({p}_{j}=\frac{{h}_{j}}{{\lambda }^{{\prime} }}\), that evolves the superoperators \({{{{\mathcal{L}}}}}_{j}\) and \({{{{\mathcal{L}}}}}_{ij}\) for time interval \(\tau =\frac{\lambda t}{N}\) and \({\tau }^{{\prime} }=\frac{{\lambda }^{{\prime} }t}{N}\) respectively. Here we note that for the second channel, for each Pauli Pk, we have expanded the sum \({p}_{{k}^{{\prime} }}={\sum }_{{j}^{{\prime} }}\frac{{h}_{{j}^{{\prime} }}}{{\lambda }^{{\prime} }}\) to reflect the commuting groups in which it belongs. Thus \(\mathop{\sum }\nolimits_{k = 1}^{M}{p}_{{k}^{{\prime} }}=\mathop{\sum }\nolimits_{j = 1}^{L}\mathop{\sum }\nolimits_{{i}_{j} = 1}^{{L}_{j}}{p}_{j}\). Then we can prove the following.

Lemma 2.2

The distance between the qDRIFT channel with single and grouped Hamiltonian terms for simulation time t using N time steps obeys

$$\parallel {{{{\mathcal{E}}}}}_{2}-{{{{\mathcal{E}}}}}_{1}{\parallel }_{\Diamond}\le \frac{4{t}^{2}{\lambda }^{{\prime} 2}}{{N}^{2}}$$

The proof has been given in Supplementary Method 4 (Lemma 7).

Optimized circuits for quantum chemistry

In this section we review quantum algorithms for quantum chemistry and design efficient circuits that are useful for quantum chemistry simulation within the Trotter–Suzuki formalism. The electronic structure problem has emerged as a central application of quantum computers in recent years, with quantum algorithms providing potential exponential speedups relative to the best known classical algorithms50,53. The electronic structure problem more specifically is, for a fixed set of positions of the nuclei, find the configuration of electrons that minimizes the total energy for a fixed number of electrons. The properties of materials, molecules and atoms at low temperatures emerge from these energies. In the non-relativistic case, the dynamics of these electrons are governed by the Coulomb Hamiltonian.

$$H=-\mathop{\sum}\limits_{i}\frac{{\nabla }_{i}^{2}}{2}-\mathop{\sum}\limits_{i,j}\frac{{\zeta }_{j}}{| {R}_{j}-{r}_{i}| }+\mathop{\sum}\limits_{i < j}\frac{1}{| {r}_{i}-{r}_{j}| }+\mathop{\sum}\limits_{i < j}\frac{{\zeta }_{i}{\zeta }_{j}}{| {R}_{i}-{R}_{j}| }$$

where we have used atomic units, ri represent the positions of electrons, Ri represent the positions of nuclei, and ζi are the charges of nuclei.

Following the strategy outlined in13, we select the second quantization and discretize the Hamiltonian by representing it within some canonical basis such as a Gaussian basis or a planewave basis. Under the above assumptions, the electronic Hamiltonian can be represented in terms of creation and annihilation operators as follows64,65. Each spin orbital is assigned a (distinct) qubit where the state \(\left\vert 1\right\rangle\) corresponds to an occupied orbital and \(\left\vert 0\right\rangle\) an unoccupied orbital. Specifically, let \({a}_{p}^{{\dagger} }\) and ap be the fermionic raising and lowering operators acting on spin-orbital p satisfying the anti-commutation relation \(\{{a}_{p}^{{\dagger} },{a}_{q}\}={\delta }_{pq}\) and \(\{{a}_{p},{a}_{q}\}=\{{a}_{p}^{{\dagger} },{a}_{q}^{{\dagger} }\}=0\),

$$H=\mathop{\sum}\limits_{p,q}{h}_{pq}{a}_{p}^{{\dagger} }{a}_{q}+\frac{1}{2}\mathop{\sum}\limits_{p,q,r,s}{h}_{pqrs}{a}_{p}^{{\dagger} }{a}_{q}^{{\dagger} }{a}_{r}{a}_{s}$$
(14)

where the coefficients hpq, hpqrs are determined by the discrete basis set chosen, and the sums run over the number of discretization elements or basis set for a single particle. From inspection, we can see that the number of terms in Equation (14) is O(N4), where N is the size of the discrete representation. The molecular orbitals are one widely used basis set. These, in turn, can be expressed as linear combinations of atomic basis functions66,67. The coefficients of this expansion are obtained by solving the set of Hartree–Fock equations that arise from the variational minimization of the energy using a single determinant wave function. Thus in this representation the location of (indistinguishable) electrons are specified by the occupations of the discrete basis.

The Jordan–Wigner45 or Bravyi–Kitaev46 transformations are commonly used to convert the fermionic creation and annihilation operators into Pauli operators. For example, within the Jordan–Wigner encoding, a and a can be written in terms of qubit operators as follows.

$${a}_{p}={Q}_{(p)}\mathop{\prod }\limits_{j=0}^{p-1}{Z}_{(j)}=\frac{1}{2}({X}_{(p)}+i{Y}_{(p)})\mathop{\prod }\limits_{j=0}^{p-1}{Z}_{(j)}\quad {{{\rm{and}}}}\quad {a}_{p}^{{\dagger} }={Q}_{(p)}^{{\dagger} }\mathop{\prod }\limits_{j=0}^{p-1}{Z}_{(j)}=\frac{1}{2}({X}_{(p)}-i{Y}_{(p)})\mathop{\prod }\limits_{j=0}^{p-1}{Z}_{(j)}$$

Here \({Q}_{(p)}^{{\dagger} }=\frac{1}{2}({X}_{(p)}-i{Y}_{(p)})\) and \({Q}_{(p)}=\frac{1}{2}({X}_{(p)}+i{Y}_{(p)})\) are the qubit creation and annihilation operators respectively. ∏jZ(j) acts as an exchange-phase factor, accounting for the anti-commutation relations of a and a.

With these tools in place, the second-order Trotter–Suzuki approximation reads

$${e}^{-iHt}=\mathop{\prod}\limits_{p,q}{e}^{-it({h}_{pq}{a}_{p}^{{\dagger} }{a}_{q}+{h}_{qp}{a}_{q}^{{\dagger} }{a}_{q}^{{\dagger} }{a}_{p})/2}\mathop{\prod}\limits_{p,q,r,s}{e}^{-it({h}_{pqrs}{a}_{p}^{{\dagger} }{a}_{q}^{{\dagger} }{a}_{r}{a}_{s}+{h}_{srqp}{a}_{s}^{{\dagger} }{a}_{r}^{{\dagger} }{a}_{q}{a}_{p})/4}+O({t}^{2})$$
(15)

Such a simulation can then be performed by substituting in the Pauli representation yielded by the Jordan–Wigner transformation. Higher order versions of this are also known63 that can achieve error scaling O(t2k+1); however, we do not focus on such cases here since the optimizations to the operator exponentials that we consider here will apply in all such cases.

Optimizing two-body operator exponentials

The two-body terms are the most common, and often the most significant, contribution to the complexity of a simulation of the Coulomb Hamiltonian in second quantization10. In this section, we consider the general two-body double excitation terms to reduce this dominant cost for simulation of chemistry, which when expressed using the Jordan–Wigner transformation, can be written as product of X, Y, Z operators as follows53. We have removed the parentheses in the subscripts, for convenience.

$${h}_{pqrs}{a}_{p}^{{\dagger} }{a}_{q}^{{\dagger} }{a}_{r}{a}_{s}+{h}_{srqp}{a}_{s}^{{\dagger} }{a}_{r}^{{\dagger} }{a}_{q}{a}_{p}=\left(\mathop{\bigotimes }\limits_{k=s+1}^{r-1}{Z}_{k}\right)\left(\mathop{\bigotimes }\limits_{k=q+1}^{p-1}{Z}_{k}\right)\left(\frac{\Re \{{h}_{pqrs}\}}{8}{H}_{r}+\frac{\Im \{{h}_{pqrs}\}}{8}{H}_{i}\right)$$
(16)
$$\begin{array}{l}\,{{\mbox{where}}}\,{H}_{r}\,=\,{X}_{s}{X}_{r}{X}_{q}{X}_{p}-{X}_{s}{X}_{r}{Y}_{q}{Y}_{p}+{X}_{s}{Y}_{r}{X}_{q}{Y}_{p}+{Y}_{s}{X}_{r}{X}_{q}{Y}_{p}\\ \qquad \qquad \qquad+\,{Y}_{s}{X}_{r}{Y}_{q}{X}_{p}-{Y}_{s}{Y}_{r}{X}_{q}{X}_{p}+{X}_{s}{Y}_{r}{Y}_{q}{X}_{p}+{Y}_{s}{Y}_{r}{Y}_{q}{Y}_{p}\\ \quad{{\mbox{and}}}\,{H}_{i}\,=\,{Y}_{s}{X}_{r}{X}_{q}{X}_{p}+{X}_{s}{Y}_{r}{X}_{q}{X}_{p}-{X}_{s}{X}_{r}{Y}_{q}{Y}_{p}-{X}_{s}{Y}_{r}{Y}_{q}{Y}_{p}\\ \qquad \qquad \qquad-\,{Y}_{s}{X}_{r}{Y}_{q}{Y}_{p}+{Y}_{s}{Y}_{r}{X}_{q}{X}_{p}+{Y}_{s}{Y}_{r}{X}_{q}{Y}_{p}+{Y}_{s}{Y}_{r}{Y}_{q}{X}_{p}\end{array}$$
(17)

Note that if a Gaussian orbital basis is chosen then the values of hpqrs are typically real, resulting in Hi = 0. We will assume in the remainder of the discussion that such terms are zero and focus our attention on only the real part of the Hamiltonian.

If we define \({h}_{1}=({h}_{pqrs}{\delta }_{{X}_{p}{X}_{s}}{\delta }_{{X}_{q}{X}_{r}}-{h}_{qprs}{\delta }_{{X}_{p}{X}_{r}}{\delta }_{{X}_{q}{X}_{s}})\), \({h}_{2}=({h}_{psqr}{\delta }_{{X}_{p}{X}_{r}}{\delta }_{{X}_{q}{X}_{s}}-{h}_{spqr}{\delta }_{{X}_{p}{X}_{q}}{\delta }_{{X}_{r}{X}_{s}})\) and \({h}_{3}=({h}_{prsq}{\delta }_{{X}_{p}{X}_{q}}{\delta }_{{X}_{r}{X}_{s}}-{h}_{prqs}{\delta }_{{X}_{p}{X}_{s}}{\delta }_{{X}_{q}{X}_{r}})\), for distinct p, q, r, s then we have the following53.

$$\begin{array}{lll}&&\frac{1}{2}\mathop{\sum}\limits_{p,q,r,s}{h}_{pqrs}\left({a}_{p}^{{\dagger} }{a}_{q}^{{\dagger} }{a}_{r}{a}_{s}+{a}_{s}^{{\dagger} }{a}_{r}^{{\dagger} }{a}_{q}{a}_{p}\right)\\ &=&\frac{1}{8}\left(\mathop{\bigotimes }\limits_{k=p+1}^{q-1}\mathop{\bigotimes }\limits_{k=r+1}^{s-1}{Z}_{k}\right)\left(({X}_{p}{X}_{q}{X}_{r}{X}_{s}+{Y}_{p}{Y}_{q}{Y}_{r}{Y}_{s})(-{h}_{1}-{h}_{2}+{h}_{3})\right.\\ &&+({X}_{p}{X}_{q}{Y}_{r}{Y}_{s}+{Y}_{p}{Y}_{q}{X}_{r}{X}_{s})({h}_{1}-{h}_{2}+{h}_{3})+({Y}_{p}{X}_{q}{Y}_{r}{X}_{s}+{X}_{p}{Y}_{q}{X}_{r}{Y}_{s})(-{h}_{1}-{h}_{2}-{h}_{3})\\ &&+\left.({Y}_{p}{X}_{q}{X}_{r}{Y}_{s}+{X}_{p}{Y}_{q}{Y}_{r}{X}_{s})(-{h}_{1}+{h}_{2}+{h}_{3})\right)\end{array}$$
(18)

Thus conventionally, the part of the Hamiltonian which can be expressed in the form of Equation (16), are broken down into groups of at most 8 commuting operators that act on the qubits in question. Each term is diagonalized by a Clifford circuit and the evolution is performed based on this, with some Rz gates. In50 the authors diagonalize all 8 terms in the simultaneous eigenbasis and parallelizes all 8 Rz gates. This reduces the number of Clifford gates, depth, but comes at the cost of using extra 4 ancillae. Excluding the diagonalizing circuits on both sides they use 32 CNOTs and 8 RZ. Each diagonalizing circuit uses 3 CNOT and 1 H gate. Our goal in this section is to design more efficient quantum circuits for the double excitation terms. In Table 1 we have compared the gate costs of the circuit in50 with the circuits derived by us in each of the 3 cases considered by us. Fermionic SWAP gates55,68 can be used to make the orbitals neigboring and hence get rid of the tensor product of Z terms. So from here on, we ignore these terms.

Let q1, q2, q3, q4 be the qubits to which the fermions in the orbitals p, q, r, s are mapped respectively. We follow the technique used in50. W = CNOT(1, 2)CNOT(1, 3)CNOT(1, 4)H(1) is the unitary diagonalizing the 8 terms in the simultaneous eigenbasis. We rewrite the Hamiltonian H with general coefficients \({a}_{0},\ldots ,{a}_{7}\subset {\mathbb{R}}\). Unless mentioned, the leftmost operator acts on qubit q1, next ones on q2, q3 and the rightmost on qubit q4.

$$\begin{array}{ll}H\,=\,{a}_{0}XXXX+{a}_{1}YYXX+{a}_{2}YXYX+{a}_{3}YXXY\\ \qquad\quad+\,{a}_{4}XYYX+{a}_{5}XYXY+{a}_{6}XXYY+{a}_{7}YYYY\end{array}$$
(19)

Then following the arguments in50 we have the following.

$$\begin{array}{ll}{e}^{-iHt}\,=\,W\left({e}^{-i{a}_{0}Z{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{e}^{i{a}_{1}ZZ{\mathbb{I}}{\mathbb{I}}t}{e}^{i{a}_{2}Z{\mathbb{I}}Z{\mathbb{I}}t}{e}^{i{a}_{3}Z{\mathbb{I}}{\mathbb{I}}Zt}{e}^{i{a}_{4}ZZZ{\mathbb{I}}t}{e}^{i{a}_{5}ZZ{\mathbb{I}}Zt}{e}^{i{a}_{6}Z{\mathbb{I}}ZZt}{e}^{-{a}_{7}ZZZZt}\right){W}^{{\dagger} }\\ \qquad\quad=\,W{e}^{i(-{a}_{0}Z{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}+{a}_{1}ZZ{\mathbb{I}}{\mathbb{I}}+{a}_{2}Z{\mathbb{I}}Z{\mathbb{I}}+{a}_{3}Z{\mathbb{I}}{\mathbb{I}}Z+{a}_{4}ZZZ{\mathbb{I}}+{a}_{5}ZZ{\mathbb{I}}Z+{a}_{6}Z{\mathbb{I}}ZZ-{a}_{7}ZZZZ)t}{W}^{{\dagger} }\end{array}$$
(20)

The terms in between W and W add an overall phase ϕ. We denote the state of the qubits q1, …, q4 after application of W by variables x1, …, x4 respectively. It is sufficient to analyse the phase when the state is in the standard basis. Consider \({e}^{-i{a}_{0}Z{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}\) - this term contributes a phase of − a0t if x1 = 0 and a0t if x1 = 1. Similarly \({e}^{i{a}_{1}ZZ{\mathbb{I}}{\mathbb{I}}t}\) contributes a phase of a1t if x1x2 = 0 and vice versa. Thus we can have the following expression for the overall phase.

$$\begin{array}{rcl}\phi &=&\left(-{(-1)}^{{x}_{1}}{a}_{0}+{(-1)}^{{x}_{1}\oplus {x}_{2}}{a}_{1}+{(-1)}^{{x}_{1}\oplus {x}_{3}}{a}_{2}+{(-1)}^{{x}_{1}\oplus {x}_{4}}{a}_{3}+{(-1)}^{{x}_{1}\oplus {x}_{2}\oplus {x}_{3}}{a}_{4}+{(-1)}^{{x}_{1}\oplus {x}_{2}\oplus {x}_{4}}{a}_{5}\right.\\ &&\left.+\,{(-1)}^{{x}_{1}\oplus {x}_{3}\oplus {x}_{4}}{a}_{6}-{(-1)}^{{x}_{1}\oplus {x}_{2}\oplus {x}_{3}\oplus {x}_{4}}{a}_{7}\right)\end{array}$$
(21)

For different values of a0, …a7 we get different value of overall phase and different circuits. We consider the following three cases. It is easy to see that \({\phi }_{{x}_{1} = 1}=-{\phi }_{{x}_{1} = 0}\). So in all the cases below it is sufficient to calculate the phase while setting x1 = 0.

Case I

Let a1t = a6t = − θ and the remaining a0t = a2t = … = a5t = a7t = θ. Then we can verify that ϕ = 8θ if x1 = 1, x2 = 0, x3 = x4 = 1, ϕ = − 8θ if x1 = 0, x2 = 0, x3 = x4 = 1 and ϕ = 0 for the remaining values of x1, …, x4. Then the quantum circuit simulating eiHt is shown in Fig. 1a and b.

Fig. 1: Quantum circuit simulating eiHt.
figure 1

a Circuit when a1t = a6t = − θ and a0t = a2t = … = a5t = a7t = θ. (b) Circuit in a with the multi-controlled rotation implemented with Toffoli and controlled rotation. c Circuit when a0t = … = a7t = θ. d Circuit when the coefficients are as in Equation 18. e Circuit in d with the multi-controlled rotations in the boxed section implemented with Toffolis and controlled rotations. f Circuit in e with some further reduction of the intermediate Toffoli gates.

Case II

Let a0t = … = a7t = θ. If x2 = 1, x3x4 = 0 then ϕ = 0 and if x2 = 0, x3x4 = 0 then \(\phi ={(-1)}^{{x}_{3}}4\theta\). When x3x4 = 1 then \(\phi =-2\theta +{(-1)}^{{x}_{2}}2\theta\). This is equal to − 4θ if x2 = 1. The quantum circuit simulating eiHt is shown in Fig. 1c.

Case III

Let a0t = a7t = − h1 − h2 + h3, a1t = a6t = h1 − h2 + h3, a2t = a5t = − h1 − h2 − h3 and a3t = a4t = − h1 + h2 + h3, as shown in Equation (18). It can be verified that \({\phi }_{{x}_{2} = {x}_{3} = 1,{x}_{4} = 0}=8{h}_{2}\), \({\phi }_{{x}_{2} = 0,{x}_{3} = {x}_{4} = 1}=8{h}_{1}\) and \({\phi }_{{x}_{2} = {x}_{4} = 1,{x}_{3} = 0}=-8{h}_{3}\). For every other values of x2, x3, x4, ϕ = 0. The corresponding quantum circuit simulating eiHt has been shown in Fig. 1d, e, f.

We already remarked that we can ignore the product of Z terms in Equations (16) and (18) by using fermionic SWAP gates. Now if we take two Hamiltonians of the form (19) having some overlapping qubits, then we can get different Hamiltonians by rearranging the commuting Paulis. In the next few subsections we design circuits for the corresponding exponentials of these Hamiltonians. We must keep in mind that in the following subsections P0 = X and P1 = Y, \(\overline{i}=i+1\,{{\mathrm{mod}}}\,\,2\). Table 2 summarizes the number of non-Clifford gates used to implement the various circuits. All rotation gates with n ( >1) controls, can be decomposed into cRz (single control) and NOT with n controls, each of which can be decomposed into n − 1 Toffolis (as shown in Fig. 1b). We have discussed in ‘Introduction’ about special gadgets that can be used to further reduce the T-count of the circuits. In Fig. 1e, 1f we show how Toffolis can be reduced in segments of the circuits. Our circuits have less gates (even the Clifford gates), compared to50 or the approaches where we synthesize circuit for each exponentiated Pauli and then concatenate them. In fact, we show the dependence of the circuit size or Clifford and non-Clifford gate cost on the coefficients of the commuting Paulis in the Hamiltonian expression.

Table 2 The first table shows the number of Rz, cRz and Toffoli (Toff.) pairs used to design the circuits implementing \({e}^{-i{H}^{{\prime} }t}\), where \({H}^{{\prime} }\) are the Hamiltonians (Ham.) described in Section ‘Results’.

Overlap on 1 qubit

Previously, we provided an analysis of the circuits for cases where many of the Hamiltonian coefficients are chosen to follow regular patterns and see that the costs of the simulation can be reduced through the use of these techniques. Here we provide a more aggressive strategy wherein we combine multiple commuting terms together and find particular combinations of angles such that the simulation circuits are efficient. The results are summarized in Table 2. We consider the case when there is overlap on 1 qubit. We can have the following sets of commuting Paulis.

$${G}_{1y}=\{{P}_{i}{P}_{j}{P}_{k}Y{\mathbb{I}}{\mathbb{I}}{\mathbb{I}},{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}Y{P}_{a}{P}_{b}{P}_{c}:i+j+k\equiv 1\,{{\mathrm{mod}}}\,\,2,a+b+c\equiv 1\,{{\mathrm{mod}}}\,\,2\}$$
(22)
$${G}_{1x}=\{{P}_{i}{P}_{j}{P}_{k}X{\mathbb{I}}{\mathbb{I}}{\mathbb{I}},{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}X{P}_{a}{P}_{b}{P}_{c}:i+j+k\equiv 0\,{{\mathrm{mod}}}\,\,2,a+b+c\equiv 0\,{{\mathrm{mod}}}\,\,2\}$$
(23)

Without loss of generality, we assume that the leftmost operator acts on qubit q1, next one on q2 and so on - rightmost one acts on qubit q7. We denote a state vector as \(\left\vert {Q}_{1}v{Q}_{2}\right\rangle\) where \({Q}_{1}=\left\vert {q}_{1}{q}_{2}{q}_{3}\right\rangle\), \({Q}_{2}=\left\vert {q}_{5}{q}_{6}{q}_{7}\right\rangle\) and v, q1, …, q7 {0, 1}. We can have the following Hamiltonian terms, expressed as sums of commuting Paulis from the above two sets.

$$\begin{array}{rcl}{H}_{1y}&=&{a}_{3}YXXY{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}+{a}_{5}XYXY{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}+{a}_{6}XXYY{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}+{a}_{7}YYYY{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}\\ &&+{b}_{1}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}YYXX+{b}_{2}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}YXYX+{b}_{3}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}YXXY+{b}_{7}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}YYYY\end{array}$$
(24)
$$\begin{array}{rcl}{H}_{1x}&=&{a}_{0}XXXX{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}+{a}_{1}YYXX{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}+{a}_{2}YXYX{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}+{a}_{4}XYYX{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}\\ &&+{b}_{0}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}XXXX+{b}_{4}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}XYYX+{b}_{5}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}XYXY+{b}_{6}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}XXYY\end{array}$$
(25)

Circuit for simulating \({e}^{-i{H}_{1y}t}\)

Let W1y be the unitary consisting of the following sequence of gates. The rightmost one is the first to be applied. With a slight abuse of notation we denote CNOT(4, 1)CNOT(4, 2)CNOT(4, 3) by CNOT(4, I) and CNOT(4, 5)CNOT(4, 6)CNOT(4, 7) by CNOT(4, II).

$${W}_{1y}=CNO{T}_{(4,I)}{H}_{(4)}{Z}_{(4)}CNO{T}_{(4,I)}CNO{T}_{(4,II)}{H}_{(4)}CNO{T}_{(4,I)}$$

In the following theorem we show that this is a diagonalizing circuit for the set of Paulis in G1y.

Theorem 2.1

For each \(i,j,k,l,a,b,c\in {{\mathbb{Z}}}_{2}\), such that \({P}_{i}{P}_{j}{P}_{k}Y{\mathbb{I}}{\mathbb{I}}{\mathbb{I}},{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}Y{P}_{a}{P}_{b}{P}_{c}\in {G}_{1y}\) we have the following.

$$\begin{array}{rcl}&&{\sqrt{-1}}^{i+j+k+1}{W}_{1y}\left({Z}_{(1)}^{i}{Z}_{(2)}^{j}{Z}_{(3)}^{k}{Z}_{(4)}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}\right){W}_{1y}^{{\dagger} }={P}_{i}{P}_{j}{P}_{k}Y{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}\\ \,{{\mbox{and}}}\,&&{\sqrt{-1}}^{a+b+c+1}{W}_{1y}\left({\mathbb{I}}{\mathbb{I}}{\mathbb{I}}{Z}_{(4)}{Z}_{(5)}^{i}{Z}_{(6)}^{j}{Z}_{(7)}^{k}\right){W}_{1y}^{{\dagger} }={\mathbb{I}}{\mathbb{I}}{\mathbb{I}}Y{P}_{a}{P}_{b}{P}_{c}\end{array}$$

We prove this theorem by showing that the operators on the LHS and RHS have equivalent actions on the eigenstates corresponding to an eigenbasis for the Paulis in G1y. The proof of this theorem has been given in Theorem 1 of Supplementary Method 1. The eigenbasis has been shown in Lemma 1 of Supplementary Method 1.

Thus we have the following.

$$\begin{array}{rcl}{e}^{-i{H}_{1y}t}&=&{e}^{-i(-{a}_{3}{W}_{1y}(Z{\mathbb{I}}{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}){W}_{1y}^{{\dagger} }-{a}_{5}{W}_{1y}({\mathbb{I}}Z{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}){W}_{1y}^{{\dagger} }-{a}_{6}{W}_{1y}({\mathbb{I}}{\mathbb{I}}ZZ{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}){W}_{1y}^{{\dagger} }+{a}_{7}{W}_{1y}(ZZZZ{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}){W}_{1y}^{{\dagger} })t}\\ &&\cdot {e}^{-i(-{b}_{1}{W}_{1y}({\mathbb{I}}{\mathbb{I}}{\mathbb{I}}ZZ{\mathbb{I}}{\mathbb{I}}){W}_{1y}^{{\dagger} }-{b}_{2}{W}_{1y}({\mathbb{I}}{\mathbb{I}}{\mathbb{I}}Z{\mathbb{I}}Z{\mathbb{I}}){W}_{1y}^{{\dagger} }-{b}_{3}{W}_{1y}({\mathbb{I}}{\mathbb{I}}{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}Z){W}_{1y}^{{\dagger} }+{b}_{7}{W}_{1y}({\mathbb{I}}{\mathbb{I}}{\mathbb{I}}ZZZZ){W}_{1y}^{{\dagger} })t}\\ &=&{W}_{1y}{e}^{i{a}_{3}Z{\mathbb{I}}{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{e}^{i{a}_{5}{\mathbb{I}}Z{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{e}^{i{a}_{6}{\mathbb{I}}{\mathbb{I}}ZZ{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{e}^{-i{a}_{7}ZZZZ{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{e}^{i{b}_{1}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}ZZ{\mathbb{I}}{\mathbb{I}}t}{e}^{i{b}_{2}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}Z{\mathbb{I}}Z{\mathbb{I}}t}{e}^{i{b}_{3}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}Zt}{e}^{-i{b}_{7}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}ZZZZt}{W}_{1y}^{{\dagger} }\end{array}$$

The state of the qubits q1, …, q7 after the application of W1y is denoted by variables x1, …, x7 respectively. We have the following expression for the overall phase incurred between W1y and \({W}_{1y}^{{\dagger} }\).

$$\begin{array}{rcl}\phi ={(-1)}^{{x}_{4}\oplus {x}_{1}}{a}_{3}t+{(-1)}^{{x}_{4}\oplus {x}_{2}}{a}_{5}t+{(-1)}^{{x}_{4}\oplus {x}_{3}}{a}_{6}t-{(-1)}^{{x}_{4}\oplus {x}_{1}\oplus {x}_{2}\oplus {x}_{3}}{a}_{7}t\\ +{(-1)}^{{x}_{4}\oplus {x}_{7}}{b}_{3}t+{(-1)}^{{x}_{4}\oplus {x}_{6}}{b}_{2}t+{(-1)}^{{x}_{4}\oplus {x}_{5}}{b}_{1}t-{(-1)}^{{x}_{4}\oplus {x}_{5}\oplus {x}_{6}\oplus {x}_{7}}{b}_{7}t\end{array}$$

It is easy to check that \({\phi }_{\overline{{x}_{4}}}=-{\phi }_{{x}_{4}}\). We consider the following three cases and it is sufficient to check the phase values when x4 = 0.

Case I

Let a6t = − θ1, b1t = − θ2, a3t = a5t = a7t = θ1 and b2t = b3t = b7t = θ2. Following the conventions and explanations given for Case I we have the following overall phase after the application of W1y. We can write ϕ = f1(θ1) + f2(θ), for two functions f1 and f2. The following can be verified.

  1. 1.

    If x1 = x2 = 0 and x3 = 1 then ϕ = 4θ1 + f2(θ2). Analogously, if ϕ = f1(θ1) + 4θ2 when x7 = x6 = 0, x5 = 1.

  2. 2.

    If x1 = x2 = 1 and x3 = 0 then ϕ = − 4θ1 + f2(θ2) and if x7 = x6 = 1, x5 = 0 then ϕ = f1(θ1) − 4θ2.

  3. 3.

    For any other values of x1, x2, x3, ϕ = f2(θ2) and analogously, for any other values of x7, x6, x5, ϕ=f1(θ1).

A quantum circuit simulating \({e}^{-i{H}_{1y}t}\) is shown in Fig. 2a.

Fig. 2: Quantum circuit for \({e}^{-i{H}_{1y}t}\) and \({e}^{-i{H}_{1x1}t}\).
figure 2

ac: Circuit simulating \({e}^{-i{H}_{1y}t}\) a when a6t = − θ1, b1t = − θ2, a3t = a5t = a7t = θ1 and b2t = b3t = b7t = θ2; b when a0t = … = a7t = θ1 and b0t = … = b7t = θ2; c when the coefficients are as in Equation 18. df: Circuit simulating \({e}^{-i{H}_{1x1}t}\) d when a1t = − θ1, b6t = − θ2, a0t = a2t = a3t = θ1 and b0t = b4t = b5t = θ2; e when a0t = … = a7t = θ1 and b0t = … = b7t = θ2; f when the coefficients are as in Equation 18.

Case II

Now we consider the case when a6t = a3t = a5t = a7t = θ1 and b1t = b2t = b3t = b7t = θ2. Here also ϕ can be written as sum of two functions : ϕ = f1(θ1) + f2(θ2). We can make the following observations.

  1. 1.

    If only one of x1, x2, x3 is 1 then ϕ = 2θ1 + f2(θ2) and analogously, if any one of x5, x6, x7 is 1 then ϕ = f1(θ1) + 2θ2.

  2. 2.

    If any two of x1, x2, x3 is 1 then ϕ = − 2θ1 + f2(θ2). Similarly, if any two of x5, x6, x7 is 1 then ϕ = f1(θ1) − 2θ2.

  3. 3.

    If x1 = x2 = x3 = 0 then ϕ = 2θ1 + f2(θ2) and similarly, if x5 = x6 = x7 = 0 then ϕ = f1(θ1) + 2θ2.

  4. 4.

    If x1 = x2 = x3 = 1 then ϕ = − 2θ1 + f2(θ2) and analogously, if x5 = x6 = x7 = 1 then ϕ = f1(θ1) − 2θ2.

A circuit simulating \({e}^{-i{H}_{1y}t}\) in this case, has been shown in Fig. 2b.

Case III

Let a3t = − h1 + h2 + h3, a5t = − h1 − h2 − h3, a6t = h1 − h2 + h3, a7t = − h1 − h2 + h3 and b3t = − g1 + g2 + g3, b2t = − g1 − g2 − g3, b1t = g1 − g2 + g3, b7t = − g1 − g2 + g3(Equation (18)). Let h = (h1, h2, h3) and g = (g1, g2, g3). We can write ϕ = f1(h) + f2(g). We can make the following observations.

  1. 1.

    If x1 = x2 = x3 then ϕ = f2((g)) and analogously, if x5 = x6 = x7 then ϕ = f1(h).

  2. 2.

    Suppose xi = xj and xk ≠ xi, where i, j, k {1, 2, 3} and i ≠ j ≠ k. Then flipping the values changes the sign. For example, if \({\phi }_{{x}_{1} = {x}_{2} = 0,{x}_{3} = 1}={f}_{1}({{{\bf{h}}}})+{f}_{2}({{{\bf{g}}}})\), then \({\phi }_{{x}_{1} = {x}_{2} = 1,{x}_{3} = 0}=-{f}_{1}({{{\bf{h}}}})+{f}_{2}({{{\bf{g}}}})\). Similar phenomenon occurs if i, j, k {7, 6, 5}, except this time sign of f2(g) flips.. So it is sufficient to consider the case when two variables are 1.

    $$\begin{array}{rcl}{\phi }_{{x}_{1} = {x}_{2} = 1,{x}_{3} = 0}=4{h}_{1}+{f}_{2}({{{\bf{g}}}}),&&{\phi }_{{x}_{7} = {x}_{6} = 1,{x}_{5} = 0}={f}_{1}({{{\bf{h}}}})+4{g}_{1}\\ {\phi }_{{x}_{2} = {x}_{3} = 1,{x}_{1} = 0}=4{h}_{2}+{f}_{2}({{{\bf{g}}}}),&&{\phi }_{{x}_{6} = {x}_{5} = 1,{x}_{7} = 0}={f}_{1}({{{\bf{h}}}})+4{g}_{2}\\ {\phi }_{{x}_{3} = {x}_{1} = 1,{x}_{2} = 0}=-4{h}_{3}+{f}_{2}({{{\bf{g}}}}),&&{\phi }_{{x}_{5} = {x}_{7} = 1,{x}_{6} = 1}={f}_{1}({{{\bf{h}}}})-4{g}_{3}\end{array}$$

A circuit simulating \({e}^{-i{H}_{1y}t}\) in this case, has been shown in Fig. 2c.

Circuit for simulating \({e}^{-i{H}_{1x}t}\)

An eigenbasis for the Paulis in G1x has been given in Lemma 2 of Supplementary Method 1. But we are unable to find out (by hand) a unitary (analogous to W1y) that diagonalizes the set of commuting Paulis in G1x, as we did in the previous subsection for G1y. So we divide the commuting Paulis into two groups of 4-qubit Paulis, i.e. we consider the following two sets.

$$\begin{array}{ll}{G}_{1x1}\,=\,\{{P}_{i}{P}_{j}{P}_{k}X{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}:i+j+k\equiv 0\,{{\mathrm{mod}}}\,\,2.\}\\ {G}_{1x2}\,=\,\{{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}X{P}_{a}{P}_{b}{P}_{c}:a+b+c\equiv 0\,{{\mathrm{mod}}}\,\,2.\}\end{array}$$

and the following two Hamiltonians

$$\begin{array}{ll}{H}_{1x1}\,=\,{a}_{0}XXXX{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}+{a}_{1}YYXX{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}+{a}_{2}YXYX{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}+{a}_{4}XYYX{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}\\ {H}_{1x2}\,=\,{b}_{0}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}XXXX+{b}_{4}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}XYYX+{b}_{5}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}XYXY+{b}_{6}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}XXYY\end{array}$$

We can use the diagonalizing circuit of50 and have the following.

$$\begin{array}{ll}{e}^{-i{H}_{1x1}t}\,=\,{W}_{1x1}{e}^{-i{a}_{0}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{e}^{i{a}_{1}ZZ{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{e}^{i{a}_{2}Z{\mathbb{I}}ZZ{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{e}^{i{a}_{4}{\mathbb{I}}ZZZ{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{W}_{1x1}^{{\dagger} }\\ {e}^{-i{H}_{1x2}t}\,=\,{W}_{1x2}{e}^{-i{b}_{0}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{e}^{i{b}_{4}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}ZZZ{\mathbb{I}}t}{e}^{i{b}_{5}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}ZZ{\mathbb{I}}Zt}{e}^{i{b}_{6}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}Z{\mathbb{I}}ZZt}{W}_{1x2}^{{\dagger} }\end{array}$$

where W1x1 = CNOT(4, 1)CNOT(4, 2)CNOT(4, 3)H(4) and W1x2 = CNOT(4, 5)CNOT(4, 6)CNOT(4, 7)H(4), where the rightmost gate is the first one to be applied. We denote the state of the qubits q1, …, q4 after the application of W1x1 by the variables x1, …, x4 respectively. Also, the variables \({x}_{4}^{{\prime} },\ldots ,{x}_{7}^{{\prime} }\) denote the state of the qubits q4, …, q7, respectively, after the application of W1x2. We have the following expression for the overall phase incurred between W1x1, \({W}_{1x1}^{{\dagger} }\) and between W1x2, \({W}_{1x2}^{{\dagger} }\).

$$\begin{array}{ll}{\phi }_{1}\,=\,-{(-1)}^{{x}_{4}}{a}_{0}t+{(-1)}^{{x}_{4}\oplus {x}_{1}\oplus {x}_{2}}{a}_{1}t+{(-1)}^{{x}_{4}\oplus {x}_{1}\oplus {x}_{3}}{a}_{2}t+{(-1)}^{{x}_{4}\oplus {x}_{2}\oplus {x}_{3}}{a}_{4}t\\ {\phi }_{2}\,=\,-{(-1)}^{{x}_{4}^{{\prime} }}{b}_{0}t+{(-1)}^{{x}_{4}^{{\prime} }\oplus {x}_{5}^{{\prime} }\oplus {x}_{6}^{{\prime} }}{b}_{4}t+{(-1)}^{{x}_{4}^{{\prime} }\oplus {x}_{5}^{{\prime} }\oplus {x}_{7}^{{\prime} }}{b}_{5}t+{(-1)}^{{x}_{4}^{{\prime} }\oplus {x}_{6}^{{\prime} }\oplus {x}_{7}^{{\prime} }}{b}_{6}t\end{array}$$

We consider the following three cases, in each of which \({\phi }_{1,\overline{{x}_{4}}}=-{\phi }_{1,{x}_{4}}\) and \({\phi }_{2,\overline{{x}_{4}^{{\prime} }}}={\phi }_{2,{x}_{4}^{{\prime} }}\).

Case I

Let a1t = − θ1, b6t = − θ2, a0t = a2t = a4t = θ1 and b0t = b4t = b5t = θ2. It is easy to verify that a non-zero phase ϕ1 = − 4θ1 exists if and only if x1 = x2 ≠ x3. Analogously, ϕ2 = − 4θ2 if \({x}_{7}^{{\prime} }={x}_{6}^{{\prime} }\ne {x}_{5}^{{\prime} }\), else it is 0.

Case II

Now we consider the case when a0t = a1t = a2t = a4t = θ1, b0t = b4t = b5t = b6t = θ2. If x1 = x2 = x3 then ϕ1 = 2θ1, else it is − 2θ1. Similarly for ϕ2.

Case III

Next we consider the case where a0t = − h1 − h2 + h3, a1t = h1 − h2 + h3, a2t = − h1 − h2 − h3, a4t = − h1 + h2 + h3, and b0t = − g1 − g2 + g3, b4t = − g1 + g2 + g3, b5t = − g1 − g2 − g3, b6t = g1 − g2 + g3. Here, non-zero phase exists if any two of the variable have same value.

$$\begin{array}{rcl}{\phi }_{1}({x}_{1}={x}_{2}\,\ne\, {x}_{3})=4{h}_{1};&&{\phi }_{1}({x}_{2}={x}_{3}\,\ne\, {x}_{1})=4{h}_{2};\quad {\phi }_{1}({x}_{1}={x}_{3}\,\ne\, {x}_{2})=-4{h}_{3};\\ {\phi }_{2}({x}_{5}^{{\prime} }={x}_{6}^{{\prime} }\,\ne\, {x}_{7}^{{\prime} })=4{g}_{2};&&{\phi }_{2}({x}_{6}^{{\prime} }={x}_{7}^{{\prime} }\,\ne\, {x}_{5}^{{\prime} })=4{g}_{1};\quad {\phi }_{2}({x}_{5}^{{\prime} }={x}_{7}^{{\prime} }\,\ne\, {x}_{6}^{{\prime} })=-4{g}_{3};\end{array}$$

Circuits simulating \({e}^{-i{H}_{1x1}t}\) in Case I, II and III have been shown in Fig. 2d, e and f respectively. Circuits for \({e}^{-i{H}_{1x2}t}\) are similar. Circuit for \({e}^{-i{H}_{1x}t}\) in each case is obtained by concatenating the corresponding circuits.

Overlap on 2 qubits

In general, the more options that we have for grouping mutually commuting terms the more effective our compilation strategy will be. While the most natural case to examine is the case where all of the Hamiltonian terms act on disjoint sets of qubits, Hamiltonian terms can commute if they overlap on only two qubits as well. For example, we can have the following sets of commuting Pauli operations

$${G}_{21}=\{{P}_{k}{P}_{l}{P}_{i}{P}_{j}{\mathbb{I}}{\mathbb{I}},{\mathbb{I}}{\mathbb{I}}{P}_{i}{P}_{j}{P}_{k}{P}_{l}:i+j\equiv 1\,{{\mathrm{mod}}}\,\,2,k,l=i,j\,{{\mbox{or}}}\,\overline{i},\overline{j}\,\,{{\mbox{respectively}}}\,\}$$
(26)
$${G}_{20}=\{{P}_{k}{P}_{l}{P}_{i}{P}_{j}{\mathbb{I}}{\mathbb{I}},{\mathbb{I}}{\mathbb{I}}{P}_{i}{P}_{j}{P}_{k}{P}_{l}:i+j\equiv 0\,{{\mathrm{mod}}}\,\,2,k,l=i,j\,{{\mbox{or}}}\,\overline{i},\overline{j}\,\,{{\mbox{respectively}}}\,\}$$
(27)

Without loss of generality, we assume that the leftmost operator acts on qubit q1, next one on q2 and so on - rightmost one acts on qubit q6. We denote a state vector as \(\left\vert {Q}_{1}{Q}_{2}{Q}_{3}\right\rangle\) where \({Q}_{1}=\left\vert {q}_{1}{q}_{2}\right\rangle\), \({Q}_{2}=\left\vert {q}_{3}{q}_{4}\right\rangle\) and \({Q}_{3}=\left\vert {q}_{5}{q}_{6}\right\rangle\) are the first, second and third pairs of qubits respectively. We can have the following Hamiltonian terms, expressed as sums of commuting Paulis from the above two sets.

$$\begin{array}{ll}{H}_{21}\,=\,{a}_{2}YXYX{\mathbb{I}}{\mathbb{I}}+{a}_{3}YXXY{\mathbb{I}}{\mathbb{I}}+{a}_{4}XYYX{\mathbb{I}}{\mathbb{I}}+{a}_{5}XYXY{\mathbb{I}}{\mathbb{I}}\\ \qquad\quad+\,{b}_{2}{\mathbb{I}}{\mathbb{I}}YXYX+{b}_{3}{\mathbb{I}}{\mathbb{I}}XYYX+{b}_{4}{\mathbb{I}}{\mathbb{I}}YXXY+{b}_{5}{\mathbb{I}}{\mathbb{I}}XYXY\end{array}$$
(28)
$$\begin{array}{ll}{H}_{20}\,=\,{a}_{0}XXXX{\mathbb{I}}{\mathbb{I}}+{a}_{1}YYXX{\mathbb{I}}{\mathbb{I}}+{a}_{6}XXYY{\mathbb{I}}{\mathbb{I}}+{a}_{7}YYYY{\mathbb{I}}{\mathbb{I}}\\ \qquad\quad+\,{b}_{0}{\mathbb{I}}{\mathbb{I}}XXXX+{b}_{6}{\mathbb{I}}{\mathbb{I}}YYXX+{b}_{1}{\mathbb{I}}{\mathbb{I}}XXYY+{b}_{7}{\mathbb{I}}{\mathbb{I}}YYYY\end{array}$$
(29)

Circuit for simulating \({e}^{-i{H}_{21}t}\)

As before our simulation strategy involves diagonalizing the Hamiltonian using a Clifford circuit and then build Let W1 be the unitary consisting of the following sequence of gates. The rightmost one is the first to be applied.

$${W}_{1}=CNO{T}_{(3,1)}CNO{T}_{(3,4)}{H}_{(3)}{Z}_{(3)}CNO{T}_{(3,1)}CNO{T}_{(3,5)}{H}_{(3)}CNO{T}_{(3,1)}$$

The following theorem shows that this is a diagonalizing circuit for the set of Paulis in G21.

Theorem 2.2

For each \(i,j,k,l\in {{\mathbb{Z}}}_{2}\), such that \({P}_{k}{P}_{l}{P}_{i}{P}_{j}{\mathbb{I}}{\mathbb{I}},{\mathbb{I}}{\mathbb{I}}{P}_{i}{P}_{j}{P}_{k}{P}_{l}\in {G}_{21}\) we have the following.

$$\begin{array}{rcl}&&{\sqrt{-1}}^{i+j+k+l}{W}_{1}\left({Z}_{(1)}^{k}{Z}_{(2)}^{l}{Z}_{(3)}{Z}_{(4)}^{j}{\mathbb{I}}{\mathbb{I}}\right){W}_{1}^{{\dagger} }={P}_{k}{P}_{l}{P}_{i}{P}_{j}{\mathbb{I}}{\mathbb{I}}\\ \,{{\mbox{and}}}\,&&{\sqrt{-1}}^{i+j+k+l}{W}_{1}\left({\mathbb{I}}{\mathbb{I}}{Z}_{(3)}{Z}_{(4)}^{j}{Z}_{(5)}^{k}{Z}_{(6)}^{l}\right){W}_{1}^{{\dagger} }={\mathbb{I}}{\mathbb{I}}{P}_{i}{P}_{j}{P}_{k}{P}_{l}\end{array}$$

The proof is similar to Theorem 2.1 and has been given in Supplementary Method 2. Theorem 2.2 then gives us the following.

$$\begin{array}{ll}{e}^{-i{H}_{21}t}\,=\,{e}^{-i(-{a}_{2}{W}_{1}(Z{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}){W}_{1}^{{\dagger} }-{a}_{3}{W}_{1}(Z{\mathbb{I}}ZZ{\mathbb{I}}{\mathbb{I}}){W}_{1}^{{\dagger} }-{a}_{4}{W}_{1}({\mathbb{I}}ZZ{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}){W}_{1}^{{\dagger} }-{a}_{5}{W}_{1}({\mathbb{I}}ZZZ{\mathbb{I}}{\mathbb{I}}){W}_{1}^{{\dagger} })t}\\ \qquad\qquad\,\cdot {e}^{-i(-{b}_{2}{W}_{1}({\mathbb{I}}{\mathbb{I}}Z{\mathbb{I}}Z{\mathbb{I}}){W}_{1}^{{\dagger} }-{b}_{3}{W}_{1}({\mathbb{I}}{\mathbb{I}}ZZZ{\mathbb{I}}){W}_{1}^{{\dagger} }-{b}_{4}{W}_{1}({\mathbb{I}}{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}Z){W}_{1}^{{\dagger} }-{b}_{5}{W}_{1}({\mathbb{I}}{\mathbb{I}}ZZ{\mathbb{I}}Z){W}_{1}^{{\dagger} })t}\\ \qquad\quad=\,{W}_{1}{e}^{i{a}_{2}Z{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{e}^{{a}_{3}Z{\mathbb{I}}ZZ{\mathbb{I}}{\mathbb{I}}t}{e}^{{a}_{4}{\mathbb{I}}ZZ{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{e}^{{a}_{5}{\mathbb{I}}ZZZ{\mathbb{I}}{\mathbb{I}}t}{e}^{i{b}_{2}{\mathbb{I}}{\mathbb{I}}Z{\mathbb{I}}Z{\mathbb{I}}t}{e}^{{b}_{3}{\mathbb{I}}{\mathbb{I}}ZZZ{\mathbb{I}}t}{e}^{{b}_{4}{\mathbb{I}}{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}Zt}{e}^{{b}_{5}{\mathbb{I}}{\mathbb{I}}ZZ{\mathbb{I}}Zt}{W}_{1}^{{\dagger} }\end{array}$$

We denote the state of the qubits q1, …, q6 after the application of W1 by the variables x1, …, x6 respectively. We have the following expression for the overall phase incurred between W1 and \({W}_{1}^{{\dagger} }\).

$$\begin{array}{ll}\phi \,=\,{(-1)}^{{x}_{3}\oplus {x}_{1}}{a}_{2}t+{(-1)}^{{x}_{3}\oplus {x}_{2}}{a}_{4}t+{(-1)}^{{x}_{3}\oplus {x}_{4}\oplus {x}_{1}}{a}_{3}t+{(-1)}^{{x}_{3}\oplus {x}_{4}\oplus {x}_{2}}{a}_{5}t\\ \qquad\,+\,{(-1)}^{{x}_{3}\oplus {x}_{5}}{b}_{2}t+{(-1)}^{{x}_{3}\oplus {x}_{6}}{b}_{4}t+{(-1)}^{{x}_{3}\oplus {x}_{4}\oplus {x}_{5}}{b}_{3}t+{(-1)}^{{x}_{3}\oplus {x}_{4}\oplus {x}_{6}}{b}_{5}t\end{array}$$

It is easy to check that \({\phi }_{\overline{{x}_{3}}}=-{\phi }_{{x}_{3}}\). We consider the following cases and it is sufficient to check the phase values when x3 = 0.

Case I (II)

We consider the case when a2t = a3t = a4t = a5t = θ1, b2t = b3t = b4t = b5t = θ2. There are no a1, a6, b1, b6 in the expression of the Hamiltonian. So, for consistency with the previous and following subsection, we can consider this as either Case I or II.

We can write ϕ = f1(θ1) + f2(θ2). We can verify that when q1q2 = q4 = 0 then \(\phi ={(-1)}^{{q}_{1}}4{\theta }_{1}+{f}_{2}({\theta }_{2})\) and analogously, when q5q6 = q4 = 0 then \(\phi ={f}_{1}({\theta }_{1})+{(-1)}^{{q}_{5}}4{\theta }_{2}\). For all other values of q1, q2, q4, ϕ = f2(θ2) and for all other values of q5, q6, q4, ϕ = f1(θ1). A quantum circuit for simulating \({e}^{-i{H}_{21}t}\) in this case has been shown in Fig. 3a.

Fig. 3: Quantum circuit for \({e}^{-i{H}_{21}t}\) and \({e}^{-i{H}_{201}t}\).
figure 3

(a, b): Circuit simulating \({e}^{-i{H}_{21}t}\) (a) when a2t = a3t = a4t = a5t = θ1 and b2t = b3t = b4t = b5t = θ2; (b) when the coefficients are as in Equation 18. (ce) Circuit simulating \({e}^{-i{H}_{201}t}\) (c) when a1t = a6t = − θ1, b1t = b6t = − θ2, a0t = a7t = θ1 and b0t = b7t = θ2; (d) when a0t = … = a7t = θ1 and b0t = … = b7t = θ2; (e) when the coefficients are as in Equation 18.

Case III

Let a2t = a5t = − h1 − h2 − h3 and a3t = a4t = − h1 + h2 + h3, b2t = b5t = − g1 − g2 − g3 and b3t = b4t = − g1 + g2 + g3. We can write ϕ = f1(h) + f2(g), where h = (h1, h2, h3) and g = (g1, g2, g3). When q1q2 = q4 = 0, then \(\phi =-{(-1)}^{{q}_{1}}4{h}_{1}+{f}_{2}({{{\bf{g}}}})\) and when q1q2 = q4 = 1, then \(\phi =-{(-1)}^{{q}_{1}}4({h}_{1}+{h}_{3})+{f}_{2}({{{\bf{g}}}})\). For every other values of q1, q2, q4, ϕ = f2(g). Similarly, when q5q6 = q4 = 0, then \(\phi ={f}_{1}({{{\bf{h}}}})-{(-1)}^{{q}_{5}}4{g}_{1}\) and when q5q6 = q4 = 1, then \(\phi ={f}_{1}({{{\bf{h}}}})-{(-1)}^{{q}_{5}}4({g}_{1}+{g}_{3})\). For every other values of q5, q6, q4, ϕ = f1(h). A quantum circuit simulating \({e}^{-i{H}_{21}t}\) in this case has been shown in Fig. 3b.

Circuit for simulating \({e}^{-i{H}_{20}t}\)

An eigenbasis for the Paulis in G20 has been shown in Lemma 4 of Supplementary Method 2. But since we have been unable to derive a diagonalizing circuit, so we divide the commuting Paulis into two groups of 4-qubit Paulis as follows.

$$\begin{array}{ll}{G}_{201}\,=\,\{{P}_{k}{P}_{l}{P}_{i}{P}_{j}{\mathbb{I}}{\mathbb{I}}:i+j,k+l\equiv 0\,{{\mathrm{mod}}}\,\,2.\}\\ {G}_{202}\,=\,\{{\mathbb{I}}{\mathbb{I}}{P}_{i}{P}_{j}{P}_{k}{P}_{l}:i+j,k+l\equiv 0\,{{\mathrm{mod}}}\,\,2.\}\end{array}$$

We get the following two Hamiltonians.

$$\begin{array}{ll}{H}_{201}\,=\,{a}_{0}XXXX{\mathbb{I}}{\mathbb{I}}+{a}_{1}YYXX{\mathbb{I}}{\mathbb{I}}+{a}_{6}XXYY{\mathbb{I}}{\mathbb{I}}+{a}_{7}YYYY{\mathbb{I}}{\mathbb{I}}\\ {H}_{202}\,=\,{b}_{0}{\mathbb{I}}{\mathbb{I}}XXXX+{b}_{6}{\mathbb{I}}{\mathbb{I}}YYXX+{b}_{1}{\mathbb{I}}{\mathbb{I}}XXYY+{b}_{7}{\mathbb{I}}{\mathbb{I}}YYYY\end{array}$$

Using the diagonalizing circuit of50 and have the following.

$$\begin{array}{ll}{e}^{-i{H}_{201}t}\,=\,{W}_{01}{e}^{-i{a}_{0}{\mathbb{I}}{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{e}^{i{a}_{1}ZZZ{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{e}^{i{a}_{6}{\mathbb{I}}{\mathbb{I}}ZZ{\mathbb{I}}{\mathbb{I}}t}{e}^{-i{a}_{7}ZZZZ{\mathbb{I}}{\mathbb{I}}t}{W}_{01}^{{\dagger} }\\ {e}^{-i{H}_{202}t}\,=\,{W}_{02}{e}^{-i{b}_{0}{\mathbb{I}}{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{e}^{i{b}_{6}{\mathbb{I}}{\mathbb{I}}ZZ{\mathbb{I}}{\mathbb{I}}t}{e}^{i{b}_{1}{\mathbb{I}}{\mathbb{I}}Z{\mathbb{I}}ZZt}{e}^{-i{b}_{7}{\mathbb{I}}{\mathbb{I}}ZZZZt}{W}_{02}^{{\dagger} }\end{array}$$

where W01 = CNOT(3, 1)CNOT(3, 2)CNOT(3, 4)H(3) and W02 = CNOT(3, 4)CNOT(3, 5)CNOT(3, 6)H(3), where the rightmost gate is the first one to be applied. We denote the state of the qubits q1, …, q4 and q3, …, q6 after the application of W01 and W02 by the variables x1, …, x4 and \({x}_{3}^{{\prime} },\ldots ,{x}_{6}^{{\prime} }\) respectively. We have the following expression for the overall phase incurred between W01, \({W}_{01}^{{\dagger} }\) and between W02, \({W}_{02}^{{\dagger} }\).

$$\begin{array}{ll}{\phi }_{1}\,=\,-{(-1)}^{{x}_{3}}{a}_{0}t+{(-1)}^{{x}_{3}\oplus {x}_{1}\oplus {x}_{2}}{a}_{1}t+{(-1)}^{{x}_{3}\oplus {x}_{4}}{a}_{6}t-{(-1)}^{{x}_{3}\oplus {x}_{4}\oplus {x}_{1}\oplus {x}_{2}}{a}_{7}t\\ {\phi }_{2}\,=\,-{(-1)}^{{x}_{3}^{{\prime} }}{b}_{0}t+{(-1)}^{{x}_{3}^{{\prime} }\oplus {x}_{4}^{{\prime} }}{b}_{6}t+{(-1)}^{{x}_{3}^{{\prime} }\oplus {x}_{5}^{{\prime} }\oplus {x}_{6}^{{\prime} }}{b}_{1}t-{(-1)}^{{x}_{3}^{{\prime} }\oplus {x}_{4}^{{\prime} }\oplus {x}_{5}^{{\prime} }\oplus {x}_{6}^{{\prime} }}{b}_{7}t\end{array}$$

In all the cases considered below it is easy to verify that \({\phi }_{1,\overline{{x}_{3}}}=-{\phi }_{1,{x}_{3}}\) and \({\phi }_{2,\overline{{x}_{3}^{{\prime} }}}=-{\phi }_{2,{x}_{3}^{{\prime} }}\). So it is enough to consider \({x}_{3}={x}_{3}^{{\prime} }=0\).

Case I

Assume a1t = a6t = − θ1, b1t = b6t = − θ2, a0t = a7t = θ1 and b0t = b7t = θ2. If x1x2 = x4 = 0 then ϕ1 = − 4θ1, else it is 0. Similar conclusions follow for ϕ2 if we replace x1, x2, x3, x4 by \({x}_{6}^{{\prime} },{x}_{5}^{{\prime} },{x}_{3}^{{\prime} },{x}_{4}^{{\prime} }\) respectively.

Case II

Let a0t = a1t = a6t = a7t = θ1, b0t = b1t = b6t = b7t = θ2. If x1x2 = x4 = 1 then ϕ1 = − 4θ1, else it is 0. Similar conclusions follow for ϕ2 if we replace x1, x2, x3, x4 by \({x}_{6}^{{\prime} },{x}_{5}^{{\prime} },{x}_{3}^{{\prime} },{x}_{4}^{{\prime} }\) respectively.

Case III

Let a0t = a7t = − h1 − h2 + h3, a1t = a6t = h1 − h2 + h3 and b0t = b7t = − g1 − g2 + g3, b1t = b6t = g1 − g2 + g3. If x1x2 = x4 = 0 then ϕ1 = 4h1 and if x1x2 = x4 = 1 then ϕ2 = 2(h2 − h3). Similar conclusions follow for ϕ2 if we replace x1, x2, x4 by \({x}_{6}^{{\prime} },{x}_{5}^{{\prime} },{x}_{4}^{{\prime} }\) respectively.

Circuits simulating \({e}^{-i{H}_{201}t}\) in Case I, II and III have been shown in Fig. 3c, d and e respectively. Circuits for \({e}^{-i{H}_{202}t}\) are similar. Circuit for \({e}^{-i{H}_{20}}\) in each of these cases is obtained by concatenating the corresponding circuits.

Overlap on 3 qubits

Now we consider the case when there is overlap on 3 qubits. We can have the following sets of commuting Paulis.

$${G}_{3y}=\{Y{P}_{i}{P}_{j}{P}_{k}{\mathbb{I}},{\mathbb{I}}{P}_{i}{P}_{j}{P}_{k}Y:i+j+k\equiv 1\,{{\mathrm{mod}}}\,\,2\}$$
(30)
$${G}_{3x}=\{X{P}_{i}{P}_{j}{P}_{k}{\mathbb{I}},{\mathbb{I}}{P}_{i}{P}_{j}{P}_{k}X:i+j+k\equiv 0\,{{\mathrm{mod}}}\,\,2\}$$
(31)

Without loss of generality, we assume that the leftmost operator acts on qubit q1, next one on q2 and so on - rightmost one acts on qubit q5. We denote a state vector as \(\left\vert {Q}_{1}{q}_{2}{q}_{3}{q}_{4}{Q}_{2}\right\rangle\) where \({Q}_{1}=\left\vert {q}_{1}\right\rangle\), \({Q}_{2}=\left\vert {q}_{5}\right\rangle\) and q1, …, q5 {0, 1}. We can have the following Hamiltonian terms, expressed as sums of commuting Paulis from the above two sets.

$$\begin{array}{ll}{H}_{3y}\,=\,{a}_{1}YYXX{\mathbb{I}}+{a}_{2}YXYX{\mathbb{I}}+{a}_{3}YXXY{\mathbb{I}}+{a}_{7}YYYY{\mathbb{I}}\\ \qquad\quad\,+\,{b}_{3}{\mathbb{I}}YXXY+{b}_{5}{\mathbb{I}}XYXY+{b}_{6}{\mathbb{I}}XXYY+{b}_{7}{\mathbb{I}}YYYY\\ \end{array}$$
(32)
$$\begin{array}{ll}{H}_{3x}\,=\,{a}_{0}XXXX{\mathbb{I}}+{a}_{4}XYYX{\mathbb{I}}+{a}_{5}XYXY{\mathbb{I}}+{a}_{6}XXYY{\mathbb{I}}\\ \qquad\qquad+\,{b}_{0}{\mathbb{I}}XXXX+{b}_{1}{\mathbb{I}}YYXX+{b}_{2}{\mathbb{I}}YXYX+{b}_{4}{\mathbb{I}}XYYX\end{array}$$
(33)

Circuit for simulating \({e}^{-i{H}_{3y}t}\)

Let W3y be the unitary consisting of the following sequence of gates. The rightmost one is the first to be applied. With a slight abuse of notation we denote \(CNO{T}_{(c,{t}_{1})}CNO{T}_{(c,{t}_{2})}CNO{T}_{(c,{t}_{3})}\ldots\) by \(CNO{T}_{(c;{t}_{1},{t}_{2},{t}_{3},\ldots )}\) (multi-target CNOT).

$${W}_{3y}=CNO{T}_{(2;1,3,4)}{H}_{(2)}{Z}_{(2)}CNO{T}_{(2;1,5)}{H}_{(2)}CNO{T}_{(2,1)}$$

Theorem 2.3

For each \(i,j,k\in {{\mathbb{Z}}}_{2}\), such that \(Y{P}_{i}{P}_{j}{P}_{k}{\mathbb{I}},{\mathbb{I}}{P}_{i}{P}_{j}{P}_{k}Y\in {G}_{3y}\) we have the following.

$$\begin{array}{rcl}&&{\sqrt{-1}}^{i+j+k+1}{W}_{3y}\left({Z}_{(1)}{Z}_{(2)}{Z}_{(3)}^{j}{Z}_{(4)}^{k}{\mathbb{I}}\right){W}_{3y}^{{\dagger} }=Y{P}_{i}{P}_{j}{P}_{k}{\mathbb{I}}\\ &\,{{\mbox{and}}}\,&{\sqrt{-1}}^{i+j+k+1}{W}_{3y}\left({\mathbb{I}}{Z}_{(2)}{Z}_{(3)}^{j}{Z}_{(4)}^{k}{Z}_{(5)}\right){W}_{3y}^{{\dagger} }={\mathbb{I}}{P}_{i}{P}_{j}{P}_{k}Y\end{array}$$

The proof is similar to Theorem 2.1 and has been shown in Supplementary Method 3. Thus we have the following.

$$\begin{array}{ll}{e}^{-i{H}_{3y}t}\,=\,{e}^{-i(-{a}_{1}{W}_{3y}(ZZ{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}){W}_{3y}^{{\dagger} }-{a}_{2}{W}_{3y}(ZZZ{\mathbb{I}}{\mathbb{I}}){W}_{3y}^{{\dagger} }-{a}_{3}{W}_{3y}(ZZ{\mathbb{I}}Z{\mathbb{I}}){W}_{3y}^{{\dagger} }+{a}_{7}{W}_{3y}(ZZZZ{\mathbb{I}}){W}_{3y}^{{\dagger} })t}\\ \qquad\qquad\,\cdot {e}^{-i(-{b}_{3}{W}_{3y}({\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}Z){W}_{3y}^{{\dagger} }-{b}_{5}{W}_{3y}({\mathbb{I}}ZZ{\mathbb{I}}Z){W}_{3y}^{{\dagger} }-{b}_{6}{W}_{3y}({\mathbb{I}}Z{\mathbb{I}}ZZ){W}_{3y}^{{\dagger} }+{b}_{7}{W}_{3y}({\mathbb{I}}ZZZZ){W}_{3y}^{{\dagger} })t}\\ \qquad\quad=\,{W}_{3y}{e}^{i{a}_{1}ZZ{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{e}^{i{a}_{2}ZZZ{\mathbb{I}}{\mathbb{I}}t}{e}^{i{a}_{3}ZZ{\mathbb{I}}Z{\mathbb{I}}t}{e}^{-i{a}_{7}ZZZZ{\mathbb{I}}t}{e}^{i{b}_{3}{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}Zt}{e}^{i{b}_{5}{\mathbb{I}}ZZ{\mathbb{I}}Zt}{e}^{i{b}_{6}{\mathbb{I}}Z{\mathbb{I}}ZZt}{e}^{-i{b}_{7}{\mathbb{I}}ZZZZt}{W}_{3y}^{{\dagger} }\end{array}$$

We denote the state of the qubits q1, …, q5 after the application of W3y by the variables x1, …, x5 respectively. We have the following expression for the overall phase incurred between W3y and \({W}_{3y}^{{\dagger} }\).

$$\begin{array}{lll}\phi \,=\,{(-1)}^{{x}_{2}\oplus {x}_{1}}{a}_{1}+{(-1)}^{{x}_{2}\oplus {x}_{1}\oplus {x}_{3}}{a}_{2}+{(-1)}^{{x}_{2}\oplus {x}_{1}\oplus {x}_{4}}{a}_{3}-{(-1)}^{{x}_{2}\oplus {x}_{1}\oplus {x}_{3}\oplus {x}_{4}}{a}_{7}\\ \qquad\quad+\,{(-1)}^{{x}_{2}\oplus {x}_{5}}{b}_{3}+{(-1)}^{{x}_{2}\oplus {x}_{5}\oplus {x}_{3}}{b}_{5}+{(-1)}^{{x}_{2}\oplus {x}_{5}\oplus {x}_{4}}{b}_{6}-{(-1)}^{{x}_{2}\oplus {x}_{5}\oplus {x}_{3}\oplus {x}_{4}}{b}_{7}\end{array}$$

It is easy to verify that \({\phi }_{\overline{{x}_{2}}}=-{\phi }_{{x}_{2}}\). We consider the following cases and it is sufficient to check the phase values when x2 = 0.

Case I

We consider the case when a1t = − θ1, b6t = − θ2, a2t = a3t = a7t = θ1 and b3t = b5t = b7t = θ2. We can write ϕ = f1(θ1) + f2(θ2). It can be verified that \({\phi }_{\overline{{x}_{1}},{x}_{5}}=-f({\theta }_{1})+f({\theta }_{2})\) and \({\phi }_{{x}_{1},\overline{{x}_{5}}}=f({\theta }_{1})-f({\theta }_{2})\). So we concentrate on x1 = x5 = 0. If x3 = x4 = 1 then ϕ = − 4θ1 and if x3 = 0, x4 = 1 then ϕ = 4θ2. A quantum circuit simulating \({e}^{-i{H}_{3y}t}\) has been shown in Fig. 4a. If θ1 = θ2 then we can have a further reduction of controlled rotation gates, as shown in Fig. 4b.

Fig. 4: Quantum circuit for \({e}^{-i{H}_{3y}t}\) and \({e}^{-i{H}_{3x1}t}\).
figure 4

ae: Circuit simulating \({e}^{-i{H}_{3y}t}\) (a) when a1t = − θ1, b6t = − θ2, a2t = a3t = a7t = θ1 and b3t = b5t = b7t = θ2; (b) with the coefficient values in (a), except here θ1 = θ2; (c) when a1t = a2t = a3t = a7t = θ1 and b3t = b5t = b6t = b7t = θ2; (d) when the coefficients are as in Equation 18; (e) Circuit with the coefficients in (d), except in this case h2 = g1, h3 = g3, h1 = g2. fh: Circuit simulating \({e}^{-i{H}_{3x1}t}\) (f) when a6t = − θ1, b1t = − θ2, a0t = a5t = a4t = θ1 and b0t = b2t = b4t = θ2; (g) when a0t = … = a7t = θ1 and b0t = … = b7t = θ2; (h) when the coefficients are as in Equation 18.

Case II

Next we consider the case when a1t = a2t = a3t = a7t = θ1, b6t = b3t = b5t = b7t = θ2. In this case ϕ = 0 whenever x1x5 = 1. Else, as before \({\phi }_{\overline{{x}_{1}},{x}_{5}}=-f({\theta }_{1})+f({\theta }_{2})\) and \({\phi }_{{x}_{1},\overline{{x}_{5}}}=f({\theta }_{1})-f({\theta }_{2})\). So it is enough to consider x1 = x5 = 0. When x3 = x4 = 1 then ϕ = − 2(θ1 + θ2), else ϕ = 2(θ1 + θ2). Thus we can have a quantum circuit simulating \({e}^{-i{H}_{3yt}}\), as shown in Fig. 4c.

Case III

Now we consider the case when a1t = h1 − h2 + h3, a2t = − h1 − h2 − h3, a3t = − h1 + h2 + h3, a7t = − h1 − h2 + h3 and b3t = − g1 + g2 + g3, b5t = − g1 − g2 − g3, b6t = g1 − g2 + g3, b7t = − g1 − g2 + g3. If we denote h = (h1, h2, h3) and g = (g1, g2, g3), then we can write ϕ = f(h) + f(g). Here too, \({\phi }_{\overline{{x}_{1}},{x}_{5}}=-f({{{\bf{h}}}})+f({{{\bf{g}}}})\) and \({\phi }_{{x}_{1},\overline{{x}_{5}}}=f({{{\bf{h}}}})-f({{{\bf{g}}}})\). So let us consider x1 = x5 = 0. Then we have the following phase values.

$$\begin{array}{r}{\phi }_{{x}_{3} = {x}_{4} = 0}=0,\quad {\phi }_{{x}_{3} = 0,{x}_{4} = 1}=-4({h}_{2}+{g}_{1}),\quad {\phi }_{{x}_{3} = 1,{x}_{4} = 0}=4({h}_{3}+{g}_{3}),\quad {\phi }_{{x}_{3} = {x}_{4} = 1}=4({h}_{1}+{g}_{2})\end{array}$$

A circuit simulating \({e}^{-i{H}_{3yt}}\) in this case has been shown in Fig. 4d. If h2 = g1, h3 = g3, h1 = g2 then we can have a simpler circuit, as shown in Fig. 4e.

Circuit for simulating \({e}^{-i{H}_{3x}t}\)

The diagonalizing transformation for the Pauli operators in G3x is shown in Lemma 6 of Supplementary Method 3. Since we have been unable to find a diagonalizing circuit, so we divide the commuting Paulis into two groups of 4-qubit Paulis,

$$\begin{array}{ll}{G}_{3x1}\,=\,\{X{P}_{i}{P}_{j}{P}_{k}{\mathbb{I}}:i+j+k\equiv 0\,{{\mathrm{mod}}}\,\,2.\}\\ {G}_{3x2}\,=\,\{{\mathbb{I}}{P}_{i}{P}_{j}{P}_{k}X:i+j+k\equiv 0\,{{\mathrm{mod}}}\,\,2.\}\end{array}$$

and have the following two Hamiltonians.

$$\begin{array}{ll}{H}_{3x1}\,=\,{a}_{0}XXXX{\mathbb{I}}+{a}_{4}XYYX{\mathbb{I}}+{a}_{5}XYXY{\mathbb{I}}+{a}_{6}XXYY{\mathbb{I}}\\ {H}_{3x2}\,=\,{b}_{0}{\mathbb{I}}XXXX+{b}_{1}{\mathbb{I}}YYXX+{b}_{2}{\mathbb{I}}YXYX+{b}_{4}{\mathbb{I}}XYYX\end{array}$$

Using the diagonalizing circuit of50 and have the following.

$$\begin{array}{ll}{e}^{-i{H}_{3x1}t}\,=\,{W}_{3x1}{e}^{-i{a}_{0}{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{e}^{i{a}_{6}{\mathbb{I}}ZZZ{\mathbb{I}}t}{e}^{i{a}_{5}{\mathbb{I}}Z{\mathbb{I}}Z{\mathbb{I}}t}{e}^{i{a}_{4}{\mathbb{I}}ZZ{\mathbb{I}}{\mathbb{I}}t}{W}_{3x1}^{{\dagger} }\\ {e}^{-i{H}_{3x2}t}\,=\,{W}_{3x2}{e}^{-i{b}_{0}{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{e}^{i{b}_{1}{\mathbb{I}}ZZ{\mathbb{I}}{\mathbb{I}}t}{e}^{i{b}_{2}{\mathbb{I}}Z{\mathbb{I}}Z{\mathbb{I}}t}{e}^{i{b}_{4}{\mathbb{I}}ZZZ{\mathbb{I}}t}{W}_{3x2}^{{\dagger} }\end{array}$$

where W3x1 = CNOT(2; 1, 3, 4)H(2) and W3x2 = CNOT(2; 3, 4, 5)H(2), where the rightmost gate is the first one to be applied. We denote the state of the qubits q1, …, q4 and q2, …, q5 after the application of W3x1 and W3x2 by the variables x1, …, x4 and \({x}_{2}^{{\prime} },\ldots ,{x}_{5}^{{\prime} }\) respectively. We have the following expression for the overall phase incurred between W3x1, \({W}_{3x1}^{{\dagger} }\) and between W3x2, \({W}_{3x2}^{{\dagger} }\).

$$\begin{array}{ll}{\phi }_{1}\,=\,-{(-1)}^{{x}_{2}}{a}_{0}t+{(-1)}^{{x}_{2}\oplus {x}_{3}\oplus {x}_{4}}{a}_{6}t+{(-1)}^{{x}_{2}\oplus {x}_{4}}{a}_{5}t+{(-1)}^{{x}_{2}\oplus {x}_{3}}{a}_{4}t\\ {\phi }_{2}\,=\,-{(-1)}^{{x}_{2}^{{\prime} }}{b}_{0}t+{(-1)}^{{x}_{2}^{{\prime} }\oplus {x}_{3}^{{\prime} }}{b}_{1}t+{(-1)}^{{x}_{2}^{{\prime} }\oplus {x}_{4}^{{\prime} }}{b}_{2}t+{(-1)}^{{x}_{2}^{{\prime} }\oplus {x}_{3}^{{\prime} }\oplus {x}_{4}^{{\prime} }}{b}_{4}t\end{array}$$

In all the cases considered below it is easy to verify that \({\phi }_{1,\overline{{x}_{2}}}=-{\phi }_{1,{x}_{2}}\) and \({\phi }_{2,\overline{{x}_{2}^{{\prime} }}}=-{\phi }_{2,{x}_{2}^{{\prime} }}\). So it is enough to consider \({x}_{2}={x}_{2}^{{\prime} }=0\).

Case I

We consider the case when a6t = − θ1, b1t = − θ2, a0t = a5t = a4t = θ1 and b0t = b4t = b2t = θ2. ϕ1 = − 4θ1 when x3 = x4 = 1, else it is 0. And ϕ2 = − 4θ2 when \({x}_{3}^{{\prime} }=0,{x}_{4}^{{\prime} }=1\), else it is 0.

Case II

Let a0t = a6t = a5t = a4t = θ1, b0t = b4t = b2t = b1t = θ2. ϕ1 = 2θ1 when x3 = x4 = 0, else ϕ1 = − 2θ1. Similarly for ϕ2.

Case III

Assume a0t = − h1 − h2 + h3, a6t = h1 − h2 + h3, a5t = − h1 − h2 − h3, a4t = − h1 + h2 + h3, and b0t = − g1 − g2 + g3, b4t = − g1 + g2 + g3, b2t = − g1 − g2 − g3, b1t = g1 − g2 + g3. We have the following phases.

$$\begin{array}{rcl}{\phi }_{1}({x}_{3}=0,{x}_{4}=1)=4{h}_{2};&&{\phi }_{1}({x}_{3}=1,{x}_{4}=0)=-4{h}_{3};\quad {\phi }_{1}({x}_{3}={x}_{4}=1)=4{h}_{1};\\ {\phi }_{2}({x}_{3}^{{\prime} }=0,{x}_{4}^{{\prime} }=1)=4{g}_{1},&&{\phi }_{2}({x}_{3}^{{\prime} }=1,{x}_{4}^{{\prime} }=0)=-4{g}_{3};\quad {\phi }_{2}({x}_{3}^{{\prime} }={x}_{4}^{{\prime} }=1)=4{g}_{2};\end{array}$$

Circuits simulating \({e}^{-i{H}_{3x1}t}\) in Case I, II and III have been shown in Fig. 4f, g and h respectively. Circuits for \({e}^{-i{H}_{3x2}t}\) are similar. Circuit for \({e}^{-i{H}_{3x}t}\) in each case is obtained by concatenating the corresponding circuits.

Circuit for arbitrary exponentiated Hamiltonians

Our previous discussion focuses on the case of fermionic simulation within a Jordan–Wigner representation using Hamiltonian terms that are fermionically swapped to be adjacent to each other. While these simulation circuits are among the most important for applications in chemistry, it does not necessarily represent all cases of physical interest let alone chemistry. Here we address this by discussing ways to synthesize circuits for arbitrary exponentiated Hamiltonians in \({{\mathbb{C}}}^{{2}^{n}\times {2}^{n}}\), expressible as sum of Pauli operators, with an aim to reduce the number of non-Clifford resources. For reasons discussed previously, it is enough to consider a Hamiltonian H expressed as sum of commuting Pauli operators.

$$H=\mathop{\sum}\limits_{i}{\alpha }_{i}{P}_{i}\qquad {P}_{i}\in {{{{\mathcal{P}}}}}_{n}$$

In most cases one synthesizes circuit for each \({e}^{-i{\alpha }_{i}{P}_{i}t}\) using a number of CNOT and one Rz gate. Thus the number of Rz gates required is equal to the number of summands. Here we describe procedure to synthesize circuit for eiHt i.e. considering multiple summands or Pauli operators.

We diagonalize H, for example, by using the algorithms in51. In the previous section we have constructed explicit eigenbases for the diagonalization of some specific Hamiltonians. Then we get the following.

$$H=W\left(\mathop{\sum}\limits_{i}{\alpha }_{i}^{{\prime} }{Q}_{i}\right){W}^{{\dagger} }$$

Here \({Q}_{i}={\otimes }_{j = 1}^{n}{Q}_{ij}\), a tensor product of Z and \({\mathbb{I}}\) i.e. \({Q}_{ij}\in \{Z,{\mathbb{I}}\}\). W is a diagonalizing Clifford circuit. Thus we get the following.

$${e}^{-iHt}=W{e}^{{\sum }_{i}{\alpha }_{i}^{{\prime} }{Q}_{i}}{W}^{{\dagger} }$$
(34)

Lemma 2.3

Let \({{{\mathcal{H}}}}={\sum }_{i}{\alpha }_{i}^{{\prime} }{Q}_{i}\), such that \({Q}_{i}={\otimes }_{j = 1}^{n}{Q}_{ij}\), where \({Q}_{ij}\in \{Z,{\mathbb{I}}\}\). With each Qi we associate an n-length vector yi = (yi1, …, yin)  {0, 1}n such that \({\left({{{{\bf{y}}}}}_{i}\right)}_{j}={y}_{ij}=1\) if Qij = Z, else yij = 0. Let x1, …, xn {0, 1} and \(\left\vert 0\right\rangle ={\left[1,0\right]}^{T}\), \(\left\vert 1\right\rangle ={\left[0,1\right]}^{T}\). The eigenvectors of \({{{\mathcal{H}}}}\) are of the form \(\left\vert v\right\rangle { = \bigotimes }_{j = 1}^{n}\left\vert {x}_{j}\right\rangle\) and the corresponding eigenvalue is

$${\phi }_{v}=\mathop{\sum}\limits_{i}{\alpha }_{i}^{{\prime} }{(-1)}^{{\oplus }_{j = 1}^{n}{y}_{ij}{x}_{j}}$$
(35)

Proof

The summands in \({{{\mathcal{H}}}}\) are mutually commuting and so they have a common eigenbasis. Let us first consider Qi. Since \({Q}_{ij}\left\vert x\right\rangle =\left\vert x\right\rangle\) if \({Q}_{ij}={\mathbb{I}}\) and \({Q}_{ij}\left\vert x\right\rangle ={(-1)}^{x}\left\vert x\right\rangle\) if Qij = Z, where x {0, 1}, so we have the following.

$${Q}_{i}\left\vert v\right\rangle =\left(\mathop{\bigotimes }\limits_{j=1}^{n}{Q}_{ij}\right)\left(\mathop{\bigotimes }\limits_{j=1}^{n}\left\vert {x}_{j}\right\rangle \right)=\mathop{\bigotimes }\limits_{j=1}^{n}{Q}_{ij}\left\vert {x}_{j}\right\rangle ={(-1)}^{{\oplus }_{j = 1}^{n}{y}_{ij}{x}_{j}}$$

This implies that

$${{{\mathcal{H}}}}\left\vert v\right\rangle =\left(\mathop{\sum}\limits_{i}{\alpha }_{i}^{{\prime} }{Q}_{i}\right)\left\vert v\right\rangle =\mathop{\sum}\limits_{i}{\alpha }_{i}^{{\prime} }{(-1)}^{{\oplus }_{j = 1}^{n}{y}_{ij}{x}_{j}}\left\vert v\right\rangle .$$

We can also interpret ϕ as the overall phase incurred between W and W. For given values of \({\alpha }_{i}^{{\prime} }\), ϕ is an n-variable Boolean function where xj are the Boolean variables. We can evaluate a truth table and get all the 2n values of ϕ for different values of (x1, …, xn)  {0, 1}n. For each distinct non-zero absolute phase value θ (ignoring sign), we can have a sub-circuit \({{{{\mathcal{C}}}}}_{| \theta | }\) that has only one controlled rotation cRz(2θ) gate. The complete circuit can be obtained by combining these different sub-circuits (one for each θ ≠ 0), in between the diagonalizing Clifford circuits W, W. The ordering of the sub-circuits do not matter.

Now we discuss how to synthesize sub-circuit \({{{{\mathcal{C}}}}}_{| \theta | }\), for one such distinct absolute value of ϕ. Let \({{{{\mathcal{M}}}}}_{\theta }\) be the set of binary values for variables x1, x2, …, xn, such that ϕ computes to θ in Equation (35).

$${{{{\mathcal{M}}}}}_{\theta }=\{({x}_{1},\ldots ,{x}_{n})\in {\{0,1\}}^{n}:{\phi }_{{x}_{1},\ldots ,{x}_{n}}=\theta \}$$
(36)

Analogously we can define \({{{{\mathcal{M}}}}}_{-\theta }\). We can also associate \({{{{\mathcal{M}}}}}_{\theta }\) and \({{{{\mathcal{M}}}}}_{-\theta }\) with Sθ and Sθ, the sets of eigenvectors with eigenvalues θ and − θ respectively, as obtained from Lemma 2.3. We define the following operators, which acts on the input vector space and the space of two ancillae - c and r, the latter being initialized to 0.

$$\begin{array}{ll}{V}_{\theta }\,=\,\mathop{\sum}\limits_{\left\vert v\right\rangle \in {S}_{\theta }}\left\vert v,c\oplus 1,0\right\rangle \left\langle v,c,0\right\vert +\mathop{\sum}\limits_{\left\vert w\right\rangle \notin {S}_{\theta }}\left\vert w,c,0\right\rangle \left\langle w,c,0\right\vert \\ {V}_{-\theta }\,=\,\mathop{\sum}\limits_{\left\vert v\right\rangle \in {S}_{-\theta }}\left\vert v,c\oplus 1,1\right\rangle \left\langle v,c,0\right\vert +\mathop{\sum}\limits_{\left\vert w\right\rangle \notin {S}_{-\theta }}\left\vert w,c,0\right\rangle \left\langle w,c,0\right\vert \end{array}$$
(37)

The circuit \({{{{\mathcal{C}}}}}_{| \theta | }={V}_{\theta }{V}_{-\theta }{\left(c{R}_{Z}(2\theta )\right)}_{cr}{V}_{-\theta }^{{\dagger} }{V}_{\theta }^{{\dagger} }\). If the input vector is in Sθ or Sθ then both these operators flip a control ancilla qubit (c). Additionally, if the vector is in Sθ then the second ancilla r is flipped. We apply a cRz(2θ) gate on r, controlled on c. Thus if the input vector is in Sθ then we actually apply cRz( − 2θ). The ancillae c, r can be controlled by multi-controlled X gates, that can be further decomposed in terms of Toffoli and CNOT gates69,70. For example, let \({{{{\mathcal{M}}}}}_{\theta }=\{(0,0,1,1),(0,1,1,1)\}\) and \({{{{\mathcal{M}}}}}_{-\theta }=\{(1,1,0,0)\}\). The two Boolean min-terms of \({{{{\mathcal{M}}}}}_{\theta }\) can be compressed to have a single term because when x1 = 0, x3 = x4 = 1 then ϕ = θ, irrespective of the value of x2. We call it the ‘don’t care condition’ for x2. So, equivalently we can write \({{{{\mathcal{M}}}}}_{\theta }=\{(0,* ,1,1)\}\), where * denotes the don’t-care condition. In general, algorithms like Karnaugh map71, ESPRESSO72 can be used to get compact set of Boolean min-terms. A circuit \({{{{\mathcal{C}}}}}_{| \theta | }\) has been shown in Fig. 5.

Fig. 5: Circuit \({{{{\mathcal{C}}}}}_{| \theta | }\).
figure 5

The circuit when \({{{{\mathcal{M}}}}}_{\theta }=\{(1,1,0,0),(1,1,1,0)\}\) and \({{{{\mathcal{M}}}}}_{-\theta }=\{(0,0,1,1)\}\).

Hence, due to the invariance of the point spectrum of unitarily equivalent operators we have the following.

Lemma 2.4

Let H = ∑iαiPi, where Pi are mutually commuting n-qubit Pauli operators. We can implement a circuit for eiHt with at most m (controlled)-rotations, where m is the number of distinct non-zero eigenvalues (ignoring sign) of H.

Illustration—Quantum Heisenberg and quantum Ising model

We consider the problem of designing quantum circuits for simulating the quantum Heisenberg and Ising model with Hamiltonians HH and HI respectively. The Heisenberg Hamiltonian is widely used to study magnetic systems, where the magnetic spins are treated quantum mechanically60,73,74,75,76. Let G = (E, V) be the underlying graph with the vertex and edge set being V and E, respectively.

$${H}_{H}=\mathop{\sum}\limits_{(i,j)\in E}\left({J}_{x}{X}_{(i)}{X}_{(j)}+{J}_{y}{Y}_{(i)}{Y}_{(j)}+{J}_{z}{Z}_{(i)}{Z}_{(j)}\right)+\mathop{\sum}\limits_{i\in V}{d}_{h}{Z}_{(i)}$$
(38)
$${H}_{I}=\mathop{\sum}\limits_{(i,j)\in E}{J}_{z}{Z}_{(i)}{Z}_{(j)}+\mathop{\sum}\limits_{i\in V}{d}_{h}^{{\prime} }{Z}_{(i)}$$
(39)

In the above Jx, Jy, Jz are coupling parameters, denoting the exchange interaction between nearest neighbor spins along the X,Y,Z-direction respectively. \({d}_{h},{d}_{h}^{{\prime} }\) is the time amplitude of the external magnetic field along the Z-direction. One set of commuting Paulis are {X(i)X(j): (i, j) E}, {Y(i)Y(j): (i, j) E}, {Z(i)Z(j): (i, j) E} and {Z(i): iV}.

Let us first consider the set {Z(i): iV}. Following the previous discussions and Lemma 2.3, the overall phase incurred or the eigenvalues are as follows.

$${\phi }^{{\prime} }={d}_{h}\mathop{\sum}\limits_{i\in V}{(-1)}^{{x}_{i}}\qquad {x}_{i}\in \{0,1\}$$
(40)

For x {0, 1}V, one particular assignment of values to the Boolean variables, let T0 = {iV: xi = 0} and T1 = {iV: xi = 1}. So T0 + T1 = V and

$${\phi }_{{{{\bf{x}}}}}^{{\prime} }={d}_{h}\left(| {T}_{0}| -| {T}_{1}| \right)={d}_{h}\left(| V| -2| {T}_{1}| \right).$$
(41)

So the number of distinct non-zero eigenvalues or absolute values of \({\phi }^{{\prime} }\) can be V/2. Implementing each \({e}^{-i{d}_{h}{Z}_{(i)}t}\) would require V rotation gates. Thus, using Lemma 2.4, we have about 50% reduction in the rotation gate cost.

Now, let us consider the other commuting sets. Since HXH = Z and (HSX)Y(HSX) = Z, so each of the above sets can be diagonalized and we can focus on the problem of simulating a quantum circuit for the Hamiltonian : H = J(i, j)EZ(i)Z(j), where J is a constant. Our aim is to derive an upper bound on the number of controlled rotations required to simulate eiHt. Following the previous discussions, the overall phase incurred between the diagonalizing Cliffords W, W is as follows (Lemma 2.3).

$$\phi =J\mathop{\sum}\limits_{(i,j)\in E}{(-1)}^{{x}_{i}\oplus {x}_{j}}$$
(42)

where xi, xj {0, 1} are variables denoting the state of the qubits after application of W. The quantum circuit has V qubits, corresponding to each vertex of G. Let x =  {0, 1}V denote one particular assignment of values to the variables x1, …, xV. S0 = {(i, j) E: xi = xj = 1} and S1 = {(i, j) E : xi or xj is 1}. If \({S}^{{\prime} }=\{(i,j)\in E:{x}_{i}={x}_{j}=0\}\), then \(| {S}^{{\prime} }| =| E| -| {S}_{0}| -| {S}_{1}|\). Let ϕx be the value of the phase for this particular assignment.

$${\phi }_{{{{\bf{x}}}}}=J\left(| {S}^{{\prime} }| +| {S}_{0}| -| {S}_{1}| \right)=J\left(| E| -| {S}_{0}| -| {S}_{1}| +| {S}_{0}| -| {S}_{1}| \right)=J\left(| E| -2| {S}_{1}| \right)$$
(43)

Let V1x = {iV: xi = 1}, V0x = {iV: xi = 0} and \({{{\mathcal{N}}}}(k)\) be the set of neighbouring vertices of k in G. Then

$$| {S}_{1}| =\mathop{\sum}\limits_{k\in {V}_{1{{{\bf{x}}}}}}\left\vert \left\{{{{\mathcal{N}}}}(k)\setminus {V}_{1{{{\bf{x}}}}}\right\}\right\vert$$
(44)

Now for any assignment S1 can vary from 1, …, E. So the number of distinct values of ϕ is at most E/2. And hence we need at most E/2 controlled-Rz gates in the circuit simulating eiHt. Had we simulated each \({e}^{-iJ{Z}_{(i)}{Z}_{(j)}t}\), we would have required ERz gates. So we can achieve about 50% reduction in the rotation gate cost under the assumption that controlled-Rz costs the same to implement as a single Rz gate.

G is a cycle

This is basically the traslationally invariant 1-D spin chain. Let \({G}_{{V}_{1{{{\bf{x}}}}}}\) be the subgraph induced by V1x, which is a union of paths. For each path p, let \({S}_{1p}={\bigcup }_{k\in p}\{{{{\mathcal{N}}}}(k)\setminus {V}_{1{{{\bf{x}}}}}\}\subseteq {S}_{1}\) be the set of vertices in this path. Each of the terminal vertices has one neighbour in VV1x. So S1p = 2. Thus if \({{{\mathcal{P}}}}\) is the set of all such paths in \({G}_{{V}_{1{{{\bf{x}}}}}}\), then

$${\phi }_{{{{\bf{x}}}}}=J\left(| E| -2\mathop{\sum}\limits_{p\in {{{\mathcal{P}}}}}| {S}_{1p}| \right)=J\left(| E| -4| {{{\mathcal{P}}}}| \right)$$
(45)

Now \(| {{{\mathcal{P}}}}|\) can vary from 1, …, E/4 to give E/4 distinct values of ϕ. This implies a quantum circuit synthesizing eiHt will require at most E/4cRz gates. This is about 75% reduction in the cost of rotation gates, compared to synthesizing each \({e}^{-iJ{Z}_{(i)}{Z}_{(j)}t}\).

G is a complete graph

In this case for each kV1x, we have \({{{\mathcal{N}}}}(k)\setminus {V}_{1{{{\bf{x}}}}}={V}_{0{{{\bf{x}}}}}\). So we have,

$${\phi }_{{{{\bf{x}}}}}=J\left(| E| -2| {V}_{1{{{\bf{x}}}}}| | {V}_{0{{{\bf{x}}}}}| \right)=J\left(| E| -2\parallel {{{\bf{x}}}}{\parallel }_{1}(| V| -\parallel {{{\bf{x}}}}{\parallel }_{1})\right)$$
(46)

So there can be V/2 distinct values of ϕx as x1 varies from 1, …, V/2. And hence we require at most V/2cRz gates for simulating eiHt. If we simulate each \({e}^{-iJ{Z}_{(i)}{Z}_{(j)}t}\) then we require \(| E| =\frac{| V| (| V| -1)}{2}\)Rz gates. This indicates about \(100\left(1-\frac{2}{| V| -1}\right) \%\) reduction in the cost of rotation gates.

In Fig. 6 we have shown quantum circuits simulating \({e}^{it\theta {\sum }_{(i,j)\in E}{Z}_{(i)}{Z}_{(j)}}\) for some simple graphs G = (V, E). The circuits have been designed to optimize the number of Toffoli gates, as well.

Fig. 6: Quantum circuit simulating \({e}^{it\theta {\sum }_{(i,j)\in E}{Z}_{(i)}{Z}_{(j)}}\).
figure 6

When the underlying graph G = (V, E) is (a) 4-point circle, (b) 3-point line, (c) triangle, (d) 4-point complete graph, (e) 6-point circle.

Reducing the number of Toffoli gates

We discussed before that the T-count from the Toffolis may be a significant factor in high error regime as the logarithmic cost of rotation synthesis may not dominate the additive constant that arises from the Toffoli gates needed. In order to reduce the number of Toffolis we can do the following. We design circuits reducing Toffolis for Hamiltonians over smaller graphs, such as in Fig. 6a–e. Then we decompose a Hamiltonian over a large graph into Hamiltonians over these smaller graphs. For example, consider a 1-D cycle on N points and Hz = θ(i, j)EZ(i)Z(j). We break this cycle into smaller chains of length 3 i.e. Hz = θ(Z(1)Z(2) + Z(2)Z(3)) + (Z(3)Z(4) + Z(4)Z(5)) + … = ∑iHzi. We have a quantum circuit that synthesizes each \({e}^{i{H}_{zi}t}\) with only one cRz gate (Fig. 6b). So to synthesize \({e}^{i{H}_{z}t}\) we require approximately N/2cRz. This is about twice the number of controlled rotations required, had we synthesized without decomposing. But it does not require any extra Toffoli-pairs. We manage to get approximately 50% reduction, compared to synthesizing each summand i.e. \({e}^{it{Z}_{(i)}{Z}_{(j)}}\).

Now consider a large N × N lattice which has N2 vertices and 2N(N − 1) edges and the Hamiltonian Hz = θ(i, j)EZ(i)Z(j). We can decompose this into (N−1)2 smaller interior cycles of 4-points and a bigger outer circle with 2N + 2(N − 2) = 4(N − 1) points. From Fig. 6a, we know that we can design a circuit simulating the exponentiated Hamiltonian corresponding to each interior cycle with 1 cRz and 1 Toffoli pair. We can further decompose the outer circle (as explained in the previous paragraph) and have a circuit with approximately 2(N − 1)cRz gates. Thus we require ≈ (N − 1)2 + 2(N − 1) = (N − 1)(N + 1)cRz and (N − 1)2 Toffoli-pairs. We have discussed before that for general graphs, number of cRz required is ≈ E/2 = N(N − 1), so we use ≈ (N − 1) more cRz by decomposing, but the Toffoli cost reduces a lot. Had we synthesized each \({e}^{it{Z}_{(i)}{Z}_{(j)}}\), we would have used 2N(N − 1)Rz. Thus we manage to get a reduction of ≈ (N−1)2 in the number of Rz/cRz.

In Fig. 6e we gave a circuit for simulating \({e}^{it\theta {\sum }_{(i,j)\in E}{Z}_{(i)}{Z}_{(j)}}\), when the underlying graph is a 6-point cycle. We reduced the Toffoli-pairs by decomposing the graph into smaller cycles.

Application : Simulating with qDRIFT

In this section we consider one simulation algorithm - qDRIFT. We focus on qDRIFT rather than Trotter for our experiments because qDRIFT is easier to analyze numerically. This is because Trotter errors subtly depend on operator ordering. Specifically we consider a Hamiltonian \(H=\mathop{\sum }\nolimits_{j = 1}^{L}{h}_{j}{P}_{j}\) and sample Pauli operators to apply in each short time step, as described earlier in the paper and in43. We can assume each hj > 0, since the negation affects the angles of the rotation gates. In qDRIFT, in each iteration one Pauli term is sampled up to a total of N samples. The probability of sampling Pj is hj/∑ihi and is then simulated for a short time period. We consider another procedure where we re-write the Hamiltonian as \(H=\mathop{\sum }\nolimits_{j = 1}^{L{\prime} }{h}_{j}^{{\prime} }{H}_{j}\), where each Hj = ∑iPi, is a sum of commuting Paulis. In each iteration one Hj is sampled with probability \(\frac{{h}_{j}^{{\prime} }}{{\sum }_{i}{h}_{i}^{{\prime} }}\) and simulated for a short time period. Then we compare the growth of error, number of Rz/cRz, Toffoli gates used in these two procedures - (i) one Pauli sampled in each iteration, (ii) group of commuting Paulis sampled in each iteration.

For our first set of experiments we examine the case of simulation of 4 and 6-qubit Heisenberg models. The coefficients Jx, Jy, Jz, dh have been sampled from a 0 mean normal distribution with variance 1. In Fig. 7 we show that we achieve better scaling of Rz/cRz when multiple commuting Pauli operators are sampled and evolved in each iteration. In fact, the error also scales well with the number of iterations, i.e. we can achieve the same error in less number of iterations, or in another way, it is possible to achieve much lower error in the same time (iterations) when multiple operators are sampled. We calculate error as:

$$\,{{\mbox{Error}}}\,={{\mathbb{E}}}_{\rho }(\parallel {{{{\mathcal{E}}}}}_{2}(\rho )-{{{{\mathcal{E}}}}}_{1}(\rho ){\parallel }_{{l}_{2}})$$

Where \({{{{\mathcal{E}}}}}_{1}={e}^{iHt}\rho {e}^{-iHt}\), \(\tau =t\cdot \left({\sum }_{j}{h}_{j}\right)/N\) and \(\parallel \cdot {\parallel }_{{l}_{2}}\) is the induced Euclidean norm on matrices and \({{\mathbb{E}}}_{\rho }\) is the Haar average over input states. We obtain \({{{{\mathcal{E}}}}}_{2}\) through averaging M random qDRIFT protocols, where M varies from 100 to 3000 for our purposes. These values are chosen to ensure that the sampling error is small at the scale of the plots generated.

$${V}_{k}=\mathop{\prod}\limits_{{j}_{{i}_{k}}}{e}^{i{H}_{{j}_{{i}_{k}}}\tau }$$
$${{{{\mathcal{E}}}}}_{2}=\frac{1}{M}\mathop{\sum }\limits_{k=1}^{M}{V}_{k}\rho {V}_{k}^{{\dagger} }$$

In our experiments ρ is randomly drawn rather than chosen to maximize the diamond distance. As a result, this does not give a tight upper bound on the error quantified by any induced channel norm. Further, all evolution is done using t = 1 and the groupings are hand optimized using counts given in Supplementary Method 5. The data, tabulated in Fig. 7, shows that the number of iterations of the qDRIFT channel needed to simulate the dynamics to bound the error below a particular value, is reduced by a factor of 2.34 and 2.8 through the use of grouping commuting terms for the randomly chosen 4 and 6 qubit Heisenberg Hamiltonian respectively. The number of rotations is found to be reduced by a factor of roughly 2.34 for the 4 qubit ensemble but 1.8 for the 6 qubit case. This suggests that the groupings that we consider, while highly successful at reducing the number of iterations of qDRIFT needed, the number of gates per iteration increases from the 4 to 6 qubit examples. This suggests that further computer aided optimization may be needed in order to see the full benefit of such groupings as we increase the size of models.

Fig. 7: Simulation of 4 and 6-qubit Heisenberg Hamiltonian.
figure 7

Log-log plots showing number of iterations (a, d), Rz/cRz (b, e), Toffoli-pairs (c, f) as function of error, while simulating the 4 (ac) and 6-qubit (df) quantum Heisenberg Hamiltonian (HH) with qDRIFT. The red and blue curve shows the variation when sampling single and multiple commuting Paulis per iteration, respectively. The Y-axis label in all plots is \({\log }_{10}\,{{\mbox{Error}}}\,\). The X-axis label of (a), (d) is \({\log }_{10}\,{{\mbox{Iterations}}}\,\), (b), (e) is \({\log }_{10}\,{{\mbox{Rotations}}}\,\) and (c), (f) is \({\log }_{10}\,{{\mbox{Toffoli Cost}}}\,\).

Similar observations can be made for our second set of experiments where we simulate the Hamiltonian of H2 and LiH (with freezing in the STO-3G basis). The plots in Fig. 8 show that in case of H2, the number of iterations of the qDRIFT channel needed to simulate the dynamics to bound the error below a particular value, is reduced by a factor of 4 through the use of grouping commuting terms. For LiH this factor is nearly 2.1. The number of rotations is found to be reduced by a factor of roughly 3.2 for H2 and 2 for LiH.

Fig. 8: Simulation of H2 and LiH Hamiltonian.
figure 8

Log-log plots showing number of iterations (a, d), Rz/cRz (b, e), Toffoli-pairs (c, f) as function of error, while simulating the H2 (ac) and LiH (df) Hamiltonians with qDRIFT. The red and blue dots show the variation when sampling single and multiple commuting Paulis per qDRIFT iteration, respectively. The Y-axis label in all plots is \({\log }_{10}\,{{\mbox{Error}}}\,\). The X-axis label of (a), (d) is \({\log }_{10}\,{{\mbox{Iterations}}}\,\), (b), (e) is \({\log }_{10}\,{{\mbox{Rotations}}}\,\) and (c), (f) is \({\log }_{10}\,{{\mbox{Toffoli Cost}}}\,\).

For all the experiments that we consider the Toffoli-pair gate count is comparable with the Rz/cRz count, so the Toffoli pairs do not contribute significantly to the overall T-count, as compared to the rotation gates. The number of gates depend on the diagonalizing circuits and the grouping into commuting Paulis. In this paper we have shown the set of results for the eigenbasis or grouping that were better among the options considered by us. In Supplementary Method 5 we have explicitly mentioned the Hamiltonians, the groupings and given a short description of how we obtained the rotation and Toffoli costs.

All plots, code, and data can be found online in our public repository https://github.com/SNIPRS/hamiltonian. All code was written in Python. Our results were obtained partly with computing resources in the Cedar cluster of Compute Canada. Specifically, our code was run on an Intel(R) Xeon(R) E5-2683 v4 CPU at 2.10 GHz, utilizing 48 cores, up to 12GBs of RAM, and running Gentoo Linux 2.6. For the Heisenberg Hamiltonians, our results were obtained using 12 cores of an Intel(R) Core(TM) i5-12600K CPU at 3.6 GHz running Ubuntu 20.04.4 and up to 32GBs of RAM.

Discussion

In this paper, we have considered the problem of designing efficient quantum circuits for exponential of Hamiltonians that can be expressed as sum of Paulis. In contrast with most previous approaches, we synthesize circuit for a sum of exponentiated commuting Paulis, rather than concatenate circuits for each exponentiated Pauli. These resulting circuits are observed, for some parameter combinations, to require far fewer non-Clifford operations than the standard circuits. We therefore propose an algorithm for greedily compiling a Trotter or qDRIFT simulation into a sequence of such simulations and observe that when multiple rotations are grouped we see at fixed error that a factor of roughly 1.8−3.2 fewer rotations are needed to simulate 6 and 4-qubit Heisenberg models, LiH, H2. Also, for simulation protocols like qDRIFT, it is possible to achieve a better performance, in the sense that the error accumulated per iteration is less if we sample multiple commuting Paulis. The overall non-Clifford gate cost of the entire protocol is also less.

There are a number of interesting avenues that are revealed by this work. The first is that a more complete set of rules for compiling Hamiltonian terms into sets that can be easily exponentiated reveals the potential for more efficient simulation compilation of Hamiltonians. These replacement rules, once identified, can be used inside a more systematic Hamiltonian compiler package that would allow more substantial optimizations of the Hamiltonian for the given simulation method. This raises a second issue, while in this work we focus on the case of optimizing Trotter and related simulation methods, similar considerations could be performed for optimizing the prepare and select circuits used in LCU/qubitization simulation algorithms. Such procedures are harder to optimize as the simulation algorithm does not factorize as nicely into independent simulations; however, the importance of these simulation methods makes the development of compilation strategies essential.

Finally, an important avenue hinted at by this work is the possibility that approximate unitary synthesis methods can be combined with quantum simulation routines to further reduce the cost. If fermionic swaps are used, for example, simulation reduces to implementing a series of 4-local Hamiltonians and optimal circuits can be in principle constructed for such Hamiltonians using existing approaches. The computational overheads required for optimal (approximate) synthesis of these unitaries makes this a daunting task; however, if a sufficient lexicon of cheap unitaries are found for such simulation then it will not only lead to lower costs for quantum simulation using Trotter/qDRIFT: it will also unify Hamiltonian compilation with circuit synthesis into a single conceptual framework.