Synthesizing efficient circuits for Hamiltonian simulation

Mukhopadhyay, Priyanka; Wiebe, Nathan; Zhang, Hong Tao

doi:10.1038/s41534-023-00697-6

Download PDF

Article
Open access
Published: 03 April 2023

Synthesizing efficient circuits for Hamiltonian simulation

npj Quantum Information volume 9, Article number: 31 (2023) Cite this article

3940 Accesses
6 Citations
3 Altmetric
Metrics details

Subjects

Abstract

We provide an approach for compiling quantum simulation circuits that appear in Trotter, qDRIFT and multi-product formulas to Clifford and non-Clifford operations that can reduce the number of non-Clifford operations. The total number of gates, especially CNOT, reduce in many cases. We show that it is possible to implement an exponentiated sum of commuting Paulis with at most m (controlled)-rotation gates, where m is the number of distinct non-zero eigenvalues (ignoring sign). Thus we can collect mutually commuting Hamiltonian terms into groups satisfying one of several symmetries identified in this work. This allows an inexpensive simulation of the entire group of terms. We further show that the cost can in some cases be reduced by partially allocating Hamiltonian terms to several groups and provide a polynomial time classical algorithm that can greedily allocate the terms to appropriate groupings.

Hamiltonian simulation algorithms for near-term quantum hardware

Article Open access 17 August 2021

A quantum solution for efficient use of symmetries in the simulation of many-body systems

Article Open access 08 January 2020

Simulating quantum computations with Tutte polynomials

Article Open access 24 September 2021

Introduction

One main reason that led Feynman¹ and others to propose the idea of quantum computers was the fact that problems like simulating the dynamics of quantum systems are intractable on a classical computer. Starting from the seminal work of Lloyd², much research³ has been done to develop algorithms for simulating Hamiltonians, culminating in various techniques like product formulas^4,5, quantum walks⁶, linear combination of unitaries⁷, truncated Taylor series⁸, and quantum signal processing⁹. Special techniques have been developed for simulating particular physical systems^{10,11,12,13,14,15,16,17}, which might find applications in developing new pharmaceuticals, catalysts and materials. Phase estimation can be combined with quantum simulation to find the ground state energy¹⁸ and excited state energies^19,20,21 of the Hamiltonian. This is called the electronic structure problem¹⁴, which is important in chemistry and material science. Research in quantum simulation has also inspired the development of quantum algorithms for various other problems^{22,23,24,25,26}.

One main challenge for digital quantum simulation is the implementation of efficient circuits that can produce reliable results. Without it, a theoretical exponential speedup may not lead to a useful algorithm if a typical practical application requires an amount of time and memory that is beyond the reach of even a quantum computer. There are a number of factors that can affect the efficiency of a quantum circuit i.e. its running time and error, for example, the number of qubits, depth, gate count, etc. So depending upon the applications or other hardware constraints, one can design algorithms that optimize or reduce the count/depth of one particular type of quantum gate or other resources. For example, there are algorithms that do T-count and T-depth-optimal synthesis^27,28,29 given a unitary or does re-synthesis of a given circuit with reduced T-count, T-depth^30,31,32 or CNOT-count^33,34,35. The non-Clifford T gate has known constructions in most of the error correction schemes and the cost of fault-tolerantly implementing it exceeds the cost of the Clifford group gates by as much as a factor of hundred or more^36,37,38. Quantum error correction and fault tolerance are especially significant for large quantum circuits, else the accumulation of errors will make any output highly unreliable and hence useless. The minimum number of T-gates required to implement certain unitaries is a quantifier of difficulty in many algorithms^39,40 that try to classically simulate quantum computation. So, even though alternative fault-tolerance methods such as completely transversal Clifford+T scheme⁴¹ and anyonic quantum computing⁴² are also being explored, minimization of the number of T gates in quantum circuits remain an important and widely studied goal. Multi-qubit gates like CNOT introduce more error than single qubit gates, so reducing CNOT gate is important and especially relevant for the noisy intermediate scale quantum (NISQ) computers.

Our contributions

(I) One main result in this paper is Lemma 2.4, which shows that it is possible to implement an exponentiated sum of commuting Paulis with at most m (controlled)-rotation gates, where m is the number of distinct non-zero eigenvalues (ignoring sign). For illustration, we consider the Hamiltonian for the Heisenberg model and we show that it is possible to achieve about 50% reduction in the rotation gate cost and for certain underlying graphs this reduction can be about 75%. However, the cost of Toffolis may increase. We have given explicit circuits for 4-qubit and 6-qubit chain (or cycle), where we attempt to reduce both the rotation and Toffoli gate cost.

(II) In most previous works, circuits for individual exponentiated Paulis are synthesized and combined. We show that it is possible to reduce the gate count (not only non-Clifford gates) if we instead consider groups of commuting Paulis. To give some practical demonstration, we consider the qDRIFT Hamiltonian simulation algorithm⁴³. We call the error introduced due to the algorithm as ‘simulation error’. We take the 1-D 4 qubit and 6 qubit Heisenberg Hamiltonians (Fig. 7) and also 4-qubit Hamiltonians for H₂ and LiH (with freezing in the STO-3G basis) (Fig. 8), and compare the case where a single Pauli term is selected with the case where a set of commuting Pauli terms is selected for implementation at each iteration of qDRIFT. We observe that the error accumulation is less for multiple terms and also the rotation gate cost is less in this error regime. The number of Toffoli pairs is roughly equal to the number of R_z/cR_z used, in case of multiple terms. So overall, we have less T-count when implementing multiple commuting Paulis per iteration of qDRIFT. This adds to the motivation of building efficient circuits for such Hamiltonians.

(III) In subsection ‘Optimized Circuits for Quantum Chemistry’ we derive explicit quantum circuits for the two-body excitation terms appearing in the Coulomb Hamiltonian in quantum chemistry. We mainly use the Clifford+T universal fault-tolerant gate set to implement unitaries. We design efficient circuits for a different grouping of commuting Pauli operators. It is evident (Table 2) that the rotation gate cost depends on the coefficients of the Pauli summands. For some combination of coefficients the circuits derived here are optimal, in the sense, that they have the minimum (i.e. 1) number of cR_z gates. Though our focus is on reducing the non-Clifford gate count, but most of the quantum circuits derived here have an overall reduced gate count, including a reduction in the 2-qubit gates like CNOT. In Table 1 we have compared the number of gates required to implement one of the Hamiltonians considered in this paper with a previous construction. For the remaining Hamiltonians we did not find any compact previous construction to compare with. In short, our approach can be useful not only in the fault-tolerant regime, but also in the NISQ era.

Table 1 Comparison of gate counts required to simulate e^−iHt (Eqs. (19), (20)) using the circuit synthesized by us with the circuit in ref. ⁵⁰.

Full size table

(IV) In Algorithm 1, we describe a greedy method of grouping into commuting Paulis, but the objective is to optimize the number of non-Clifford gates. There have been a host of work that tackles the question of how to group the commuting Paulis and to the best of our knowledge most (if not all) of them has the objective to reduce the number of measurements required to make an estimation⁴⁴. The latter problem is especially important for variational quantum eigensolvers. The grouping that optimizes the non-Clifford gates may not optimize the number of measurements. In most cases, finding the optimal grouping is difficult. But we can always ask the question that given a grouping (for whatever objective), is it possible to compile efficient circuits. In this case, we can use our techniques (Lemma 2.4) to reduce the gate count. Thus our methods can also be used to design circuits for the measurement problem.

In this paper, we use the Jordan–Wigner (JW) transformation⁴⁵ to map from the fermionic to the qubit space. And then we group into commuting Paulis. Other transformations like Bravyi–Kitaev and parity transformations⁴⁶ can also be used and may be beneficial in circumstances where Clifford operations are costly or inherent quantum error correction is desirable. We focus on Jordan–Wigner for two reasons. First, in this paper, we focus on the synthesis of efficient quantum circuits for exponentiated commuting Paulis and the techniques hold no matter whichever mapping is considered. Second, previous work has not shown obvious advantages for Bravyi–Kitaev transformations within the domain of fault-tolerant quantum computing.

How we compare the cost of non-Clifford resources

In all the constructions discussed in this paper, two approximately implementable gates are used—R_z and controlled R_z (cR_z), whose T-count varies inversely with precision or synthesis error. From the results given in²⁹ and from the implementations performed here until the error 10⁻⁶, we believe that T-count of cR_z can be less than that of R_z for most modestly small rotation angles. However, for convenience, we assume these have equal cost and with some abuse of terms, we refer to the T-count of R_z/cR_z as the ‘(non-Clifford) rotation gate cost’. The only exactly implementable non-Clifford unitary/gate considered in the constructions is Toffoli with T-count 7²⁷ or 4⁴⁷. For low error regime, the T-count of approximately implementable R_z/cR_z will dominate, while in a high error regime the T-count of Toffoli may matter, if we use a lot of them. To reduce the T-count of compute-uncompute Toffoli pairs, we can use the temporary logical AND gadget, proposed by Gidney⁴⁸. In fact, in our circuits, we use R_z gates controlled on n qubits (n > 1), each of which can be decomposed into (compute–uncompute) pairs of NOT gates controlled on n qubits and a cR_Z gate. Each such multi-controlled NOT can be implemented with n−1 Toffoli or 4n−4 T-gates⁴⁸. If we combine compute-uncompute pairs then the overall T-count of the circuit can reduce further, by using logical AND gadget. We must keep in mind that the implementations in^47,48 use classical resources and measurements, and it is not straightforward to argue that it will give advantage, inspite of using less number of T-gates. We can also use the construction in⁴⁹ that implements an n-controlled NOT gate using 4n−4 T, 4n−3 CNOT and n−1 ancillae qubits. In our paper, we have expressed the non-Clifford T-gate cost in terms of the rotation gate cost and the number of Toffoli pairs used.

Related work

In ref. ⁵⁰ the authors studied the non-Clifford resource cost required to simulate the chemical process of biological nitrogen fixation by nitrogenase. In ref. ⁵¹ the authors developed algorithms to synthesize circuits for the Clifford operators that diagonalize a group of commuting Paulis. The goal was to reduce the two-qubit CNOT gate count because of its low fidelity and limited qubit connectivity of near-term quantum computer architectures. A similar diagonalization algorithm has been used in⁵² for efficient simulation of Hamiltonian dynamics. Much work has been done for the construction of quantum circuits for the evolution of molecular systems^{16,53,54,55,56,57,58,59} and the Heisenberg model⁶⁰.

Results

Notation

In many places we write G_(q) to denote that the gate or operator G acts on qubit q. For multi-qubit gates we write CNOT_(c, t) to denote a CNOT with control at qubit c and target at qubit t. For convenience, we have removed the parenthesis in the subscript whenever there is less ambiguity. We write [K] = {1, 2, …, K}. We denote the n × n identity matrix by ${{\mathbb{I}}}_{n}$ or ${\mathbb{I}}$ if dimension is clear from the context. We denote the set of n-qubit unitaries by ${{{{\mathcal{U}}}}}_{n}$. The size of an n-qubit unitary is N × N where N = 2ⁿ. We have given detailed description about the n-qubit Pauli operators (${{{{\mathcal{P}}}}}_{n}$), Clifford group (${{{{\mathcal{C}}}}}_{n}$) and the group (${{{{\mathcal{J}}}}}_{n}$) generated by the Clifford and T gates in Supplementary Note 1.

Optimizing Trotter-decompositions

The time evolution of a quantum system, described by a Hamiltonian H is e^−iHt. Most often the Hamiltonian H can be decomposed as the sum $H=\mathop{\sum }\nolimits_{j = 1}^{m}{\alpha }_{j}{H}_{j}$, where each H_j is Hermitian. There can be more than one decomposition of H and we select the one such that for each H_j the unitary ${e}^{-i\tau {H}_{j}}$ is efficiently implementable on a quantum computer, for any τ. The goal of the Hamiltonian simulation problem is to find an approximation of e^−itH into a sequence of ${e}^{-i\tau {H}_{j}}$, up to some desired precision. For example, using the Lie-Trotter formula⁵ we have that

$${e}^{-iHt}=\mathop{\lim }\limits_{k\to \infty }{\left(\mathop{\prod}\limits_{j}{e}^{-i(t/k){\alpha }_{j}{H}_{j}}\right)}^{k}.$$

In the non-asymptotic regime, the Trotter scheme provides a first-order approximation, with the norm of the difference between the exact and approximate time evolution scaling as O(t²/k). More advanced higher order schemes^3,4 are also available. Alternatively, a randomized approach called qDRIFT can be used in place of a Trotter formula wherein the quantum state is evolved according to the probabilistic channel

$$\rho \,\mapsto \mathop{\sum}\limits_{j}\frac{{\alpha }_{j}}{| \alpha {| }_{1}}{e}^{-i| \alpha {| }_{1}{H}_{j}t}\rho {e}^{i| \alpha {| }_{1}{H}_{j}t}.$$

(1)

Note the error here is also O(t²); however, in this case, a single exponential is performed rather than O(m) as would be needed for the comparable Trotter formula. The cost of such an approach scales as $O(| \alpha {| }_{1}^{2}{t}^{2}/\epsilon )$ for error ϵ and does not directly depend on m.

The approximation errors arising in the use of product formulas are caused by non-commuting terms in the Hamiltonian. For example, see ref. ⁶¹ for a detailed exposition on Trotter errors. Given any set of mutually commuting operators P₁, …, P_m we have the following:

$${e}^{-it\mathop{\sum }\nolimits_{j = 1}^{m}{P}_{j}}=\mathop{\prod }\limits_{j=1}^{m}{e}^{-it{P}_{j}}$$

(2)

Thus, the operators are partitioned into mutually commuting subsets. Time evolution for the sum of mutually commuting operators in each such subset is trivial, and the product formulas can be applied to the sum of Hamiltonians formed as the sum of each subset. This approach becomes especially applicable in scenarios where the Hamiltonian can be expressed as a sum of Pauli operators, for which the commutation relations can easily be evaluated.

As a specific example, consider the case where H = aZ ⊗ Z ⊗ Z. Since the Hamiltonian is diagonal, e^{−iaZ⊗Z⊗Zt} has computational basis vector $\left\vert {b}_{1},{b}_{2},{b}_{3}\right\rangle$ and eigenvalues ${e}^{-i{(-1)}^{{b}_{1}\oplus {b}_{2}\oplus {b}_{3}}at}$. Thus the eigenvalues are determined by the parity of the bit strings, which can be computed using CNOT gates. From this reasoning the following quantum circuit will perform the simulation of this Pauli operator exactly.

(3)

As every Pauli operator of weight 3 can be diagonalized by Clifford conjugation, this circuit up to an elementary basis transformation, will simulate any weight 3 Pauli Hamiltonian. The exact same strategy of diagonalizing and simulating the Pauli operator in the eigenbasis shows that each exponential of a weight ν Pauli operator Hamiltonian requires 2(ν−1) CNOT operators and one rotation gate. This strategy is at the heart of most elementary networks for simulating chemistry and spin models^53,62.

The work of⁵⁰ provided another way of thinking about these decompositions by showing an explicit method that can diagonalize sums of commuting operators that appear in chemistry simulations by transforming into a simultaneous eigenbasis of such terms. In full generality, such transformations reduce the circuit depth but need not reduce the circuit size. However, we will see here that for some Hamiltonians these transformations can reduce the circuit size as well.

As a motivating example, consider the Hamiltonian H = XX + YY + ZZ. This Hamiltonian can be simulated, up to a global phase, by

(4)

This can be implemented using two Toffoli gates and a single qubit rotation. In contrast, the standard approach from^53,62 would use three single qubit rotations and no Toffoli gates. As rotation synthesis often is 10 times more expensive than Toffoli gates^27,29,50, this will almost always be a favorable way of performing the simulation. In contrast, if this symmetry is broken then the Hamiltonian term will be more expensive to simulate. Thus it can be favorable to introduce such symmetries as needed artificially. For example, consider

$$H=XX+YY+(3/2)ZZ=(XX+YY+ZZ)+ZZ/2.$$

(5)

Such a simulation can be performed using two rotation gates rather than the 3 naïvely needed and so it makes sense to compile the Hamiltonian terms this way to reduce the overall complexity.

As another example, not all rotations are equally expensive and so we should also combine terms in such a way as to minimize the cost. For example consider the time-evolution operator

$$U(t)={e}^{-i(\pi /4\sqrt{2}-\epsilon )Z-i(\pi /4\sqrt{2}+\epsilon )X}\approx {e}^{-i\pi /4(X+Z)/\sqrt{2}}{e}^{-i(X-Z)\epsilon }.$$

(6)

While the first operation in this Trotterization is not a Clifford operation, it is a simulation of a Hadamard gate for time π/4. As this corresponds to a special angle and since the Hadamard gate can be diagonalized using a constant size H and T circuit, the cost of implementing this first term is O(1) and thus the dominant cost is the remaining rotation. In contrast, if this property were not used then we would have two arbitrary rotations in the Trotterization which would be nearly twice the cost of this simplified approach. These ideas can further be used in concert: remainder terms that arise from inexactly rounding a Hamiltonian evolution to a known cheap simulation can be absorbed into other terms or even other Trotter steps.

Algorithm 1

Hamiltonian compilation using Greedy 1-norm minimization

We propose an algorithm in Algorithm 1 that exploits this intuition through a greedy decomposition of the Hamiltonian into sums of commuting terms. These mutually commuting terms, or fragments, are chosen such that the ratio of the fraction of the Hamiltonian that is simulated by the term to the cost of the term is maximized. This choice is motivated in part by the fact that the query complexity of a quantum simulation is lower bounded by Ω(∣α∣₁t)⁶³ and thus designing circuits that simulate as large of a fraction of this one-norm as possible per quantum gate operation is a sensible optimization heuristic for our greedy algorithm. Unlike traditional approaches to partitioning the Hamiltonian, our approach allows partial allocation of Hamiltonian terms to multiple commuting sets. Further, the allocation can be negative in our approach. This negative allocation is important because we will see that in some cases the introduction of more Hamiltonian weights on some terms can be more than offset by the reduced costs of simulating the fragment.

The number of optimization steps required for our greedy algorithm is at most O(m²). To see this, assume that the optimal strategy involves μ iterations of the outer loop for μ ∈ Ω(m) and assume that the inner loop optimization requires ν iterations. Since COST ≥ 1 it holds that ${\Gamma }_{\max }\le {\sum }_{j}| {\alpha }_{j}^{{\prime} }| -{\sum }_{j}| {\alpha }_{j}^{{\prime} }-{\beta }_{j}|$. Assume that ${\sum }_{j}| {\alpha }_{j}^{{\prime} }| -{\sum }_{j}| {\alpha }_{j}^{{\prime} }-{\beta }_{j}| < | {\alpha }^{{\prime} }{| }_{\infty }$. In this case, by assumption there exists a trivial solution that outperforms this where the largest term is simulated in isolation at cost 1. Therefore we must have that ${\sum }_{j}| {\alpha }_{j}^{{\prime} }| -{\sum }_{j}| {\alpha }_{j}^{{\prime} }-{\beta }_{j}| \ge | {\alpha }^{{\prime} }{| }_{\infty }$. Then from standard norm inequalities we have that $| {\alpha }^{{\prime} }{| }_{\infty }\ge | {\alpha }^{{\prime} }{| }_{1}/m$. Thus the one-norm of the vector is given by a first-order difference equation of the form ∣α^(j+1)∣₁ ≤ (1 − 1/m)∣α^(j)∣₁. The general solution to this is (1−1/m)^j∣α∣₁ which is ϵ for $j\in O(\log (1/\epsilon )/\log (1/(1-1/m)))\in O(m\log (1/\epsilon ))$. This implies that $\mu \in O\left(m\log (1/\epsilon )\right.$. Next ν is the maximum number of iterations for the inner loop. Since each iteration continues until the total number of terms remaining is reduced by one we have that ν ∈ O(m). Thus the total number of iteration steps is $\mu \nu \in O({m}^{2}\log (1/\epsilon ))$. This shows that the algorithm scales polynomially with the number of terms if the optimization process is also efficient.

The cost of optimization can vary strongly depending on the continuity/convexity of the objective function and without making further assumptions we cannot assume that the optima over $\overrightarrow{\beta }$ can be found in polynomial time. If we assume, however, that the optimizer works by considering one of a polynomial number of potential circuits for simulating the terms and then uses linear programming to find the optimal value of $\overrightarrow{\beta }$, we have that the optimization problem can be solved in polynomial time on a classical computer. Such a choice corresponds exactly to the discussion in the next sections, where we propose the use of a discrete set of optimization strategies for simulating chemistry that can then be used within Algorithm 1 to greedily find the best possible simulation circuit given these discrete set of optimizations for the value of $\overrightarrow{\beta }$ chosen.

Truncating Hamiltonian

We can terminate Algorithm 1 before all terms are allocated i.e. we output {H_j = h_j∑_iP_i: j = 1, …, m^″} such that $\mathop{\sum }\nolimits_{j = 1}^{{m}^{{\prime\prime} }}{H}_{j}=\tilde{H}\,\ne \,H$. This leads to truncation errors in our simulation algorithm that will be present even if an algorithm such as qDRIFT is used for the simulation. We show here that if we truncate some terms of the Hamiltonian, then the error incurred is at most twice the error incurred from the complete Hamiltonian simulation by qDRIFT, given that the distance of the truncated and given Hamiltonian is at most square root of the qDRIFT simulation error. We do this because in some cases we may be able to simulate the truncated Hamiltonian with less number of gates.

Suppose we write the given Hamiltonian as follows.

$$H=\mathop{\sum }\limits_{j=1}^{M}{w}_{j}{H}_{j}+\delta H=\tilde{H}+\delta H\qquad [\parallel H\parallel \le 1]$$

(7)

Here each H_j is a Hermitian matrix for which an efficient simulation circuit exists. The protocol working with the truncated Hamiltonian $\tilde{H}$, samples each H_j independently with probability ${p}_{j}=\frac{{w}_{j}}{\lambda }$ (where λ = ∑_i∣w_i∣), in each iteration.

The error per iteration of qDRIFT, i.e. ϵ_N, is given by bounding the diamond distance between the channel ${{{{\mathcal{U}}}}}_{N}(\rho )$ corresponding to the Hamiltonian H and the channel $\tilde{{{{\mathcal{E}}}}}(\rho )$ implemented by the protocol.

Lemma 2.1

The error observed when there are N time-steps taken using a qDRIFT channel, ϵ_N, as quantified by the diamond distance as a function of the truncation error in the Hamiltonian δ is

$${\epsilon }_{N}\le \parallel \tilde{{{{\mathcal{E}}}}}({\rho} )-{{{{\mathcal{U}}}}}_{N}({\rho} ){\parallel }_{\Diamond}\le {\epsilon }_{qDRIFT}+2\delta \sqrt{{\epsilon }_{qDRIFT}}$$

where ${\epsilon }_{qDRIFT}\lessapprox \frac{2{\lambda }^{2}{t}^{2}}{{N}^{2}}$ and λ = ∑_i∣w_i∣.

The proof has been given in Supplementary Method 4 (Lemma 8). Thus the total error after all repetitions is as follows.

$$\epsilon \, \le \, N{\epsilon }_{N}\lessapprox N\frac{2{\lambda }^{2}{t}^{2}}{{N}^{2}}+2\sqrt{2}\delta N\frac{\lambda t}{N}=\frac{2{\lambda }^{2}{t}^{2}}{N}+2\sqrt{2}\delta \lambda t$$

(8)

This shows that if $\delta \in O(\sqrt{{\epsilon }_{qDRIFT}})$ then the asymptotic scaling is not impacted by the exclusion of the terms from the Hamiltonian.

Expected cost

Let the cost of implementing the unitary ${e}^{it{w}_{j}{L}_{j}/N}$ be c_j. Cost can be defined in many ways, like total number of gates, number of non-Clifford gates like T or Toffoli gate, number of multi-qubit gates like CNOT, etc. In our paper we focus mainly on the number of non-Clifford gates. Let ${{{{\mathcal{C}}}}}_{N}$ be the variable denoting the cost per repetition of our protocol. Then the expected cost and the variance per repetition is as follows.

$${\mathbb{E}}[{{{{\mathcal{C}}}}}_{N}]=\mathop{\sum }\limits_{j=1}^{M}{p}_{j}{c}_{j}=\frac{1}{\lambda }\mathop{\sum }\limits_{j=1}^{M}{w}_{j}{c}_{j}={\mu }_{N}\quad {{{\rm{and}}}}\quad \,{{\mbox{Var}}}\,[{{{{\mathcal{C}}}}}_{N}]=\frac{1}{{\lambda }^{2}}\left(\lambda \mathop{\sum }\limits_{j=1}^{M}{w}_{j}{c}_{j}^{2}-{\left(\mathop{\sum }\limits_{j = 1}^{M}{w}_{j}{c}_{j}\right)}^{2}\right)={\sigma }_{N}^{2}$$

(9)

By Chebyshev’s inequality (Supplementary Note 1) we have the following for some real number k > 0.

$$\Pr \left[| {{{{\mathcal{C}}}}}_{N}-{\mu }_{N}| \ge k{\sigma }_{N}\right]\le \frac{1}{{k}^{2}}$$

(10)

The cost per repetition of our protocol is a bounded variable i.e. $a\le {{{{\mathcal{C}}}}}_{N}\le b$, for some real numbers a, b. If ${{{\mathcal{C}}}}$ is the variable denoting the cost of all repetitions of our protocol, then

$${\mathbb{E}}[{{{\mathcal{C}}}}]=N{\mu }_{N}$$

(11)

and since each repetition is independent, making the corresponding cost variables per repetition distributed identically and independently, so we apply Hoeffding’s inequality (Supplementary Note 1) and obtain the following.

$$\Pr \left[\left\vert {{{\mathcal{C}}}}-N{\mu }_{N}\right\vert \ge cN{\mu }_{N}\right]\le 2\exp \left(-\frac{2{c}^{2}{N}^{2}{\mu }_{N}^{2}}{N{(b-a)}^{2}}\right)=2\exp \left(-\frac{2{c}^{2}N{\mu }_{N}^{2}}{{(b-a)}^{2}}\right)={\epsilon }_{c}\qquad [c > 0]$$

(12)

Thus with probability at least 1−ϵ_c, the cost of all repetitions of the protocol is at most $\frac{(c+1)N}{\lambda }\mathop{\sum }\nolimits_{j = 1}^{M}{w}_{j}{c}_{j}$, where $c=\frac{b-a}{{\mu }_{N}\sqrt{2N}}\log \left(\frac{2}{{\epsilon }_{c}}\right)$.

Error in simulation while sampling multiple Paulis

We consider the qDRIFT protocol⁴³ for simulating Hamiltonians. If H = ∑_jh_jH_j, then in each iteration we sample H_j with probability proportional to h_j and then simulate it for a short time period. Now H_j can be a single Pauli operator or a sum of commuting Paulis, as is achieved in Algorithm 1, to optimize the cost of simulation. Here we derive a bound on the difference in simulation error for these two cases.

Let ${H}_{j}=\mathop{\sum }\nolimits_{{i}_{j} = 1}^{{L}_{j}}{P}_{{i}_{j}}$—sum over commuting Paulis and M be the total number of Pauli operators. So the Hamiltonian can be written as $H=\mathop{\sum }\nolimits_{j = 1}^{L}{h}_{j}{H}_{j}=\mathop{\sum }\nolimits_{j = 1}^{L}\mathop{\sum }\nolimits_{{i}_{j} = 1}^{{L}_{j}}{h}_{j}{P}_{{i}_{j}}$. We assume the most general case where a single Pauli can be shared between multiple commuting groups i.e. H_j.

In the first case, a group of commuting Paulis i.e one of the H_j is selected independently with probability ${q}_{j}=\frac{{h}_{j}}{{\sum }_{j}{h}_{j}}$. In the second case, one single Pauli operator P_k is sampled independently with probability ${p}_{k}^{{\prime} }=\frac{{\sum }_{{j}^{{\prime} }}{h}_{{j}^{{\prime} }}}{{\sum }_{i}{h}_{i}{L}_{i}}$, where in the numerator the sum is over all the commuting Pauli groups in which P_k appears. Let λ = ∑_jh_j and ${\lambda }^{{\prime} }={\sum }_{j}{h}_{j}{L}_{j}$. We define the Liouvillian that generates unitaries under Hamiltonian H_j and ${P}_{{i}_{j}}$ so that

$${{{{\mathcal{L}}}}}_{j}=i({H}_{j}\rho -\rho {H}_{j})\quad {{{\rm{and}}}}\quad {{{{\mathcal{L}}}}}_{{i}_{j}}=i({P}_{{i}_{j}}\rho -\rho {P}_{{i}_{j}}).$$

(13)

Thus if ${{{\mathcal{L}}}}=i(H\rho -\rho H)$, then ${{{\mathcal{L}}}}=\mathop{\sum }\nolimits_{j = 1}^{L}{h}_{j}{{{{\mathcal{L}}}}}_{j}=\mathop{\sum }\nolimits_{j = 1}^{L}{h}_{j}\mathop{\sum }\nolimits_{{i}_{j} = 1}^{{L}_{j}}{{{{\mathcal{L}}}}}_{{i}_{j}}$. We define two channels ${{{{\mathcal{E}}}}}_{1}=\mathop{\sum }\nolimits_{j = 1}^{L}{q}_{j}{e}^{\tau {{{{\mathcal{L}}}}}_{j}}$ and ${{{{\mathcal{E}}}}}_{2}=\mathop{\sum }\nolimits_{j = 1}^{L}{p}_{j}\mathop{\sum }\nolimits_{{i}_{j} = 1}^{{L}_{j}}{e}^{{\tau }^{{\prime} }{{{{\mathcal{L}}}}}_{{i}_{j}}}$, where ${p}_{j}=\frac{{h}_{j}}{{\lambda }^{{\prime} }}$, that evolves the superoperators ${{{{\mathcal{L}}}}}_{j}$ and ${{{{\mathcal{L}}}}}_{ij}$ for time interval $\tau =\frac{\lambda t}{N}$ and ${\tau }^{{\prime} }=\frac{{\lambda }^{{\prime} }t}{N}$ respectively. Here we note that for the second channel, for each Pauli P_k, we have expanded the sum ${p}_{{k}^{{\prime} }}={\sum }_{{j}^{{\prime} }}\frac{{h}_{{j}^{{\prime} }}}{{\lambda }^{{\prime} }}$ to reflect the commuting groups in which it belongs. Thus $\mathop{\sum }\nolimits_{k = 1}^{M}{p}_{{k}^{{\prime} }}=\mathop{\sum }\nolimits_{j = 1}^{L}\mathop{\sum }\nolimits_{{i}_{j} = 1}^{{L}_{j}}{p}_{j}$. Then we can prove the following.

Lemma 2.2

The distance between the qDRIFT channel with single and grouped Hamiltonian terms for simulation time t using N time steps obeys

$$\parallel {{{{\mathcal{E}}}}}_{2}-{{{{\mathcal{E}}}}}_{1}{\parallel }_{\Diamond}\le \frac{4{t}^{2}{\lambda }^{{\prime} 2}}{{N}^{2}}$$

The proof has been given in Supplementary Method 4 (Lemma 7).

Optimized circuits for quantum chemistry

In this section we review quantum algorithms for quantum chemistry and design efficient circuits that are useful for quantum chemistry simulation within the Trotter–Suzuki formalism. The electronic structure problem has emerged as a central application of quantum computers in recent years, with quantum algorithms providing potential exponential speedups relative to the best known classical algorithms^50,53. The electronic structure problem more specifically is, for a fixed set of positions of the nuclei, find the configuration of electrons that minimizes the total energy for a fixed number of electrons. The properties of materials, molecules and atoms at low temperatures emerge from these energies. In the non-relativistic case, the dynamics of these electrons are governed by the Coulomb Hamiltonian.

$$H=-\mathop{\sum}\limits_{i}\frac{{\nabla }_{i}^{2}}{2}-\mathop{\sum}\limits_{i,j}\frac{{\zeta }_{j}}{| {R}_{j}-{r}_{i}| }+\mathop{\sum}\limits_{i < j}\frac{1}{| {r}_{i}-{r}_{j}| }+\mathop{\sum}\limits_{i < j}\frac{{\zeta }_{i}{\zeta }_{j}}{| {R}_{i}-{R}_{j}| }$$

where we have used atomic units, r_i represent the positions of electrons, R_i represent the positions of nuclei, and ζ_i are the charges of nuclei.

Following the strategy outlined in¹³, we select the second quantization and discretize the Hamiltonian by representing it within some canonical basis such as a Gaussian basis or a planewave basis. Under the above assumptions, the electronic Hamiltonian can be represented in terms of creation and annihilation operators as follows^64,65. Each spin orbital is assigned a (distinct) qubit where the state $\left\vert 1\right\rangle$ corresponds to an occupied orbital and $\left\vert 0\right\rangle$ an unoccupied orbital. Specifically, let ${a}_{p}^{{\dagger} }$ and a_p be the fermionic raising and lowering operators acting on spin-orbital p satisfying the anti-commutation relation $\{{a}_{p}^{{\dagger} },{a}_{q}\}={\delta }_{pq}$ and $\{{a}_{p},{a}_{q}\}=\{{a}_{p}^{{\dagger} },{a}_{q}^{{\dagger} }\}=0$,

$$H=\mathop{\sum}\limits_{p,q}{h}_{pq}{a}_{p}^{{\dagger} }{a}_{q}+\frac{1}{2}\mathop{\sum}\limits_{p,q,r,s}{h}_{pqrs}{a}_{p}^{{\dagger} }{a}_{q}^{{\dagger} }{a}_{r}{a}_{s}$$

(14)

where the coefficients h_pq, h_pqrs are determined by the discrete basis set chosen, and the sums run over the number of discretization elements or basis set for a single particle. From inspection, we can see that the number of terms in Equation (14) is O(N⁴), where N is the size of the discrete representation. The molecular orbitals are one widely used basis set. These, in turn, can be expressed as linear combinations of atomic basis functions^66,67. The coefficients of this expansion are obtained by solving the set of Hartree–Fock equations that arise from the variational minimization of the energy using a single determinant wave function. Thus in this representation the location of (indistinguishable) electrons are specified by the occupations of the discrete basis.

The Jordan–Wigner⁴⁵ or Bravyi–Kitaev⁴⁶ transformations are commonly used to convert the fermionic creation and annihilation operators into Pauli operators. For example, within the Jordan–Wigner encoding, a and a^† can be written in terms of qubit operators as follows.

$${a}_{p}={Q}_{(p)}\mathop{\prod }\limits_{j=0}^{p-1}{Z}_{(j)}=\frac{1}{2}({X}_{(p)}+i{Y}_{(p)})\mathop{\prod }\limits_{j=0}^{p-1}{Z}_{(j)}\quad {{{\rm{and}}}}\quad {a}_{p}^{{\dagger} }={Q}_{(p)}^{{\dagger} }\mathop{\prod }\limits_{j=0}^{p-1}{Z}_{(j)}=\frac{1}{2}({X}_{(p)}-i{Y}_{(p)})\mathop{\prod }\limits_{j=0}^{p-1}{Z}_{(j)}$$

Here ${Q}_{(p)}^{{\dagger} }=\frac{1}{2}({X}_{(p)}-i{Y}_{(p)})$ and ${Q}_{(p)}=\frac{1}{2}({X}_{(p)}+i{Y}_{(p)})$ are the qubit creation and annihilation operators respectively. ∏_jZ_(j) acts as an exchange-phase factor, accounting for the anti-commutation relations of a and a^†.

With these tools in place, the second-order Trotter–Suzuki approximation reads

$${e}^{-iHt}=\mathop{\prod}\limits_{p,q}{e}^{-it({h}_{pq}{a}_{p}^{{\dagger} }{a}_{q}+{h}_{qp}{a}_{q}^{{\dagger} }{a}_{q}^{{\dagger} }{a}_{p})/2}\mathop{\prod}\limits_{p,q,r,s}{e}^{-it({h}_{pqrs}{a}_{p}^{{\dagger} }{a}_{q}^{{\dagger} }{a}_{r}{a}_{s}+{h}_{srqp}{a}_{s}^{{\dagger} }{a}_{r}^{{\dagger} }{a}_{q}{a}_{p})/4}+O({t}^{2})$$

(15)

Such a simulation can then be performed by substituting in the Pauli representation yielded by the Jordan–Wigner transformation. Higher order versions of this are also known⁶³ that can achieve error scaling O(t^2k+1); however, we do not focus on such cases here since the optimizations to the operator exponentials that we consider here will apply in all such cases.

Optimizing two-body operator exponentials

The two-body terms are the most common, and often the most significant, contribution to the complexity of a simulation of the Coulomb Hamiltonian in second quantization¹⁰. In this section, we consider the general two-body double excitation terms to reduce this dominant cost for simulation of chemistry, which when expressed using the Jordan–Wigner transformation, can be written as product of X, Y, Z operators as follows⁵³. We have removed the parentheses in the subscripts, for convenience.

$${h}_{pqrs}{a}_{p}^{{\dagger} }{a}_{q}^{{\dagger} }{a}_{r}{a}_{s}+{h}_{srqp}{a}_{s}^{{\dagger} }{a}_{r}^{{\dagger} }{a}_{q}{a}_{p}=\left(\mathop{\bigotimes }\limits_{k=s+1}^{r-1}{Z}_{k}\right)\left(\mathop{\bigotimes }\limits_{k=q+1}^{p-1}{Z}_{k}\right)\left(\frac{\Re \{{h}_{pqrs}\}}{8}{H}_{r}+\frac{\Im \{{h}_{pqrs}\}}{8}{H}_{i}\right)$$

(16)

$$\begin{array}{l}\,{{\mbox{where}}}\,{H}_{r}\,=\,{X}_{s}{X}_{r}{X}_{q}{X}_{p}-{X}_{s}{X}_{r}{Y}_{q}{Y}_{p}+{X}_{s}{Y}_{r}{X}_{q}{Y}_{p}+{Y}_{s}{X}_{r}{X}_{q}{Y}_{p}\\ \qquad \qquad \qquad+\,{Y}_{s}{X}_{r}{Y}_{q}{X}_{p}-{Y}_{s}{Y}_{r}{X}_{q}{X}_{p}+{X}_{s}{Y}_{r}{Y}_{q}{X}_{p}+{Y}_{s}{Y}_{r}{Y}_{q}{Y}_{p}\\ \quad{{\mbox{and}}}\,{H}_{i}\,=\,{Y}_{s}{X}_{r}{X}_{q}{X}_{p}+{X}_{s}{Y}_{r}{X}_{q}{X}_{p}-{X}_{s}{X}_{r}{Y}_{q}{Y}_{p}-{X}_{s}{Y}_{r}{Y}_{q}{Y}_{p}\\ \qquad \qquad \qquad-\,{Y}_{s}{X}_{r}{Y}_{q}{Y}_{p}+{Y}_{s}{Y}_{r}{X}_{q}{X}_{p}+{Y}_{s}{Y}_{r}{X}_{q}{Y}_{p}+{Y}_{s}{Y}_{r}{Y}_{q}{X}_{p}\end{array}$$

(17)

Note that if a Gaussian orbital basis is chosen then the values of h_pqrs are typically real, resulting in H_i = 0. We will assume in the remainder of the discussion that such terms are zero and focus our attention on only the real part of the Hamiltonian.

If we define ${h}_{1}=({h}_{pqrs}{\delta }_{{X}_{p}{X}_{s}}{\delta }_{{X}_{q}{X}_{r}}-{h}_{qprs}{\delta }_{{X}_{p}{X}_{r}}{\delta }_{{X}_{q}{X}_{s}})$, ${h}_{2}=({h}_{psqr}{\delta }_{{X}_{p}{X}_{r}}{\delta }_{{X}_{q}{X}_{s}}-{h}_{spqr}{\delta }_{{X}_{p}{X}_{q}}{\delta }_{{X}_{r}{X}_{s}})$ and ${h}_{3}=({h}_{prsq}{\delta }_{{X}_{p}{X}_{q}}{\delta }_{{X}_{r}{X}_{s}}-{h}_{prqs}{\delta }_{{X}_{p}{X}_{s}}{\delta }_{{X}_{q}{X}_{r}})$, for distinct p, q, r, s then we have the following⁵³.

$$\begin{array}{lll}&&\frac{1}{2}\mathop{\sum}\limits_{p,q,r,s}{h}_{pqrs}\left({a}_{p}^{{\dagger} }{a}_{q}^{{\dagger} }{a}_{r}{a}_{s}+{a}_{s}^{{\dagger} }{a}_{r}^{{\dagger} }{a}_{q}{a}_{p}\right)\\ &=&\frac{1}{8}\left(\mathop{\bigotimes }\limits_{k=p+1}^{q-1}\mathop{\bigotimes }\limits_{k=r+1}^{s-1}{Z}_{k}\right)\left(({X}_{p}{X}_{q}{X}_{r}{X}_{s}+{Y}_{p}{Y}_{q}{Y}_{r}{Y}_{s})(-{h}_{1}-{h}_{2}+{h}_{3})\right.\\ &&+({X}_{p}{X}_{q}{Y}_{r}{Y}_{s}+{Y}_{p}{Y}_{q}{X}_{r}{X}_{s})({h}_{1}-{h}_{2}+{h}_{3})+({Y}_{p}{X}_{q}{Y}_{r}{X}_{s}+{X}_{p}{Y}_{q}{X}_{r}{Y}_{s})(-{h}_{1}-{h}_{2}-{h}_{3})\\ &&+\left.({Y}_{p}{X}_{q}{X}_{r}{Y}_{s}+{X}_{p}{Y}_{q}{Y}_{r}{X}_{s})(-{h}_{1}+{h}_{2}+{h}_{3})\right)\end{array}$$

(18)

Thus conventionally, the part of the Hamiltonian which can be expressed in the form of Equation (16), are broken down into groups of at most 8 commuting operators that act on the qubits in question. Each term is diagonalized by a Clifford circuit and the evolution is performed based on this, with some R_z gates. In⁵⁰ the authors diagonalize all 8 terms in the simultaneous eigenbasis and parallelizes all 8 R_z gates. This reduces the number of Clifford gates, depth, but comes at the cost of using extra 4 ancillae. Excluding the diagonalizing circuits on both sides they use 32 CNOTs and 8 R_Z. Each diagonalizing circuit uses 3 CNOT and 1 H gate. Our goal in this section is to design more efficient quantum circuits for the double excitation terms. In Table 1 we have compared the gate costs of the circuit in⁵⁰ with the circuits derived by us in each of the 3 cases considered by us. Fermionic SWAP gates^55,68 can be used to make the orbitals neigboring and hence get rid of the tensor product of Z terms. So from here on, we ignore these terms.

Let q₁, q₂, q₃, q₄ be the qubits to which the fermions in the orbitals p, q, r, s are mapped respectively. We follow the technique used in⁵⁰. W = CNOT_(1, 2)CNOT_(1, 3)CNOT_(1, 4)H₍₁₎ is the unitary diagonalizing the 8 terms in the simultaneous eigenbasis. We rewrite the Hamiltonian H with general coefficients ${a}_{0},\ldots ,{a}_{7}\subset {\mathbb{R}}$. Unless mentioned, the leftmost operator acts on qubit q₁, next ones on q₂, q₃ and the rightmost on qubit q₄.

$$\begin{array}{ll}H\,=\,{a}_{0}XXXX+{a}_{1}YYXX+{a}_{2}YXYX+{a}_{3}YXXY\\ \qquad\quad+\,{a}_{4}XYYX+{a}_{5}XYXY+{a}_{6}XXYY+{a}_{7}YYYY\end{array}$$

(19)

Then following the arguments in⁵⁰ we have the following.

$$\begin{array}{ll}{e}^{-iHt}\,=\,W\left({e}^{-i{a}_{0}Z{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{e}^{i{a}_{1}ZZ{\mathbb{I}}{\mathbb{I}}t}{e}^{i{a}_{2}Z{\mathbb{I}}Z{\mathbb{I}}t}{e}^{i{a}_{3}Z{\mathbb{I}}{\mathbb{I}}Zt}{e}^{i{a}_{4}ZZZ{\mathbb{I}}t}{e}^{i{a}_{5}ZZ{\mathbb{I}}Zt}{e}^{i{a}_{6}Z{\mathbb{I}}ZZt}{e}^{-{a}_{7}ZZZZt}\right){W}^{{\dagger} }\\ \qquad\quad=\,W{e}^{i(-{a}_{0}Z{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}+{a}_{1}ZZ{\mathbb{I}}{\mathbb{I}}+{a}_{2}Z{\mathbb{I}}Z{\mathbb{I}}+{a}_{3}Z{\mathbb{I}}{\mathbb{I}}Z+{a}_{4}ZZZ{\mathbb{I}}+{a}_{5}ZZ{\mathbb{I}}Z+{a}_{6}Z{\mathbb{I}}ZZ-{a}_{7}ZZZZ)t}{W}^{{\dagger} }\end{array}$$

(20)

The terms in between W and W^† add an overall phase ϕ. We denote the state of the qubits q₁, …, q₄ after application of W by variables x₁, …, x₄ respectively. It is sufficient to analyse the phase when the state is in the standard basis. Consider ${e}^{-i{a}_{0}Z{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}$ - this term contributes a phase of − a₀t if x₁ = 0 and a₀t if x₁ = 1. Similarly ${e}^{i{a}_{1}ZZ{\mathbb{I}}{\mathbb{I}}t}$ contributes a phase of a₁t if x₁ ⊕ x₂ = 0 and vice versa. Thus we can have the following expression for the overall phase.

$$\begin{array}{rcl}\phi &=&\left(-{(-1)}^{{x}_{1}}{a}_{0}+{(-1)}^{{x}_{1}\oplus {x}_{2}}{a}_{1}+{(-1)}^{{x}_{1}\oplus {x}_{3}}{a}_{2}+{(-1)}^{{x}_{1}\oplus {x}_{4}}{a}_{3}+{(-1)}^{{x}_{1}\oplus {x}_{2}\oplus {x}_{3}}{a}_{4}+{(-1)}^{{x}_{1}\oplus {x}_{2}\oplus {x}_{4}}{a}_{5}\right.\\ &&\left.+\,{(-1)}^{{x}_{1}\oplus {x}_{3}\oplus {x}_{4}}{a}_{6}-{(-1)}^{{x}_{1}\oplus {x}_{2}\oplus {x}_{3}\oplus {x}_{4}}{a}_{7}\right)\end{array}$$

(21)

For different values of a₀, …a₇ we get different value of overall phase and different circuits. We consider the following three cases. It is easy to see that ${\phi }_{{x}_{1} = 1}=-{\phi }_{{x}_{1} = 0}$. So in all the cases below it is sufficient to calculate the phase while setting x₁ = 0.

Case I

Let a₁t = a₆t = − θ and the remaining a₀t = a₂t = … = a₅t = a₇t = θ. Then we can verify that ϕ = 8θ if x₁ = 1, x₂ = 0, x₃ = x₄ = 1, ϕ = − 8θ if x₁ = 0, x₂ = 0, x₃ = x₄ = 1 and ϕ = 0 for the remaining values of x₁, …, x₄. Then the quantum circuit simulating e^−iHt is shown in Fig. 1a and b.

**Fig. 1: Quantum circuit simulating e^−iHt.**

Case II

Let a₀t = … = a₇t = θ. If x₂ = 1, x₃ ⊕ x₄ = 0 then ϕ = 0 and if x₂ = 0, x₃ ⊕ x₄ = 0 then $\phi ={(-1)}^{{x}_{3}}4\theta$. When x₃ ⊕ x₄ = 1 then $\phi =-2\theta +{(-1)}^{{x}_{2}}2\theta$. This is equal to − 4θ if x₂ = 1. The quantum circuit simulating e^−iHt is shown in Fig. 1c.

Case III

Let a₀t = a₇t = − h₁ − h₂ + h₃, a₁t = a₆t = h₁ − h₂ + h₃, a₂t = a₅t = − h₁ − h₂ − h₃ and a₃t = a₄t = − h₁ + h₂ + h₃, as shown in Equation (18). It can be verified that ${\phi }_{{x}_{2} = {x}_{3} = 1,{x}_{4} = 0}=8{h}_{2}$, ${\phi }_{{x}_{2} = 0,{x}_{3} = {x}_{4} = 1}=8{h}_{1}$ and ${\phi }_{{x}_{2} = {x}_{4} = 1,{x}_{3} = 0}=-8{h}_{3}$. For every other values of x₂, x₃, x₄, ϕ = 0. The corresponding quantum circuit simulating e^−iHt has been shown in Fig. 1d, e, f.

We already remarked that we can ignore the product of Z terms in Equations (16) and (18) by using fermionic SWAP gates. Now if we take two Hamiltonians of the form (19) having some overlapping qubits, then we can get different Hamiltonians by rearranging the commuting Paulis. In the next few subsections we design circuits for the corresponding exponentials of these Hamiltonians. We must keep in mind that in the following subsections P₀ = X and P₁ = Y, $\overline{i}=i+1\,{{\mathrm{mod}}}\,\,2$. Table 2 summarizes the number of non-Clifford gates used to implement the various circuits. All rotation gates with n ( >1) controls, can be decomposed into cR_z (single control) and NOT with n controls, each of which can be decomposed into n − 1 Toffolis (as shown in Fig. 1b). We have discussed in ‘Introduction’ about special gadgets that can be used to further reduce the T-count of the circuits. In Fig. 1e, 1f we show how Toffolis can be reduced in segments of the circuits. Our circuits have less gates (even the Clifford gates), compared to⁵⁰ or the approaches where we synthesize circuit for each exponentiated Pauli and then concatenate them. In fact, we show the dependence of the circuit size or Clifford and non-Clifford gate cost on the coefficients of the commuting Paulis in the Hamiltonian expression.

Table 2 The first table shows the number of R_z, cR_z and Toffoli (Toff.) pairs used to design the circuits implementing ${e}^{-i{H}^{{\prime} }t}$, where ${H}^{{\prime} }$ are the Hamiltonians (Ham.) described in Section ‘Results’.

Full size table

Overlap on 1 qubit

Previously, we provided an analysis of the circuits for cases where many of the Hamiltonian coefficients are chosen to follow regular patterns and see that the costs of the simulation can be reduced through the use of these techniques. Here we provide a more aggressive strategy wherein we combine multiple commuting terms together and find particular combinations of angles such that the simulation circuits are efficient. The results are summarized in Table 2. We consider the case when there is overlap on 1 qubit. We can have the following sets of commuting Paulis.

$${G}_{1y}=\{{P}_{i}{P}_{j}{P}_{k}Y{\mathbb{I}}{\mathbb{I}}{\mathbb{I}},{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}Y{P}_{a}{P}_{b}{P}_{c}:i+j+k\equiv 1\,{{\mathrm{mod}}}\,\,2,a+b+c\equiv 1\,{{\mathrm{mod}}}\,\,2\}$$

(22)

$${G}_{1x}=\{{P}_{i}{P}_{j}{P}_{k}X{\mathbb{I}}{\mathbb{I}}{\mathbb{I}},{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}X{P}_{a}{P}_{b}{P}_{c}:i+j+k\equiv 0\,{{\mathrm{mod}}}\,\,2,a+b+c\equiv 0\,{{\mathrm{mod}}}\,\,2\}$$

(23)

Without loss of generality, we assume that the leftmost operator acts on qubit q₁, next one on q₂ and so on - rightmost one acts on qubit q₇. We denote a state vector as $\left\vert {Q}_{1}v{Q}_{2}\right\rangle$ where ${Q}_{1}=\left\vert {q}_{1}{q}_{2}{q}_{3}\right\rangle$, ${Q}_{2}=\left\vert {q}_{5}{q}_{6}{q}_{7}\right\rangle$ and v, q₁, …, q₇ ∈ {0, 1}. We can have the following Hamiltonian terms, expressed as sums of commuting Paulis from the above two sets.

$$\begin{array}{rcl}{H}_{1y}&=&{a}_{3}YXXY{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}+{a}_{5}XYXY{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}+{a}_{6}XXYY{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}+{a}_{7}YYYY{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}\\ &&+{b}_{1}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}YYXX+{b}_{2}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}YXYX+{b}_{3}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}YXXY+{b}_{7}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}YYYY\end{array}$$

(24)

$$\begin{array}{rcl}{H}_{1x}&=&{a}_{0}XXXX{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}+{a}_{1}YYXX{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}+{a}_{2}YXYX{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}+{a}_{4}XYYX{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}\\ &&+{b}_{0}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}XXXX+{b}_{4}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}XYYX+{b}_{5}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}XYXY+{b}_{6}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}XXYY\end{array}$$

(25)

Circuit for simulating ${e}^{-i{H}_{1y}t}$

Let W_1y be the unitary consisting of the following sequence of gates. The rightmost one is the first to be applied. With a slight abuse of notation we denote CNOT_(4, 1)CNOT_(4, 2)CNOT_(4, 3) by CNOT_(4, I) and CNOT_(4, 5)CNOT_(4, 6)CNOT_(4, 7) by CNOT_(4, II).

$${W}_{1y}=CNO{T}_{(4,I)}{H}_{(4)}{Z}_{(4)}CNO{T}_{(4,I)}CNO{T}_{(4,II)}{H}_{(4)}CNO{T}_{(4,I)}$$

In the following theorem we show that this is a diagonalizing circuit for the set of Paulis in G_1y.

Theorem 2.1

For each $i,j,k,l,a,b,c\in {{\mathbb{Z}}}_{2}$, such that ${P}_{i}{P}_{j}{P}_{k}Y{\mathbb{I}}{\mathbb{I}}{\mathbb{I}},{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}Y{P}_{a}{P}_{b}{P}_{c}\in {G}_{1y}$ we have the following.

$$\begin{array}{rcl}&&{\sqrt{-1}}^{i+j+k+1}{W}_{1y}\left({Z}_{(1)}^{i}{Z}_{(2)}^{j}{Z}_{(3)}^{k}{Z}_{(4)}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}\right){W}_{1y}^{{\dagger} }={P}_{i}{P}_{j}{P}_{k}Y{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}\\ \,{{\mbox{and}}}\,&&{\sqrt{-1}}^{a+b+c+1}{W}_{1y}\left({\mathbb{I}}{\mathbb{I}}{\mathbb{I}}{Z}_{(4)}{Z}_{(5)}^{i}{Z}_{(6)}^{j}{Z}_{(7)}^{k}\right){W}_{1y}^{{\dagger} }={\mathbb{I}}{\mathbb{I}}{\mathbb{I}}Y{P}_{a}{P}_{b}{P}_{c}\end{array}$$

We prove this theorem by showing that the operators on the LHS and RHS have equivalent actions on the eigenstates corresponding to an eigenbasis for the Paulis in G_1y. The proof of this theorem has been given in Theorem 1 of Supplementary Method 1. The eigenbasis has been shown in Lemma 1 of Supplementary Method 1.

Thus we have the following.

$$\begin{array}{rcl}{e}^{-i{H}_{1y}t}&=&{e}^{-i(-{a}_{3}{W}_{1y}(Z{\mathbb{I}}{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}){W}_{1y}^{{\dagger} }-{a}_{5}{W}_{1y}({\mathbb{I}}Z{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}){W}_{1y}^{{\dagger} }-{a}_{6}{W}_{1y}({\mathbb{I}}{\mathbb{I}}ZZ{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}){W}_{1y}^{{\dagger} }+{a}_{7}{W}_{1y}(ZZZZ{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}){W}_{1y}^{{\dagger} })t}\\ &&\cdot {e}^{-i(-{b}_{1}{W}_{1y}({\mathbb{I}}{\mathbb{I}}{\mathbb{I}}ZZ{\mathbb{I}}{\mathbb{I}}){W}_{1y}^{{\dagger} }-{b}_{2}{W}_{1y}({\mathbb{I}}{\mathbb{I}}{\mathbb{I}}Z{\mathbb{I}}Z{\mathbb{I}}){W}_{1y}^{{\dagger} }-{b}_{3}{W}_{1y}({\mathbb{I}}{\mathbb{I}}{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}Z){W}_{1y}^{{\dagger} }+{b}_{7}{W}_{1y}({\mathbb{I}}{\mathbb{I}}{\mathbb{I}}ZZZZ){W}_{1y}^{{\dagger} })t}\\ &=&{W}_{1y}{e}^{i{a}_{3}Z{\mathbb{I}}{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{e}^{i{a}_{5}{\mathbb{I}}Z{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{e}^{i{a}_{6}{\mathbb{I}}{\mathbb{I}}ZZ{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{e}^{-i{a}_{7}ZZZZ{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{e}^{i{b}_{1}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}ZZ{\mathbb{I}}{\mathbb{I}}t}{e}^{i{b}_{2}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}Z{\mathbb{I}}Z{\mathbb{I}}t}{e}^{i{b}_{3}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}Zt}{e}^{-i{b}_{7}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}ZZZZt}{W}_{1y}^{{\dagger} }\end{array}$$

The state of the qubits q₁, …, q₇ after the application of W_1y is denoted by variables x₁, …, x₇ respectively. We have the following expression for the overall phase incurred between W_1y and ${W}_{1y}^{{\dagger} }$.

$$\begin{array}{rcl}\phi ={(-1)}^{{x}_{4}\oplus {x}_{1}}{a}_{3}t+{(-1)}^{{x}_{4}\oplus {x}_{2}}{a}_{5}t+{(-1)}^{{x}_{4}\oplus {x}_{3}}{a}_{6}t-{(-1)}^{{x}_{4}\oplus {x}_{1}\oplus {x}_{2}\oplus {x}_{3}}{a}_{7}t\\ +{(-1)}^{{x}_{4}\oplus {x}_{7}}{b}_{3}t+{(-1)}^{{x}_{4}\oplus {x}_{6}}{b}_{2}t+{(-1)}^{{x}_{4}\oplus {x}_{5}}{b}_{1}t-{(-1)}^{{x}_{4}\oplus {x}_{5}\oplus {x}_{6}\oplus {x}_{7}}{b}_{7}t\end{array}$$

It is easy to check that ${\phi }_{\overline{{x}_{4}}}=-{\phi }_{{x}_{4}}$. We consider the following three cases and it is sufficient to check the phase values when x₄ = 0.

Case I

Let a₆t = − θ₁, b₁t = − θ₂, a₃t = a₅t = a₇t = θ₁ and b₂t = b₃t = b₇t = θ₂. Following the conventions and explanations given for Case I we have the following overall phase after the application of W_1y. We can write ϕ = f₁(θ₁) + f₂(θ), for two functions f₁ and f₂. The following can be verified.

1.
If x₁ = x₂ = 0 and x₃ = 1 then ϕ = 4θ₁ + f₂(θ₂). Analogously, if ϕ = f₁(θ₁) + 4θ₂ when x₇ = x₆ = 0, x₅ = 1.
2.
If x₁ = x₂ = 1 and x₃ = 0 then ϕ = − 4θ₁ + f₂(θ₂) and if x₇ = x₆ = 1, x₅ = 0 then ϕ = f₁(θ₁) − 4θ₂.
3.
For any other values of x₁, x₂, x₃, ϕ = f₂(θ₂) and analogously, for any other values of x₇, x₆, x₅, ϕ₌f₁(θ₁).

A quantum circuit simulating ${e}^{-i{H}_{1y}t}$ is shown in Fig. 2a.

**Fig. 2: Quantum circuit for ${e}^{-i{H}_{1y}t}$ and ${e}^{-i{H}_{1x1}t}$.**

Case II

Now we consider the case when a₆t = a₃t = a₅t = a₇t = θ₁ and b₁t = b₂t = b₃t = b₇t = θ₂. Here also ϕ can be written as sum of two functions : ϕ = f₁(θ₁) + f₂(θ₂). We can make the following observations.

1.
If only one of x₁, x₂, x₃ is 1 then ϕ = 2θ₁ + f₂(θ₂) and analogously, if any one of x₅, x₆, x₇ is 1 then ϕ = f₁(θ₁) + 2θ₂.
2.
If any two of x₁, x₂, x₃ is 1 then ϕ = − 2θ₁ + f₂(θ₂). Similarly, if any two of x₅, x₆, x₇ is 1 then ϕ = f₁(θ₁) − 2θ₂.
3.
If x₁ = x₂ = x₃ = 0 then ϕ = 2θ₁ + f₂(θ₂) and similarly, if x₅ = x₆ = x₇ = 0 then ϕ = f₁(θ₁) + 2θ₂.
4.
If x₁ = x₂ = x₃ = 1 then ϕ = − 2θ₁ + f₂(θ₂) and analogously, if x₅ = x₆ = x₇ = 1 then ϕ = f₁(θ₁) − 2θ₂.

A circuit simulating ${e}^{-i{H}_{1y}t}$ in this case, has been shown in Fig. 2b.

Case III

Let a₃t = − h₁ + h₂ + h₃, a₅t = − h₁ − h₂ − h₃, a₆t = h₁ − h₂ + h₃, a₇t = − h₁ − h₂ + h₃ and b₃t = − g₁ + g₂ + g₃, b₂t = − g₁ − g₂ − g₃, b₁t = g₁ − g₂ + g₃, b₇t = − g₁ − g₂ + g₃(Equation (18)). Let h = (h₁, h₂, h₃) and g = (g₁, g₂, g₃). We can write ϕ = f₁(h) + f₂(g). We can make the following observations.

1.
If x₁ = x₂ = x₃ then ϕ = f₂((g)) and analogously, if x₅ = x₆ = x₇ then ϕ = f₁(h).
2.
Suppose x_i = x_j and x_k ≠ x_i, where i, j, k ∈ {1, 2, 3} and i ≠ j ≠ k. Then flipping the values changes the sign. For example, if ${\phi }_{{x}_{1} = {x}_{2} = 0,{x}_{3} = 1}={f}_{1}({{{\bf{h}}}})+{f}_{2}({{{\bf{g}}}})$, then ${\phi }_{{x}_{1} = {x}_{2} = 1,{x}_{3} = 0}=-{f}_{1}({{{\bf{h}}}})+{f}_{2}({{{\bf{g}}}})$. Similar phenomenon occurs if i, j, k ∈ {7, 6, 5}, except this time sign of f₂(g) flips.. So it is sufficient to consider the case when two variables are 1.
$$\begin{array}{rcl}{\phi }_{{x}_{1} = {x}_{2} = 1,{x}_{3} = 0}=4{h}_{1}+{f}_{2}({{{\bf{g}}}}),&&{\phi }_{{x}_{7} = {x}_{6} = 1,{x}_{5} = 0}={f}_{1}({{{\bf{h}}}})+4{g}_{1}\\ {\phi }_{{x}_{2} = {x}_{3} = 1,{x}_{1} = 0}=4{h}_{2}+{f}_{2}({{{\bf{g}}}}),&&{\phi }_{{x}_{6} = {x}_{5} = 1,{x}_{7} = 0}={f}_{1}({{{\bf{h}}}})+4{g}_{2}\\ {\phi }_{{x}_{3} = {x}_{1} = 1,{x}_{2} = 0}=-4{h}_{3}+{f}_{2}({{{\bf{g}}}}),&&{\phi }_{{x}_{5} = {x}_{7} = 1,{x}_{6} = 1}={f}_{1}({{{\bf{h}}}})-4{g}_{3}\end{array}$$

A circuit simulating ${e}^{-i{H}_{1y}t}$ in this case, has been shown in Fig. 2c.

Circuit for simulating ${e}^{-i{H}_{1x}t}$

An eigenbasis for the Paulis in G_1x has been given in Lemma 2 of Supplementary Method 1. But we are unable to find out (by hand) a unitary (analogous to W_1y) that diagonalizes the set of commuting Paulis in G_1x, as we did in the previous subsection for G_1y. So we divide the commuting Paulis into two groups of 4-qubit Paulis, i.e. we consider the following two sets.

$$\begin{array}{ll}{G}_{1x1}\,=\,\{{P}_{i}{P}_{j}{P}_{k}X{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}:i+j+k\equiv 0\,{{\mathrm{mod}}}\,\,2.\}\\ {G}_{1x2}\,=\,\{{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}X{P}_{a}{P}_{b}{P}_{c}:a+b+c\equiv 0\,{{\mathrm{mod}}}\,\,2.\}\end{array}$$

and the following two Hamiltonians

$$\begin{array}{ll}{H}_{1x1}\,=\,{a}_{0}XXXX{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}+{a}_{1}YYXX{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}+{a}_{2}YXYX{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}+{a}_{4}XYYX{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}\\ {H}_{1x2}\,=\,{b}_{0}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}XXXX+{b}_{4}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}XYYX+{b}_{5}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}XYXY+{b}_{6}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}XXYY\end{array}$$

We can use the diagonalizing circuit of⁵⁰ and have the following.

$$\begin{array}{ll}{e}^{-i{H}_{1x1}t}\,=\,{W}_{1x1}{e}^{-i{a}_{0}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{e}^{i{a}_{1}ZZ{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{e}^{i{a}_{2}Z{\mathbb{I}}ZZ{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{e}^{i{a}_{4}{\mathbb{I}}ZZZ{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{W}_{1x1}^{{\dagger} }\\ {e}^{-i{H}_{1x2}t}\,=\,{W}_{1x2}{e}^{-i{b}_{0}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{e}^{i{b}_{4}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}ZZZ{\mathbb{I}}t}{e}^{i{b}_{5}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}ZZ{\mathbb{I}}Zt}{e}^{i{b}_{6}{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}Z{\mathbb{I}}ZZt}{W}_{1x2}^{{\dagger} }\end{array}$$

where W_1x1 = CNOT_(4, 1)CNOT_(4, 2)CNOT_(4, 3)H₍₄₎ and W_1x2 = CNOT_(4, 5)CNOT_(4, 6)CNOT_(4, 7)H₍₄₎, where the rightmost gate is the first one to be applied. We denote the state of the qubits q₁, …, q₄ after the application of W_1x1 by the variables x₁, …, x₄ respectively. Also, the variables ${x}_{4}^{{\prime} },\ldots ,{x}_{7}^{{\prime} }$ denote the state of the qubits q₄, …, q₇, respectively, after the application of W_1x2. We have the following expression for the overall phase incurred between W_1x1, ${W}_{1x1}^{{\dagger} }$ and between W_1x2, ${W}_{1x2}^{{\dagger} }$.

$$\begin{array}{ll}{\phi }_{1}\,=\,-{(-1)}^{{x}_{4}}{a}_{0}t+{(-1)}^{{x}_{4}\oplus {x}_{1}\oplus {x}_{2}}{a}_{1}t+{(-1)}^{{x}_{4}\oplus {x}_{1}\oplus {x}_{3}}{a}_{2}t+{(-1)}^{{x}_{4}\oplus {x}_{2}\oplus {x}_{3}}{a}_{4}t\\ {\phi }_{2}\,=\,-{(-1)}^{{x}_{4}^{{\prime} }}{b}_{0}t+{(-1)}^{{x}_{4}^{{\prime} }\oplus {x}_{5}^{{\prime} }\oplus {x}_{6}^{{\prime} }}{b}_{4}t+{(-1)}^{{x}_{4}^{{\prime} }\oplus {x}_{5}^{{\prime} }\oplus {x}_{7}^{{\prime} }}{b}_{5}t+{(-1)}^{{x}_{4}^{{\prime} }\oplus {x}_{6}^{{\prime} }\oplus {x}_{7}^{{\prime} }}{b}_{6}t\end{array}$$

We consider the following three cases, in each of which ${\phi }_{1,\overline{{x}_{4}}}=-{\phi }_{1,{x}_{4}}$ and ${\phi }_{2,\overline{{x}_{4}^{{\prime} }}}={\phi }_{2,{x}_{4}^{{\prime} }}$.

Case I

Let a₁t = − θ₁, b₆t = − θ₂, a₀t = a₂t = a₄t = θ₁ and b₀t = b₄t = b₅t = θ₂. It is easy to verify that a non-zero phase ϕ₁ = − 4θ₁ exists if and only if x₁ = x₂ ≠ x₃. Analogously, ϕ₂ = − 4θ₂ if ${x}_{7}^{{\prime} }={x}_{6}^{{\prime} }\ne {x}_{5}^{{\prime} }$, else it is 0.

Case II

Now we consider the case when a₀t = a₁t = a₂t = a₄t = θ₁, b₀t = b₄t = b₅t = b₆t = θ₂. If x₁ = x₂ = x₃ then ϕ₁ = 2θ₁, else it is − 2θ₁. Similarly for ϕ₂.

Case III

Next we consider the case where a₀t = − h₁ − h₂ + h₃, a₁t = h₁ − h₂ + h₃, a₂t = − h₁ − h₂ − h₃, a₄t = − h₁ + h₂ + h₃, and b₀t = − g₁ − g₂ + g₃, b₄t = − g₁ + g₂ + g₃, b₅t = − g₁ − g₂ − g₃, b₆t = g₁ − g₂ + g₃. Here, non-zero phase exists if any two of the variable have same value.

$$\begin{array}{rcl}{\phi }_{1}({x}_{1}={x}_{2}\,\ne\, {x}_{3})=4{h}_{1};&&{\phi }_{1}({x}_{2}={x}_{3}\,\ne\, {x}_{1})=4{h}_{2};\quad {\phi }_{1}({x}_{1}={x}_{3}\,\ne\, {x}_{2})=-4{h}_{3};\\ {\phi }_{2}({x}_{5}^{{\prime} }={x}_{6}^{{\prime} }\,\ne\, {x}_{7}^{{\prime} })=4{g}_{2};&&{\phi }_{2}({x}_{6}^{{\prime} }={x}_{7}^{{\prime} }\,\ne\, {x}_{5}^{{\prime} })=4{g}_{1};\quad {\phi }_{2}({x}_{5}^{{\prime} }={x}_{7}^{{\prime} }\,\ne\, {x}_{6}^{{\prime} })=-4{g}_{3};\end{array}$$

Circuits simulating ${e}^{-i{H}_{1x1}t}$ in Case I, II and III have been shown in Fig. 2d, e and f respectively. Circuits for ${e}^{-i{H}_{1x2}t}$ are similar. Circuit for ${e}^{-i{H}_{1x}t}$ in each case is obtained by concatenating the corresponding circuits.

Overlap on 2 qubits

In general, the more options that we have for grouping mutually commuting terms the more effective our compilation strategy will be. While the most natural case to examine is the case where all of the Hamiltonian terms act on disjoint sets of qubits, Hamiltonian terms can commute if they overlap on only two qubits as well. For example, we can have the following sets of commuting Pauli operations

$${G}_{21}=\{{P}_{k}{P}_{l}{P}_{i}{P}_{j}{\mathbb{I}}{\mathbb{I}},{\mathbb{I}}{\mathbb{I}}{P}_{i}{P}_{j}{P}_{k}{P}_{l}:i+j\equiv 1\,{{\mathrm{mod}}}\,\,2,k,l=i,j\,{{\mbox{or}}}\,\overline{i},\overline{j}\,\,{{\mbox{respectively}}}\,\}$$

(26)

$${G}_{20}=\{{P}_{k}{P}_{l}{P}_{i}{P}_{j}{\mathbb{I}}{\mathbb{I}},{\mathbb{I}}{\mathbb{I}}{P}_{i}{P}_{j}{P}_{k}{P}_{l}:i+j\equiv 0\,{{\mathrm{mod}}}\,\,2,k,l=i,j\,{{\mbox{or}}}\,\overline{i},\overline{j}\,\,{{\mbox{respectively}}}\,\}$$

(27)

Without loss of generality, we assume that the leftmost operator acts on qubit q₁, next one on q₂ and so on - rightmost one acts on qubit q₆. We denote a state vector as $\left\vert {Q}_{1}{Q}_{2}{Q}_{3}\right\rangle$ where ${Q}_{1}=\left\vert {q}_{1}{q}_{2}\right\rangle$, ${Q}_{2}=\left\vert {q}_{3}{q}_{4}\right\rangle$ and ${Q}_{3}=\left\vert {q}_{5}{q}_{6}\right\rangle$ are the first, second and third pairs of qubits respectively. We can have the following Hamiltonian terms, expressed as sums of commuting Paulis from the above two sets.

$$\begin{array}{ll}{H}_{21}\,=\,{a}_{2}YXYX{\mathbb{I}}{\mathbb{I}}+{a}_{3}YXXY{\mathbb{I}}{\mathbb{I}}+{a}_{4}XYYX{\mathbb{I}}{\mathbb{I}}+{a}_{5}XYXY{\mathbb{I}}{\mathbb{I}}\\ \qquad\quad+\,{b}_{2}{\mathbb{I}}{\mathbb{I}}YXYX+{b}_{3}{\mathbb{I}}{\mathbb{I}}XYYX+{b}_{4}{\mathbb{I}}{\mathbb{I}}YXXY+{b}_{5}{\mathbb{I}}{\mathbb{I}}XYXY\end{array}$$

(28)

$$\begin{array}{ll}{H}_{20}\,=\,{a}_{0}XXXX{\mathbb{I}}{\mathbb{I}}+{a}_{1}YYXX{\mathbb{I}}{\mathbb{I}}+{a}_{6}XXYY{\mathbb{I}}{\mathbb{I}}+{a}_{7}YYYY{\mathbb{I}}{\mathbb{I}}\\ \qquad\quad+\,{b}_{0}{\mathbb{I}}{\mathbb{I}}XXXX+{b}_{6}{\mathbb{I}}{\mathbb{I}}YYXX+{b}_{1}{\mathbb{I}}{\mathbb{I}}XXYY+{b}_{7}{\mathbb{I}}{\mathbb{I}}YYYY\end{array}$$

(29)

Circuit for simulating ${e}^{-i{H}_{21}t}$

As before our simulation strategy involves diagonalizing the Hamiltonian using a Clifford circuit and then build Let W₁ be the unitary consisting of the following sequence of gates. The rightmost one is the first to be applied.

$${W}_{1}=CNO{T}_{(3,1)}CNO{T}_{(3,4)}{H}_{(3)}{Z}_{(3)}CNO{T}_{(3,1)}CNO{T}_{(3,5)}{H}_{(3)}CNO{T}_{(3,1)}$$

The following theorem shows that this is a diagonalizing circuit for the set of Paulis in G₂₁.

Theorem 2.2

For each $i,j,k,l\in {{\mathbb{Z}}}_{2}$, such that ${P}_{k}{P}_{l}{P}_{i}{P}_{j}{\mathbb{I}}{\mathbb{I}},{\mathbb{I}}{\mathbb{I}}{P}_{i}{P}_{j}{P}_{k}{P}_{l}\in {G}_{21}$ we have the following.

$$\begin{array}{rcl}&&{\sqrt{-1}}^{i+j+k+l}{W}_{1}\left({Z}_{(1)}^{k}{Z}_{(2)}^{l}{Z}_{(3)}{Z}_{(4)}^{j}{\mathbb{I}}{\mathbb{I}}\right){W}_{1}^{{\dagger} }={P}_{k}{P}_{l}{P}_{i}{P}_{j}{\mathbb{I}}{\mathbb{I}}\\ \,{{\mbox{and}}}\,&&{\sqrt{-1}}^{i+j+k+l}{W}_{1}\left({\mathbb{I}}{\mathbb{I}}{Z}_{(3)}{Z}_{(4)}^{j}{Z}_{(5)}^{k}{Z}_{(6)}^{l}\right){W}_{1}^{{\dagger} }={\mathbb{I}}{\mathbb{I}}{P}_{i}{P}_{j}{P}_{k}{P}_{l}\end{array}$$

The proof is similar to Theorem 2.1 and has been given in Supplementary Method 2. Theorem 2.2 then gives us the following.

$$\begin{array}{ll}{e}^{-i{H}_{21}t}\,=\,{e}^{-i(-{a}_{2}{W}_{1}(Z{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}){W}_{1}^{{\dagger} }-{a}_{3}{W}_{1}(Z{\mathbb{I}}ZZ{\mathbb{I}}{\mathbb{I}}){W}_{1}^{{\dagger} }-{a}_{4}{W}_{1}({\mathbb{I}}ZZ{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}){W}_{1}^{{\dagger} }-{a}_{5}{W}_{1}({\mathbb{I}}ZZZ{\mathbb{I}}{\mathbb{I}}){W}_{1}^{{\dagger} })t}\\ \qquad\qquad\,\cdot {e}^{-i(-{b}_{2}{W}_{1}({\mathbb{I}}{\mathbb{I}}Z{\mathbb{I}}Z{\mathbb{I}}){W}_{1}^{{\dagger} }-{b}_{3}{W}_{1}({\mathbb{I}}{\mathbb{I}}ZZZ{\mathbb{I}}){W}_{1}^{{\dagger} }-{b}_{4}{W}_{1}({\mathbb{I}}{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}Z){W}_{1}^{{\dagger} }-{b}_{5}{W}_{1}({\mathbb{I}}{\mathbb{I}}ZZ{\mathbb{I}}Z){W}_{1}^{{\dagger} })t}\\ \qquad\quad=\,{W}_{1}{e}^{i{a}_{2}Z{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{e}^{{a}_{3}Z{\mathbb{I}}ZZ{\mathbb{I}}{\mathbb{I}}t}{e}^{{a}_{4}{\mathbb{I}}ZZ{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{e}^{{a}_{5}{\mathbb{I}}ZZZ{\mathbb{I}}{\mathbb{I}}t}{e}^{i{b}_{2}{\mathbb{I}}{\mathbb{I}}Z{\mathbb{I}}Z{\mathbb{I}}t}{e}^{{b}_{3}{\mathbb{I}}{\mathbb{I}}ZZZ{\mathbb{I}}t}{e}^{{b}_{4}{\mathbb{I}}{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}Zt}{e}^{{b}_{5}{\mathbb{I}}{\mathbb{I}}ZZ{\mathbb{I}}Zt}{W}_{1}^{{\dagger} }\end{array}$$

We denote the state of the qubits q₁, …, q₆ after the application of W₁ by the variables x₁, …, x₆ respectively. We have the following expression for the overall phase incurred between W₁ and ${W}_{1}^{{\dagger} }$.

$$\begin{array}{ll}\phi \,=\,{(-1)}^{{x}_{3}\oplus {x}_{1}}{a}_{2}t+{(-1)}^{{x}_{3}\oplus {x}_{2}}{a}_{4}t+{(-1)}^{{x}_{3}\oplus {x}_{4}\oplus {x}_{1}}{a}_{3}t+{(-1)}^{{x}_{3}\oplus {x}_{4}\oplus {x}_{2}}{a}_{5}t\\ \qquad\,+\,{(-1)}^{{x}_{3}\oplus {x}_{5}}{b}_{2}t+{(-1)}^{{x}_{3}\oplus {x}_{6}}{b}_{4}t+{(-1)}^{{x}_{3}\oplus {x}_{4}\oplus {x}_{5}}{b}_{3}t+{(-1)}^{{x}_{3}\oplus {x}_{4}\oplus {x}_{6}}{b}_{5}t\end{array}$$

It is easy to check that ${\phi }_{\overline{{x}_{3}}}=-{\phi }_{{x}_{3}}$. We consider the following cases and it is sufficient to check the phase values when x₃ = 0.

Case I (II)

We consider the case when a₂t = a₃t = a₄t = a₅t = θ₁, b₂t = b₃t = b₄t = b₅t = θ₂. There are no a₁, a₆, b₁, b₆ in the expression of the Hamiltonian. So, for consistency with the previous and following subsection, we can consider this as either Case I or II.

We can write ϕ = f₁(θ₁) + f₂(θ₂). We can verify that when q₁ ⊕ q₂ = q₄ = 0 then $\phi ={(-1)}^{{q}_{1}}4{\theta }_{1}+{f}_{2}({\theta }_{2})$ and analogously, when q₅ ⊕ q₆ = q₄ = 0 then $\phi ={f}_{1}({\theta }_{1})+{(-1)}^{{q}_{5}}4{\theta }_{2}$. For all other values of q₁, q₂, q₄, ϕ = f₂(θ₂) and for all other values of q₅, q₆, q₄, ϕ = f₁(θ₁). A quantum circuit for simulating ${e}^{-i{H}_{21}t}$ in this case has been shown in Fig. 3a.

**Fig. 3: Quantum circuit for ${e}^{-i{H}_{21}t}$ and ${e}^{-i{H}_{201}t}$.**

Case III

Let a₂t = a₅t = − h₁ − h₂ − h₃ and a₃t = a₄t = − h₁ + h₂ + h₃, b₂t = b₅t = − g₁ − g₂ − g₃ and b₃t = b₄t = − g₁ + g₂ + g₃. We can write ϕ = f₁(h) + f₂(g), where h = (h₁, h₂, h₃) and g = (g₁, g₂, g₃). When q₁ ⊕ q₂ = q₄ = 0, then $\phi =-{(-1)}^{{q}_{1}}4{h}_{1}+{f}_{2}({{{\bf{g}}}})$ and when q₁ ⊕ q₂ = q₄ = 1, then $\phi =-{(-1)}^{{q}_{1}}4({h}_{1}+{h}_{3})+{f}_{2}({{{\bf{g}}}})$. For every other values of q₁, q₂, q₄, ϕ = f₂(g). Similarly, when q₅ ⊕ q₆ = q₄ = 0, then $\phi ={f}_{1}({{{\bf{h}}}})-{(-1)}^{{q}_{5}}4{g}_{1}$ and when q₅ ⊕ q₆ = q₄ = 1, then $\phi ={f}_{1}({{{\bf{h}}}})-{(-1)}^{{q}_{5}}4({g}_{1}+{g}_{3})$. For every other values of q₅, q₆, q₄, ϕ = f₁(h). A quantum circuit simulating ${e}^{-i{H}_{21}t}$ in this case has been shown in Fig. 3b.

Circuit for simulating ${e}^{-i{H}_{20}t}$

An eigenbasis for the Paulis in G₂₀ has been shown in Lemma 4 of Supplementary Method 2. But since we have been unable to derive a diagonalizing circuit, so we divide the commuting Paulis into two groups of 4-qubit Paulis as follows.

$$\begin{array}{ll}{G}_{201}\,=\,\{{P}_{k}{P}_{l}{P}_{i}{P}_{j}{\mathbb{I}}{\mathbb{I}}:i+j,k+l\equiv 0\,{{\mathrm{mod}}}\,\,2.\}\\ {G}_{202}\,=\,\{{\mathbb{I}}{\mathbb{I}}{P}_{i}{P}_{j}{P}_{k}{P}_{l}:i+j,k+l\equiv 0\,{{\mathrm{mod}}}\,\,2.\}\end{array}$$

We get the following two Hamiltonians.

$$\begin{array}{ll}{H}_{201}\,=\,{a}_{0}XXXX{\mathbb{I}}{\mathbb{I}}+{a}_{1}YYXX{\mathbb{I}}{\mathbb{I}}+{a}_{6}XXYY{\mathbb{I}}{\mathbb{I}}+{a}_{7}YYYY{\mathbb{I}}{\mathbb{I}}\\ {H}_{202}\,=\,{b}_{0}{\mathbb{I}}{\mathbb{I}}XXXX+{b}_{6}{\mathbb{I}}{\mathbb{I}}YYXX+{b}_{1}{\mathbb{I}}{\mathbb{I}}XXYY+{b}_{7}{\mathbb{I}}{\mathbb{I}}YYYY\end{array}$$

Using the diagonalizing circuit of⁵⁰ and have the following.

$$\begin{array}{ll}{e}^{-i{H}_{201}t}\,=\,{W}_{01}{e}^{-i{a}_{0}{\mathbb{I}}{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{e}^{i{a}_{1}ZZZ{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{e}^{i{a}_{6}{\mathbb{I}}{\mathbb{I}}ZZ{\mathbb{I}}{\mathbb{I}}t}{e}^{-i{a}_{7}ZZZZ{\mathbb{I}}{\mathbb{I}}t}{W}_{01}^{{\dagger} }\\ {e}^{-i{H}_{202}t}\,=\,{W}_{02}{e}^{-i{b}_{0}{\mathbb{I}}{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{e}^{i{b}_{6}{\mathbb{I}}{\mathbb{I}}ZZ{\mathbb{I}}{\mathbb{I}}t}{e}^{i{b}_{1}{\mathbb{I}}{\mathbb{I}}Z{\mathbb{I}}ZZt}{e}^{-i{b}_{7}{\mathbb{I}}{\mathbb{I}}ZZZZt}{W}_{02}^{{\dagger} }\end{array}$$

where W₀₁ = CNOT_(3, 1)CNOT_(3, 2)CNOT_(3, 4)H₍₃₎ and W₀₂ = CNOT_(3, 4)CNOT_(3, 5)CNOT_(3, 6)H₍₃₎, where the rightmost gate is the first one to be applied. We denote the state of the qubits q₁, …, q₄ and q₃, …, q₆ after the application of W₀₁ and W₀₂ by the variables x₁, …, x₄ and ${x}_{3}^{{\prime} },\ldots ,{x}_{6}^{{\prime} }$ respectively. We have the following expression for the overall phase incurred between W₀₁, ${W}_{01}^{{\dagger} }$ and between W₀₂, ${W}_{02}^{{\dagger} }$.

$$\begin{array}{ll}{\phi }_{1}\,=\,-{(-1)}^{{x}_{3}}{a}_{0}t+{(-1)}^{{x}_{3}\oplus {x}_{1}\oplus {x}_{2}}{a}_{1}t+{(-1)}^{{x}_{3}\oplus {x}_{4}}{a}_{6}t-{(-1)}^{{x}_{3}\oplus {x}_{4}\oplus {x}_{1}\oplus {x}_{2}}{a}_{7}t\\ {\phi }_{2}\,=\,-{(-1)}^{{x}_{3}^{{\prime} }}{b}_{0}t+{(-1)}^{{x}_{3}^{{\prime} }\oplus {x}_{4}^{{\prime} }}{b}_{6}t+{(-1)}^{{x}_{3}^{{\prime} }\oplus {x}_{5}^{{\prime} }\oplus {x}_{6}^{{\prime} }}{b}_{1}t-{(-1)}^{{x}_{3}^{{\prime} }\oplus {x}_{4}^{{\prime} }\oplus {x}_{5}^{{\prime} }\oplus {x}_{6}^{{\prime} }}{b}_{7}t\end{array}$$

In all the cases considered below it is easy to verify that ${\phi }_{1,\overline{{x}_{3}}}=-{\phi }_{1,{x}_{3}}$ and ${\phi }_{2,\overline{{x}_{3}^{{\prime} }}}=-{\phi }_{2,{x}_{3}^{{\prime} }}$. So it is enough to consider ${x}_{3}={x}_{3}^{{\prime} }=0$.

Case I

Assume a₁t = a₆t = − θ₁, b₁t = b₆t = − θ₂, a₀t = a₇t = θ₁ and b₀t = b₇t = θ₂. If x₁ ⊕ x₂ = x₄ = 0 then ϕ₁ = − 4θ₁, else it is 0. Similar conclusions follow for ϕ₂ if we replace x₁, x₂, x₃, x₄ by ${x}_{6}^{{\prime} },{x}_{5}^{{\prime} },{x}_{3}^{{\prime} },{x}_{4}^{{\prime} }$ respectively.

Case II

Let a₀t = a₁t = a₆t = a₇t = θ₁, b₀t = b₁t = b₆t = b₇t = θ₂. If x₁ ⊕ x₂ = x₄ = 1 then ϕ₁ = − 4θ₁, else it is 0. Similar conclusions follow for ϕ₂ if we replace x₁, x₂, x₃, x₄ by ${x}_{6}^{{\prime} },{x}_{5}^{{\prime} },{x}_{3}^{{\prime} },{x}_{4}^{{\prime} }$ respectively.

Case III

Let a₀t = a₇t = − h₁ − h₂ + h₃, a₁t = a₆t = h₁ − h₂ + h₃ and b₀t = b₇t = − g₁ − g₂ + g₃, b₁t = b₆t = g₁ − g₂ + g₃. If x₁ ⊕ x₂ = x₄ = 0 then ϕ₁ = 4h₁ and if x₁ ⊕ x₂ = x₄ = 1 then ϕ₂ = 2(h₂ − h₃). Similar conclusions follow for ϕ₂ if we replace x₁, x₂, x₄ by ${x}_{6}^{{\prime} },{x}_{5}^{{\prime} },{x}_{4}^{{\prime} }$ respectively.

Circuits simulating ${e}^{-i{H}_{201}t}$ in Case I, II and III have been shown in Fig. 3c, d and e respectively. Circuits for ${e}^{-i{H}_{202}t}$ are similar. Circuit for ${e}^{-i{H}_{20}}$ in each of these cases is obtained by concatenating the corresponding circuits.

Overlap on 3 qubits

Now we consider the case when there is overlap on 3 qubits. We can have the following sets of commuting Paulis.

$${G}_{3y}=\{Y{P}_{i}{P}_{j}{P}_{k}{\mathbb{I}},{\mathbb{I}}{P}_{i}{P}_{j}{P}_{k}Y:i+j+k\equiv 1\,{{\mathrm{mod}}}\,\,2\}$$

(30)

$${G}_{3x}=\{X{P}_{i}{P}_{j}{P}_{k}{\mathbb{I}},{\mathbb{I}}{P}_{i}{P}_{j}{P}_{k}X:i+j+k\equiv 0\,{{\mathrm{mod}}}\,\,2\}$$

(31)

Without loss of generality, we assume that the leftmost operator acts on qubit q₁, next one on q₂ and so on - rightmost one acts on qubit q₅. We denote a state vector as $\left\vert {Q}_{1}{q}_{2}{q}_{3}{q}_{4}{Q}_{2}\right\rangle$ where ${Q}_{1}=\left\vert {q}_{1}\right\rangle$, ${Q}_{2}=\left\vert {q}_{5}\right\rangle$ and q₁, …, q₅ ∈ {0, 1}. We can have the following Hamiltonian terms, expressed as sums of commuting Paulis from the above two sets.

$$\begin{array}{ll}{H}_{3y}\,=\,{a}_{1}YYXX{\mathbb{I}}+{a}_{2}YXYX{\mathbb{I}}+{a}_{3}YXXY{\mathbb{I}}+{a}_{7}YYYY{\mathbb{I}}\\ \qquad\quad\,+\,{b}_{3}{\mathbb{I}}YXXY+{b}_{5}{\mathbb{I}}XYXY+{b}_{6}{\mathbb{I}}XXYY+{b}_{7}{\mathbb{I}}YYYY\\ \end{array}$$

(32)

$$\begin{array}{ll}{H}_{3x}\,=\,{a}_{0}XXXX{\mathbb{I}}+{a}_{4}XYYX{\mathbb{I}}+{a}_{5}XYXY{\mathbb{I}}+{a}_{6}XXYY{\mathbb{I}}\\ \qquad\qquad+\,{b}_{0}{\mathbb{I}}XXXX+{b}_{1}{\mathbb{I}}YYXX+{b}_{2}{\mathbb{I}}YXYX+{b}_{4}{\mathbb{I}}XYYX\end{array}$$

(33)

Circuit for simulating ${e}^{-i{H}_{3y}t}$

Let W_3y be the unitary consisting of the following sequence of gates. The rightmost one is the first to be applied. With a slight abuse of notation we denote $CNO{T}_{(c,{t}_{1})}CNO{T}_{(c,{t}_{2})}CNO{T}_{(c,{t}_{3})}\ldots$ by $CNO{T}_{(c;{t}_{1},{t}_{2},{t}_{3},\ldots )}$ (multi-target CNOT).

$${W}_{3y}=CNO{T}_{(2;1,3,4)}{H}_{(2)}{Z}_{(2)}CNO{T}_{(2;1,5)}{H}_{(2)}CNO{T}_{(2,1)}$$

Theorem 2.3

For each $i,j,k\in {{\mathbb{Z}}}_{2}$, such that $Y{P}_{i}{P}_{j}{P}_{k}{\mathbb{I}},{\mathbb{I}}{P}_{i}{P}_{j}{P}_{k}Y\in {G}_{3y}$ we have the following.

$$\begin{array}{rcl}&&{\sqrt{-1}}^{i+j+k+1}{W}_{3y}\left({Z}_{(1)}{Z}_{(2)}{Z}_{(3)}^{j}{Z}_{(4)}^{k}{\mathbb{I}}\right){W}_{3y}^{{\dagger} }=Y{P}_{i}{P}_{j}{P}_{k}{\mathbb{I}}\\ &\,{{\mbox{and}}}\,&{\sqrt{-1}}^{i+j+k+1}{W}_{3y}\left({\mathbb{I}}{Z}_{(2)}{Z}_{(3)}^{j}{Z}_{(4)}^{k}{Z}_{(5)}\right){W}_{3y}^{{\dagger} }={\mathbb{I}}{P}_{i}{P}_{j}{P}_{k}Y\end{array}$$

The proof is similar to Theorem 2.1 and has been shown in Supplementary Method 3. Thus we have the following.

$$\begin{array}{ll}{e}^{-i{H}_{3y}t}\,=\,{e}^{-i(-{a}_{1}{W}_{3y}(ZZ{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}){W}_{3y}^{{\dagger} }-{a}_{2}{W}_{3y}(ZZZ{\mathbb{I}}{\mathbb{I}}){W}_{3y}^{{\dagger} }-{a}_{3}{W}_{3y}(ZZ{\mathbb{I}}Z{\mathbb{I}}){W}_{3y}^{{\dagger} }+{a}_{7}{W}_{3y}(ZZZZ{\mathbb{I}}){W}_{3y}^{{\dagger} })t}\\ \qquad\qquad\,\cdot {e}^{-i(-{b}_{3}{W}_{3y}({\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}Z){W}_{3y}^{{\dagger} }-{b}_{5}{W}_{3y}({\mathbb{I}}ZZ{\mathbb{I}}Z){W}_{3y}^{{\dagger} }-{b}_{6}{W}_{3y}({\mathbb{I}}Z{\mathbb{I}}ZZ){W}_{3y}^{{\dagger} }+{b}_{7}{W}_{3y}({\mathbb{I}}ZZZZ){W}_{3y}^{{\dagger} })t}\\ \qquad\quad=\,{W}_{3y}{e}^{i{a}_{1}ZZ{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{e}^{i{a}_{2}ZZZ{\mathbb{I}}{\mathbb{I}}t}{e}^{i{a}_{3}ZZ{\mathbb{I}}Z{\mathbb{I}}t}{e}^{-i{a}_{7}ZZZZ{\mathbb{I}}t}{e}^{i{b}_{3}{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}Zt}{e}^{i{b}_{5}{\mathbb{I}}ZZ{\mathbb{I}}Zt}{e}^{i{b}_{6}{\mathbb{I}}Z{\mathbb{I}}ZZt}{e}^{-i{b}_{7}{\mathbb{I}}ZZZZt}{W}_{3y}^{{\dagger} }\end{array}$$

We denote the state of the qubits q₁, …, q₅ after the application of W_3y by the variables x₁, …, x₅ respectively. We have the following expression for the overall phase incurred between W_3y and ${W}_{3y}^{{\dagger} }$.

$$\begin{array}{lll}\phi \,=\,{(-1)}^{{x}_{2}\oplus {x}_{1}}{a}_{1}+{(-1)}^{{x}_{2}\oplus {x}_{1}\oplus {x}_{3}}{a}_{2}+{(-1)}^{{x}_{2}\oplus {x}_{1}\oplus {x}_{4}}{a}_{3}-{(-1)}^{{x}_{2}\oplus {x}_{1}\oplus {x}_{3}\oplus {x}_{4}}{a}_{7}\\ \qquad\quad+\,{(-1)}^{{x}_{2}\oplus {x}_{5}}{b}_{3}+{(-1)}^{{x}_{2}\oplus {x}_{5}\oplus {x}_{3}}{b}_{5}+{(-1)}^{{x}_{2}\oplus {x}_{5}\oplus {x}_{4}}{b}_{6}-{(-1)}^{{x}_{2}\oplus {x}_{5}\oplus {x}_{3}\oplus {x}_{4}}{b}_{7}\end{array}$$

It is easy to verify that ${\phi }_{\overline{{x}_{2}}}=-{\phi }_{{x}_{2}}$. We consider the following cases and it is sufficient to check the phase values when x₂ = 0.

Case I

We consider the case when a₁t = − θ₁, b₆t = − θ₂, a₂t = a₃t = a₇t = θ₁ and b₃t = b₅t = b₇t = θ₂. We can write ϕ = f₁(θ₁) + f₂(θ₂). It can be verified that ${\phi }_{\overline{{x}_{1}},{x}_{5}}=-f({\theta }_{1})+f({\theta }_{2})$ and ${\phi }_{{x}_{1},\overline{{x}_{5}}}=f({\theta }_{1})-f({\theta }_{2})$. So we concentrate on x₁ = x₅ = 0. If x₃ = x₄ = 1 then ϕ = − 4θ₁ and if x₃ = 0, x₄ = 1 then ϕ = 4θ₂. A quantum circuit simulating ${e}^{-i{H}_{3y}t}$ has been shown in Fig. 4a. If θ₁ = θ₂ then we can have a further reduction of controlled rotation gates, as shown in Fig. 4b.

**Fig. 4: Quantum circuit for ${e}^{-i{H}_{3y}t}$ and ${e}^{-i{H}_{3x1}t}$.**

Case II

Next we consider the case when a₁t = a₂t = a₃t = a₇t = θ₁, b₆t = b₃t = b₅t = b₇t = θ₂. In this case ϕ = 0 whenever x₁ ⊕ x₅ = 1. Else, as before ${\phi }_{\overline{{x}_{1}},{x}_{5}}=-f({\theta }_{1})+f({\theta }_{2})$ and ${\phi }_{{x}_{1},\overline{{x}_{5}}}=f({\theta }_{1})-f({\theta }_{2})$. So it is enough to consider x₁ = x₅ = 0. When x₃ = x₄ = 1 then ϕ = − 2(θ₁ + θ₂), else ϕ = 2(θ₁ + θ₂). Thus we can have a quantum circuit simulating ${e}^{-i{H}_{3yt}}$, as shown in Fig. 4c.

Case III

Now we consider the case when a₁t = h₁ − h₂ + h₃, a₂t = − h₁ − h₂ − h₃, a₃t = − h₁ + h₂ + h₃, a₇t = − h₁ − h₂ + h₃ and b₃t = − g₁ + g₂ + g₃, b₅t = − g₁ − g₂ − g₃, b₆t = g₁ − g₂ + g₃, b₇t = − g₁ − g₂ + g₃. If we denote h = (h₁, h₂, h₃) and g = (g₁, g₂, g₃), then we can write ϕ = f(h) + f(g). Here too, ${\phi }_{\overline{{x}_{1}},{x}_{5}}=-f({{{\bf{h}}}})+f({{{\bf{g}}}})$ and ${\phi }_{{x}_{1},\overline{{x}_{5}}}=f({{{\bf{h}}}})-f({{{\bf{g}}}})$. So let us consider x₁ = x₅ = 0. Then we have the following phase values.

$$\begin{array}{r}{\phi }_{{x}_{3} = {x}_{4} = 0}=0,\quad {\phi }_{{x}_{3} = 0,{x}_{4} = 1}=-4({h}_{2}+{g}_{1}),\quad {\phi }_{{x}_{3} = 1,{x}_{4} = 0}=4({h}_{3}+{g}_{3}),\quad {\phi }_{{x}_{3} = {x}_{4} = 1}=4({h}_{1}+{g}_{2})\end{array}$$

A circuit simulating ${e}^{-i{H}_{3yt}}$ in this case has been shown in Fig. 4d. If h₂ = g₁, h₃ = g₃, h₁ = g₂ then we can have a simpler circuit, as shown in Fig. 4e.

Circuit for simulating ${e}^{-i{H}_{3x}t}$

The diagonalizing transformation for the Pauli operators in G_3x is shown in Lemma 6 of Supplementary Method 3. Since we have been unable to find a diagonalizing circuit, so we divide the commuting Paulis into two groups of 4-qubit Paulis,

$$\begin{array}{ll}{G}_{3x1}\,=\,\{X{P}_{i}{P}_{j}{P}_{k}{\mathbb{I}}:i+j+k\equiv 0\,{{\mathrm{mod}}}\,\,2.\}\\ {G}_{3x2}\,=\,\{{\mathbb{I}}{P}_{i}{P}_{j}{P}_{k}X:i+j+k\equiv 0\,{{\mathrm{mod}}}\,\,2.\}\end{array}$$

and have the following two Hamiltonians.

$$\begin{array}{ll}{H}_{3x1}\,=\,{a}_{0}XXXX{\mathbb{I}}+{a}_{4}XYYX{\mathbb{I}}+{a}_{5}XYXY{\mathbb{I}}+{a}_{6}XXYY{\mathbb{I}}\\ {H}_{3x2}\,=\,{b}_{0}{\mathbb{I}}XXXX+{b}_{1}{\mathbb{I}}YYXX+{b}_{2}{\mathbb{I}}YXYX+{b}_{4}{\mathbb{I}}XYYX\end{array}$$

Using the diagonalizing circuit of⁵⁰ and have the following.

$$\begin{array}{ll}{e}^{-i{H}_{3x1}t}\,=\,{W}_{3x1}{e}^{-i{a}_{0}{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{e}^{i{a}_{6}{\mathbb{I}}ZZZ{\mathbb{I}}t}{e}^{i{a}_{5}{\mathbb{I}}Z{\mathbb{I}}Z{\mathbb{I}}t}{e}^{i{a}_{4}{\mathbb{I}}ZZ{\mathbb{I}}{\mathbb{I}}t}{W}_{3x1}^{{\dagger} }\\ {e}^{-i{H}_{3x2}t}\,=\,{W}_{3x2}{e}^{-i{b}_{0}{\mathbb{I}}Z{\mathbb{I}}{\mathbb{I}}{\mathbb{I}}t}{e}^{i{b}_{1}{\mathbb{I}}ZZ{\mathbb{I}}{\mathbb{I}}t}{e}^{i{b}_{2}{\mathbb{I}}Z{\mathbb{I}}Z{\mathbb{I}}t}{e}^{i{b}_{4}{\mathbb{I}}ZZZ{\mathbb{I}}t}{W}_{3x2}^{{\dagger} }\end{array}$$

where W_3x1 = CNOT_{(2; 1, 3, 4)}H₍₂₎ and W_3x2 = CNOT_{(2; 3, 4, 5)}H₍₂₎, where the rightmost gate is the first one to be applied. We denote the state of the qubits q₁, …, q₄ and q₂, …, q₅ after the application of W_3x1 and W_3x2 by the variables x₁, …, x₄ and ${x}_{2}^{{\prime} },\ldots ,{x}_{5}^{{\prime} }$ respectively. We have the following expression for the overall phase incurred between W_3x1, ${W}_{3x1}^{{\dagger} }$ and between W_3x2, ${W}_{3x2}^{{\dagger} }$.

$$\begin{array}{ll}{\phi }_{1}\,=\,-{(-1)}^{{x}_{2}}{a}_{0}t+{(-1)}^{{x}_{2}\oplus {x}_{3}\oplus {x}_{4}}{a}_{6}t+{(-1)}^{{x}_{2}\oplus {x}_{4}}{a}_{5}t+{(-1)}^{{x}_{2}\oplus {x}_{3}}{a}_{4}t\\ {\phi }_{2}\,=\,-{(-1)}^{{x}_{2}^{{\prime} }}{b}_{0}t+{(-1)}^{{x}_{2}^{{\prime} }\oplus {x}_{3}^{{\prime} }}{b}_{1}t+{(-1)}^{{x}_{2}^{{\prime} }\oplus {x}_{4}^{{\prime} }}{b}_{2}t+{(-1)}^{{x}_{2}^{{\prime} }\oplus {x}_{3}^{{\prime} }\oplus {x}_{4}^{{\prime} }}{b}_{4}t\end{array}$$

In all the cases considered below it is easy to verify that ${\phi }_{1,\overline{{x}_{2}}}=-{\phi }_{1,{x}_{2}}$ and ${\phi }_{2,\overline{{x}_{2}^{{\prime} }}}=-{\phi }_{2,{x}_{2}^{{\prime} }}$. So it is enough to consider ${x}_{2}={x}_{2}^{{\prime} }=0$.

Case I

We consider the case when a₆t = − θ₁, b₁t = − θ₂, a₀t = a₅t = a₄t = θ₁ and b₀t = b₄t = b₂t = θ₂. ϕ₁ = − 4θ₁ when x₃ = x₄ = 1, else it is 0. And ϕ₂ = − 4θ₂ when ${x}_{3}^{{\prime} }=0,{x}_{4}^{{\prime} }=1$, else it is 0.

Case II

Let a₀t = a₆t = a₅t = a₄t = θ₁, b₀t = b₄t = b₂t = b₁t = θ₂. ϕ₁ = 2θ₁ when x₃ = x₄ = 0, else ϕ₁ = − 2θ₁. Similarly for ϕ₂.

Case III

Assume a₀t = − h₁ − h₂ + h₃, a₆t = h₁ − h₂ + h₃, a₅t = − h₁ − h₂ − h₃, a₄t = − h₁ + h₂ + h₃, and b₀t = − g₁ − g₂ + g₃, b₄t = − g₁ + g₂ + g₃, b₂t = − g₁ − g₂ − g₃, b₁t = g₁ − g₂ + g₃. We have the following phases.

$$\begin{array}{rcl}{\phi }_{1}({x}_{3}=0,{x}_{4}=1)=4{h}_{2};&&{\phi }_{1}({x}_{3}=1,{x}_{4}=0)=-4{h}_{3};\quad {\phi }_{1}({x}_{3}={x}_{4}=1)=4{h}_{1};\\ {\phi }_{2}({x}_{3}^{{\prime} }=0,{x}_{4}^{{\prime} }=1)=4{g}_{1},&&{\phi }_{2}({x}_{3}^{{\prime} }=1,{x}_{4}^{{\prime} }=0)=-4{g}_{3};\quad {\phi }_{2}({x}_{3}^{{\prime} }={x}_{4}^{{\prime} }=1)=4{g}_{2};\end{array}$$

Circuits simulating ${e}^{-i{H}_{3x1}t}$ in Case I, II and III have been shown in Fig. 4f, g and h respectively. Circuits for ${e}^{-i{H}_{3x2}t}$ are similar. Circuit for ${e}^{-i{H}_{3x}t}$ in each case is obtained by concatenating the corresponding circuits.

Circuit for arbitrary exponentiated Hamiltonians

Our previous discussion focuses on the case of fermionic simulation within a Jordan–Wigner representation using Hamiltonian terms that are fermionically swapped to be adjacent to each other. While these simulation circuits are among the most important for applications in chemistry, it does not necessarily represent all cases of physical interest let alone chemistry. Here we address this by discussing ways to synthesize circuits for arbitrary exponentiated Hamiltonians in ${{\mathbb{C}}}^{{2}^{n}\times {2}^{n}}$, expressible as sum of Pauli operators, with an aim to reduce the number of non-Clifford resources. For reasons discussed previously, it is enough to consider a Hamiltonian H expressed as sum of commuting Pauli operators.

$$H=\mathop{\sum}\limits_{i}{\alpha }_{i}{P}_{i}\qquad {P}_{i}\in {{{{\mathcal{P}}}}}_{n}$$

In most cases one synthesizes circuit for each ${e}^{-i{\alpha }_{i}{P}_{i}t}$ using a number of CNOT and one R_z gate. Thus the number of R_z gates required is equal to the number of summands. Here we describe procedure to synthesize circuit for e^−iHt i.e. considering multiple summands or Pauli operators.

We diagonalize H, for example, by using the algorithms in⁵¹. In the previous section we have constructed explicit eigenbases for the diagonalization of some specific Hamiltonians. Then we get the following.

$$H=W\left(\mathop{\sum}\limits_{i}{\alpha }_{i}^{{\prime} }{Q}_{i}\right){W}^{{\dagger} }$$

Here ${Q}_{i}={\otimes }_{j = 1}^{n}{Q}_{ij}$, a tensor product of Z and ${\mathbb{I}}$ i.e. ${Q}_{ij}\in \{Z,{\mathbb{I}}\}$. W is a diagonalizing Clifford circuit. Thus we get the following.

$${e}^{-iHt}=W{e}^{{\sum }_{i}{\alpha }_{i}^{{\prime} }{Q}_{i}}{W}^{{\dagger} }$$

(34)

Lemma 2.3

Let ${{{\mathcal{H}}}}={\sum }_{i}{\alpha }_{i}^{{\prime} }{Q}_{i}$, such that ${Q}_{i}={\otimes }_{j = 1}^{n}{Q}_{ij}$, where ${Q}_{ij}\in \{Z,{\mathbb{I}}\}$. With each Q_i we associate an n-length vector y_i = (y_i1, …, y_in) ∈ {0, 1}ⁿ such that ${\left({{{{\bf{y}}}}}_{i}\right)}_{j}={y}_{ij}=1$ if Q_ij = Z, else y_ij = 0. Let x₁, …, x_n ∈ {0, 1} and $\left\vert 0\right\rangle ={\left[1,0\right]}^{T}$, $\left\vert 1\right\rangle ={\left[0,1\right]}^{T}$. The eigenvectors of ${{{\mathcal{H}}}}$ are of the form $\left\vert v\right\rangle { = \bigotimes }_{j = 1}^{n}\left\vert {x}_{j}\right\rangle$ and the corresponding eigenvalue is

$${\phi }_{v}=\mathop{\sum}\limits_{i}{\alpha }_{i}^{{\prime} }{(-1)}^{{\oplus }_{j = 1}^{n}{y}_{ij}{x}_{j}}$$

(35)

Proof

The summands in ${{{\mathcal{H}}}}$ are mutually commuting and so they have a common eigenbasis. Let us first consider Q_i. Since ${Q}_{ij}\left\vert x\right\rangle =\left\vert x\right\rangle$ if ${Q}_{ij}={\mathbb{I}}$ and ${Q}_{ij}\left\vert x\right\rangle ={(-1)}^{x}\left\vert x\right\rangle$ if Q_ij = Z, where x ∈ {0, 1}, so we have the following.

$${Q}_{i}\left\vert v\right\rangle =\left(\mathop{\bigotimes }\limits_{j=1}^{n}{Q}_{ij}\right)\left(\mathop{\bigotimes }\limits_{j=1}^{n}\left\vert {x}_{j}\right\rangle \right)=\mathop{\bigotimes }\limits_{j=1}^{n}{Q}_{ij}\left\vert {x}_{j}\right\rangle ={(-1)}^{{\oplus }_{j = 1}^{n}{y}_{ij}{x}_{j}}$$

This implies that

$${{{\mathcal{H}}}}\left\vert v\right\rangle =\left(\mathop{\sum}\limits_{i}{\alpha }_{i}^{{\prime} }{Q}_{i}\right)\left\vert v\right\rangle =\mathop{\sum}\limits_{i}{\alpha }_{i}^{{\prime} }{(-1)}^{{\oplus }_{j = 1}^{n}{y}_{ij}{x}_{j}}\left\vert v\right\rangle .$$

We can also interpret ϕ as the overall phase incurred between W and W^†. For given values of ${\alpha }_{i}^{{\prime} }$, ϕ is an n-variable Boolean function where x_j are the Boolean variables. We can evaluate a truth table and get all the 2ⁿ values of ϕ for different values of (x₁, …, x_n) ∈ {0, 1}ⁿ. For each distinct non-zero absolute phase value ∣θ∣ (ignoring sign), we can have a sub-circuit ${{{{\mathcal{C}}}}}_{| \theta | }$ that has only one controlled rotation cR_z(2θ) gate. The complete circuit can be obtained by combining these different sub-circuits (one for each ∣θ∣ ≠ 0), in between the diagonalizing Clifford circuits W, W^†. The ordering of the sub-circuits do not matter.

Now we discuss how to synthesize sub-circuit ${{{{\mathcal{C}}}}}_{| \theta | }$, for one such distinct absolute value of ϕ. Let ${{{{\mathcal{M}}}}}_{\theta }$ be the set of binary values for variables x₁, x₂, …, x_n, such that ϕ computes to θ in Equation (35).

$${{{{\mathcal{M}}}}}_{\theta }=\{({x}_{1},\ldots ,{x}_{n})\in {\{0,1\}}^{n}:{\phi }_{{x}_{1},\ldots ,{x}_{n}}=\theta \}$$

(36)

Analogously we can define ${{{{\mathcal{M}}}}}_{-\theta }$. We can also associate ${{{{\mathcal{M}}}}}_{\theta }$ and ${{{{\mathcal{M}}}}}_{-\theta }$ with S_θ and S_−θ, the sets of eigenvectors with eigenvalues θ and − θ respectively, as obtained from Lemma 2.3. We define the following operators, which acts on the input vector space and the space of two ancillae - c and r, the latter being initialized to 0.

$$\begin{array}{ll}{V}_{\theta }\,=\,\mathop{\sum}\limits_{\left\vert v\right\rangle \in {S}_{\theta }}\left\vert v,c\oplus 1,0\right\rangle \left\langle v,c,0\right\vert +\mathop{\sum}\limits_{\left\vert w\right\rangle \notin {S}_{\theta }}\left\vert w,c,0\right\rangle \left\langle w,c,0\right\vert \\ {V}_{-\theta }\,=\,\mathop{\sum}\limits_{\left\vert v\right\rangle \in {S}_{-\theta }}\left\vert v,c\oplus 1,1\right\rangle \left\langle v,c,0\right\vert +\mathop{\sum}\limits_{\left\vert w\right\rangle \notin {S}_{-\theta }}\left\vert w,c,0\right\rangle \left\langle w,c,0\right\vert \end{array}$$

(37)

The circuit ${{{{\mathcal{C}}}}}_{| \theta | }={V}_{\theta }{V}_{-\theta }{\left(c{R}_{Z}(2\theta )\right)}_{cr}{V}_{-\theta }^{{\dagger} }{V}_{\theta }^{{\dagger} }$. If the input vector is in S_θ or S_−θ then both these operators flip a control ancilla qubit (c). Additionally, if the vector is in S_−θ then the second ancilla r is flipped. We apply a cR_z(2θ) gate on r, controlled on c. Thus if the input vector is in S_−θ then we actually apply cR_z( − 2θ). The ancillae c, r can be controlled by multi-controlled X gates, that can be further decomposed in terms of Toffoli and CNOT gates^69,70. For example, let ${{{{\mathcal{M}}}}}_{\theta }=\{(0,0,1,1),(0,1,1,1)\}$ and ${{{{\mathcal{M}}}}}_{-\theta }=\{(1,1,0,0)\}$. The two Boolean min-terms of ${{{{\mathcal{M}}}}}_{\theta }$ can be compressed to have a single term because when x₁ = 0, x₃ = x₄ = 1 then ϕ = θ, irrespective of the value of x₂. We call it the ‘don’t care condition’ for x₂. So, equivalently we can write ${{{{\mathcal{M}}}}}_{\theta }=\{(0,* ,1,1)\}$, where * denotes the don’t-care condition. In general, algorithms like Karnaugh map⁷¹, ESPRESSO⁷² can be used to get compact set of Boolean min-terms. A circuit ${{{{\mathcal{C}}}}}_{| \theta | }$ has been shown in Fig. 5.

**Fig. 5: Circuit ${{{{\mathcal{C}}}}}_{| \theta | }$.**

Hence, due to the invariance of the point spectrum of unitarily equivalent operators we have the following.

Lemma 2.4

Let H = ∑_iα_iP_i, where P_i are mutually commuting n-qubit Pauli operators. We can implement a circuit for e^−iHt with at most m (controlled)-rotations, where m is the number of distinct non-zero eigenvalues (ignoring sign) of H.

Illustration—Quantum Heisenberg and quantum Ising model

We consider the problem of designing quantum circuits for simulating the quantum Heisenberg and Ising model with Hamiltonians H_H and H_I respectively. The Heisenberg Hamiltonian is widely used to study magnetic systems, where the magnetic spins are treated quantum mechanically^{60,73,74,75,76}. Let G = (E, V) be the underlying graph with the vertex and edge set being V and E, respectively.

$${H}_{H}=\mathop{\sum}\limits_{(i,j)\in E}\left({J}_{x}{X}_{(i)}{X}_{(j)}+{J}_{y}{Y}_{(i)}{Y}_{(j)}+{J}_{z}{Z}_{(i)}{Z}_{(j)}\right)+\mathop{\sum}\limits_{i\in V}{d}_{h}{Z}_{(i)}$$

(38)

$${H}_{I}=\mathop{\sum}\limits_{(i,j)\in E}{J}_{z}{Z}_{(i)}{Z}_{(j)}+\mathop{\sum}\limits_{i\in V}{d}_{h}^{{\prime} }{Z}_{(i)}$$

(39)

In the above J_x, J_y, J_z are coupling parameters, denoting the exchange interaction between nearest neighbor spins along the X,Y,Z-direction respectively. ${d}_{h},{d}_{h}^{{\prime} }$ is the time amplitude of the external magnetic field along the Z-direction. One set of commuting Paulis are {X_(i)X_(j): (i, j) ∈ E}, {Y_(i)Y_(j): (i, j) ∈ E}, {Z_(i)Z_(j): (i, j) ∈ E} and {Z_(i): i ∈ V}.

Let us first consider the set {Z_(i): i ∈ V}. Following the previous discussions and Lemma 2.3, the overall phase incurred or the eigenvalues are as follows.

$${\phi }^{{\prime} }={d}_{h}\mathop{\sum}\limits_{i\in V}{(-1)}^{{x}_{i}}\qquad {x}_{i}\in \{0,1\}$$

(40)

For x ∈ {0, 1}^∣V∣, one particular assignment of values to the Boolean variables, let T₀ = {i ∈ V: x_i = 0} and T₁ = {i ∈ V: x_i = 1}. So ∣T₀∣ + ∣T₁∣ = ∣V∣ and

$${\phi }_{{{{\bf{x}}}}}^{{\prime} }={d}_{h}\left(| {T}_{0}| -| {T}_{1}| \right)={d}_{h}\left(| V| -2| {T}_{1}| \right).$$

(41)

So the number of distinct non-zero eigenvalues or absolute values of ${\phi }^{{\prime} }$ can be ⌈∣V∣/2⌉. Implementing each ${e}^{-i{d}_{h}{Z}_{(i)}t}$ would require ∣V∣ rotation gates. Thus, using Lemma 2.4, we have about 50% reduction in the rotation gate cost.

Now, let us consider the other commuting sets. Since H^†XH = Z and (HSX)^†Y(HSX) = Z, so each of the above sets can be diagonalized and we can focus on the problem of simulating a quantum circuit for the Hamiltonian : H = J∑_(i, j)∈EZ_(i)Z_(j), where J is a constant. Our aim is to derive an upper bound on the number of controlled rotations required to simulate e^−iHt. Following the previous discussions, the overall phase incurred between the diagonalizing Cliffords W, W^† is as follows (Lemma 2.3).

$$\phi =J\mathop{\sum}\limits_{(i,j)\in E}{(-1)}^{{x}_{i}\oplus {x}_{j}}$$

(42)

where x_i, x_j ∈ {0, 1} are variables denoting the state of the qubits after application of W. The quantum circuit has ∣V∣ qubits, corresponding to each vertex of G. Let x = ∈ {0, 1}^∣V∣ denote one particular assignment of values to the variables x₁, …, x_∣V∣. S₀ = {(i, j) ∈ E: x_i = x_j = 1} and S₁ = {(i, j) ∈ E : x_i or x_j is 1}. If ${S}^{{\prime} }=\{(i,j)\in E:{x}_{i}={x}_{j}=0\}$, then $| {S}^{{\prime} }| =| E| -| {S}_{0}| -| {S}_{1}|$. Let ϕ_x be the value of the phase for this particular assignment.

$${\phi }_{{{{\bf{x}}}}}=J\left(| {S}^{{\prime} }| +| {S}_{0}| -| {S}_{1}| \right)=J\left(| E| -| {S}_{0}| -| {S}_{1}| +| {S}_{0}| -| {S}_{1}| \right)=J\left(| E| -2| {S}_{1}| \right)$$

(43)

Let V_1x = {i ∈ V: x_i = 1}, V_0x = {i ∈ V: x_i = 0} and ${{{\mathcal{N}}}}(k)$ be the set of neighbouring vertices of k in G. Then

$$| {S}_{1}| =\mathop{\sum}\limits_{k\in {V}_{1{{{\bf{x}}}}}}\left\vert \left\{{{{\mathcal{N}}}}(k)\setminus {V}_{1{{{\bf{x}}}}}\right\}\right\vert$$

(44)

Now for any assignment ∣S₁∣ can vary from 1, …, ∣E∣. So the number of distinct values of ϕ is at most ⌈∣E∣/2⌉. And hence we need at most ⌈∣E∣/2⌉ controlled-R_z gates in the circuit simulating e^−iHt. Had we simulated each ${e}^{-iJ{Z}_{(i)}{Z}_{(j)}t}$, we would have required ∣E∣R_z gates. So we can achieve about 50% reduction in the rotation gate cost under the assumption that controlled-R_z costs the same to implement as a single R_z gate.

G is a cycle

This is basically the traslationally invariant 1-D spin chain. Let ${G}_{{V}_{1{{{\bf{x}}}}}}$ be the subgraph induced by V_1x, which is a union of paths. For each path p, let ${S}_{1p}={\bigcup }_{k\in p}\{{{{\mathcal{N}}}}(k)\setminus {V}_{1{{{\bf{x}}}}}\}\subseteq {S}_{1}$ be the set of vertices in this path. Each of the terminal vertices has one neighbour in V⧹V_1x. So ∣S_1p∣ = 2. Thus if ${{{\mathcal{P}}}}$ is the set of all such paths in ${G}_{{V}_{1{{{\bf{x}}}}}}$, then

$${\phi }_{{{{\bf{x}}}}}=J\left(| E| -2\mathop{\sum}\limits_{p\in {{{\mathcal{P}}}}}| {S}_{1p}| \right)=J\left(| E| -4| {{{\mathcal{P}}}}| \right)$$

(45)

Now $| {{{\mathcal{P}}}}|$ can vary from 1, …, ⌈∣E∣/4⌉ to give ⌈∣E∣/4⌉ distinct values of ∣ϕ∣. This implies a quantum circuit synthesizing e^−iHt will require at most ⌈∣E∣/4⌉cR_z gates. This is about 75% reduction in the cost of rotation gates, compared to synthesizing each ${e}^{-iJ{Z}_{(i)}{Z}_{(j)}t}$.

G is a complete graph

In this case for each k ∈ V_1x, we have ${{{\mathcal{N}}}}(k)\setminus {V}_{1{{{\bf{x}}}}}={V}_{0{{{\bf{x}}}}}$. So we have,

$${\phi }_{{{{\bf{x}}}}}=J\left(| E| -2| {V}_{1{{{\bf{x}}}}}| | {V}_{0{{{\bf{x}}}}}| \right)=J\left(| E| -2\parallel {{{\bf{x}}}}{\parallel }_{1}(| V| -\parallel {{{\bf{x}}}}{\parallel }_{1})\right)$$

(46)

So there can be ⌈∣V∣/2⌉ distinct values of ∣ϕ_x∣ as ∥x∥₁ varies from 1, …, ⌈∣V∣/2⌉. And hence we require at most ⌈∣V∣/2⌉cR_z gates for simulating e^−iHt. If we simulate each ${e}^{-iJ{Z}_{(i)}{Z}_{(j)}t}$ then we require $| E| =\frac{| V| (| V| -1)}{2}$R_z gates. This indicates about $100\left(1-\frac{2}{| V| -1}\right) \%$ reduction in the cost of rotation gates.

In Fig. 6 we have shown quantum circuits simulating ${e}^{it\theta {\sum }_{(i,j)\in E}{Z}_{(i)}{Z}_{(j)}}$ for some simple graphs G = (V, E). The circuits have been designed to optimize the number of Toffoli gates, as well.

**Fig. 6: Quantum circuit simulating ${e}^{it\theta {\sum }_{(i,j)\in E}{Z}_{(i)}{Z}_{(j)}}$.**

Reducing the number of Toffoli gates

We discussed before that the T-count from the Toffolis may be a significant factor in high error regime as the logarithmic cost of rotation synthesis may not dominate the additive constant that arises from the Toffoli gates needed. In order to reduce the number of Toffolis we can do the following. We design circuits reducing Toffolis for Hamiltonians over smaller graphs, such as in Fig. 6a–e. Then we decompose a Hamiltonian over a large graph into Hamiltonians over these smaller graphs. For example, consider a 1-D cycle on N points and H_z = θ∑_(i, j)∈EZ_(i)Z_(j). We break this cycle into smaller chains of length 3 i.e. H_z = θ(Z₍₁₎Z₍₂₎ + Z₍₂₎Z₍₃₎) + (Z₍₃₎Z₍₄₎ + Z₍₄₎Z₍₅₎) + … = ∑_iH_zi. We have a quantum circuit that synthesizes each ${e}^{i{H}_{zi}t}$ with only one cR_z gate (Fig. 6b). So to synthesize ${e}^{i{H}_{z}t}$ we require approximately N/2cR_z. This is about twice the number of controlled rotations required, had we synthesized without decomposing. But it does not require any extra Toffoli-pairs. We manage to get approximately 50% reduction, compared to synthesizing each summand i.e. ${e}^{it{Z}_{(i)}{Z}_{(j)}}$.

Now consider a large N × N lattice which has N² vertices and 2N(N − 1) edges and the Hamiltonian H_z = θ∑_(i, j)∈EZ_(i)Z_(j). We can decompose this into (N−1)² smaller interior cycles of 4-points and a bigger outer circle with 2N + 2(N − 2) = 4(N − 1) points. From Fig. 6a, we know that we can design a circuit simulating the exponentiated Hamiltonian corresponding to each interior cycle with 1 cR_z and 1 Toffoli pair. We can further decompose the outer circle (as explained in the previous paragraph) and have a circuit with approximately 2(N − 1)cR_z gates. Thus we require ≈ (N − 1)² + 2(N − 1) = (N − 1)(N + 1)cR_z and (N − 1)² Toffoli-pairs. We have discussed before that for general graphs, number of cR_z required is ≈ ∣E∣/2 = N(N − 1), so we use ≈ (N − 1) more cR_z by decomposing, but the Toffoli cost reduces a lot. Had we synthesized each ${e}^{it{Z}_{(i)}{Z}_{(j)}}$, we would have used 2N(N − 1)R_z. Thus we manage to get a reduction of ≈ (N−1)² in the number of R_z/cR_z.

In Fig. 6e we gave a circuit for simulating ${e}^{it\theta {\sum }_{(i,j)\in E}{Z}_{(i)}{Z}_{(j)}}$, when the underlying graph is a 6-point cycle. We reduced the Toffoli-pairs by decomposing the graph into smaller cycles.

Application : Simulating with qDRIFT

In this section we consider one simulation algorithm - qDRIFT. We focus on qDRIFT rather than Trotter for our experiments because qDRIFT is easier to analyze numerically. This is because Trotter errors subtly depend on operator ordering. Specifically we consider a Hamiltonian $H=\mathop{\sum }\nolimits_{j = 1}^{L}{h}_{j}{P}_{j}$ and sample Pauli operators to apply in each short time step, as described earlier in the paper and in⁴³. We can assume each h_j > 0, since the negation affects the angles of the rotation gates. In qDRIFT, in each iteration one Pauli term is sampled up to a total of N samples. The probability of sampling P_j is h_j/∑_ih_i and is then simulated for a short time period. We consider another procedure where we re-write the Hamiltonian as $H=\mathop{\sum }\nolimits_{j = 1}^{L{\prime} }{h}_{j}^{{\prime} }{H}_{j}$, where each H_j = ∑_iP_i, is a sum of commuting Paulis. In each iteration one H_j is sampled with probability $\frac{{h}_{j}^{{\prime} }}{{\sum }_{i}{h}_{i}^{{\prime} }}$ and simulated for a short time period. Then we compare the growth of error, number of R_z/cR_z, Toffoli gates used in these two procedures - (i) one Pauli sampled in each iteration, (ii) group of commuting Paulis sampled in each iteration.

For our first set of experiments we examine the case of simulation of 4 and 6-qubit Heisenberg models. The coefficients J_x, J_y, J_z, d_h have been sampled from a 0 mean normal distribution with variance 1. In Fig. 7 we show that we achieve better scaling of R_z/cR_z when multiple commuting Pauli operators are sampled and evolved in each iteration. In fact, the error also scales well with the number of iterations, i.e. we can achieve the same error in less number of iterations, or in another way, it is possible to achieve much lower error in the same time (iterations) when multiple operators are sampled. We calculate error as:

$$\,{{\mbox{Error}}}\,={{\mathbb{E}}}_{\rho }(\parallel {{{{\mathcal{E}}}}}_{2}(\rho )-{{{{\mathcal{E}}}}}_{1}(\rho ){\parallel }_{{l}_{2}})$$

Where ${{{{\mathcal{E}}}}}_{1}={e}^{iHt}\rho {e}^{-iHt}$, $\tau =t\cdot \left({\sum }_{j}{h}_{j}\right)/N$ and $\parallel \cdot {\parallel }_{{l}_{2}}$ is the induced Euclidean norm on matrices and ${{\mathbb{E}}}_{\rho }$ is the Haar average over input states. We obtain ${{{{\mathcal{E}}}}}_{2}$ through averaging M random qDRIFT protocols, where M varies from 100 to 3000 for our purposes. These values are chosen to ensure that the sampling error is small at the scale of the plots generated.

$${V}_{k}=\mathop{\prod}\limits_{{j}_{{i}_{k}}}{e}^{i{H}_{{j}_{{i}_{k}}}\tau }$$

$${{{{\mathcal{E}}}}}_{2}=\frac{1}{M}\mathop{\sum }\limits_{k=1}^{M}{V}_{k}\rho {V}_{k}^{{\dagger} }$$

In our experiments ρ is randomly drawn rather than chosen to maximize the diamond distance. As a result, this does not give a tight upper bound on the error quantified by any induced channel norm. Further, all evolution is done using t = 1 and the groupings are hand optimized using counts given in Supplementary Method 5. The data, tabulated in Fig. 7, shows that the number of iterations of the qDRIFT channel needed to simulate the dynamics to bound the error below a particular value, is reduced by a factor of 2.34 and 2.8 through the use of grouping commuting terms for the randomly chosen 4 and 6 qubit Heisenberg Hamiltonian respectively. The number of rotations is found to be reduced by a factor of roughly 2.34 for the 4 qubit ensemble but 1.8 for the 6 qubit case. This suggests that the groupings that we consider, while highly successful at reducing the number of iterations of qDRIFT needed, the number of gates per iteration increases from the 4 to 6 qubit examples. This suggests that further computer aided optimization may be needed in order to see the full benefit of such groupings as we increase the size of models.

**Fig. 7: Simulation of 4 and 6-qubit Heisenberg Hamiltonian.**

Similar observations can be made for our second set of experiments where we simulate the Hamiltonian of H₂ and LiH (with freezing in the STO-3G basis). The plots in Fig. 8 show that in case of H₂, the number of iterations of the qDRIFT channel needed to simulate the dynamics to bound the error below a particular value, is reduced by a factor of 4 through the use of grouping commuting terms. For LiH this factor is nearly 2.1. The number of rotations is found to be reduced by a factor of roughly 3.2 for H₂ and 2 for LiH.

**Fig. 8: Simulation of H₂ and LiH Hamiltonian.**

For all the experiments that we consider the Toffoli-pair gate count is comparable with the R_z/cR_z count, so the Toffoli pairs do not contribute significantly to the overall T-count, as compared to the rotation gates. The number of gates depend on the diagonalizing circuits and the grouping into commuting Paulis. In this paper we have shown the set of results for the eigenbasis or grouping that were better among the options considered by us. In Supplementary Method 5 we have explicitly mentioned the Hamiltonians, the groupings and given a short description of how we obtained the rotation and Toffoli costs.

All plots, code, and data can be found online in our public repository https://github.com/SNIPRS/hamiltonian. All code was written in Python. Our results were obtained partly with computing resources in the Cedar cluster of Compute Canada. Specifically, our code was run on an Intel(R) Xeon(R) E5-2683 v4 CPU at 2.10 GHz, utilizing 48 cores, up to 12GBs of RAM, and running Gentoo Linux 2.6. For the Heisenberg Hamiltonians, our results were obtained using 12 cores of an Intel(R) Core(TM) i5-12600K CPU at 3.6 GHz running Ubuntu 20.04.4 and up to 32GBs of RAM.

Discussion

In this paper, we have considered the problem of designing efficient quantum circuits for exponential of Hamiltonians that can be expressed as sum of Paulis. In contrast with most previous approaches, we synthesize circuit for a sum of exponentiated commuting Paulis, rather than concatenate circuits for each exponentiated Pauli. These resulting circuits are observed, for some parameter combinations, to require far fewer non-Clifford operations than the standard circuits. We therefore propose an algorithm for greedily compiling a Trotter or qDRIFT simulation into a sequence of such simulations and observe that when multiple rotations are grouped we see at fixed error that a factor of roughly 1.8−3.2 fewer rotations are needed to simulate 6 and 4-qubit Heisenberg models, LiH, H₂. Also, for simulation protocols like qDRIFT, it is possible to achieve a better performance, in the sense that the error accumulated per iteration is less if we sample multiple commuting Paulis. The overall non-Clifford gate cost of the entire protocol is also less.

There are a number of interesting avenues that are revealed by this work. The first is that a more complete set of rules for compiling Hamiltonian terms into sets that can be easily exponentiated reveals the potential for more efficient simulation compilation of Hamiltonians. These replacement rules, once identified, can be used inside a more systematic Hamiltonian compiler package that would allow more substantial optimizations of the Hamiltonian for the given simulation method. This raises a second issue, while in this work we focus on the case of optimizing Trotter and related simulation methods, similar considerations could be performed for optimizing the prepare and select circuits used in LCU/qubitization simulation algorithms. Such procedures are harder to optimize as the simulation algorithm does not factorize as nicely into independent simulations; however, the importance of these simulation methods makes the development of compilation strategies essential.

Finally, an important avenue hinted at by this work is the possibility that approximate unitary synthesis methods can be combined with quantum simulation routines to further reduce the cost. If fermionic swaps are used, for example, simulation reduces to implementing a series of 4-local Hamiltonians and optimal circuits can be in principle constructed for such Hamiltonians using existing approaches. The computational overheads required for optimal (approximate) synthesis of these unitaries makes this a daunting task; however, if a sufficient lexicon of cheap unitaries are found for such simulation then it will not only lead to lower costs for quantum simulation using Trotter/qDRIFT: it will also unify Hamiltonian compilation with circuit synthesis into a single conceptual framework.

Data availability

All plots and data can be found online in our public repository https://github.com/SNIPRS/hamiltonian.

Code availability

All code can be found online in our public repository https://github.com/SNIPRS/hamiltonian.

References

Feynman, R. P. et al. Simulating physics with computers. Int. J. Theor. Phys. 21, 467–488 (1982).
Article MathSciNet Google Scholar
Lloyd, S. Universal quantum simulators. Science 273, 1073–1078 (1996).
Article ADS MathSciNet MATH Google Scholar
Childs, A. M., Maslov, D., Nam, Y., Ross, N. J. & Su, Y. Toward the first quantum simulation with quantum speedup. Proc. Natl. Acad. Sci. 115, 9456–9461 (2018).
Article ADS MathSciNet MATH Google Scholar
Suzuki, M. General theory of fractal path integrals with applications to many-body theories and statistical physics. J. Math. Phys. 32, 400–407 (1991).
Article ADS MathSciNet MATH Google Scholar
Trotter, H. F. On the product of semi-groups of operators. Proc. Am. Math. Soc. 10, 545–551 (1959).
Article MathSciNet MATH Google Scholar
Berry, D. W. & Childs, A. M. Black-box Hamiltonian simulation and unitary implementation. Quantum Inf. Comput. 12, 29–62 (2012).
MathSciNet MATH Google Scholar
Childs, A. M. & Wiebe, N. Hamiltonian simulation using linear combinations of unitary operations. Quantum Inf. Comput. 12, 901–924 (2012).
MathSciNet MATH Google Scholar
Berry, D. W., Childs, A. M., Cleve, R., Kothari, R. & Somma, R. D. Simulating Hamiltonian dynamics with a truncated Taylor series. Phys. Rev. Lett. 114, 090502 (2015).
Article ADS Google Scholar
Low, G. H. & Chuang, I. L. Optimal Hamiltonian simulation by quantum signal processing. Phys. Rev. Lett. 118, 010501 (2017).
Article ADS MathSciNet Google Scholar
Babbush, R. et al. Low-depth quantum simulation of materials. Phys. Rev. X 8, 011044 (2018).
Google Scholar
Cao, Y. et al. Quantum chemistry in the age of quantum computing. Chem. Rev. 119, 10856–10915 (2019).
Article Google Scholar
Jordan, S. P., Lee, K. S. M. & Preskill, J. Quantum algorithms for quantum field theories. Science 336, 1130–1133 (2012).
Article ADS Google Scholar
Lanyon, B. P. et al. Towards quantum chemistry on a quantum computer. Nat. Chem. 2, 106–111 (2010).
Article Google Scholar
McArdle, S., Endo, S., Aspuru-Guzik, A., Benjamin, S. C. & Yuan, X. Quantum computational chemistry. Rev Mod Phys 92, 015003 (2020).
Article ADS MathSciNet Google Scholar
Poulin, D. et al. The Trotter step size required for accurate quantum simulation of quantum chemistry. Quantum Inf. Comput. 15, 361–384 (2015).
MathSciNet Google Scholar
Wecker, D., Bauer, B., Clark, B. K., Hastings, M. B. & Troyer, M. Gate-count estimates for performing quantum chemistry on small quantum computers. Phys. Rev. A 90, 022305 (2014).
Article ADS Google Scholar
Keever, C. M. & Lubasch, M. Classically optimized Hamiltonian simulation. Preprint at https://arXiv.org/quant-ph/2205.11427 (2022).
Abrams, D. S. & Lloyd, S. Quantum algorithm providing exponential speed increase for finding eigenvalues and eigenvectors. Phys. Rev. Lett. 83, 5162 (1999).
Article ADS Google Scholar
Peruzzo, A. et al. A variational eigenvalue solver on a photonic quantum processor. Nat. Commun. 5, 1–7 (2014).
Article Google Scholar
Wang, D., Higgott, O. & Brierley, S. A generalised variational quantum eigensolver. Preprint at https://arXiv.org/quant-ph/1802.00171 (2018).
O’Brien, T. E., Tarasinski, B. & Terhal, B. M. Quantum phase estimation for noisy, small-scale experiments. Preprint at https://arXiv.org/quant-ph/1809.09697 (2018).
Berry, D. W. High-order quantum algorithm for solving linear differential equations. J. Phys. A Math. Theor. 47, 105301 (2014).
Article ADS MathSciNet MATH Google Scholar
Brandao, F. G. S. L. & Svore, K. M. Quantum speed-ups for solving semidefinite programs. In: Proc. 58th Ann. Symp. on Foundations of Computer Science 415–426 (IEEE, 2017).
Childs, A. M. et al. Exponential algorithmic speedup by a quantum walk. In: Proc. 35th Ann. Symp. on Theory of Computing 59–68 (ACM, 2003).
Farhi, E., Goldstone, J. & Gutmann, S. A quantum algorithm for the Hamiltonian NAND tree. Theory Comput. 4, 169–190 (2008).
Article MathSciNet MATH Google Scholar
Harrow, A. W., Hassidim, A. & Lloyd, S. Quantum algorithm for linear systems of equations. Phys. Rev. Lett. 103, 150502 (2009).
Article ADS MathSciNet Google Scholar
Mosca, M. & Mukhopadhyay, P. A polynomial time and space heuristic algorithm for T-count. Quantum Sci. Technol. 7, 015003 (2021).
Article ADS Google Scholar
Gheorghiu, V., Mosca, M. & Mukhopadhyay, P. A (quasi-)polynomial time heuristic algorithm for synthesizing T-depth optimal circuits. NPJ Quantum Inf 8, 110 (2022).
Article ADS Google Scholar
Gheorghiu, V., Mosca, M. & Mukhopadhyay, P. T-count and T-depth of any multi-qubit unitary. NPJ Quantum Inf. 8, 141 (2022).
Article ADS Google Scholar
Amy, M., Maslov, D. & Mosca, M. Polynomial-time T-depth optimization of Clifford+T circuits via matroid partitioning. IEEE Trans. Computer-Aided Design Integr. Circuits Syst. 33, 1476–1489 (2014).
Article Google Scholar
Duncan, R., Kissinger, A., Perdrix, S. & Van De Wetering, J. Graph-theoretic simplification of quantum circuits with the ZX-calculus. Quantum 4, 279 (2020).
Article Google Scholar
Häner, T. & Soeken, M. Lowering the T-depth of quantum circuits by reducing the multiplicative depth of logic networks. Preprint at https://arXiv.org/quant-ph/2006.03845 (2020).
Patel, K. N., Markov, I. L. & Hayes, J. P. Optimal synthesis of linear reversible circuits. Quantum Inf Comput 8, 282–294 (2008).
MathSciNet MATH Google Scholar
Amy, M., Azimzadeh, P. & Mosca, M. On the controlled-NOT complexity of controlled-NOT–phase circuits. Quantum Sci. Technol. 4, 015002 (2018).
Article ADS Google Scholar
Gheorghiu, V., Jiaxin, H., Li, S. M., Mosca, M. & Mukhopadhyay, P. Reducing the CNOT count for Clifford+T circuits on NISQ architectures. IEEE Trans. Computer-Aided Design Integr. Circuits Syst. (2022).
Fowler, A. G., Stephens, A. M. & Groszkowski, P. High-threshold universal quantum computation on the surface code. Phys. Rev. A 80, 052312 (2009).
Article ADS Google Scholar
Aliferis, P., Gottesman, D. & Preskill, J. Quantum accuracy threshold for concatenated distance-3 codes. Quantum Inf. Comput. 6, 97–165 (2006).
MathSciNet MATH Google Scholar
Fowler, A. G., Mariantoni, M., Martinis, J. M. & Cleland, A. N. Surface codes: Towards practical large-scale quantum computation. Phys. Rev. A 86, 032324 (2012).
Article ADS Google Scholar
Bravyi, S. & Gosset, D. Improved classical simulation of quantum circuits dominated by Clifford gates. Phys. Rev. Lett. 116, 250501 (2016).
Article ADS Google Scholar
Bravyi, S., Smith, G. & Smolin, J. A. Trading classical and quantum computational resources. Phys. Rev. X 6, 021043 (2016).
Google Scholar
Paetznick, A. & Reichardt, B. W. Universal fault-tolerant quantum computation with only transversal gates and error correction. Phys. Rev. Lett. 111, 090505 (2013).
Article ADS Google Scholar
Kitaev, A. Y. Fault-tolerant quantum computation by anyons. Ann. Phys. 303, 2–30 (2003).
Article ADS MathSciNet MATH Google Scholar
Campbell, E. Random compiler for fast Hamiltonian simulation. Phys. Rev. Lett. 123, 070503 (2019).
Article ADS Google Scholar
Huggins, W. J. et al. Efficient and noise resilient measurements for quantum chemistry on near-term quantum computers. NPJ Quantum Inf. 7, 1–9 (2021).
Article Google Scholar
Jordan, P. & Wigner, E. P. in The Collected Works of Eugene Paul Wigner 109–129 (Springer, 1993).
Bravyi, S. B. & Kitaev, A. Y. Fermionic quantum computation. Ann. Phys. 298, 210–226 (2002).
Article ADS MathSciNet MATH Google Scholar
Jones, C. Low-overhead constructions for the fault-tolerant Toffoli gate. Phys. Rev. A 87, 022328 (2013).
Article ADS Google Scholar
Gidney, C. Halving the cost of quantum addition. Quantum 2, 74 (2018).
Article Google Scholar
He, Y., Luo, M. X., Zhang, E., Wang, H. K. & Wang, X. F. Decompositions of n-qubit Toffoli gates with linear circuit complexity. Int. J. Theor. Phys. 56, 2350–2361 (2017).
Article MathSciNet MATH Google Scholar
Reiher, M., Wiebe, N., Svore, K. M., Wecker, D. & Troyer, M. Elucidating reaction mechanisms on quantum computers. Proc. Natl. Acad. Sci. 114, 7555–7560 (2017).
Article ADS Google Scholar
van den Berg, E. & Temme, K. Circuit optimization of Hamiltonian simulation by simultaneous diagonalization of Pauli clusters. Quantum 4, 322 (2020).
Article Google Scholar
Kawase, Y. & Fujii, K. Fast classical simulation of Hamiltonian dynamics by simultaneous diagonalization using Clifford transformation with parallel computation. Comput. Phys. Commun. (2023).
Whitfield, J. D., Biamonte, J. & Aspuru-Guzik, A. Simulation of electronic structure Hamiltonians using quantum computers. Mol. Phys. 109, 735–750 (2011).
Article ADS Google Scholar
Barkoutsos, P. K. L. et al. Quantum algorithms for electronic structure calculations: Particle-hole Hamiltonian and optimized wave-function expansions. Phys. Rev. A 98, 022322 (2018).
Article ADS Google Scholar
Kivlichan, I. D. et al. Quantum simulation of electronic structure with linear depth and connectivity. Phys. Rev. Lett. 120, 110501 (2018).
Article ADS MathSciNet Google Scholar
Ganzhorn, M. et al. Gate-efficient simulation of molecular eigenstates on a quantum computer. Phys. Rev. Appl. 11, 044092 (2019).
Article ADS Google Scholar
Gard, B. T. et al. Efficient symmetry-preserving state preparation circuits for the variational quantum eigensolver algorithm. NPJ Quantum Inf. 6, 1–9 (2020).
Article ADS Google Scholar
Lepage, H. V., Lasek, A. A., Arvidsson-Shukur, D. R. M. & Barnes, C. H. W. Entanglement generation via power-of-SWAP operations between dynamic electron-spin qubits. Phys. Rev. A 101, 022329 (2020).
Article ADS Google Scholar
Yordanov, Y. S., Arvidsson-Shukur, D. R. M. & Barnes, C. H. W. Efficient quantum circuits for quantum computational chemistry. Phys. Rev. A 102, 062612 (2020).
Article ADS Google Scholar
Gulania, S., Peng, B., Alexeev, Y. & Govind, N. Quantum time dynamics of 1D-Heisenberg models employing the Yang-Baxter equation for circuit compression. Preprint at https://arXiv.org/quant-ph/2112.01690 (2021).
Childs, A. M., Su, Y., Tran, M. C., Wiebe, N. & Zhu, S. Theory of Trotter error with commutator scaling. Phys. Rev. X 11, 011020 (2021).
Google Scholar
Nielsen, M. A. & Chuang, I. L. Quantum Computation and Quantum Information (Cambridge University Press, 2010).
Berry, D. W., Ahokas, G., Cleve, R. & Sanders, B. C. Efficient quantum algorithms for simulating sparse Hamiltonians. Commun. Math. Phys. 270, 359–371 (2007).
Article ADS MathSciNet MATH Google Scholar
Szabo, A. & Ostlund, N. S. Modern Quantum Chemistry: Introduction to Advanced Electronic Structure Theory (Courier Corporation, 2012).
Helgaker, T., Jorgensen, P. & Olsen, J. Molecular Electronic-structure Theory (John Wiley & Sons, 2014).
Feller, D. The role of databases in support of computational chemistry calculations. J. Comput. Chem. 17, 1571–1586 (1996).
Article Google Scholar
Schuchardt, K. L. et al. Basis set exchange: a community database for computational sciences. J. Chem. Inf. Model 47, 1045–1052 (2007).
Article Google Scholar
Verstraete, F., Cirac, J. I. & Latorre, J. I. Quantum circuits for strongly correlated quantum systems. Phys. Rev. A 79, 032316 (2009).
Article ADS Google Scholar
Barenco, A. et al. Elementary gates for quantum computation. Phys. Rev. A 52, 3457 (1995).
Article ADS Google Scholar
da Silva, A. J. & Park, D. K. Linear-depth quantum circuits for multi-qubit controlled gates. Phys. Rev. A 106, 042602 (2022).
Article ADS Google Scholar
Karnaugh, M. The map method for synthesis of combinational logic circuits. Trans. Am. Inst. Elect. Eng. I: Commun. Electron. 72, 593–599 (1953).
MathSciNet Google Scholar
Brayton, R. K., Hachtel, G. D., Hemachandra, L. A., Newton, A. R. & Sangiovanni-Vincentelli, A. L. M. A comparison of logic minimization strategies using espresso: An APL program package for partitioned logic minimization. In: Proc. Int. Symposium on Circuits Systems 42–48 (IEEE, 1982).
Fazekas, P. Lecture Notes on Electron Correlation and Magnetism Vol. 5 (World Scientific, 1999).
de PR Moreira, I. & Illas, F. A unified view of the theoretical description of magnetic coupling in molecular chemistry and solid state physics. Phys. Chem. Chem. Phys. 8, 1645–1659 (2006).
Article Google Scholar
Skomski, R. Simple Models of Magnetism (Oxford University Press, 2008).
Pires, A. S. T. & Sergio, A. Theoretical Tools for Spin Models in Magnetic Systems (IOP Publishing Bristol, 2021).

Download references

Acknowledgements

P.M. wishes to thank NTT Research for their financial and technical support. This work was supported in part by Canada’s NSERC. Research at IQC is supported in part by the Government of Canada through Innovation, Science and Economic Development Canada. This research was enabled in part by support provided by WestGrid (www.westgrid.ca) and Compute Canada Calcul Canada (www.computecanada.ca). N.Wiebe acknowledges funding from the Google Quantum Research Program, the Natural Sciences and Engineering Research Council of Canada and this work on this project was supported by the U.S. Department of Energy, Office of Science, National Quantum Information Science Research Centers, Co-Design Center for Quantum Advantage under contract number DE-SC0012704.

Author information

Authors and Affiliations

Institute for Quantum Computing, University of Waterloo, Waterloo, ON, Canada
Priyanka Mukhopadhyay
Department of Combinatorics and Optimization, University of Waterloo, Waterloo, ON, Canada
Priyanka Mukhopadhyay
Department of Computer Science, University of Toronto, Toronto, ON, Canada
Nathan Wiebe
Pacific Northwest National Laboratory, Richland, WA, USA
Nathan Wiebe
Department of Mathematics, University of Toronto, Toronto, ON, Canada
Hong Tao Zhang

Authors

Priyanka Mukhopadhyay
View author publications
You can also search for this author in PubMed Google Scholar
Nathan Wiebe
View author publications
You can also search for this author in PubMed Google Scholar
Hong Tao Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The ideas were given by P.M. and N.W. The implementations were done by H.T.Z. All the authors contributed in the preparation of the manuscript.

Corresponding author

Correspondence to Priyanka Mukhopadhyay.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Mukhopadhyay, P., Wiebe, N. & Zhang, H.T. Synthesizing efficient circuits for Hamiltonian simulation. npj Quantum Inf 9, 31 (2023). https://doi.org/10.1038/s41534-023-00697-6

Download citation

Received: 08 September 2022
Accepted: 10 March 2023
Published: 03 April 2023
DOI: https://doi.org/10.1038/s41534-023-00697-6

Subjects

Abstract

Similar content being viewed by others

Hamiltonian simulation algorithms for near-term quantum hardware

A quantum solution for efficient use of symmetries in the simulation of many-body systems

Simulating quantum computations with Tutte polynomials

Introduction

Our contributions

How we compare the cost of non-Clifford resources

Related work

Results

Notation

Optimizing Trotter-decompositions

Algorithm 1

Truncating Hamiltonian

Lemma 2.1

Expected cost

Error in simulation while sampling multiple Paulis

Lemma 2.2

Optimized circuits for quantum chemistry

Optimizing two-body operator exponentials

Case I

Case II

Case III

Overlap on 1 qubit

Circuit for simulating \({e}^{-i{H}_{1y}t}\)

Theorem 2.1

Case I

Case II

Case III

Circuit for simulating \({e}^{-i{H}_{1x}t}\)

Case I

Case II

Case III

Overlap on 2 qubits

Circuit for simulating \({e}^{-i{H}_{21}t}\)

Theorem 2.2

Case I (II)

Case III

Circuit for simulating \({e}^{-i{H}_{20}t}\)

Case I

Case II

Case III

Overlap on 3 qubits

Circuit for simulating \({e}^{-i{H}_{3y}t}\)

Theorem 2.3

Case I

Case II

Case III

Circuit for simulating \({e}^{-i{H}_{3x}t}\)

Case I

Case II

Case III

Circuit for arbitrary exponentiated Hamiltonians

Lemma 2.3

Proof

Lemma 2.4

Illustration—Quantum Heisenberg and quantum Ising model

G is a cycle

G is a complete graph

Reducing the number of Toffoli gates

Application : Simulating with qDRIFT

Discussion

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article