The electronic structure (ES) problem, namely, solving for the ground- or low-lying eigenstates of the Schrödinger equation for atoms, molecules, and materials, is an important problem in theoretical chemistry and physics. There are several approaches to solving this problem on a quantum computer, including projecting approximate solutions to eigenstates using phase estimation1,2,3, directly preparing eigenstates using the adiabatic algorithm4,5,6, or using quantum variational algorithms7,8 to optimize parameterized circuits corresponding to unitary Coupled Cluster (uCC)9,10,11 or approximate adiabatic state preparation12,13.

Time evolution, under the Hamiltonian or the uCC cluster operator, is a common component in these algorithms. For near-term quantum devices (especially with limited connectivity), Trotter-Suzuki based methods for time evolution, (which break the evolution of a complex sum of operators such as the Hamiltonian into a sequence of Trotter steps each evolving only a single term in the sum) are most compelling since they lack the complex controlled operations required by asymptotically more precise methods14,15,16. In order to perform a discrete simulation, the Hamiltonian or cluster operator is first represented in a single-particle basis of dimension N. However, in many bases, including the molecular orbital and active spaces bases common in ES, the Hamiltonian and cluster operator contain \({\mathcal{O}}({N}^{4})\) second-quantized terms. This leads to at least \({\mathcal{O}}({N}^{4})\) gate complexity for a single Trotter step17,18, a formidable barrier to practical progress. While complexity can be reduced using alternative bases19,20, such representations are not usually as compact as the molecular orbital one (i.e. a longer basis expansion is required to represent typical physical states of interest). Thus, reducing the cost of the Trotter step for general bases is an important goal, particularly within the context of near-term simulation paradigms.

In this article, we introduce a method to rewrite the Trotter step of a general quantum chemistry Hamiltonian evolution, as well other exponentials of quartic fermionic operators such as the uCC operator, solely in terms of unitary single-particle rotations and Trotter steps of two-body operators. This allows for the efficient implementation of Trotter steps on a linearly connected quantum device within the Jordan-Wigner fermionic encoding20,21. It further allows for a systematic low-rank truncation (whose applicability derives from sparsity in the number of terms in the sum) which can executed in a gate-count efficient manner by the above class of circuits. The procedure starts from a nested matrix factorization of the four-body two-electron interaction term, related to the one described in the Hamiltonian evolution context by Poulin et al.22. A key additional idea is that the nested matrix factorization exposes a low-rank structure when the interaction term is a physical operator. This was observed empirically in the classical electronic structure context for the Hamiltonian by Peng and Kowalski in ref. 23 (although that work obtained an incorrect empirical scaling) and studied more deeply by some of us in ref. 24, where we presented mathematical evidence for the correct scaling. Here, we demonstrate that the low-rank structure allows one to perform truncations that significantly reduce the gate complexity of the Trotter step for the Hamiltonian operator, as well as for the unitary cluster operator. In particular, we achieve a Hamiltonian Trotter step with an asymptotic gate complexity scaling as \({\mathcal{O}}({N}^{2})\) with system size, and \({\mathcal{O}}({N}^{3})\) for fixed systems and increasing basis size. These scalings require only linear nearest-neighbor connectivity. We give numerical evidence that we can carry out a Hamiltonian Trotter step on a 50 qubit quantum chemical problem with as few as 4000 layers of two-qubit gates on a linear nearest-neighbor architecture, a viable target for implementation on near-term quantum devices. Compiled to Clifford gates and single-qubit rotations, this requires fewer than 105 non-Clifford rotations, an improvement of orders of magnitude over past Trotter-based methods in a fault-tolerant cost model25.


Double-decomposition structure

We first define the Hamiltonian H and cluster operator τ. In second quantization H is

$$H=\mathop{\sum }\limits_{pq=1}^{N}{h}_{pq}{a}_{p}^{\dagger }{a}_{q}+\frac{1}{2}\mathop{\sum }\limits_{pqrs=1}^{N}{h}_{pqrs}{a}_{p}^{\dagger }{a}_{q}^{\dagger }{a}_{r}{a}_{s}\equiv h+V\ \ ,$$

where \({a}_{p}^{\dagger }\) and ap are fermionic creation and annihilation operators for spin orbital ϕp, and the scalar coefficients hpq and hpqrs are the one- and two-electron integrals over the basis functions ϕp (here assumed real).

The uCC cluster operator τ = T − T, where T is the standard (non-unitary) coupled cluster (CC) operator. For uCCSD (uCC with single and double excitations applied to a single determinant reference),

$$\begin{array}{lll}\tau \,=\,\mathop{\sum }\limits_{i=1}^{{N}_{\text{o}}}\mathop{\sum }\limits_{a={N}_{\text{o}}+1}^{N}{t}_{ai}({a}_{a}^{\dagger }{a}_{i}-{a}_{i}^{\dagger }{a}_{a})\\ \qquad\,+\,\frac{1}{4}\mathop{\sum }\limits_{ij=1}^{{N}_{\text{o}}}\mathop{\sum }\limits_{ab={N}_{\text{o}}+1}^{N}{t}_{abij}({a}_{a}^{\dagger }{a}_{b}^{\dagger }{a}_{i}{a}_{j}-{a}_{i}^{\dagger }{a}_{j}^{\dagger }{a}_{a}{a}_{b})\\ \quad\equiv \,\mathop{\sum }\limits_{pq=1}^{N}{t}_{pq}^{\prime}{a}_{p}^{\dagger }{a}_{q}+\frac{1}{4}\mathop{\sum }\limits_{pqrs=1}^{N}{t}_{pqrs}^{\prime}{a}_{p}^{\dagger }{a}_{q}^{\dagger }{a}_{r}{a}_{s}\quad ,\end{array}$$

where ij, ab index the No occupied and Nv virtual spin orbitals respectively, and No + Nv = N. For the scaling arguments with system size, we assume No, NvN, while increasing basis size corresponds to increasing Nv only. Both H and τ contain \({\mathcal{O}}({N}^{4})\) second-quantized terms. Thus, for arbitrary Hamiltonian integrals or cluster amplitudes, regardless of the gate decomposition or fermion encoding used, implementing the time-evolution Trotter step requires at least \({\mathcal{O}}({N}^{4})\) gates.

The integrals and cluster amplitudes that one encounters in molecular ES applications, however, are not arbitrary, but contain considerable structure. We now show that this allows us to construct approximate operators \(H^{\prime}\) or \(\tau ^{\prime}\), accurate to within a desired tolerance ε, that can be implemented with greatly reduced gate counts. The physical basis for this result is the pairwise-nature of the Hamiltonian interactions, arising from the 1/r12 Coulomb kernel in real-space. More precisely, we will rewrite the two-fermion parts of H and iτ associated with the integrals hpqrs and \({t}_{pqrs}^{\prime}\) as a double-factorized form

$$\mathop{\sum }\limits_{pq=1}^{N}{S}_{pq}{a}_{p}^{\dagger }{a}_{q}+\mathop{\sum }\limits_{\ell =1}^{L}\mathop{\sum }\limits_{ij=1}^{{\rho }_{\ell }}\frac{{\lambda }_{i}^{(\ell )}{\lambda }_{j}^{(\ell )}}{2}{n}_{i}^{(\ell )}{n}_{j}^{(\ell )}\equiv S+\mathop{\sum }\limits_{\ell =1}^{L}{V}^{(\ell )}\ ,$$

where, defining \({\psi }_{i}^{(\ell )}=\mathop{\sum }\nolimits_{p = 1}^{N}{U}_{pi}^{(\ell )}{\phi }_{p}\),

$${n}_{i}^{(\ell )}=\mathop{\sum }\limits_{ps=1}^{N}{U}_{pi}^{(\ell )}{a}_{p}^{\dagger }{a}_{s}{U}_{si}^{(\ell )}={a}_{{\psi }_{i}^{(\ell )}}^{\dagger }{a}_{{\psi }_{i}^{(\ell )}}^{}$$

are number operators in a rotated basis. Approximate \(H^{\prime}\) and \(\tau ^{\prime}\) with reduced complexity can then be obtained by truncating the summations over L, ρ. The dependence of the error ε on L and ρ is discussed further below.

The doubly-decomposed form of V can be obtained using a nested matrix factorization, a type of tensor factorization introduced in ref. 23. We illustrate this for the Hamiltonian operator. First, the creation and annihilation operators are reordered,

$$V=\frac{1}{2}\mathop{\sum }\limits_{pqrs=1}^{N}{h}_{ps,qr}({a}_{p}^{\dagger }{a}_{s}{a}_{q}^{\dagger }{a}_{r}-{a}_{p}^{\dagger }{a}_{r}{\delta }_{qs})=V^{\prime} +S\ ,$$

and \(V^{\prime}\) is recast into a supermatrix indexed by orbitals (ps), (qr) involving electrons 1,2 respectively. Due to the eight-fold symmetry hpqrs = hsrqp = hpqsr = hqprs = hqpsr = hrsqp = hrspq = hsrpq this matrix is real symmetric, thus we can write a matrix decomposition in terms of a rank-three auxiliary tensor \({\mathcal{L}}\) such that

$$V^{\prime} =\mathop{\sum }\limits_{\ell =1}^{L}{\left({{\mathcal{L}}}^{(\ell )}\right)}^{2}=\mathop{\sum }\limits_{\ell =1}^{L}\mathop{\sum }\limits_{pqrs=1}^{N}{{\mathcal{L}}}_{ps}^{(\ell )}{{\mathcal{L}}}_{qr}^{(\ell )}{a}_{p}^{\dagger }{a}_{s}{a}_{q}^{\dagger }{a}_{r}.$$

A simple way to obtain \({\mathcal{L}}\) is to diagonalize the \(V^{\prime}\) supermatrix, although other techniques26,27,28,29,30,31,32, such as Cholesky decomposition (CD), are also commonly used; we use the Cholesky decomposition in our numerical simulations below. (Note that the positive nature of the Coulomb kernel means that \(V^{\prime}\) is positive also, as is used in the Cholesky decomposition). The next step is to decompose each auxiliary matrix \({{\mathcal{L}}}^{(\ell )}\). Again for the Hamiltonian, this is also real symmetric, thus we can similarly diagonalize it,

$$\mathop{\sum }\limits_{ps=1}^{N}{{\mathcal{L}}}_{ps}^{(\ell )}{a}_{p}^{\dagger }{a}_{s}=\mathop{\sum }\limits_{i=1}^{{\rho }_{\ell }}{U}_{pi}^{(\ell )}{\lambda }_{i}^{(\ell )}{U}_{si}^{(\ell )}{a}_{p}^{\dagger }{a}_{s}\quad ,$$

where λ(), U() are the eigenvalues and eigenvectors of \({{\mathcal{L}}}^{(\ell )}\). Combining the two eigenvalue decompositions yields the double-factorized result, Eq. (3).

In the cluster operator, amplitudes t have four-fold mixed symmetry and antisymmetry, tabij = tjiba = −tbaij = − tabji. Thus because the cluster operator is not simply symmetric, we cannot use a Cholesky decomposition and must modify the arguments above. However, as shown in the Supplementary Discussion, using a singular value decomposition of the cluster operator, we can write \(i\tau ={\sum }_{\ell \mu }{Y}_{\ell ,\mu }^{2}\), where Y,μ are normal and can be diagonalized giving the same double-factorized form.

Accuracy of low-rank approximations

For an exact decomposition of H, L = N2 and ρ = N. However, it is well established from empirical ES calculations that the ranks L and ρ can be significantly reduced if we approximate H by truncating small terms. The low-rank truncations are performed as follows: in the case of L, we truncate the CD in the AO basis or localized MO basis based on the L norm, i.e. use the smallest L such that \({\max }_{psqr}| {h}_{ps,qr}-\mathop{\sum }\nolimits_{\ell = 1}^{L}{{\mathcal{L}}}_{ps}^{(\ell )}{{\mathcal{L}}}_{qr}^{(\ell )}| \,<\,{\varepsilon }_{\text{CD}}\). The computational cost of the modified Cholesky decomposition scheme is known to scale asymptotically like \({\mathcal{O}}({N}^{3})\) within the AO basis for a fixed error threshold33. For ρ, we perform an eigenvalue truncation (ET) based on the L1 norm, i.e. use the smallest ρ such that \(\mathop{\sum }\nolimits_{j = {\rho }_{\ell }+1}^{N}| {\lambda }_{j}^{(\ell )}|\, <\,{\varepsilon }_{\text{ET}}\). Note that for H, εCD and εET have dimension energy and square root of energy. For simplicity, we have chosen εCD = εET ≡ ε in atomic units. For this truncation of H it has been shown that when increasing the molecular size or simulation basis, \(L \sim {\mathcal{O}}(N)\), while \(\langle {\rho }_{\ell }\rangle =\frac{1}{L}\mathop{\sum }\nolimits_{\ell = 1}^{L}{\rho }_{\ell } \sim {\mathcal{O}}(1)\) for increasing molecular size in the asymptotic limit24.

For the uCC operator, the amplitudes t can be cast in a supermatrix tai,bj that is symmetric, tai,bj = tbj,ai, but not positive. Therefore, we substitute the Cholesky decomposition with a singular value decomposition, obtaining \(L \sim {\mathcal{O}}(N)\) with increasing basis size but \(L \sim {\mathcal{O}}({N}^{2})\) with increasing molecular size (albeit with a small coefficient); the scaling properties of ρ,μ have not previously been studied. Note that, for the uCC operator, the truncations εSVD, εET are dimensionless, unlike for H.

In Fig. 1 we show L and 〈ρ〉 for different truncation thresholds in: (set 1) a variety of molecules that can be represented with a modest number of qubits (CH4, H2O, CO2, NH3, H2CO, H2S, F2, BeH2, HCl) using STO-6G, cc-pVDZ, 6-31G*, cc-pVTZ bases; (set 2) alkane chains CnH2n+2, n ≤ 8, using the STO-6G basis; (set 3) Fe–S clusters ([2Fe–2S], [4Fe–4S], and the nitrogenase PN cluster, in active spaces with N = 40, 72, 146 respectively. Further details the of calculations are given in the Methods section.

Fig. 1: Truncation results for H and τ.
figure 1

a Linear scaling of the number L of vectors with basis size N, in the low-rank approximation of H. b Sublinear scaling of the average eigenvalue number 〈ρ〉. c Error \(| {E}_{\,\text{c}}^{\prime}-{E}_{\text{c}}|\) in the ground-state correlation energy from the low-rank approximation of H, compared with chemical accuracy (horizontal black line). Data points in all of the main figures comprise small molecules with fixed size and increasingly large basis (set 1); insets show alkane chains with up to 8 C atoms (set 2) and iron–sulfur clusters of nitrogenase (set 3). Lines indicate N (left, middle) and the chemical accuracy (right). df Same as the upper panel, for the uCC operator τ, with 〈ρ〉 averaged over μ. The symbol ε indicates ε = εCD = εTH.

We first give some context to these sets for quantum simulation. The valence active space models considered in the Fe–S systems in set 3 are representative of a nearer-term quantum application where not all degrees of freedom are treated on a quantum computer. In this set, the underlying Gaussian basis dependence is largely removed by the reduction to an active space, as such calculations converge exponentially quickly with basis size34. All electron simulations of molecules of the kind in set 1 and 2 may be considered in the context of quantum resources available in the longer-term. While we have considered only a representative set of systems, additional intuition for the Hamiltonian ranks in these classes of molecules can be obtained from quantum Monte Carlo calculations, which work well for sets 1 and 2, and where a similar decomposition has been applied24.

For the uCC operator, in order to treat sufficiently large systems to observe the scaling trends, we have used the (classically computable) traditional CC amplitudes, equal to the uCC amplitudes in the weak-coupling limit (they agree through third order in perturbation theory). The uCC and traditional CC amplitudes are thus similar for all molecules in sets 1 and 2 near their equilibrium geometries, and molecules in set 3 in the highest spin electronic state.

For Hamiltonian evolution, we clearly see the LN scaling across different truncation thresholds, for both increasing system size and basis. For τ, LN with increasing basis in a fixed molecule, while LN2 with increasing size (e.g. in alkane chains). Interestingly, the value of L in the Hamiltonian decomposition is quite similar across different molecules for the same number of spin orbitals (qubits). In the subsequent ET for the Hamiltonian, 〈ρ〉 features sublinear scaling for set 2 (alkanes, n ≥ 5, represented here with 75–125 qubits), as well as for set 3 (Fe–S clusters). While we have shown in 1D systems that ρ ~ O(1) rigorously in the asymptotic size regime, these systems are still too small to see this saturation, although the practical reduction in ρ from full rank in sets 2 and 3 is significant. For the uCC operator, we observe that 〈ρ〉 scales like \({\mathcal{O}}(N)\) for alkane chains and increasing molecular size, while it is approximately constant for increasing basis set size. The less favorable scaling of L, 〈ρ,μ〉 with system size for the uCC operator, relative to H, stems from the antisymmetry properties of the amplitudes, which in the current factorization means that Y,μ do not show the same sparsity as \({{\mathcal{L}}}^{(\ell )}\).

First-order correction

The error arising from the truncations leading to \(H^{\prime}\) and \(\tau ^{\prime}\) can be understood in terms of two components: (i) the error in the operators, and (ii) the error in the states generated by time evolution with these operators. It is possible to substantially reduce both errors using quantities that can be computed classically. We illustrate this for the error in \(H^{\prime}\). First, the correlation energy, defined as Ec = E − EHF with E the total energy and EHF the Hartree–Fock energy, is usually a much smaller quantity than the total energy in chemical systems. It is expected to be subject to much smaller truncation errors, mainly due to cancellation of errors between exact and mean-field truncations. Thus, using the classically computed mean-field energy of \(H^{\prime}\), we can obtain the truncated correlation energy as \({E}_{\text{c}}^{\prime} =E^{\prime} -{E}_{\text{HF}}^{\prime}\). Second, we can estimate the remaining error in \({E}_{c}^{\prime}\) from first-order perturbation theory as \(\langle \psi | H-H^{\prime} | \psi \rangle\), where a classical approximation to ψ is used. If the classical ψ is accurate, the corrected \({E}_{c}^{\prime}\) is then accurate to \({\mathcal{O}}({\varepsilon }^{2})\). In Fig. 2 we plot \(| {E}_{\,\text{c}}^{\prime}-{E}_{\text{c}}|\) for H2O at the cc-pVDZ level. Adding the perturbative correction from the classical CC ground-state reduces the error by about an order of magnitude, such that even an aggressive truncation threshold of εCD = εTH ≡ ε = 10−2 a.u. yields the total correlation energy within the standard chemical accuracy of 1.6 × 10−3 a.u.

Fig. 2: Error \(| {E}_{\,\text{c}}^{\prime}-{E}_{\text{c}}|\) for H2O (cc-pVDZ) as a function of ε = εTH = εCD, with and without perturbative correction (blue, orange points respectively) for CD and CD+ET truncation schemes (crosses, diamonds), measured using HF and CC wavefunctions.
figure 2

The error increases with ε, and is visibly smaller with perturbative correction.

For the \({\tau }^{\prime}\) truncation, one could include a similar error correction for the correlation energy derived from approximate cluster amplitudes (although we do not do so here). Note that we have only considered taking a given amplitude \(\tau ^{\prime}\) and the error from implementing the corresponding uCC operator with truncation. However, there is the additional possibility of optimizing the amplitude \({\tau }^{\prime}\) within the truncated form in a variational uCC approach. In this case, the stationary condition can formally be obtained by differentiating through the ansatz. While we reserve a detailed error analysis in this setting for future work, so long as one is close to the variational minimum, it is clear that the resulting error in the variational energy remains quadratic with respect to small truncations of \({\tau }^{\prime}\).

Gate counts for quantum computers

The double-factorized decomposition Eq. (3) provides a simple circuit implementation of the Trotter step. For example, for the Hamiltonian Trotter step, we write

$${e}^{i{{\Delta }}tH}={e}^{i{{\Delta }}t(h+S)}{U}^{(1)}\mathop{\prod }\limits_{\ell =1}^{L}{e}^{i{{\Delta }}t{V}^{(\ell )}}{\tilde{U}}^{(\ell )}+{\mathcal{O}}{({{\Delta }}t)}^{2}\ \ ,$$

where \({\tilde{U}}^{(\ell )}={U}^{\dagger (\ell )}{U}^{(\ell +1)}\). Time evolution then corresponds to (single-particle) basis rotations with evolution under the single-particle operator h + S and pairwise operators V(). Note that because h + S is a one-body operator, it can be exactly implemented (with Trotter approximation) using a single-particle basis change U(0) followed by a layer of N phase gates (the latter being a simultaneous application of one-qubit gates to all qubits). The single-particle basis changes U() can be implemented using \(\left(\begin{array}{c} {N} \\ {2}\end{array} \right) - \left(\begin{array}{c}{N - \rho_\ell}\\{2}\end{array}\right)\) Givens rotations35 (detailed in the Supplementary Discussion). These rotations can be implemented efficiently using two-qubit gates on a linearly connected architecture20,21.

Taking into account SZ spin symmetry to implement basis rotations separately for spin-up and spin-down orbitals gives a count of \(2\left(\begin{array}{c}N/2\\ 2\end{array}\right) - 2\left(\begin{array}{c}(N-{\rho }_{\ell })/2\\ 2\end{array}\right)\) with a corresponding circuit depth (on a linear architecture) of (N + ρ)/2. Using a fermionic swap network, a Trotter step corresponding to evolution under the pairwise operator V() can be implemented in \(\left(\begin{array}{c}{\rho }_{\ell }\\ 2\end{array}\right)\) linear nearest-neighbor two-qubit gates, with a two-qubit gate depth of exactly ρ.

Summing these terms, counts thus are \(\left(\begin{array}{c}N\\ 2\end{array}\right)+{\sum }_{\ell \mu }\left[\right.\left(\begin{array}{c}N\\ 2\end{array}\right)-\left(\begin{array}{c}N-{\rho }_{\ell ,\mu }\\ 2\end{array}\right)\left]\right.+\left(\begin{array}{c}{\rho }_{\ell ,\mu }\\ 2\end{array}\right)\), where the μ subscript can be ignored when considering the Hamiltonian.

To realize this algorithm on a near-term device, where the critical cost model is the number of two-qubit gates, one can either implement the gates directly in hardware36, which requires \(\mathop{\sum }\nolimits_{\ell = 1}^{L}\left[\right.\frac{N{\rho }_{\ell }}{4}+\frac{{\rho }_{\ell }^{2}}{4}-{\rho }_{\ell }\left]\right.\) gates on a linear nearest-neighbor architecture, with circuit depth \(\mathop{\sum }\nolimits_{\ell = 1}^{L}\frac{N}{2}+\frac{3{\rho }_{\ell }}{2}\). If decomposing into a standard two-qubit gate set (e.g. CZ or CNOT), the gate count would be three times the above count.

To realize this algorithm within an error-corrected code such as the surface code37, where the critical cost model is the number of T gates, one can decompose each Givens rotation gate in two arbitrary single-qubit rotations and each diagonal pair interaction in one arbitrary single-qubit rotation. Thus, the number of single-qubit rotations is \(\mathop{\sum }\nolimits_{\ell = 1}^{L}\left[\right.\frac{N{\rho }_{\ell }}{2}-2{\rho }_{\ell }\left]\right.\). Using standard synthesis techniques for single-qubit rotations, the number of T gates depends on the desired precision as \(1.15{{\mathrm{log}}\,}_{2}(1/{\varepsilon }_{\text{RS}})+9.2\) times this count38, where εRS is the tolerance of rotation synthesis. Note that while defining εRS is needed to obtain the final T gate count (see e.g. ref. 39), if we assume a given synthesis threshold, the relative cost of two algorithms in the error-corrected cost model is obtained simply by comparing the number of single-qubit rotations.

In Fig. 3 we show the total gate counts needed to carry out a Trotter step of \(H^{\prime}\) and \(\tau ^{\prime}\) with different truncation thresholds. Using the scaling estimates obtained above for L, ρ in the gate count expression, we expect the Hamiltonian Trotter step to have a gate count \({N}_{\text{gates}} \sim {\mathcal{O}}({N}^{2})\) for increasing molecular size, and \({\mathcal{O}}({N}^{3})\) for fixed molecular size and increasing basis size, and the uCC Trotter step to show \({N}_{\text{gates}} \sim {\mathcal{O}}({N}^{4})\) for increasing molecular size and \({\mathcal{O}}({N}^{3})\) with increasing basis size. This scalings are confirmed by the gate counts in Fig. 3. As seen, the crossover between N3 and N2 behavior of the Hamiltonian Trotter gate cost, for alkanes (set 2), occur at larger N than one would expect from the 〈ρ〉 data alone from Fig. 1, due to tails in the distribution of ρ.

Fig. 3: Gate counts per Trotter step of the Hamiltonian and uCC operator, for εCD = εET = εSVD = 10−2, 10−3, and 10−4 (red, green, blue).
figure 3

Gate counts per Trotter step of the (a) Hamiltonian and (b) uCC operator. Black lines indicate power-law fits, with optimal exponents 3.06(3) and 3.2(1), respectively. Both gate counts scale as the third power of the basis size N across a wide range of truncation thresholds, an improvement over the O(N4) scaling assuming no truncation.

The threshold for classical-quantum crossover in recent quantum supremacy experiments (although dependent on the precise computational task) has been studied in detail at roughly 50 qubits, see for example ref. 40. For near-term devices, the number of layers of gates on a parallel architecture with restricted connectivity is often considered a good cost model. Using the circuit depth estimate ∑(N + ρ), we see that we can carry out a single Hamiltonian Trotter step on a system with 50 qubits with as few as 4000 layers of parallel gates on a linear architecture. Within cost models appropriate for error correction, the most relevant cost metric is the number of T gates25,41,42. For our algorithms, T gates enter through single-qubit rotations and thus, the number of non-Clifford single-qubit rotations is an important metric. Based on the gate count estimate for basis changes, the number of non-Clifford rotations required for our Trotter steps is roughly 100,000. For a fixed εRS = 10−6, the number of T gates obtained after rotation synthesis would then be approximately 30 times this number.


In summary, we have introduced a nested decomposition of the Hamiltonian and uCC operators, leading to substantially reduced gate complexity for the Trotter step both in realistic molecular simulations with under 100 qubits, and in the asymptotic regime. The discussed decomposition is by no means the only one possible and, for the uCC operator, it is non-optimal, as more efficient decompositions for antisymmetric quantities exist43. Future work to better understand the interplay between classical tensor decompositions and the components of quantum algorithms thus presents an exciting possibility for further improvements in practical quantum simulation algorithms.

Note: since the time this paper was first posted as a preprint, many other works have further applied the decomposition in this work, or very closely related decompositions. Some examples of such applications include the implementation of a cluster-Jastrow ansatz44; its use as a component in efficient Hamiltonian evolution in conjunction with other techniques and under different cost models45,46; and use of this form to reduce the cost of measurements47,48).


The Hartree–Fock calculations for the small molecules were obtained using chemistry package PySCF49, and calculations for the iron–sulfur clusters were obtained using density-matrix renormalization group (DMRG)50,51 implemented within PySCF as BLOCK.

Details of calculations

Here, we provide further details about the calculations yielding the data shown in the main text, focusing on each of the three studied sets (set 1, set 2, set 3).

Set 1—comprises 9 small molecules (namely CH4, H2O, CO2, NH3, H2CO, H2S, F2, BeH2, HCl), studied at experimental equilibrium geometries from ref. 52. Molecules in this set have been studied with restricted Hartree–Fock (RHF) and restricted classical coupled cluster with single and double excitations (RCCSD) on top of the RHF state. Matrix elements of the Hamiltonian and classical RCCSD amplitudes have been computed with the PySCF software49, using the STO-6G, 6-31G*, cc-pVDZ, cc-pVTZ bases.

Set 2—comprises alkane chains (namely ethane, propane, butane, pentane, hexane, heptane and octane, all described by the chemical formula CnH2n+2 with n = 2…8), studied at experimental equilibrium geometries from ref. 52. Molecules in this set have been studied with RHF, RCCSD methods. Matrix elements of the Hamiltonian and classical RCCSD amplitudes have been computed with the PySCF software49, using the STO-6G basis.

Set 3—comprises Fe–S clusters [2Fe–2S] [2Fe(II)] and [4Fe–4S] [2Fe(III),2Fe(II)], and the PN-cluster [8Fe–7S] [8Fe(II)]) of nitrogenase.

The active orbitals of [2Fe–2S] and [4Fe–4S] complexes were prepared by a split localization of the converged molecular orbitals at the level of BP86/TZP-DKH, while those of the [8Fe–7S] were prepared at the level of BP86/def2-SVP. The active space for each complex was composed of Fe 3d and S 3p of the core part and σ-bonds with ligands, which is the minimal chemically meaningful active space. The structure of the iron–sulfur core and the numbers of active orbitals and electrons for each complex are summarized in Fig. 4.

Fig. 4: Iron–sulfur clusters used in the present work, and their active spaces (specified by numbers of active electrons and orbitals).
figure 4

a [2Fe-2S] (30e,20o), b [4Fe-4S] (54e,36o), c [8Fe-7S] (114e,73o).

Molecules in this set were treated with density-matrix renormalization group (DMRG)50,51, using the PySCF software. The DMRG calculations were performed for the S = 0 states, which are the experimentally identified ground states, with bond dimensions 8000, 4000, and 2000 for [2Fe–2S], [4Fe–4S], and [8Fe–7S]. Note that the active space employed in the present work for the PN-cluster is larger than the active space previously used to treat the FeMoco cluster of nitrogenase, having the same number of transition metal atoms25.

Broken-symmetry unrestricted Hartree–Fock (UHF) (MS = 0) calculations were carried out for [2Fe–2S] and [4Fe–4S]. For [8Fe–7S], due to convergence issues, high-spin UHF calculations were used instead.