Low rank representations for quantum simulation of electronic structure

The quantum simulation of quantum chemistry is a promising application of quantum computers. However, for N molecular orbitals, the $\mathcal{O}(N^4)$ gate complexity of performing Hamiltonian and unitary Coupled Cluster Trotter steps makes simulation based on such primitives challenging. We substantially reduce the gate complexity of such primitives through a two-step low-rank factorization of the Hamiltonian and cluster operator, accompanied by truncation of small terms. Using truncations that incur errors below chemical accuracy, we are able to perform Trotter steps of the arbitrary basis electronic structure Hamiltonian with $\mathcal{O}(N^3)$ gate complexity in small simulations, which reduces to $\mathcal{O}(N^2 \log N)$ gate complexity in the asymptotic regime, while our unitary Coupled Cluster Trotter step has $\mathcal{O}(N^3)$ gate complexity as a function of increasing basis size for a given molecule. In the case of the Hamiltonian Trotter step, these circuits have $\mathcal{O}(N^2)$ depth on a linearly connected array, an improvement over the $\mathcal{O}(N^3)$ scaling assuming no truncation. As a practical example, we show that a chemically accurate Hamiltonian Trotter step for a 50 qubit molecular simulation can be carried out in the molecular orbital basis with as few as 4,000 layers of parallel nearest-neighbor two-qubit gates, consisting of fewer than 100,000 non-Clifford rotations. We also apply our algorithm to iron-sulfur clusters relevant for elucidating the mode of action of metalloenzymes.

The quantum simulation of quantum chemistry is a promising application of quantum computers. However, for N molecular orbitals, the O(N 4 ) gate complexity of performing Hamiltonian and unitary Coupled Cluster Trotter steps makes simulation based on such primitives challenging. We substantially reduce the gate complexity of such primitives through a two-step low-rank factorization of the Hamiltonian and cluster operator, accompanied by truncation of small terms. Using truncations that incur errors below chemical accuracy, we are able to perform Trotter steps of the arbitrary basis electronic structure Hamiltonian with O(N 3 ) gate complexity in small simulations, which reduces to O(N 2 log N ) gate complexity in the asymptotic regime, while our unitary Coupled Cluster Trotter step has O(N 3 ) gate complexity as a function of increasing basis size for a given molecule. In the case of the Hamiltonian Trotter step, these circuits have O(N 2 ) depth on a linearly connected array, an improvement over the O(N 3 ) scaling assuming no truncation. As a practical example, we show that a chemically accurate Hamiltonian Trotter step for a 50 qubit molecular simulation can be carried out in the molecular orbital basis with as few as 4,000 layers of parallel nearest-neighbor two-qubit gates, consisting of fewer than 10 5 non-Clifford rotations. We also apply our algorithm to iron-sulfur clusters relevant for elucidating the mode of action of metalloenzymes.
The electronic structure (ES) problem, namely, solving for the ground-or low-lying eigenstates of the Schrödinger equation for atoms, molecules and materials, is an important problem in theoretical chemistry and physics. There are several approaches to solving this problem on a quantum computer, including projecting approximate solutions to eigenstates using phase estimation [1-3], directly preparing eigenstates using the adiabatic algorithm [4][5][6], or using quantum variational algorithms [7][8][9] to optimize parameterized circuits corresponding to unitary Coupled Cluster (uCC) [10][11][12] or approximate adiabatic state preparation [13,14].
Time-evolution, under the Hamiltonian or the uCC cluster operator, is a common component in these algorithms. For near-term quantum devices (especially with limited connectivity), Trotter-Suzuki based methods for time-evolution are most compelling since they lack the complex controlled operations required by asymptotically more precise methods [15][16][17]. In order to perform a discrete simulation, the Hamiltonian or cluster operator is first represented in a single-particle basis of dimension N . However, in many bases, including the molecular orbital and active spaces bases common in ES, the Hamiltonian and cluster operator contain O(N 4 ) secondquantized terms. This leads to at least O(N 4 ) gate complexity for a single Trotter step [18,19], a formidable barrier to practical progress. While complexity can be reduced using alternative bases [20,21], such representations are not usually as compact as the molecular orbital one. Thus, reducing the cost of the Trotter step for general bases is an important goal, particularly within the context of near-term simulation paradigms.
In this Letter, we introduce a general method to reduce the number of gates to implement a Trotter step of the Hamiltonian or uCC cluster operator, that is especially beneficial for orbital bases, where these operators contain O(N 4 ) terms. Borrowing from classical simulations, we employ a low-rank decomposition to reduce the Hamiltonian and cluster operator to pairwise form [22][23][24][25][26][27][28][29][30]. We choose a nested matrix factorization [31] that has an efficient circuit implementation on a quantum computer via the swap-network strategy [21,32], leading to a Hamiltonian Trotter step with an asymptotic gate complexity scaling as O(N 2 log N ) with system size, and O(N 3 ) for fixed systems and increasing basis size. These scalings require only linear nearest-neighbor connectivity. We give numerical evidence that we can carry out a Hamiltonian Trotter step on a 50 qubit quantum chemical problem with as few as 4,000 layers of two-qubit gates on a linear nearest-neighbor architecture, a viable target for implementation on near-term quantum devices. Compiled to Clifford gates and single-qubit rotations, this requires fewer than 10 5 non-Clifford rotations, an improvement over past Trotter based methods in a fault-tolerant cost model [33].
We first define the Hamiltonian H and cluster operator τ . In second quantization H is where a † p and a p are fermionic creation and annihilation operators for spin orbital φ p , and the scalar coefficients h pq and h pqrs are the one-and two-electron integrals over the basis functions φ p (here assumed real).
The uCC cluster operator τ = T − T † , where T is the standard (non-unitary) coupled cluster (CC) operator. For uCCSD (uCC with single and double excitations applied to a single determinant reference), where ij, ab index the N o occupied and N v virtual spin orbitals respectively, and The integrals and cluster amplitudes that one encounters in molecular ES applications, however, are not arbitrary, but contain considerable structure. We now show that this allows us to construct approximate operators H ′ or τ ′ , accurate to within a desired tolerance ε, that can be implemented with greatly reduced gate counts. The physical basis for this result is the pairwise-nature of the Hamiltonian interactions, arising from the 1/r 12 Coulomb kernel in real-space. More precisely, we will rewrite the two-fermion parts of H and iτ associated with the integrals h pqrs and t ′ pqrs as a double-factorized form are number operators in a rotated basis. Approximate H ′ and τ ′ with reduced complexity can then be obtained by truncating the summations over L, ρ ℓ . The dependence of the error ε on L and ρ ℓ is discussed further below. The doubly-decomposed form of V can be obtained using a nested matrix factorization, a type of tensor factorization introduced in [31]. We illustrate this for the Hamiltonian operator. First, the creation and annihilation operators are reordered, and V ′ is recast into a supermatrix indexed by orbitals (ps), (qr) involving electrons 1,2 respectively. Due to the eight-fold symmetry h pqrs = h srqp = h pqsr = h qprs = h qpsr = h rsqp = h rspq = h srpq this matrix is real symmetric, thus we can write a matrix decomposition in terms of a rank-three auxiliary tensor L such that A simple way to obtain L is to diagonalize the V ′ supermatrix, although other techniques [23][24][25][26][27][28][29], such as Cholesky decomposition (CD), are also commonly used; we use the Cholesky decomposition in our numerical simulations below. The next step is to decompose each auxiliary matrix L (ℓ) . For the Hamiltonian, this is also real symmetric, thus we can similarly diagonalize it, where λ (ℓ) , U (ℓ) are the eigenvalues and eigenvectors of L (ℓ) . Combining the two eigenvalue decompositions yields the double-factorized result, Eq. (3). In the cluster operator, amplitudes t have four-fold mixed symmetry and antisymmetry, t abij = t jiba = −t baij = −t abji . However, as shown in the Supplemental Information, we can write iτ = ℓµ Y 2 ℓ,µ , where Y ℓ,µ are normal and can be diagonalized giving the same double-factorized form.
The double-factorized decomposition Eq.
(3) provides a simple circuit implementation of the Trotter step. For example, for the Hamiltonian Trotter step, we write . Time evolution then corresponds to (single-particle) basis rotations with evolution under the single-particle operator h + S and pairwise operators V ℓ . Note that because h + S is a onebody operator, it can be exactly implemented (with Trotter approximation) using a single-particle basis change U (0) followed by a layer of N phase gates. The singleparticle basis changes U (ℓ) can be implemented using Givens rotations [34]. These rotations can be implemented efficiently using two-qubit gates on a linearly connected architecture [21,32]. Taking into account S z spin symmetry to implement basis rotations separately for spin-up and spin-down orbitals gives a count of 2 N/2 2 -2 (N −ρ ℓ )/2 2 with a corresponding circuit depth (on a linear architecture) of (N + ρ ℓ )/2. Using a fermionic swap network, a Trotter step corresponding to evolution under the pairwise operator V (ℓ) can be implemented in ρ ℓ 2 linear nearest-neighbor two-qubit gates, with a two-qubit gate depth of exactly ρ ℓ . Summing these terms, counts thus are N + ℓ for H, τ respectively. To realize this algorithm on a near-term device, where the critical cost model is the number of two-qubit gates, one can either implement the gates directly in hardware [35], which requires L ℓ=1 N ρ ℓ 4 + ρ 2 ℓ 4 − ρ ℓ gates on a linear nearest neighbor architecture, with circuit depth L ℓ=1 N 2 + 3ρ ℓ 2 . If decomposing into a standard two-qubit gate set (e.g. CZ or CNOT), the gate count would be three times the above count.
To realize this algorithm within an error-corrected code such as the surface code [36], where the critical cost model is the number of T gates, one can decompose each Givens rotation gate in two arbitrary single-qubit rotations and each diagonal pair interaction in one arbitrary single-qubit rotation. Thus, the number of single-qubit rotations is L ℓ=1 N ρ ℓ 2 − 2ρ ℓ . Using standard synthesis techniques the number of T gates is then 2.3 log(1/ε) times this count [37].
For an exact decomposition of H, L = N 2 and ρ ℓ = N . However, it is well established from empirical ES calculations that both L and ρ ℓ can be significantly truncated if we approximate H and τ by H ′ and τ ′ with error ε. In the case of L, we rely on the CD and perform the truncation based on the L ∞ norm, i.e. use the smallest L such that max psqr |h ps,qr − L ℓ=1 L  [31]. For the uCC operator, using antisymmetric amplitudes as in this work yields a different scaling of the Cholesky decomposition, where L ∼ O(N ) with increasing basis but L ∼ O(N 2 ) with increasing molecular size (albeit with a small coefficient); the scaling properties of ρ ℓ,µ have not previously been studied.
In Fig. 1 we show L and ρ ℓ for different truncation thresholds in: (set 1) a variety of molecules that can be represented with a modest number of qubits (CH 4 , H 2 O, CO 2 , NH 3 , H 2 CO, H 2 S, F 2 , BeH 2 , HCl) using STO-6G, cc-pVDZ, 6-31G*, cc-pVTZ bases; (set 2) alkane chains C n H 2n+2 , n ≤ 8, using the STO-6G basis; (set 3) Fe-S clusters ([2Fe-2S], [4Fe-4S], and the nitrogenase P N cluster) in active spaces with N = 40, 72, 146 respectively. Details of calculations are given in the Supplemental Information. For the uCC operator, we have used the (classically computable) traditional CC amplitudes, equal to the uCC amplitudes in the weak-coupling limit. For H, we clearly see the L ∝ N scaling across different truncation thresholds, for both increasing system size and basis. For τ , L ∝ N with increasing basis in a fixed molecule, while L ∝ N 2 with increasing size (e.g. in alkane chains). Interestingly, the value of L in the Hamiltonian decomposition is quite similar across different molecules for the same number of spin-orbitals (qubits). In the subsequent ET for the Hamiltonian, ρ ℓ features O(log N ) scaling for alkanes (n ≥ 5, represented here with 75-125 qubits). For the uCC operator, we observe that ρ ℓ scales like O(N ) for alkane chains and increasing molecular size, while it is approximately constant for increasing basis set size. The less favourable scaling of L, ρ ℓ,µ with system size for the uCC operator, relative to H, stems from the antisymmetry properties of the amplitudes, which in the current factorization means that Y ℓ,µ do not show the same sparsity as L (ℓ) .
The error arising from the truncations leading to H ′ and τ ′ can be understood in terms of two components: (i) the error in the operators, and (ii) the error in the states generated by time-evolution with these operators. It is possible to substantially reduce both errors using quantities that can be computed classically. We illustrate this for the error in H ′ . First, the correlation energy, defined as E c = E − E HF , with E the total energy and E HF the Hartree-Fock energy, is usually a much smaller quantity than the total energy in chemical systems, and is affected by much smaller truncation errors, mainly due to cancellation or errors between exact and mean-field truncations. Thus, using the classically computed mean-field energy of H ′ , we can obtain the truncated correlation energy as E ′ c = E ′ − E ′ HF . Second, we can estimate the remaining error in E ′ c from first-order perturbation theory as ψ|H − H ′ |ψ using a classical approximation to ψ; if the classical ψ is accurate, the corrected E ′ c is then accurate to O(ε 2 ). In Fig. 2 we plot |E ′ c − E c | for H 2 O at the cc-pVDZ level. Adding the perturbative correction from the classical CC ground-state reduces the error by about an order of magnitude, such that even an aggressive truncation threshold of ε = 10 −2 a.u. yields the total correlation energy within the standard chemical accuracy of 1.6×10 −3 a.u. For the τ ′ truncation, one could include a similar error correction for the correlation energy derived from approximate cluster amplitudes, although we do not do so here.
In Fig. 3 we show the total gate counts needed to carry out a Trotter step of H ′ and τ ′ with different truncation thresholds. Using the scaling estimates obtained above for L, ρ ℓ in the gate count expression, we expect the Hamiltonian Trotter step to have a gate count N gates ∼ O(N 2 log N ) for increasing molecular size, and O(N 3 ) for fixed molecular size and increasing basis size, and the uCC Trotter step to show N gates ∼ O(N 4 ) for increasing molecular size and O(N 3 ) with increasing basis size. This scalings are confirmed by the gate counts in Fig. 3. As seen, the crossover between N 3 and N 2 log N behavior of the Hamiltonian Trotter gate cost, for alkanes, occur at larger N than one would expect from the ρ ℓ data alone from Fig. 1, due to tails in the distribution of ρ ℓ .
The threshold for classical-quantum crossover in general purpose computation is usually considered to occur at 50 qubits. For near-term devices, the number of layers of gates on a parallel architecture with restricted connectivity is often considered a good cost model. Using the circuit depth estimate ℓ (N + ρ ℓ ), we see that we can carry out a single Hamiltonian Trotter step on a system with 50 qubits with as few as 4,000 layers of parallel gates on a linear architecture. Within cost models appropriate for error-correction, the most relevant cost metric is the number of T gates [33,38,39]. For our algorithms, T gates enter through single-qubit rotations and thus, the number of non-Clifford single-qubit rotations is an important metric. Based on the gate count estimate for basis changes, the number of non-Clifford rotations required for our Trotter steps is roughly 100,000 and the number of T gates per rotation is approximately 20-50 times this number.
In summary, we have introduced a nested decomposition of the Hamiltonian and uCC operators, leading to substantially reduced gate complexity for the Trotter step both in realistic molecular simulations with under 100 qubits, and in the asymptotic regime. The discussed decomposition is by no means the only one possible and, for the uCC operator, it is non-optimal, as more efficient decompositions for antisymmetric quantities exist [40]. Future work to better understand the interplay between classical tensor decompositions and the components of quantum algorithms thus presents an exciting possibility for further improvements in practical quantum simulation algorithms.

Low-rank factorization of uCC amplitudes
The uCC operator has the expression τ = t − t † , where and the tensor t abij has the symmetry properties t abij = −t baij = −t abji , readily implying t abij = t baji . The first step to represent τ as a linear combination of squares of normal operators, similarly to the first decomposition of the electron repulsion integral, is to represent t as This equation can be written exactly as having introduced a highly sparse tensor T , where ps a † p a s and similarly for V ℓ . As stated and illustrated in the main text, the number of retained eigenvalues scales like O(N 2 ) for increasing system size. This less favorable scaling for the uCC operator, relative to H, stems from the antisymmetry properties of the amplitudes. This can be understood observing that, at second-order perturbation theory in the electron-electron interaction, uCC amplitudes are t abij ∝ ji||ab ǫa−ǫj +ǫ b −ǫi , where ǫ p are the Hartree-Fock eigenvalues and ji||ab = h abij − h abji the antisymmetrized electron repulsion integral. Antisymmetrization of the electron repulsion integral prevents from casting t onto a supermatrix where indexed by orbitals pertaining to electrons 1 and 2 separately.
Note that, since U Equation (14) expresses τ as a sum of products, of the form X 2 − X † 2 , where X is not a normal operator. The identity of easy verification, leads to squares of normal operators. Applied to (14), (15) leads immediately to where, for each ℓ, the four Y ℓ,µ operators are normal, and proportional to ( Eq (16) expresses iτ as i times a linear combination of squares of normal operators, the form used in the main text. Eigenvalues such that |σ ℓ | < ε can be discarded in the spirit of the CD for the Hamiltonian. For the uCC operator, the subsequent ET procedure is carried out as follows. In the basis of Hartree-Fock orbitals, the operators Y ℓ,µ are described by the matrices where T ℓ,µ is proportional to U ℓ + V ℓ or U ℓ + V ℓ Introducing the SVD of T ℓ,µ = A diag(ρ ℓ,µ )B † , it easy to verify that the only non-zero eigenvalues of Y ℓ,µ have the form where a i , b i are the i-th columns of A, B respectively. The ET is performed truncating the singular values ρ i ℓ,µ of T ℓ,µ and retaining the largest ρ ℓ,µ of them.

Partial Basis Rotation
Suppose we have a single particle basis rotation given by U (ℓ) , such that The Thouless theorem [42] provides a means for implementing basis rotations on quantum computers. In essence, one applies a series of rotations that act on two rows (p, q) of U (ℓ) at a time. The series of rotations is determined by performing a QR decomposition of U (ℓ) using Givens rotations r pq (θ pq ). One only needs to perform rotations on the N 2 elements below the diagonal of U (ℓ) and, when done in the correct order, these can be performed in linear depth on a device with linear connectivity [21].
In this Appendix, we show that one can perform an approximation basis transformation using on the order of ρ ℓ N rotations, where ρ ℓ ≤ N and is the number of eigenvalues retained after making a low-rank approximation.
Consider the eigenvalue equation used in order to ob- where L (ℓ) is the ℓ-th Cholesky vector, reshaped into a square matrix of order N , and λ (ℓ) is the diagonal matrix of the corresponding eigenvalues. We choose U (ℓ) such that the numbers λ (ℓ) are in decreasing order of magnitude.
In the main text, we perform a low-rank approximation of L (ℓ) by considering only the ρ ℓ largest eigenvalues or, equivalently, the eigenvectors associated with the first ρ ℓ columns of U (ℓ) . This effectively reduces the sizes of the matrices involved, and the eigenvalue equation becomes whereŪ andλ are matrices comprising the first ρ ℓ columns of U and λ respectively. Now, we only need to perform a QR decomposition ofŪ (ℓ) , after which the top ρ ℓ × ρ ℓ block is diagonal and the lower (N − ρ ℓ ) rows are zero. SinceŪ (ℓ) is only an N × ρ ℓ matrix, fewer Givens rotations are needed than in the general case. Specifically, the QR decomposition at worst requires N ρ ℓ − ρ ℓ (ρ ℓ + 1)/2 rotations, as that is the number of terms below the diagonal, coinciding with the estimate N 2 − N −ρ ℓ 2 given in the main text. In the quantum algorithm proposed in the main text, one rotates from the basis of one Cholesky vector to another. This rotation, explicitly given by U (ℓ+1) U (ℓ) , can be reduced into a single unitary operation, and the cost of approximately implementing the basis rotation as described here is determined by ρ ℓ+1 .

Details of calculations
In this Appendix, we provide further details about the calculations yielding the data showed in the main text, focussing on each of the three studied sets (set 1, set 2, set 3).
Set 2 -comprises alkane chains (namely ethane, propane, butane, pentane, hexane, heptane and octane, all described by the chemical formula C n H 2n+2 with n = 2 . . . 8), studied at experimental equilibrium geometries from [43]. Molecules in this "set 2" have been studied with RHF, RCCSD methods. Matrix elements of the Hamiltonian and classical RCCSD amplitudes have been computed with the PySCF software [44], using the STO-6G basis.  The active orbitals of [2Fe-2S] and [4Fe-4S] complexes were prepared by a split localization of the converged molecular orbitals at the level of BP86/TZP-DKH, while those of the [8Fe-7S] were prepared at the level of BP86/def2-SVP. The active space for each complex was composed of Fe 3d and S 3p of the core part and σ-bonds with ligands, which is the minimal chemically meaningful active space. The structure of the iron-sulfur core and the numbers of active orbitals and electrons for each complex are summarized in Figure 4, and the matrix elements h pq , h pqrs are made available in a compressed archive form (FeS integrals.tar).
Molecules in this "set 3" were treated with densitymatrix renormalization group (DMRG) [45,46], using the PySCF software. The DMRG calculations were performed for the S = 0 states, which are the experimentally identified ground states, with bond dimensions 8000, 4000, and 2000 for [2Fe-2S], [4Fe-4S], and [8Fe-7S]. Note that the active space employed in the present work for the P-cluster is larger than the active space previously used to treat the FeMoco cluster of nitrogenase, having the same number of transition metal atoms [33].