Quantum Simulation of Chemistry with Sublinear Scaling in Basis Size

We present a quantum algorithm for simulating quantum chemistry with gate complexity $\tilde{O}(N^{1/3} \eta^{8/3})$ where $\eta$ is the number of electrons and $N$ is the number of plane wave orbitals. In comparison, the most efficient prior algorithms for simulating electronic structure using plane waves (which are at least as efficient as algorithms using any other basis) have complexity $\tilde{O}(N^{8/3} /\eta^{2/3})$. We achieve our scaling in first quantization by performing simulation in the rotating frame of the kinetic operator using interaction picture techniques. Our algorithm is far more efficient than all prior approaches when $N \gg \eta$, as is needed to suppress discretization error when representing molecules in the plane wave basis, or when simulating without the Born-Oppenheimer approximation.


I. INTRODUCTION
The quantum simulation of quantum chemistry is one of the most anticipated applications of both near-term and fault-tolerant quantum computing. The idea to use quantum processors for simulating quantum systems dates back to Feynman [1] and was later formalized by Lloyd [2], who together with Abrams, also developed the first algorithms for simulating fermions [3]. The idea to use such simulations to prepare ground states in quantum chemistry was proposed by Aspuru-Guzik et al. [4].
That original work simulated the quantum chemistry Hamiltonian in a Gaussian orbital basis. While Gaussian orbitals are compact for molecules, they lead to complex Hamiltonians. Initial approaches had gate complexity O(N 10 ) [5,6], and the current lowest scaling algorithm in that representation has gate complexity of roughly O(N 4 ) 1 [7], where N is number of Gaussian orbitals.
Recently, [8] showed that using a plane wave basis restores structure to the Hamiltonian which enables more efficient algorithms. Currently, the two best algorithms simulating the plane wave Hamiltonian are one with O(N ) spatial complexity and O(N 3 ) gate complexity (with small constant factors) [9] and one with O(N log N ) spatial complexity and O(N 2 ) gate complexity (with large constant factors) [10], where N is the number of plane waves. The scaling of references [9] and [10] assumes constant resolution and volume proportional to N . However, a more appropriate assumption when studying molecules is to take volume proportional to the number of electrons η, in which case the method of [9] yields gate complexity O(N 10/3 /η 1/3 + N 8/3 /η 2/3 ) and [10] yields gate complexity O(N 8/3 /η 2/3 ). The reason for taking volume proportional to η is that for condensed-phase systems (e.g., periodic materials) the electron density is independent of computational cell volume and for single molecules the wavefunction dies off exponentially in space outside of a volume scaling linearly in η.
While basis set discretization error is suppressed asymptotically as O(1/N ) regardless of whether N is the number of plane waves [11,12] or Gaussians [13,14], there is a significant constant factor difference. Plane waves are the standard for treating periodic systems but one needs roughly a hundred times more plane waves than Gaussians [8] to reach the accuracy needed to predict chemical reaction rates. Since requiring a hundred times more qubits is impractical in most contexts, this limits the applicability of these recent algorithms [8-10, 15, 16] for molecules. This work solves the plane wave resolution problem by introducing an algorithm with O(η log N ) spatial complexity and O(N 1/3 η 8/3 ) gate complexity. With this sublinear scaling in N , one can perform simulations with a huge number of plane waves at relatively low cost. Our approach is based on simulating a first-quantized momentum space representation of the potential from the rotating frame of the kinetic operator by using recently introduced interaction picture simulation techniques [10]. While the actual implementations have little in common, our algorithm is conceptually dual to the interaction picture work of [10] which simulates a second-quantized plane wave dual representation of the kinetic operator from the rotating frame of the potential. It is also possible to achieve sublinear scaling in basis size without the interaction picture technique; we briefly discuss how qubitization [17] could be used to ob- The reason for our greatly increased efficiency is that the complexity scales like the maximum possible energy representable in the basis. In second quantization, the basis includes states with up to η = N electrons, which results in very high energy, even though these states are not used in the simulation. In contrast, first quantization sets the number of electrons at η. There is still some polynomial scaling with N , which is a result of the increased resolution meaning that electrons can be closer together. If we were to consider constant resolution (as in [8][9][10]), our complexity would actually be polylogarithmic in N , representing an exponential speedup in basis size.

A. Encoding Quantum Simulations of Electronic Structure in Momentum Space First Quantization
We will represent our system of η particles in N orbitals using first quantization. Thus, we require η registers (one for each particle) of size log N (indexing which orbitals are occupied). Since electrons are antisymmetric our registers will encode the wavefunction as where G is a set of N spin-orbitals, so the summation goes over all subsets of the orbitals that contain η unique elements. The second line of the above equation is being used to convey that, due to antisymmetry, swapping any two electron registers induces a phase of −1. We will specialize to plane wave orbitals in three dimensions and ignore the spin for simplicity, so Using plane waves, where r j is the position of electron j in real space, Ω is the computational cell volume, and k p = 2πp/Ω 1/3 is the wavenumber of plane wave p. Unlike in second quantization where antisymmetry is enforced in the operators so that product states of qubits encode Slater determinants [18][19][20][21][22][23], the antisymmetrization indicated in the second line of Eq. (1) must be enforced explicitly in the wavefunction since the computational basis states in Eq. (2) are not antisymmetric. However, any initial state can be antisymmetrized with gate complexity O(η log η log N ) using the techniques recently introduced in [24]. Evolution under the Hamiltonian will maintain antisymmetry provided that it exists in the initial state (a consequence of fermionic Hamiltonians commuting with the electron permutation operator).
The use of first quantization dates back to the earliest work in quantum simulation [2,3,[25][26][27]. Though less common for fermionic systems, several papers have analyzed chemistry simulations using first quantization of real space grids [28][29][30]. Such grids are incompatible with a Galerkin formulation (the usual discretization strategy used in chemistry involving integrals over the basis) and require methods such as finite-difference discretization, which lack the variational bounds on basis error guaranteed by the Galerkin formulation. Real space grids also have different convergence properties; for example, [30] finds that in order to maintain constant precision in the representation of certain states, the inverse grid spacing must sometimes scale exponentially in particle number.
Two previous papers [31,32] have presented simulation algorithms within a Gaussian orbital basis at spatial complexity O(η log N ). These approaches do not use first quantization (they still enforce symmetry in the operators rather than in the wavefunction); instead, [31,32] simulate a block of fixed particle number in the secondquantized Hamiltonian known as the configuration interaction matrix. The more efficient of these two approaches has O(η 2 N 3 ) gate complexity [32], so our O(η 8/3 N 1/3 ) gate complexity is a substantial improvement.
By integrating the plane wave basis functions with the Laplacian and Coulomb operators in the usual Galerkin formulation [33] we obtain H = T + U + V such that where T is the kinetic operator, U is the external potential operator, and V is the two-body Coulomb operator.
, R ℓ are nuclear coordinates, ζ ℓ are nuclear charges, L is the number of nuclei, and we use |q p| j as shorthand notation for While this Hamiltonian corresponds to a cubic cell with periodic boundaries, our approach can be easily extended to different lattice geometries (including non-orthogonal unit cells) and systems of reduced periodicity [34]. Note that we have chosen the frequencies in G 0 to span twice the range as those in G in order to cover the maximum momenta that may be exchanged.

B. Simulating Chemistry in the Interaction Picture
Our scheme for simulation builds on the interaction picture approach introduced in [10]. This approach is useful for performing simulation of a Hamiltonian H = A + B where norms of A and B differ significantly so that A ≫ B . The idea is to perform the simulation in the interaction picture in the rotating frame of A so that the large norm of A does not enter the simulation complexity in the usual way.
The principle in [10] is similar to Hamiltonian simulation via a Taylor series [35], except that the expression used to approximate the evolution for time t is The operation given by this expression can be implemented by using a linear combination of unitaries (LCU) approach [36]. The operator B is expressed as a linear combination of unitaries and the time is discretized, so Eq. (7) is a linear combination of unitaries which can be implemented using a control register and oblivious amplitude amplification [37]. For a short time, the cutoff K can be chosen logarithmic in the inverse error. To implement evolution for long times, the time is broken up into a number of time segments of length τ , and this expression is used on each of those segments.
The overall complexity depends on the value of λ, which is the sum of the weights of the unitaries when expressing B as a sum of unitaries. To simulate within error ǫ the number of segments used is O(λt), and K = O(log(λt/ǫ)/ log log(λt/ǫ)). The complexity in terms of LCU applications of B and evolutions e −iAτ is therefore There is also a multiplicative factor of log(t A /ǫλ) for the gate complexity, which originates from the complexity of preparing the ancilla states used for the time. This result is given in Lemma 6 of [10]. To interpret the result as given in [10], note that the 'HAM-T' oracle mentioned in that work includes the evolution under A. That is why the complexity quoted there for the number of applications of e −iAτ does not include a logarithmic factor. In quantum chemistry one often decomposes the Hamiltonian into three components H = T + U + V , and it is natural to group U and V together, because they usually commute with each other but not with T . The work of [10] focused on the simulation of chemistry in second quantization where However, for first-quantized momentum space we will observe the reverse trend that U + V ≪ T when N ≫ η.
We therefore choose that A = T and B = U + V , and need to express the potential as a linear combination of unitaries in momentum space, where w s are positive scalars and H s are unitary operators. The convention in this paper is that the w s are real and non-negative, with any phases included in the H s . A subtlety here is that when expressing U and V as a sum of unitaries we need to account for cases where addition or subtraction with ν would give values outside G. To account for this, we can express U and V as p,q∈G and it is apparent that the parts in between the large parentheses above are unitary, so we take them to be the operators H s in Eq. (9). In Eq. (10) we use the convention that Booleans correspond to 0 for false and 1 for true. This modification ensures that there is no contribution to the sum from parts where the additions or subtractions would result in values outside G. For example, for U , if (p − ν) is not in G, then the value of (p − ν) / ∈ G is interpreted as 1. This means that we have Thus, we see that λ is asymptotically equal to η 2 times where in the last line we use Ω ∝ η, which is typical for molecules [8]. From this, we find that λ = O(η 5/3 N 1/3 ). The scaling obtained in Eq. (13) corresponds to the maximum possible value of U + V . Changing the orbital basis does not change the maximum possible energy. If one considers the dual basis to the plane waves, then the orbitals are spatially localized with a minimum separation proportional to (Ω/N ) 1/3 [8]. The minimum separation between electrons proportional to (Ω/N ) 1/3 gives a potential energy scaling as (N/Ω) 1/3 , as given in Eq. (13). In contrast, the kinetic energy T has worse scaling with N because the maximum speed scales as (N/Ω) 1/3 (the inverse of the separation), which is squared to give a kinetic energy proportional to (N/Ω) 2/3 [8].

C. Performing Simulation in the Kinetic Frame
To implement our algorithm we need to realize e −iT τ as well as realize (U + V )/λ via a linear combination of unitaries. Using Eq. (3) we express e −iT τ as Therefore, in order to apply this operator, we just need to increment through each of the η electron registers to calculate the sum of k p 2 , then apply a phase rotation according to that result. The complexity of calculating the square η times is O(η log 2 N ) (assuming we are using an elementary multiplication algorithm). The complexity of the controlled rotations is O(log(ηN )), though there will be an additional logarithmic factor if we consider complexity in terms of T gates for circuit synthesis.
To apply the U + V operator we will need a select operation and a prepare operation. We use one qubit which selects between performing U and V . For V (the two-electron potential) the select LCU oracle will be select |0 |i |j |ν |p 1 1 · · · |p i i · · · |p j j · · · |p η η → |0 |i |j |ν |p 1 1 · · · |p i + ν i · · · |p j − ν j · · · |p η η . (15) This operation has complexity O(η log N ), because we can iterate through each of the η electron registers checking if the register number is equal to i or j, and if it is then adding ν (for i) or subtracting ν (for j). For U (the nuclear term), we need to apply select |1 |ℓ |j |ν |p 1 1 · · · |p j j · · · |p η η → − e −ikν ·R ℓ |1 |ℓ |j |ν |p 1 1 · · · |p j − ν j · · · |p η η . (16) We again need to iterate through the registers, and subtract ν if the register number is equal to j, which gives complexity O(η log N ). The register |i is replaced with |ℓ , and we need to apply a phase factor e −ikν ·R ℓ . This phase factor can be obtained by first computing the dot product k ν ·R ℓ , which has complexity O(log N log(1/δ R )), where δ R is the relative precision with which the positions of the nuclei are specified. For L nuclei (note that L ≤ η), we will have an additional complexity of O(L log(1/δ R )) in order to access a classical database for the positions of the nuclei R ℓ . Then, applying the controlled rotation has complexity O(log N + log(1/δ R )).
In order to take account of the modification involving x that is referred to in Eq. (12), we would have an additional control qubit for x which would be prepared in an equal superposition. When doing the additions and subtractions, one would check if they give values outside G and perform a Z operation on that ancilla if any of the results were outside G.
Let δ be the allowable error in the prepare and select operations. The number of times these operations need to be performed is O(λt), so to obtain total error no greater than ǫ we can take log(1/δ) = O (log (λt/ǫ)). Since λ is polynomial in η and N , we have log(1/δ) = O (log (ηN t/ǫ)). The error in the implementation of U/λ due to the error in the positions of the nuclei is O(δ R N 1/3 Z/η). Since the total nuclear charge should be the same as the number of electrons (since the total charge is zero), we can cancel Z and η. Then we also obtain log(1/δ R ) = O (log (ηN t/ǫ)).
The prepare operation must act as prepare |0 This state preparation can be performed by initially rotating the first qubit to give the correct weighting between the U and V terms. We prepare the register |j in an equal superposition. If the first qubit is zero (for the V component) we also prepare the penultimate register in an equal superposition over |i . We do not need to explicitly eliminate the case i = j, because in that case the operation performed is the identity and therefore has no effect on the evolution. Preparing an equal superposition over η values has complexity O(log η).
In the case that the first qubit is one (for the U component), we need to prepare the penultimate register in a superposition over |ℓ with weightings √ ζ ℓ . The nuclear charges ζ ℓ will be given by a classical database with complexity O(L). To accomplish this one can use the QROM and subsampling strategies discussed in [7,9,38]. Again recall that L ≤ η. In fact, usually L ≪ η and the equality is only saturated for systems consistingly entirely of hydrogen atoms. For a material, in practice there will be a limited number of nuclear charges with nuclei in a regular array, so this complexity will instead be logarithmic in L. Similarly, for the selected operation, a regular array of nuclei will mean that the complexity of applying the phase factor e −ikν ·R ℓ is logarithmic in L.
The key difficulty in implementing prepare is realizing the superposition over ν with weightings 1/ k ν . That is, we aim to prepare a state proportional to We describe this procedure in the next section. If there is a regular array of nuclei then the overall complexity obtained is O(log(ηN t/ǫ) log N ), where we use the fact that L < η. If a full classical database for the nuclei is required, then the complexity will have an additional factor of O(L log(ηN t/ǫ)).
Between implementing e −iT τ , select, and prepare, the dominant cost is O(η log 2 N ) for implementing e −iT τ . There is also a cost of O(log(ηN t/ǫ) log N ) for computing k ν · R ℓ , but in practice it should be smaller. The factor of log(t A /ǫλ) from [10] will also be smaller. These are the costs of a single segment, and the number of segments is given by Eq. (8) as O(λt), with λ = O(η 5/3 N 1/3 ). Thus, the total complexity is O(η 8/3 N 1/3 t).

D. Preparing the Momentum State
In this section we develop a surprisingly efficient algorithm for preparing the state in Eq. (18). Since this step would otherwise be the bottleneck of our algorithm, our implementation is crucial for the overall scaling. The general approach is to use a series of larger and larger nested cubes, each of which is larger than the previous by a factor of 2. The index µ controls which cube we consider. For each µ we prepare a set of ν values in that cube. We initially prepare a superposition state which ensures that we obtain the correct weighting for each cube. This state may be prepared with complexity O(n), which is low cost because n is logarithmic in N .
The overall preparation will be efficient since the value of 1/ k ν does not vary by a large amount within each cube, so the amplitude for success is large. The variation of 1/ k ν between cubes is accounted for by weighting in the initial superposition over µ.
To simplify the description of the state preparation, we assume that the representation of the integers for ν uses sign bits. The sign bits will need to be taken account of in the addition circuits. It also needs to be taken account of in the preparation, because there are two distinct combinations that correspond to zero. If each ν x , ν y , and ν z is represented by n bits, then each will give numbers from −(2 n−1 − 1) to 2 n−1 − 1. That is, we have N 1/3 = 2 n−1 −1. Controlled by µ we perform Hadamards on µ of the qubits representing ν x , ν y , ν z to represent the values from −(2 µ−1 − 1) to 2 µ−1 − 1.
As mentioned above, due to the representation of the integers the number zero is represented twice, with a plus sign and a minus sign. To ensure that all numbers have the same weighting at this stage, we will flag a minus zero as a failure. The total number of combinations before flagging the failure is 2 3µ so the squared amplitude is the inverse of this. Therefore, the state at this stage is Next, we test whether all of ν x , ν y , ν z are smaller than (in absolute value) 2 µ−2 . If they are, then the point is inside the box for the next lower value of µ, and we flag failure on an ancilla qubit. Note that for µ = 2 this means that we test whether ν = 0, which we need to omit. This requires testing if all of three bits for ν x , ν y , ν z are zero. The three bits that are tested depend on µ, so the complexity is O(n) (due to the need to check all 3n qubits). The state excluding the failures can then be given as where B µ (for box µ) is the set of ν such that the absolute values of ν x , ν y , ν z are less than 2 µ−1 , but it is not the case that they are all less than 2 µ−2 . That is, Next we prepare an ancilla register in an equal superposition of |m for m = 0 to M − 1, where M is a power of two and is chosen to be large enough to provide a sufficiently accurate approximation of the overall state preparation. The preparation of the superposition for m can be obtained entirely using Hadamards. We test the inequality The left-hand side can be as large as 1 in this region, because we can have just one of ν x , ν y , ν z as large as 2 µ−2 , and the other two equal to zero. That is, we are at the center of a face of the inner cube. In order to avoid divisions which are costly to implement, the inequality testing will be performed as The resulting state will be (omitting the parts where the inequality is not satisfied) where Q = ⌈M (2 µ−2 / ν ) 2 ⌉ is the number of values of m satisfying the inequality. The amplitude for each ν will then be proportional to the square root of the number of values of m, and so is The two sides are approximately equal for large M , and in that limit we obtain amplitudes proportional to 1/ ν , as required. We have omitted the parts of the state flagged as failure, and the norm squared of the success state gives the probability for success. In the limit of large M , the norm squared is In the limit of large n, this expression can be approximated by an integral, which gives The integral is the box integral B 3 (−2) in [39]. In [39], B 3 (−2) = 3C 2 (−2, 1), and C 2 (−2, 1) is given by Eq. (40) of that work with a = 1. That gives the asymptotic value where Ti 2 is the Lewin inverse tangent integral and G is the Catalan constant. After a single step of amplitude amplification, the probability of failure is Numerically we find P 2 = 11/48, and it increases with n towards the analytically predicted value. Using a single step of amplitude amplification brings the probability of success close to 1 (see Figure 1). Note that the "failure" here does not mean that the entire algorithm fails. In the case of "failure" of the state preparation one can simply perform the identity instead of the controlled unitaries in the select operation. The result is that a small known amount of the identity is added to the Hamiltonian, which can just be subtracted from any eigenvalue estimates. The amplitude amplification triples the complexity for the state preparation. Next we consider the error in the state preparation due to the finite value of M . The relevant quantity is the sum of the errors in the squared amplitudes, as that gives the error in the weightings of the operations applied to the target state. That error is upper bounded by (31) This error corresponds to the error in implementation of (U + V )/λ. As discussed above, the error in this implementation, δ, can satisfy log(1/δ) = O(log(ηN t/ǫ)).  ηN t/ǫ)). The complexities of the steps of this procedure are: 1. The register |µ can be represented in unary, so the state preparation takes a number of gates (rotations and controlled rotations) equal to n−1 = O(log N ), because the dimension is logarithmic in N .
2. The superposition over ν x , ν y , ν z can be produced with 3n = O(log N ) controlled Hadamards. These Hadamards can be controlled by qubits of the unary register used for |µ .
3. Testing whether the negative zero has been obtained can be performed with a multiply-controlled Toffoli with n controls, which has complexity O(n) = O(log N ).
4. Testing whether the value of ν is inside the inner box can be performed by using a series of n multiply-controlled Toffolis with 4 controls (with a unary qubit for |µ ), and one qubit each from the registers for each of the components of ν. The complexity is therefore O(n) = O(log N ).
6. The inequality test involves multiplications, and therefore has complexity given by the product of the number of digits (better-scaling algorithms would only perform better for an unrealistically large number of digits). The complexity is therefore The inequality test is the most costly step due to the multiplications, and gives the overall cost of the state preparation algorithm. Nevertheless, it still has logarithmic cost, so the complexity of the prepare operation will be negligible compared to the costs of the other steps. This concludes our procedure for realizing the state in Eq. (18), which completes our presentation of the overall simulation procedure.

III. CONCLUSION
The low scaling dependence of our methods on N allows us to easily overcome the constant factor difference in resolution between plane waves and Gaussians. In fact, using these algorithms we expect that one can achieve precisions limited only by relativistic effects and the Born-Oppenheimer approximation. However, the latter limitation can also be alleviated by our approach since one can use enough plane waves to reasonably span the energy scales required for momentum transfer between nuclei and electrons, and thus support simulations with explicit quantum treatment of the nuclei. We also expect that our approach could be viable for the first generation of fault-tolerant quantum computers.
Let us consider the calculation of the FeMoco cofactor of the Nitrogenase enzyme discussed in [40][41][42] which involved 54 electrons and 108 Gaussian spin-orbitals. FeMoco is the active site of biological Nitrogen fixation and its electronic structure has remained elusive to classical methods. The work of [40] found that roughly 10 15 T gates would be required, which translates to needing roughly 10 8 physical qubits if implemented in the surface code with gates at 10 −3 error rate. The large qubit count here arises from needing to parallelize magic state distillation (the system register would need only about 10 5 physical qubits). In comparison, the O(N 3 ) scaling algorithm of [9] has been shown to require less than 10 9 T gates to solve a molecule with 100 plane wave spinorbitals (which is not enough resolution for FeMoco).
Supposing we use 10 6 plane wave spin-orbitals for these 54 electrons our algorithm would require roughly 10 3 logical qubits (which can be encoded in roughly 10 6 physical qubits under the architecture assumptions discussed in [9], which are more conservative than those in [40]). Under these assumptions the value of η 8/3 N 1/3 is only about 4×10 6 , though there will be significant logarithmic and constant factors in the gate complexity. In comparison N 8/3 /η 2/3 for the approach of [10] would be about 7 × 10 14 . While further work would be needed to determine the precise gate counts, it seems reasonable that gate counts would be low enough to perform magic state distillation in series with a single T factory. This back-ofthe-envelope estimate suggests that our approach could surpass the accuracy of the FeMoco simulation discussed in [40] while using fewer physical qubits. Using such a large basis may also alleviate the need for active space perturbation techniques such as those discussed in [43].
Although we have used the interaction picture to achieve our O(η 8/3 N 1/3 t) complexity, it is also possible to achieve sublinear complexity in N without the interaction picture. This is because the value of λ associated with the kinetic operator T is O(η 1/3 N 2/3 ) [8]. Thus, one could use qubitization and signal processing [17,44] where T is simulated using LCU methods. Then, the overall complexity of our approach would be O(η 8/3 N 1/3 t + η 4/3 N 2/3 t), and the constant factors in the scaling might be smaller.