Hardware-efficient fermionic simulation with a cavity–QED system

In digital quantum simulation of fermionic models with qubits, non-local maps for encoding are often encountered. Such maps require linear or logarithmic overhead in circuit depth which could render the simulation useless, for a given decoherence time. Here we show how one can use a cavity–QED system to perform digital quantum simulation of fermionic models. In particular, we show that highly nonlocal Jordan–Wigner or Bravyi–Kitaev transformations can be efficiently implemented through a hardware approach. The key idea is using ancilla cavity modes, which are dispersively coupled to a qubit string, to collectively manipulate and measure qubit states. Our scheme reduces the circuit depth in each Trotter step of the Jordan–Wigner encoding by a factor of N2, comparing to the scheme for a device with only local connectivity, where N is the number of orbitals for a generic two-body Hamiltonian. Additional analysis for the Fermi–Hubbard model on an N × N square lattice results in a similar reduction. We also discuss a detailed implementation of our scheme with superconducting qubits and cavities. Coupling ancilla modes to a string of qubits in a cavity QED system allows the efficient quantum simulation of fermionic problems. This constitutes a challenge for most existing boson-based platforms for quantum simulation, as simulating a fermionic system require non-local transformations that impose a computational overhead. A team lead by Mohammad Hafezi at the University of Maryland, Los Alamos National Laboratory and Dartmouth College has shown that the collective manipulation and read out of an ensemble of superconducting qubits granted by introducing a dispersive coupling to microwave cavity photons grants access to the generation of non-local operations directly. This digital circuit QED architecture reduces the number of operations necessary to simulate fermionic systems by an amount that scales quadratically in the size of the system, making it more hardware-efficient than existing alternatives.

In the literature, there are a number of methods for doing so and we will focus on the methods that require implementing a non-local map, e.g., Jordan-Wigner (JW) or Bravyi-Kitaev (BK) mappings [16].Our approach relies on the use of a cavity-QED system to achieve the non-local coupling directly.This is in contrast to other ideas for improving the non-locality of the fermion-spin mapping such as direct simplification of the quantum circuit [17] or using gate teleportation [18] to lower the cost of the Jordan-Wigner and Bravyi-Kitaev schemes.Another alternative to the approach taken here is to introduce additional qubits to achieve improved locality of the spin-representations of fermonic operators [5,19,20].Lastly, we mention a recently introduced technique for quantum simulation using plane waves rather than typical electronic structure basis sets composed of quasilocal Gaussian orbitals [21].The approach taken there has been show to achieve linear circuit depth for a certain class of electronic systems.We do not pursue subspace encodings and consider arbitary electronic systems with a focus on approaches that directly implement the non-local maps rather than circumventing them.
Here, we present a hardware-efficient scheme to perform * Electronic address: gzhu123@umd.edudigital fermionic simulations on a physical system made of spins.Our approach makes use of cavity-QED physics [22][23][24][25], where one or several ancilla cavity modes are used to encode, simulate the Hamiltonian and measure the desired observables.The selective non-local coupling of ancillae to a qubit string allows for implementation of JW and BK mappings in one shot and reduces the simulation time.More specifically, in exponentiating each term of the Hamiltonian, our scheme reduces the circuit depth of both JW and BK to O(1) operations.This improvement reduces the simulation time, and therefore, mitigates the decoherence effects.
We then present an experimental implementation of our scheme in a circuit-QED platform [26][27][28][29][30][31][32][33][34][35][36][37][38], where experimental progress on fermionic and quantum chemistry simulation has been recently achieved [4,7].In particular, we use dispersive coupling of microwave cavity photons to superconducting qubits [30,38] to generate non-local string operations non-perturbatively.This digital approach offers better scaling in the collective gate time than a previous analog scheme where multi-spin interactions are generated perturbatively [39], resulting in an exponential decrease with the number of Pauli operators to be implemented.Moreover, experimental advances have been achieved in probing inhomogeneity in resonate frequencies in the context of both superconducting qubit-array and resonator-lattice [40,41], and hence pave the way for the realization of collective many-body gates.Therefore, our scheme is preferable for implementing large strings, and it also remedies the disadvantage of circuit-QED architecture, i.e. low connectivity, compared to ion trap architectures [42].
Furthermore, we compare our scheme to conventional local schemes for various fermionic models, such as Fermi-Hubbard model and generic Coulomb Hamiltonian.In these comparisons, we introduce a parallelization scheme which further improves the simulation.Specifically, by parametrically coupling multiple cavity modes, we further decrease the circuit depth for each Trotter step by an additional factor of We consider a generic electronic model with hopping and 2-body Coulomb interaction.The form of the Hamiltonian is given by Here, κ i j is the hopping matrix and V i jkl represents the interaction matrix.The indices i, j, k and l can label orbitals either in real-space or the reciprocal-space and can also absorb spin indices.
In order to simulate fermions with qubits, the simplest scheme is the Jordan-Wigner transformation: The index j can be used to label sites in any dimension.For example, the string in 2D can be chosen as a 'self-avoiding snake' as illustrated by the red string in Fig. 1.In addition to the JW transformation, the Bravyi-Kitaev transformation [16] also requires strings of Pauli operators although the form is more complicated (see Appendix VI).The length of Pauli strings are on average logarithmically shorter than JW using the Bravyi-Kitaev transformation.In order to implement the time evolution with such string operators, we will consider using the cavity-assisted conditional string operation in the following sections.

Cavity-QED interaction and controlled-string operation
We consider the quantum non-demolition (QND) interaction [43] of a cavity-QED system in the dispersive regime: where χ is the dispersive interaction strength.We prepare the cavity photon state in the restricted subspace n a = 0, 1.For circuit-QED implementation, the cavity nonlinearity introduced by the qubits are large enough, such that the cavity itself can be operated as a qubit.To collectively FIG.1: Conditional string operation realized in a cavity-QED system.The Jordan-Wigner string (red) in the 2D qubit lattice can be chosen as a snake shape.manipulate a qubit string, we simply apply the dispersive interaction for a period of τ.The time evolution operator is expressed as Here, we used the property that photon and spin operators commute, and the Pauli-matrix property (σ z j ) 2 = 1.If we choose the operation time to be τ = π/(2χ), we end up with The additional phase factor (−i) N depends on the length of the string and can be cancelled by applying an additional phase gate on the ancilla cavity, and we call the resulting evolution operator C Z , i.e., a conditional-Z string operator, controlled by the cavity photon state: (1) If n a = 0, no operation is performed; (2) If n a = 1, a string operator Z = j σ z j is applied.Such a cavity-controlled string operation has also been proposed to manipulate and engineer the topological ground state of the toric-code model [23,44,45].

Exponentiation of the string operators, time evolution and phase estimation
In order to perform digital quantum simulation of a Fermionic Hamiltonian H, one needs to perform Trotter evolution with small time steps [2], i.e., e −iH∆t .After breaking the Hamiltonian down to sub-terms H = q h q , one exponentiates each of these sub-terms as e −ih q ∆t .The sub-term h q is composed of a qubit string operator.For example, a (2)  hopping terms in parallel with the coupling to 4 cavity ancillae.In order to switch the "head" and "tail" of each string to Pauli-X operator, we split the strings into X and Z parts.The C X can be implemented with C Z sandwiched by parallel Hadamards on the qubits.All the gates in the blue-dashed box are implemented in parallel by multi-mode QND interaction [Eq.( 13)].
hopping term in Eq. ( 1) is represented by qubit operators under JW encoding as h i j =κ i j (σ + i σ − j + H.c.) k∈string σ z k .This can be split into two pieces h (1)  i j = 1 2 κ i j σ x i σ x j k∈string σ z k and h (2)  i j = 1 2 κ i j σ y i σ y j k∈string σ z k , and will be exponentiated separately.The conventional approach realizes the exponentiation of these string terms by a CNOT ladder (a sequence of nearestneighbor CNOTs) illustrated in Fig. 2(a) (upper panel, see Appendix I for details).Here, we present a hardware-efficient quantum circuit which uses the cavity-controlled string operation [Eq.( 5)] as shown in Fig. 2(a) (lower panel).The essence is to collect the global parity information into the cavity ancilla with a single C Z gate and another C Z gate to erase the parity information after the rotation of the ancilla along x-axis by an angle 2∆t.Note that this circuit reduces the number of gates and circuit depth by a factor of N (N being the length of the string) due to its non-local and highly-parallel feature, and hence greatly reduces the operation time.
To derive the properties of the circuit, we start with the conditional string operation C Z , and the rotation of the ancilla (6) where X a is the Pauli-X operator of the ancilla photon state.The three successive gates C Z R x (2∆t)C Z can be expressed as where we have used the property Z 2 = 1 q .The final expression represents a conditional evolution with the non-local many-body Hamiltonian H string = Z = j∈string σ z j , controlled by the ancilla photon state | ± a .
In general, arbitrary many-body interactions along the string can be exponentiated, by choosing the proper singlequbit rotations in the beginning and end of the circuit [see Fig. 2(a)].In Fig. 2(b,c), we show explicitly the circuits to implement the exponentiation of the hopping sub-term h (1)  i j = 1 2 κ i j σ x i σ x j k∈string σ z k and the interaction sub-term h (1)  i jkl = 1 4 V i jkl σ x i σ x j σ x k σ x l m∈string σ z m coming from the Coulomb interaction term in Eq. ( 1), both under JW encoding.Here, we have used Hadamard gates to turn certain σ z operators into σ x with the identity H j σ z j H j = σ x j .On the other hand, a typical term in the Bravyi-Kitaev encoding may involve all types of Pauli operators, e.g., σ y 1 σ x 2 σ y 3 σ z 5 .This qubit string can be exponentiated with the circuit in Fig. 2(d), where the combined Hadamards and phase gates (S and S † ) realized with a single pulse turn the σ z operators into σ y .
If one starts the ancilla in the | + a (| − a ) state, one only gets forward (backward) evolution after n Trotter steps, e −in∆tH (e in∆tH ), as suggested by Eq. (7).However, if one starts with the ancilla in state where t = n∆t.This property can be applied to quantum phase estimation [46,47] for extracting energy spectrum and state preparation (see Appendix VIII for details).Note, after the state preparation, one can extract fermionic correlation function such as k |ψ with conditional string operations.For example, the circuit shown in Fig. 2(e) implements the xx-part of the correlator, i.e. ψ|σ x i σ x j k σ z k |ψ , where setting φ = 0 (φ = π/2) in the phase gate gives the real (imaginary) part.The measurement of dynamical correlator is discussed in Appendix VII.

Parallelizations with multiple ancillary cavity-modes
Another advantage of the cavity-QED approach is that one can further parallelize the exponentiation of all the mutually commuting sub-terms h i j using multiple cavity ancillae.This can be realized with multiple cavities or different modes in the same cavity as discussed further in the next section.Parallelization is trivial if the string operators to be exponentiated do not overlap with each other.It is also possible to exponentiate multiple overlapping strings in parallel, namely ν e iκ∆tS ν , where ν labels different strings.A concrete example is exponentiating hopping terms between two neighboring rows in parallel which appears in the Hubbard model [illustrated in Fig. 2(f)].The detailed derivation can be found in METH-ODS.

B. Implementation with circuit-QED architecture
In this section, we focus on the experimental implementation of the QND interactions of Eq. (3).We also discuss implementation of parallelization with multiple ancilla modes in the same cavity either by higher level contribution or alternatively by periodical modulation of the flux couplers.

Realization with circuit QED
We consider a collection of multi-level superconducting qudits inductively coupled to a single or multiple transmissionline cavities or 3D cavities as shown in Fig. 3(a).The simplest case with one cavity mode can be described by a generalized Tavis-Cummings model [48]: Here, a is the annihlation operator for the cavity mode with frequency ω, | l j represents the l th level of the j th qudit with corresponding energy l , and g ll =g l|φ|l ≡ gφ ll is proportional to the inductive coupling strength g and the phase matrix element (φ being the superconducting phase operator).
The strength g can be made uniform even in the presence of non-uniform mode function with the flux-tunable inductive coupler [49], as shown in Fig. 3(a).
In the dispersive regime, namely (N represents the total number of coupled qudits and ∆ ll the detuning), one can adiabatically eliminate the direct inductive coupling V between qudits and the cavity.The effective Hamiltonian after a Schrieffer-Wolff transformation [48,50,51] up to second-order is given by Apart from H 0 , the terms appearing in second-order perturbation have three types: (1) The energy shift of level l is given by: , summed over the contributions χ ll from virtual transitions to all other levels l , where the first term is AC Stark and the second term is Bloch-Siegert shift, in the absence of rotating-wave-approximation; (2) the Lamb shift κ l = l l g 2 ll ∆ ll which only renormalizes the qudit energy level: l → l + κ l ; (3) the flip-flop interactions between any two qudits mediated by virtual photons with we need to cancel out to avoid the induced cross-talk errors in our many-body gates.One can choose specific superconducting circuits, such as fluxonium [38,48,52,53] focused here (alternatively flux qubit [54] or protected 0-π qubit [55,56]).
In particular, we consider the situation that phase matrix elements obtain selection-rule property [38,53,57] at large ratio of Josephson and charging energy E J /E C (e.g.E J =20 GHz, with fixed E C =0.5 GHz from now on): φ 01 =φ 12 =φ 03 =0 as shown in Fig. 3(c).In the case of fluxonium, this is due to the feature that the ground and excited states are persistentcurrent states with different winding numbers m, which can be seen from their wavefunctions being trapped in different wells of the Josephson potential −E J cos φ and have negligible overlap [Fig.3(b)].Therefore, the contribution from χ 01 (as well as any other inter-well virtual transition) is nearly zero (< 10 −5 at E J =20 GHz).A QND interaction H QND = j χa † aσ z j arises in second-order perturbation with strength χ = l (χ 0l − χ 1l )/2, while the nonzero contributions are from intra-well virtual transitions to higher levels, such as χ 02 and χ 13 , which has recently been experimentally observed (see Ref. [38]).On the other hand, the single-excitation flip-flop term j | 0 1 | j disappears (µ 01 =0) due to the forbidden interwell transitions (g 01 =g 12 =g 03 =0, etc.), and the lowest-level contribution is from j | 0 2 | j .During the simulation process, we only occupy levels 0 and 1 which act as the qubit degree of freedom, therefore the flip-flop process does not play any role and hence will not introduce the unwanted cross-talk error in the many-body CZ gate.When we need to implement singlequbit Hadamard (H) and phase (S) gates to get Pauli-X and Y [Fig.2(a)], we can go to the small-E J /E C regime (e.g.E J =4 GHz) by quasi-adiabatically tuning the flux into the junction loop.In this regime, 0-1 transition can be implemented indirectly via a Raman process (0→2→1) utilizing the low-lying Λ-structure [57], as shown in Fig. 3(b, c).A direct transition is also possible since the 0-1 matrix element is sizable and can be accessed by the classical drive.Alternatively one can stay constantly at an intermediate parameter regime (such as E J =10 GHz) so that selection rules hold while the suppressed but still non-vanishing 0-1 transition is enabled by enhancing the power of the classical drive.
Note that due to the condition of dispersive regime [Eq.( 9)], the QND interaction strength χ has to decrease when the number of coupled qubits N increases due to resonance enhancement.According to the constraint g/∆ 1/ √ N (∆ ≡ Min|∆ i j |), one can fix g and increase the detuning magnitude |∆| and get the asymptotic scaling χ = g • (g/∆) g 2 /

√
N. This scaling is exponentially better than a previous scheme where multi-spin interactions are generated perturbatively [39] with exponential decreasing interaction strength with the length of the string, i.e., O(g N /|∆| N−1 ).
For small N [i.e.O(10)], it is possible to remedy the insignificant decay of maximum interaction strength due to resonance enhancement by varying the parameters (external flux or E J ) of individual fluxoniums such that frequency of different qudits ( l, j ) are detuned.The QND interaction strength χ will not decrease significantly because it contains contributions from multiple levels χ 0l and χ 1l .One can then avoid the asymptotic 1/

√
N scaling by modular construction of multiple cavities with N ∼ O( 10) qubits together connected with quantum teleportation as discussed in Appendix IX.Alternatively, instead of obtaining the QND interaction perturbatively as the above scheme, it is in principle possible to directly engineer the QND (cross-Kerr) interaction such as utilizing nonlinear coupling with Josephson junctions [30].
Although we focus on fluxonium qubits here, one can generate QND interaction in more general cases for other qubits such as transmons.In those cases, one can detune the qubit frequency to avoid unwanted flip-flop interactions [for N ∼ O(10)], or using a balance cavity mode as discussed further in Appendix III.

Coupling to multiple ancillary modes with parametric coupler
In order to gain further parallelizability and shorten the time complexity, one can couple the qubits to multiple ancillary cavity modes as mentioned in the previous section, which certainly poses additional experimental challenges.One first needs to selectively address the qubits on different strings with a certain cavity mode which is usually distributed extensively and touches all the qubits.Second, one needs to couple the qubits dispersively to cavity modes with different frequencies.These two challenges can be solved by one trick, i.e., parametrically modulating the coupling of the qubits to the transmission-line cavity.One option is to periodically modulate the flux in the inductive coupler shown above in Fig. 3(b) (see e.g.Refs.[58,59]) with multiple tones, i.e. g j Φ(t) = ν gν, j cos( f ν t), where j labels the qubit and f ν represents the modulating frequencies, with f 0 = 0 (static coupling).The scheme is illustrated in Fig. 3(d).
The multi-tone modulation technique is mature in microwave-engineering and turns out to be a valuable computational resource.The weight g ν, j and driving tones f ν are controllable.We choose f ν such that the qubit frequency is up-converted to a frequency close to but still off-resonant with the sideband ancillary tones ( f ν ).In this case, they are dispersively coupled by the QND interaction H QND = ν j χν, j a † ν a ν σ z j with strength χν, j = ( χν, j 02 − χν, j 13 )/2, where χν, j ll = g2 ν, j /( l − l − ω ν + f ν ).Note that f ν can decrease the detuning to make the interaction sizable.We choose gν, j such that each qubit is only coupled to the tones of the selected strings, as illustrated in Fig. 3(d) with multiple colors.As we see, the inductive couplings of qubits 4 and 5 are constant such that the qubits are only dispersively coupled to the fundamental mode a 0 , while the couplings of qubits 1 and 8 are modulated by three tones and hence connect the qubits to four cavity modes etc..It is clear that the number of cav- ity modes one can up-convert (or down-convert) to is limited since the up-converted detuning has to be made different to avoid cross-talking between different ancillae modes, but one should be able to couple 10-20 modes.To couple more ancillae, the solution is again teleportation-based modular architecture discussed in Appendix IX.As we will discuss in the following section, for a Fermi-Hubbard model on a N × N square lattice in real space, the number of modes one needs to couple to is N. Therefore, for a 100-qubit system which can be realized in the near future for a short-circuit algorithm still requiring no quantum error correction, it is possible to realize our parallelization scheme.

C. Time complexity
In the previous sections, we focused on how to exponentiate a single term h p in the system Hamiltonian H= p h p .In the following, we compare the time complexity (circuit depth) of our cavity-QED approach with the conventional approach of a single Trotter step e −iH∆t .

Fermi-Hubbard model
As the first example, we consider the spinful 2D Fermi-Hubbard model in real-space and on an N × N square lattice.We use qubits on two sub-lattices to encode fermions with different spin s =↓ (purple) or s =↑ (yellow) as shown in Fig. 4. The spinful Fermi-Hubbard model is a restricted form of Eq. ( 1) given by where j → (n x , n y ) is a two-component label for the 2D sublattice.The first and second terms represent hoppings and on-site Hubbard interaction respectively.The types of terms and their corresponding time complexity is listed below (for more details see Appendix V).
(1 and 2) On-site Hubbard interaction and Horizontal hopping: translates to ZZ interaction and 2-local flip-flop interaction without string in the qubit representation, both of which have O(1) circuit-depth.(3) Vertical hopping (even and odd): typically contains a "snake-shape" JW string (Fig. 4) and hence dominates the time complexity.
With one transmission-line cavity coupled to each pair of rows, one can parallelize the vertical hopping terms (see Appendix V for details).For the vertical hopping between the same pair of rows, one can exponentiate these terms in series, resulting in the Trotter step circuit depth (time complexity) O(N).With the multi-mode scheme shown in Fig. 2(f) and Fig. 3(d), one can exponentiate these terms and reduce the depth to O(1).In contrast, the conventional approach needs O(N 2 ) due to the linear overhead of implementing the CNOT ladder in Table I.

The generic Coulomb Hamiltonian
For the generic Coulomb Hamiltonian described in Eq. ( 1), which is the relevant model for quantum chemistry or strongly-correlated electronic materials simulated in reciprocal space, the indices i, j, k and l are typically not neighbors.The type of terms that dominate the computational resource is the 4-local interaction term V i jkl c † i c † j c k c l , which requires a sequence of O(N 4 ) unitary transformations for a system with N orbitals (i, j, k, l = 1, 2, • • • N) in a single Trotter evolution step due to all possible choices of the four fermion indices.Taking into account the JW string, which has length of O(N), the Trotter step circuit depth of the conventional approach becomes O(N 5 ) [60].
For our cavity-QED approach, we list the circuit depth for the two approaches.(1) Series: O(N 4 ), due to the reduction of the linear overhead of the Jordan-Wigner string.(2) Parallel: O(N 3 ), assuming N ancilla cavity modes.The remaining O(N 3 ) terms cannot be exponentiated in parallel because they do not commute with each other (e.g. when the first index i coincide, but the remaining 3 indices j, k, and l are all different).However, note that for an actual quantum chemistry Hamiltonian, although the total number of terms scales as O(N 4 ), a large number of integrals vanish between distant orbitals or due to symmetry.The number of non-commuting terms also scales as O(N 3 ) though similarly sparse.This can be seen from the example molecules discussed in Table I (operator information collected from Ref. [6,7]), which has typically only O(N) to less than O(N 2 ) non-commuting terms (equivalent to the minimum number of commuting groups listed in the table).Therefore, there is a huge potential for parallelization in practice.

Summary of the comparison between cavity-QED and conventional approaches
Here, we summarize and compare the various properties of the cavity-QED scheme versus the conventional scheme, as shown in Table II.
In order to compare both schemes, we first compare their gate time.With the state-of-the-art technology, the secondorder QND interaction strength between qubits and cavity with the form χ j a † aσ z j , can typically reach about 50 -100 MHz [30], corresponding to gate time of 20 -40 ns.On the other hand, the conventional approach needs nearest-neighbor CNOT gates between qubits, coming from the second-order ZZ interaction, 4g 2 η i, j σ z i σ z j , (e.g.due to the third-level contribution in the context of transmon qubits [61], where η is the nonlinearity of the transmon).The typical strength of the ZZ interaction is around 50 MHz [32], corresponding to a gate time of 40 ns.Since both types of interactions are of perturbative nature (up to second order), the gate time in both cases are of the same order of magnitude.The relevant parameters are summarized in Table II.We also include the asymptotic prefactor √ N (reduces to log N with the Bravyi-Kitaev encoding) of the cavity-QED gate time due to the dispersive regime condition [Eq.( 9)], which can be remedied by the modular architecture connecting multiple cavities (Appendix IX).The average number of strings (cavity ancilla modes) a single qubit touches simultaneously is of O(10), so one does not need to worry about cross-talk between the ancillae due to frequency crowding in these cases either.
We emphasize that having a scheme with a shorter operation time in each Trotter step enables more evolution steps within the coherence time of the system, and hence increases the precision of the algorithms, such as phase estimation.Besides the cavity-QED scheme presented in this paper, there are some other schemes which can reduce the overhead due to the non-local string operator, such as Ref. [18] and [17].We compare our scheme with theirs in Appendix X.
Another significant advantage of our scheme over the conventional scheme is the gate fidelity, in particular, the fidelity due to the control pulses.In the conventional scheme, in order to implement N CNOTs in the CNOT ladder, one has to send N control pulses.Assuming the fidelity is F for each pulse, the overall fidelity due to imperfect pulse becomes F N as shown in Table II.On the other hand, in the case of our many-body gate, one can actually just use a single control pulse with error F to detune the cavity frequency.In this case, the overall fidelity due to imperfect pulse is just F , which does not have an exponential decay.Therefore, our collective many-body gate has a significant advantage in terms of quantum control and pulse fidelity.

D. Numerical simulation in the presence of decoherence
In this section, we numerically simulate and compare different approaches with two simple but representative experiments: ( [30] and [32].For the pulse fidelity of gate control, we assume a single pulse has a fidelity F for the qubit control and F for the cavity control.Note that the scaling for Bravyi-Kitaev encoding listed in this table assumes a non-local cavity ancilla which can selective address an arbitrary cluster of connected or disconnected qubits, and in the BK case the number of qubits in the cluter is O(log N).This is different from the case of a device with only local connectivity where the scaling is essentially the same as the Jordan-Wigner encoding.
lattice (simulated by 8 qubits).(2) A quantum chemistry problem, i.e., the outer shell electrons of a BeH 2 molecule (simulated by 6 qubits), which has been simulated with superconducting qubits in a recent experiment [7].
The simulation takes into account decoherence of qubits and cavity, represented by the jump operators l j = l j Γ j , where Γ j is the corresponding decay rate and l j the normalized operator.The types of jump operators of our numerical simulation is listed in the caption of Fig. 5, along with the realistic estimation of experimental parameters chosen according to Ref. [31].
In particular, we simulate the Kitaev phase estimation protocol (see Appendix VIII) for both systems and for the Fermi-Hubbard model also the measurement of spectral function A(ω)=−2Im[G(ω)], where G(ω) is extracted from the Fourier transform of the dynamical correlators including ψ | c i (t)c † j (0) | ψ (see Appendix VII).Since both measurement protocols involve time evolution U(t), the dissipation of the system will affect the measurement result, as shown in Fig. 5.We compare four different situations: the ideal situ-ation without dissipation, the conventional approach, and the cavity-QED approach in series and in parallel respectively.Since each approach needs different operation time per Trotter step, the effects of dissipation are different.
For the Fermi-Hubbard model, we use JW encoding in all cases and three transmission line cavities are needed to couple each pair of rows (four rows in total) in parallel.For the BeH 2 molecule, we use the modified BK encoding discussed in Ref. [7].With this encoding, there are a total of 164 terms, which can be divided into 8 groups, where all the terms in the same group commute with each other, as shown in Table I.In this case, one can reduce the circuit depth to 8 by exponentiating all the terms in the same group in parallel with multiple ancilla modes in the same central cavity.This would require about 20 tones in the flux modulation using the trick in Fig. 3(d).On the other hand, the series cavity-QED approach will exponentiate all the terms sequentially with a single cavity ancilla.
Regarding to the phase estimation protocol in Fig. 5(a) and (c), the cavity ancilla expectation Z a (t) (Pauli-Z) oscillates in FIG.5: Numerical simulation of the measurement protocols for different approaches taking into account dissipation effects (summed over 50 quantum trajectories in each curve), with the following jump operators for qubits and cavity and corresponding decay rate (from Ref. [31]): σ − j (10 kHz) , σ + j (0.05 kHz), σ z j (50 kHz), a (5 kHz) and a † ( ∼ 0 kHz) .(a) Phase estimation of the 2D Fermi-Hubbard model on a 2 × 2 lattice (simulated by 8 qubits), with the parameter: κ = 0.1, U = 1, and 4 electrons in total (half-filling).The upper panel shows the time-domain signal of the ancilla expectation value, while the lower panel is the Fourier transform of the upper panel in order to extract the ground-state energy.The actual ground-state energy E g of this model is shown by vertical dashed lines.Note that all the curves in the lower panel correspond to Fourier transform of the signal in the period 0 ≤ t ≤ 100, while the purple curve corresponds to the ideal case with no dissipation and being transformed over a much longer period 0 ≤ t ≤ 1000 such that the resolution is improved by about 10 times.(b) The spectral function (extracted from the dynamical correlation function) of the Fermi-Hubbard model.The separation between the hole and particle resonance peaks signals the Mott gap.(c) Phase estimation of the BeH 2 molecule (simulated by 6 qubits).Due to the signal decay of the cavity-QED (series) and local approach, we only perform Fourier transform in the period 0 ≤ t ≤ 10.In the numerical simulation, we first subtract all the diagonal terms in the Hamiltonian and then shift it back to recover the eigenenergy, mimicking the actual experimental process in Ref. [32].One can see all but the conventional local approach can locate the ground-state energy E g (dashed line), while the cavity-QED (parallel) approach has almost a resolution as good as the ideal case with no dissipation, despite the shrink of the peak.time in the ideal case, i.e.Z a (t) = cos(E g t), where E g is the ground-state energy of the prepared eigenstate.Nevertheless, in the presence of decoherence, the signal decays significantly in time, while the peaks in frequency-space signal Z a (ω) also shrinks due to dissipation.For the Fermi-Hubbard model in (a), we prepare the ground state in the beginning, and one can see that E g (shown by the dashed line) can be clearly resolved in the biggest peak in Z a (ω) in the blue and purple curves (ideal dissipationless case).The purple curve has a Fourier transform over the period 0 ≤ t ≤ 1000, namely 10 times long as the others, and hence has much better resolution.With dissipation, the signal dies out in a short time.While this peak still has the correct position for the cavity-QED parallel approach (red dashed), it shifts slightly for the series approach (green dashed) and becomes obscured in the conventional approach local (light blue dashed).For the phase estimation in BeH 2 molecule in (c), we see that the parallel cavity-QED approach (red dashed) approximates the dissipationless signal (blue) with almost the same resolution of the ground-state energy while the height of the peak is reduced.The series cavity-QED approach (green dashed) has significant broadening in the resolution, while the conventional local approach has all the peaks being smeared out and is hence hard to tell the actual energy.

For the spectral function measurement in panel (b) for
Fermi-Hubbard model, we prepare the initial state as the ground state.The two biggest peaks correspond to the hole (left) and particle (right) resonance respectively, and the distance is approximately U, namely the Mott gap.We can see that the dissipation effect leads to the shrinking and asymmetry of the two peaks.The shrinking is proportional to the operation time of different approaches.The asymmetry is due to the fact that the qubit has much larger loss rate than absorption rate as listed in the figure caption.Due to our encoding of 0 (1) electron as spin up (down) of the qubit, the qubit loss induces loss of holes but not particles.Therefore, the hole peak (left) shrinks more than the particle peak.In practice, one could choose two different ways of encoding and average the signal to get rid of this asymmetry.

III. CONCLUSION AND DISCUSSION
In this article, we have shown that, in the context of cavity/circuit-QED architecture, the use of the common cavity modes greatly simplifies the non-local string-like encoding needed for fermionic simulation, such as Jordan-Wigner and Bravyi-Kitaev transforms.In particular, we are able to get rid of a polynomial overhead, i.e., N 2 of the Trotter-step circuit depth in the conventional local approach, which reduces the time complexity of the simulation for a given precision and in turn reduces the decoherence effects.The non-local quantum control and parallelization of multiple ancilla-controlled processes developed in this paper may have profound applications in many others areas, such as quantum information processing, lattice gauge theory simulation and measurement of entanglement spectrum in quantum many-body systems [62].

A. Derivation of parallelizations with multiple ancillae
Here, we show the detailed derivation of multi-ancilae parallelization mentioned above.We use conditional string-Z operations with multiple cavity ancilla modes, namely where each ancilla mode a ν is dedicated to a particular string ν.This collective gate can be realized by dispersively coupling qubits simultaneously to multiple modes resulting in the QND interaction As explained below, by proper conditional rotations, we can achieve a generic conditional string-S in different Pauli-bases, i.e.C S ν , where the Z ν string in Eq. ( 12) is replaced by S ν .We consider the case where all the strings commute with each other, i.e. [S ν , S ν ] = 0. Thus the conditional-string also commutes, i.e. [C S ν , C S ν ] = 0. Therefore, following the derivation in Eq. ( 7), we can reach the identity where R ν x and X a,ν is the x-axis rotation and Pauli-X operator of the ancilla mode ν.If all the ancillae are initiated at | + ν , the exponentiation of multiple strings is achieved in parallel, i.e. ν e iκ∆t S ν .Now we consider how to convert the conditional-Z into conditional-S.We illustrate the idea with example shown in Fig. 2(f)].This involves turning the head and tail of each string into Pauli-X operators.To achieve this, we split the C S ν operator into two parts applied sequentially (order is arbitrary): the main C Z 1 FIG.6: CNOT ladder and its variants (4-qubit version as an illustration).
exponentiating a z-string can be described by the following unitary operator: where we have abbreviated the CNOT between qubit i and j as C i j .The unitary can be simplified by repetitively using Eq.(A1) and the identity CNOT 2 = 1 ⊗ 1 as follows: where Z = j σ z j is a string operator.From the above derivation, we see that the essence of the CNOT ladder is the growing of the Pauli-Z operator mediated by the nearest-neighbor CNOT gates, such that the rotation of a single qubit along z direction effectively does the exponentiation of the Z-string operator.The local property in the circuit shown in Fig. 6(a) makes it more appreciable in terms of experimental realization if only local interaction is allowed.However, in the absence of the nearest-neighbor restriction, other variants of this CNOT ladder exists, such as the two circuits shown in Fig. 6(b, c).The circuit in panel (b) contains a long-range CNOT between qubits 1 and 4, which directly adds qubit 1 onto the string started from qubit 4. For the circuit in panel (c), we use all long-range CNOTs between the rotated qubit 4 and other qubits, and directly mediate the Pauli-Z operator from qubit 4.This turns out to be another extreme, which is completely non-local.In the following, we will show a slight modification of this circuit can be transformed to our cavity-QED circuit.
We consider a slight modification of the circuit in Fig. 6(c) shown in Fig. 7(a) (generalized to the case of N + 1 qubits), where now the last qubit is turned from a data qubit into ancilla and initialized into state | 0 a so that it does not encode fermions itself but only collects the parity information of the and the two in the middle which transform the z-axis rotation R z (2∆t) into the x-axis rotation R x (2∆t).Also we use the property that the control and target of the CZ gate is interchangeable, i.e.CZ=ZC.These transformations lead to the circuit in panel (c), where the controls are all moved to the ancilla.Finally, note that, the sequential application of CZ gates can be merged into a single many-body C Z gate, where Z = N j=1 σ z j is the string operator, as shown in panel (d), namely our cavity-QED circuit.This is possible if the ancilla is a cavity mode which can interact non-locally with all the data qubits simultaneously and is hence hardware-efficient.In this case, the circuit depth is reduced by a factor of N due to the fact that the process of collecting the parity information into the ancilla can be done in parallel instead of the conventional approach which is in series.

Appendix B: Multi-ancillae parallelization for a generic Hamiltonian
In the main text we have shown how to parallelize terms with multiple ancillae in the Fermi-Hubbard Hamiltonian, with the circuit shown in Fig. 2(f).Note that the "collect" and "erase" stages of that circuit make use of the property that the nonlocal operators of the Fermi-Hubbard model consist of long strings of σ z -operators sandwiched between two other Pauli operators, i.e. σ x or σ y .In this section we show how to deal with less structured Hamiltonians such as those arising from quantum chemistry in Ref. [6,7].To demonstrate the issue and our solution more clearly, we will first work out a small example explicitly and present the general treatment later.Consider three mutually commuting terms selected from the list of Pauli operators making up the 6-qubit BeH 2 Hamiltonian of Ref. [7]: where we have omitted '⊗' for brevity.Here we will describe how to efficiently perform the collect/erase stage of the algorithm described in the main text, which is equivalent to the target circuit shown in Fig. 8(a).Since we can only perform controlled Z-strings using our cavity-assisted scheme [Eq.( 12) and ( 13)], these operators present a challenge.We first decompose each of the terms above into σ x -, σ y -and σ z -only strings as: . This sign can be introduced with a diagonal operator D (yellow box) acting on the ancillary cavity modes and an additional dummy qubit initialized in state | 1 d .The detailed implementation will be discussed in the end of this section.
To be concrete, we describe how to implement the groups of x-and y-strings.Given S x i and S y i we define corresponding z-strings by replacing all the non-identity Pauli operators with σ z 's and call these Z x i and Z y i respectively.In the example considered here these are We then sandwich the Z  8(d,e).
In the following, we describe the general strategy for parallelization of arbitrary terms.Consider a group of N mutually commuting terms S ν such that [S ν , S µ ] = 0 for ν, µ = 1, . . ., N. In reorganizing the order of conditional strings in Fig. 8(b) all the x-strings move past the y-and z-strings to its left.Similarly all the z-strings move past the y-strings to its right.A minus sign is picked up every time when the pair is anti-commuting.We define It is straightforward to show that the overall sign picked up by the rearrangement of the operators described in Fig. 8(b) can be compensated by the following diagonal operator The quantity inside the parenthesis is calculated as part of the classical preprocessing.Whenever it is 1, the corresponding operator is simply identity and can be discarded.If, on the other hand, the quantity inside the parenthesis is −1, we have a nontrivial contribution and the product needs to run over those pairs only: Since there are N(N − 1)/2 pairs of operators, the product above can consist of at most that many terms.Moreover, since a global sign is of no significance, in cases when there are more than N(N − 1)/4 terms we can choose to implement −D instead.With this, the worst case scenario is that the diagonal operator will consist of less than or equal to N(N − 1)/4 terms of the form (−1) n µ n ν .Next we show explicitly how to implement the sign-fixing circuit D for the parallelization example with terms given by Eq. (B1) and is illustrated in Fig. 8(f).We introduce an additional "dummy" qubit fixed in | 1 d , and apply a control-Z gate on it when a minus sign is needed for certain ancilla configuration, due to the fact that Z| 1 d = −| 1 d .For example, when applying a double-ancillae control-Z gate on the "dummy" qubit in the case of two ancillae in total, we get For the case in Eq. (B1) and Fig. 8(f), one needs to implement an ancillae-conditioned minus sign (−1) n 1 n 2 (−1) n 2 n 3 , which can be realized by two double-ancilla control-Z gates (conditioned by n 1 n 2 and n 2 n 3 respectively) as shown in the yellow box in panel (f).Note that in general all the multiancillae control-Z gate (more generally controlled-rotation) can be implemented in parallel utilizing the multi-mode QND interaction H QND ν j χν, j a † ν a ν σ z j and the induced "number splitting" [63], i.e., through the different dispersive shifts of the qubit frequency depending on the number configuration in multiple cavities, i.e. ∆ j = ν 2n ν χ ν, j .In the simplest two-ancillae example, we pick different dispersive shifts χ1, j and χ2, j due to cavity mode-1 and -2.Therefore, the total frequency shift of qubit j is (with four different outcomes), where n 1 , n 2 = 0, 1 are the photon numbers in the two cavities.The single-qubit rotations are generated by microwave pulse with the driving frequency + ∆ j n 1 ,n 2 , where is the bare qubit frequency.Therefore, the driving is only resonant when the cavity-ancillae are in state | n 1 n 2 a .Therefore, conditional-rotations in panel (f) can be achieved.We emphasize that, in general, at most N(N −1)/4 conditional-rotations in the yellow box will be performed in parallel using microwave drive with multiple tones, which can be achieved with standard microwave engineering technology, e.g., for N ∼ O(50).
Appendix C: Implementation of QND interaction with the usual Jaynes-Cummings interactions In this section, we consider the alternative experimental realization for ordinary qubits (including transmon qubits) without the rich selection-rule structure of fluxonium.In this case, a simple Jaynes-Cummings interaction can essentially capture the qubit-cavity coupling.
We consider the situation that all qubits are coupled to two transmission-line cavities.The qubit frequency ( ) is placed between two different dominant cavity frequencies ω and ω (assuming other cavity modes are far detuned away from the qubit frequency) as illustrated by Fig. 9.The system can be described by a two-mode Tavis-Cummings model Here, a and b represent the two photonic modes, σ j 's represents qubits operators, and g represents the strength of the Jaynes-Cummings (JC) interaction.
In the dispersive regime, namely (N represents the total number of qubits), one can adiabatically eliminate the direct JC interaction between qubits and the cavities, and the effective Hamiltonian in second-order perturbation theory is given by Apart from H 0 , the terms appearing in second-order perturbation have two types: (1) The QND interaction between the cavity photons and the qubits [Eq.( 3)]; (2) The non-local flipflop interactions between qubits mediated by virtual photons.
For the purpose of our protocol, we want to get rid of the later type.This can be simply achieved by setting ∆ a = −∆ b , i.e. placing the qubit frequency right in the middle of two cavity frequencies [ = 1 2 (ω + ω )], as illustrated by Fig. 9.We call the first transmission line "ancilla cavity", and we occupy this cavity (mode a) with photons.We call the second transmission line "balance cavity".We will not occupy this cavity (mode b) with photons, and one can effectively set b † b = 0.
For the above discussion, the cancellation of non-local flipflop terms relies on uniform qubit-cavity coupling g.However, usually g can have spatial dependence due to nonuniform shape of the mode function.Therefore, one should also be able to vary the qubit-resonator coupling strength on different sites to compensate such inhomogeneity.This can be achieved with the flux-tunable inductive couplers between qubits and transmission-line cavities as illustrated in Fig. 9. Now, the only remaining term apart from H 0 is the QND interaction between the ancilla photon and the qubits namely H QND = χa † a j σ z j , where the interaction strength is χ = g 2 /∆ a .
Note that an alternative way of suppressing the non-local filp-flop interactions without using a balance cavity is by detuning the frequencies of the qubits which are coupled to the same cavity mode.As long as the frequency difference ∆ is much larger than the QND interaction strength χ, namely ∆ /χ 1, the flip-flop interaction is effectively suppressed due to rotating-wave approximation.This alternative scheme works well for N ∼ O(10) since the QND interaction is still sizeable.

Appendix D: Fluxonium circuit
The fluxonium circuit can be described the following Hamiltonian [48,52,53]: where φ describes the phase difference across the small junction, the conjugate operator N = −id/dφ represents charge imbalance across the junction, in units of Cooper pair charge (2e), and Φ ext represents the flux threading the main loop of the circuit.The relevant energy scales are charging energy E C and Josephson energy E J of the small junction, and the effective inductive energy E L of the "superinductor" formed by a Josephson junction array.One can think of the above Hamiltonian as describing a fictitious particle (with coordinate φ) residing in a potential V(φ) = −E J cos φ + 1 2 E L (φ + 2πΦ ext /Φ 0 ) 2 , which is composed by a periodic cosine potential and an parabolic envelope (position tunable with the external flux Φ ext ).The charging term is equivalent to the kinetic energy of the particle with momentum being N = −id/dφ.One can define an orthogonal Wannier basis | m s according the first two terms, i.e. particle in a cosine periodical potential.Here, m labels the wells in the cosine potential and also represents the winding number of the persistent current corresponding to the particular Wannier state, while s is the band index.In this basis, one can write the effective Hamiltonian for each band s (neglect inter-band coupling) as follows: where s,n is defined by the Fourier expansion of the band structure s (p) = n s,n cos(2πnp) (p is the quasimomentum).We see that once the leading-order inter-well tunneling amplitude s,1 is suppressed (in the large E J /E C regime), the eigenstate of H s are approximately Wannier states | m s in different wells (m) and bands (s).Therefore, m effectively becomes a good quantum number, and hence one obtains a selection rule s m | φ | m s ∝ δ mm [53], i.e. interwell transitions are forbidden due to the exponential suppression of wavefunction overlap.The phase matrix elements l | φ | l are illustrated for large and small E J in Fig. 10.At large E J [panel (a)] one can see clear selection rule, i.e. interwell transitions are forbidden (φ 01 = φ 12 = 0), while at small E J [panel (b)], no clear selection rule exists and the matrix elements have a richer structure.
An alternative scheme of cancelling the flip-flop interaction between fluxonium qubits without using selection rules in the specific regime is to spatially vary the parameters (such as Φ ext or E J ) such that the 0-1 transition energy is detuned between different qubits.Due to the rich structure of the matrix elements, the dispersive shifts χ l = l χ ll and QND interaction strength χ = 1 2 (χ 1 − χ 0 ) receive contributions from multiple levels l and hence usually does not decreases when the 0-1 detuning ∆ 01 increases as in the case of usual Jaynes-Cummings model (for two-level systems).Ultimately frequency crowding limits the number of qubits allowed in this scheme.In this section we discuss the on-chip circuit-QED design for the implementation of the Fermi-Hubbard model.For the purpose of parallelization, we couple each neighboring pair of rows with a cavity through a tunable flux couplers.Note that each qubit is coupled to two cavities (red and green as shown in Fig. 11).In the following, we discuss the three types of terms mentioned in the main text with more details.
(1) On-site Hubbard interaction: This interaction translates to ZZ interaction and phase shift in the qubit representation, i.e.While the phase shift can be easily implemented by shifting the qubit frequency, the ZZ interaction can be induced perturbatively (second-order) by the flux-tunable inductive couplers between the qubits of two spin species (purple and yellow) [illustrated by the red dashed line in Fig. 11].
(2) Horizontal hopping: as illustrated in Fig. 4 in the main text, horizontal hopping (for both spin species) does not contain a JW string, and hence can be directly implemented by the inductive coupler between neighboring qubits [illustrated by the blue double arrows in Fig. 11].
(3) Vertical hopping (even and odd): The vertical hopping between neighboring rows typically contains a JW string and can be exponentiated with the circuit in Fig. 2(b) in the main text.The hopping terms are divided into two generally non-commuting groups for both spin species, with the lower row having even (odd) row index, with the corresponding JW string on the left (right) side due to the "snake"-shape of the JW string in our convention, as illustrated in Fig. 4 in the main text.To exponentiate these two groups of terms in turn, we couple the qubits on the corresponding two neighboring rows through inductive couplers with a green (red) transmissionline cavity for the even (odd) vertical hoppings.We also exponentiate the two spin species in turn.The inductive couplers in this design can be used to couple two qubits, or a qubit with the resonator.
During the protocol of exponentiating the above types of terms respectively, one can turn the couplers on and off and detune the qubits properly, such that different processes do not interfere with each other.In addition to the Jordan-Wigner encoding introduced earlier, there are a number of other fermionic encoding schemes.The fermionic antisymmetric space is spanned by basis states , where | Ω is the vacuum.The action on state K of the lowering operators is given by c j | K = (−1) Γ jK | K δ k j ,1 with K such that k i = k i for i j and k j = 0. Thus, changing the occupancy requires knowledge of the prefix sum Γ jK = j−1 i=1 k i and the occupancy k j .Depending on the encoding scheme, either the computation of the occupancy or of the prefix sum may cause the qubit operator locality to grow with system size.The simplest case is the JW transform where only the occupancy is stored in the qubits.
The optimal compromise between the two schemes results in Fenwick tree where a mixture of occupancy and partial sums are stored, known as the Bravyi-Kitaev (BK) transform.Despite additional complications in prescribing operators, the total cost (qubit string length) to encode a fermionic mode is log N for N fermionic modes.Here, we show a concrete example of exponentiating the qubit string in BK encoding in a system with six fermionic modes.In particular, we consider the encoding of a hopping term between the first and last modes, i.e.
).We present the corresponding circuit for the second sub-term in Fig. 2(d

Appendix G: Measurement of the dynamical correlation function
A useful measure which captures the response of the system is the dynamical correlator: where G p and G h represents particle and hole correlators respectively.In order to make the measurement easier, we consider measuring correlation functions of the related Hermitian operators, namely the Majorana operators defined as: One can express the complex fermion correlator as linear combinations of Majorana correlators as Note that since we have fermion number conservations in the Fermi-Hubbard model, the other two types of Majorana correlators pp and pq does not appear in the above interaction, although they should appear if there are number-nonconserving terms in the Hamiltonian.Therefore we only need to measure the two types of Majorana correlators, i.e. qq and qp .Using qp as an example, we can re-express it in the Schrödinger picture using evolution operator U(t), i.e.
One can easily see that this can be easily measured by taking an overlap between q i U(t)p j | ψ and U(t)| ψ .Therefore, one can again use a Ramsey interference protocol to extract the correlator from the ancilla (cavity), which is shown in Fig. 12.The only needed ingredients are unitary evolution where A(ω) is the spectral function and encodes important information of the system (such as the spectral gap).Note that and η are infinitesimal real number to shift the poles and integration contour.Re[ Z a (ω) ] are the quantities simulated in Fig. 5 in the main text.
When using the property of Eq. (H1) to perform PEA, one needs to exponentiate each sub-term of the Hamiltonian with a single cavity ancilla sequentially, so the parallelization scheme in the previous section cannot be used anymore.In the case of a 2D spinful Fermi-Hubbard model in real space (N × N lattice), the circuit-depth (time-complexity) of a single Trotter cycle in the phase estimation becomes O(N 2 ) if Jordan-Wigner encoding is used.In the case of the generic Hubbard model with N orbitals, it remains O(N 4 ), the same as the circuit in series as we discussed above.In order to use the parallelization scheme, we can use an alternative approach, namely coupling the multiple ancilla cavities (a j ) to a central "clock" cavity (a c ), which shifts the frequencies of the other cavities conditionally through the dispersive coupling (χa † j a j a † c a c ) such that the rotations applied on all the ancilla cavities in parallel is conditioned by the state of the clock cavity.In this scheme, the time complexity of PEA remains the same as ordinary time evolution.
In contrast, note that in the conventional approach, the phase estimation assumes the coupling to a central ancilla, which is very inefficient to implement with an experimental system involving only local interactions.In that case, one has to apply an extensive amount of SWAP gates to transport the ancilla around.Therefore, having an cavity ancilla coupled non-locally to the whole system has a significant advantage for phase estimation.Now we consider the 'digital' version of quantum phase estimation [47], the essence of which can be summarized as Here, the first register is called the "time register", which is initialized in superposition of different "time", i.e.In the context of our cavity-QED scheme, there is only one ancilla qubit in the time register, which is the cavity photon state.Therefore, we use the iterative phase estimation algorithm (IPEA) [64] to repetitively use the single ancilla qubit to improve the precision of energy measurement and state preparation, the quantum circuit of which is shown in Fig. 13.In the PEA language, one can represent the phase factor generated by the unitary evolution U(t 0 ) = exp(−iHt 0 ) as exp(−iEt 0 ) = exp(−2πφ) where φ ∈ [0, 1) is the phase to be estimated.The phase φ can be represented with a fractional binary expansion φ = 0. j 1 j 2 ... j L = j 1 2 1 + As the first step in IPEA, a conditional U(2 L−1 t 0 ) is performed (abbreviated as U 2 L−1 ), resulting in the relative phase factor exp[2πi( j 1 ... j L−1 .j L )] = exp(−2πi j L /2).Therefore, the leastimportant final digit j L = 0 or 1 can be measured through the readout of the cavity ancilla.The measurement projects the many-body wavefunction in the state register and this output wavefunction in the first step automatically serves as the input of the next iterative step.Now to obtain more significant digits j k , we need to perform a conditional U(2 k−1 t 0 ) (abbreviated as U 2 k−1 ), as shown in Fig. 13.In this case, we get the phase factor exp[−2πi( We see that the more significant digits on the left decimal point does not contribute to the phase factor in this case.However, we need to get rid of the less significant digits on the right of j k to make the measurement precise by the additional phase gate S k applied on the ancilla before the measurement.The form of S k is given by and is based on the previous measurement results of the less significant digits j k+1 , j k+2  iterative performing such measurement procedure, one gets all the digits and hence the estimated phase φ = 0. j 1 j 2 ... j L .Meanwhile, the many-body state in the state register is projected to the eigenstate corresponding to the eigenenergy 2πφ.Therefore, IPEA can be used to prepare the eigenstate or more specifically the ground state of the system with order of time O(1/ ), where is the precision of the eigenenergy.For the ground state, the preparation time is of the order O(1/∆ M ), where ∆ M is the many-body gap.This is in contrast to an alternative way of state preparation, i.e. adiabatic state preparation, which requires time of the order O(1/∆ 2 M ).
Appendix I: Scalable modular architecture and stitching of the strings Due to the constraint of the qubit number in one cavity limited by experimental feasibility and also the asymptotic 1/ √ N (or 1/ log(N) for BK encoding) reduction of the allowed perturbative interaction strength, it is important to consider scalable architecture that couples multiple modules together, where each module consists of a cavity with multiple modes and a number of qubits.We show such a scalable scheme in Fig. 14, which manages to stitch the qubit strings in multiple modules together.For simplicity we only show a single cavity mode in each module, while it can be extended to the situations with multiple modes in each module.Our modular scheme uses a teleportation scheme which can be considered as a variant of the teleportation circuit in Ref. [18].
In our scheme, the cavity modes collects the parity information of the qubit strings within each module and is teleported to a single target cavity with the assistance of additional ancilla qubit pairs (red lines) which are prepared in Bell state | Φ + = 1 √ 2 (| 00 + | 11 ).The teleportation is completed by the Bell-state measurement (BSM) projection and conditional correction again to the target state | Φ + .The target cavity is rotated by R x (2∆t) in order to exponentiate the string operator.After that, a mirror teleportation circuit is implemented to erase the parity information.Note that, since all the Bellstate preparation and measurement, and the conditional string operations in each module are performed in parallel, the circuit depth of the modular architecture to exponentiate a single string is O(1).Therefore modular approach does not increase the time complexity and can remedy the slowing down of the cavity-QED scheme due to the asymptotic 1/

√
N reduction in

FIG. 2 :
FIG. 2: (a) Arbitrary string operator exponentiated with conventional approach using a CNOT ladder to collect the parity information.The whole process can be performed collectively using cavity-QED approach with conditional string operation to realize the exponentiation of the string operator, which reduces the number of gates and the circuit depth by a factor of 1/N.(b) Exponentiation of a hopping sub-term with the action of pairs of Hadamard gates on sites i and j.(c) Exponentiation of an interaction sub-term with the action of pairs of Hadamards on site i, j, k and l.(d) Exponentiation of a hopping sub-term in the Bravyi-Kitaev encoding.(e) Measurement of the static correlator ψ|σ x i σ x j k σ z k |ψ with a Hadamard-test circuit.The expectation value of the correlator can be extracted from the cavity ancilla readout.(f) Exponentiation of 4hopping terms in parallel with the coupling to 4 cavity ancillae.In order to switch the "head" and "tail" of each string to Pauli-X operator, we split the strings into X and Z parts.The C X can be implemented with C Z sandwiched by parallel Hadamards on the qubits.All the gates in the blue-dashed box are implemented in parallel by multi-mode QND interaction [Eq.(13)].

FIG. 3 :
FIG. 3: (a) Schematics of a circuit-QED realization: superconducting qubits coupled to a transmission-line cavity with flux-tunable inductive couplers.In particular, we consider using fluxonium circuit as our qubit, and operate it in the vicinity of half flux quantum into the main loop (the right loop between inductor and junction).(b) The wavefunction is illustrated for E C = 0.5 GHz, E L = 0.75 GHz, Φ ext = 0.4Φ 0 and tunable E J .For E J = 20 GHz (top), the states are trapped deep in the wells corresponding to persistent-current states flowing in opposite directions (with winding numbers m = 0 and m = 1 respectively).The inter-well transitions are forbidden (dashed arrow), and only intra-well transitions (such as 0-2 and 1-3) are allowed (solid arrows).For E J = 4 GHz (bottom), the well is shallow and all transitions are allowed.(c) Magnitudes of phase matrix elements |φ ll | as a function of E J (tunable by external flux through the junction loop on the left).At large E J , |φ 01 |, |φ 03 | and |φ 12 | (dashed lines) are exponentially suppressed.The parameters are based on Ref. [38].(d) For further parallelization of multiple terms with overlapping strings, qubits are coupled to multiple ancillary cavity modes through periodically modulating the couplers with multiple tones.The qudit transition frequencies 2 − 0 and 3 − 1 are up-converted close to multiple cavity frequencies ω ν to induce multiple QND interactions in parallel.

FIG. 4 :
FIG. 4: Types of terms and Jordan-Wigner strings in a 2D spinful Fermi-Hubbard model on an N × N lattice.One can consider it as a checkerboard lattice with two sub-lattices (purple and yellow) representing two spin species (↓ and ↑) respectively.The 'even' and 'odd' vertical hoppings differs by the location of the strings, which are on the left and right sides respectively.

FIG. 7 :
FIG. 7: Circuit transformation from the CNOT ladder to the cavity-QED scheme.

FIG. 8 :
FIG. 8: General strategy for parallelization multiple strings with multiple ancillae.(a) The target circuit: N arbitrary strings controlled by N ancillae.(b) Split each string into x-, y-and z-strings.(c) Reorder the strings such that the same type (x, y, or z) are grouped together and performed in parallel.To fix the problem of ancillae-dependent negative sign due to anti-commutation relation between the strings, we introduce a circuit D (yellow box) acting on the ancillae and a "dummy qubit" to fix the negative sign.(d) Implementing the x-string by sandwiching the corresponding z-string with Hadmards.(e) Implementing the y-string by sandwiching the corresponding z-string with phase gates.(f) The complete circuit for the example given by Eq. (B1).The circuit D (yellow box) involves parallelized multi-ancillae control-Z gates acting on the dummy qubit initialized in state | 1 d .
strings with Hadamard (combined Hadamard and phase gates) to obtain the S x and S y strings, see Fig.

Frequency
FIG.9:An on-chip circuit-QED realization with ordinary qubits and a pair of transmission-line cavities.The frequency of the qubits reside in between the frequency of the fundamental modes of the transmission-line cavities.Superconducting qubits are coupled to two transmission-line cavities with flux-tunable inductive couplers.

FIG. 10 :
FIG.10: Phase matrix elements of the fluxonium in the large (20 GHz) and small (4 GHz) E J regime, with the other parameters fixed: E C = 0.5GHz and E L = 0.75GHz.
Appendix F: Generalization to Bravyi-Kitaev transformation and Fenwick tree encoding ) in the main text.

FIG. 11 :
FIG.11: Experimental implementation with multiple transmissionline cavities that can parallelize the exponentiation of sub-terms in each pair of neighboring rows.The flux couplers can be turned on and off such that they can be used to couple qubits to exponentiate two-body terms or be used to couple qubit and cavity to exponentiate terms with strings.

FIG. 13 :
FIG.13: Iterative phase estimation (IPEA) algorithms to measure the spectrum and prepare eigenstates or more specifically the ground state of the Hamiltonian.

1 √ N 2 N − 1 t=0
| t , where N is the number of qubits in the time register.The second register is called the "state register" containing the many-body wavefunction | ψ .Now one applies a controlled unitary CU for time t determined by the time register state | t .The relative phase factor e −iE k t is hence imprinted into the entangled state of | t and the energy eigenstate | e k .Here, C k = e k | ψ is the amplitude of the many-body wavefunction being in eigenstate | e k .One can further perform a quantum Fourier transform (QFT) such that the time register now stores the estimated energy E k of the corresponding many-body energy eigenstate | e k .By doing projective measurement on the N qubits in the time register, one can get the estimated energy E k and meanwhile project the many-body state into the energy eigenstate | e k with probability |C k | 2 , which can also be used as a state preparation protocol.

FIG. 14 : 1 √ 2 (
FIG.14: Scalable architecture with coupled cavity modules that stitch qubit strings in each module together.Four modules are illustrated here, and each module contains multiple qubits and one cavity mode which can be easily generalized to the multi-mode case.The red wires represent the additional ancilla qubits enabling teleportation in the spirit of Ref.[18].The ancilla are initially prepared in Bell state | Φ + = 1 √ 2 (| 00 + | 11 ) and are measured (and corrected) jointly with the neighboring cavity in the Bell-state basis to teleport the parity information the cavities have collected to the target cavity a 4 .The target cavity is rotated in order to exponentiate the qubit strings.After that, a mirror teleportation circuit is applied to erase the parity information.The circuit depth is of O(1).

TABLE I :
[6,7]2D spinful Fermi-Hubbard model on a 2 × 2 Summary of various properties of six different molecules (operator information based on Ref.[6,7]).The first row lists the number of Pauli terms in the Hamiltonian which can be grouped into sets of mutually commuting groups.The minimum number of such groups and the average number of terms per group appear in rows two and three, which dictate the minimum Trotter-step circuit depth and number of cavity ancillary modes needed for parallelization.Row four contains the average number Pauli operators in each term which determines the cavity load, i.e., the number of qubits interacting with a single cavity mode simultaneously.Finally the last row lists the average number of terms within each mutually commuting group that each qubit participates in, which determines the qubit load, i.e. the number of cavity modes interacting with each qubit simultaneously.

TABLE II :
Comparison of the conventional (local) and cavity-QED approaches with Jordan-Wigner and Bravyi-Kitaev encodings.The interaction strength and gate time are listed.Gate times and interaction strengths are approximate, and are based on Ref.
• • • j L .With this phase gate, one ends up with the phase factor exp[−2πi0.j k] = exp[−2πi j k /2].The measurement of the ancilla hence gives the digit j k .By ...