Eigenstate preparation for Hamiltonian systems is one promising application of noisy intermediate-scale quantum (NISQ) computers to achieve practical quantum advantage1,2,3,4,5,6,7. One of the representative hybrid quantum-classical algorithms to achieve this task is the variational quantum eigensolver (VQE). It attempts to find the ground state of a given Hamiltonian H within a variational manifold of states that are generated by parametrized quantum circuits U(θ) acting on a reference state \(\left\vert {{{\Psi }}}_{0}\right\rangle\). The parameters θ are obtained by classically minimizing the energy cost function \(E({{{{{{{\boldsymbol{\theta }}}}}}}})=\left\langle {{{\Psi }}}_{0}| {U}^{{{{\dagger}}} }({{{{{{{\boldsymbol{\theta }}}}}}}})HU({{{{{{{\boldsymbol{\theta }}}}}}}})| {{{\Psi }}}_{0}\right\rangle\) that is measured on quantum hardware2,3,4,8. The quality of a VQE calculation is tied to the ability of the variational ansatz to represent the ground state with high fidelity. In quantum computational chemistry, the unitary coupled cluster ansatz truncated at single and double excitations (UCCSD) has been extensively studied, owing to the success of the classical coupled cluster algorithm9,10,11. It was found that the application of UCCSD ansatz is limited by the rapid circuit growth with system size and the deteriorating accuracy in the presence of static electron correlations8,12,13. Therefore, alternative variants have been developed, including hardware-efficient ansätze, that improve the trainability and expressivity of the wave function ansatz3,13,14,15,16,17,18,19,20.

Indeed, it was found that compact and numerically exact variational ground state ansätze can be adaptively constructed for specific problems using approaches like the adaptive derivative-assembled pseudo-trotter (ADAPT) ansatz13,16. The adaptive ansatz is typically obtained by successively appending parametrized unitaries to a variational circuit with generators chosen from a predefined operator pool. In practice, the ADAPT-VQE algorithm works well with an operator pool composed of fermionic excitation operators in the UCCSD ansatz. The extended qubit-ADAPT VQE approach16 utilizes an operator pool composed of Pauli strings in the qubit representation of fermionic excitation operators in the UCCSD ansatz, which is shown to be capable of generating significantly more compact ansätze than the original ADAPT-VQE method at the price of introducing more variational parameters. As the circuit complexity (i.e., the number of two-qubit operations in the circuit) is a determining factor for practical calculations on NISQ devices, qubit-ADAPT is preferable and chosen for the comparative study in this work. Regarding the scalability of the qubit-ADAPT method towards larger system sizes, we note that reference18 reports a favorable linear system-size scaling for the adaptive ansatz complexity of nonintegrable mixed-field Ising model using the adaptive variational quantum imaginary time evolution method (AVQITE). AVQITE is known to generate variational circuits of comparable complexity as qubit-ADAPT VQE. As a first step to investigate the scalability in fermionic models, we here study qubit-ADAPT VQE for fermionic models with two and three spinful orbitals.

An alternative approach to constructing efficient wavefunction ansätze for problems in condensed matter physics is to exploit the sparsity of the Hamiltonian. Interacting electron systems are often simulated with reduced degrees of freedom, represented, for example, by a single-band Hubbard model. This simplified model features a sparse Hamiltonian including nearest-neighbor hopping and onsite Coulomb interactions only. Motivated by the simplicity of the Trotterized circuits for dynamics simulations due to Hamiltonian sparsity, the Hamiltonian variational ansatz (HVA) has been proposed by promoting the time in Trotter circuits to independent variational parameters21. The HVA ansatz has attracted much attention and turns out to be very successful in reaching a compact state representation for sparse Hamiltonian systems including local spin models21,22,23. Here, we propose to combine the flexibility of an adaptive approach with the efficiency of the HVA by designing a “Hamiltonian commutator” (HC) operator pool that contains pairwise commutators of operators that appear in the Hamiltonian.

To obtain a realistic description of correlated quantum materials, which typically contain partially filled d-orbitals such as transition metal compounds, or f-orbitals such as rare-earth and actinide systems, it is important to go beyond the single-orbital description of a simple Hubbard model24. Intriguing physics arises from the local Hund’s coupling of electrons in different atomic orbitals. Examples are bad metallic behavior with suppressed quasiparticle coherence and orbital-selective Mott transitions or superconducting pairing, which naturally require a multi-orbital description25,26,27,28. A multi-orbital model including additional inter-orbital hoppings and Hund’s couplings will necessarily make the Hamiltonian less sparse and consequently the HVA ansatz more complicated. Nevertheless, the complexity of material simulations can be greatly reduced by quantum embedding methods which map the infinite system to coupled subsystems, typically a noninteracting effective medium and some many-body interacting impurity models24,29,30,31,32,33,34,35,36,37. These quantum embedding approaches have proven to be very effective to simulate correlated electron systems, including energies, electronic structure, magnetism, superconductivity, and spectral properties of multiple competing phases. The computational load in these approaches is shifted from the solution of a full lattice system to that of an interacting multi-orbital impurity model. Classical algorithms for solving the impurity problem, however, are not scalable, which can be more tractable with quantum computers35,38.

In this paper, we compare the VQE circuit complexity for ground state preparation of multi-orbital many-body impurity models with a fixed HVA versus a qubit-ADAPT ansatz with different operator pools. An HC operator pool compatible with HVA is proposed to allow a fair comparison between qubit-ADAPT and fixed ansatz HVA calculations. For comparison, we also include results from UCCSD and qubit-ADAPT calculations with a simplified UCCSD pool. To connect with quantum embedding methods for realistic materials simulations, we use the Gutzwiller embedding approach33,39,40,41,42,43,44 to generate the impurity models that we employ for our benchmark35,45. The quantum calculation we perform is general and could also be applied to other embedding methods. Numerical results from a noiseless state vector simulator and quantum assembly language (QASM)-based simulator with quantum sampling noise are presented. Important techniques for efficient circuit simulations of qubit-ADAPT VQE are discussed, including ways to simplify generators and reduce the operator pool size. We further investigate the impact of realistic gate noise by performing qubit-ADAPT VQE simulations with a realistic noise model including amplitude and dephasing channels. Finally, we measure the energy cost function of the converged VQE ansatz for the eg model composed of eight spin-orbitals on the IBM quantum processing unit (QPU) ibmq_casablanca and on Quantinuum hardware.

Results and discussion

Quantum embedding model

Here we focus on a specific quantum embedding method: the well-established Gutzwiller variational embedding approach for correlated material simulations33,39,40,41,42,43,44, which is known to be equivalent to rotationally invariant slave-boson theory at the saddle point approximation46,47. Recently, our group has developed a hybrid Gutzwiller quantum-classical embedding approach (GQCE)35. GQCE maps the ground state solution of a correlated electron lattice system to a coupled eigenvalue problem of a noninteracting quasiparticle Hamiltonian and one or multiple finite-size interacting embedding Hamiltonians44. Within GQCE one employs a quantum computer to find the ground state energy and the single-particle density matrix of the interacting embedding Hamiltonian, for example, using VQE.

The embedding Hamiltonian describes an impurity model consisting of a physical many-body \({N}_{{{{{{{{\mathcal{S}}}}}}}}}\)-orbital subsystem (\({\hat{{{{{{{{\mathcal{H}}}}}}}}}}_{{{{{{{{\mathcal{S}}}}}}}}}\)) coupled with a \({N}_{{{{{{{{\mathcal{B}}}}}}}}}\)-orbital quadratic bath (\({\hat{{{{{{{{\mathcal{H}}}}}}}}}}_{{{{{{{{\mathcal{B}}}}}}}}}\)):



$${\hat{{{{{{{{\mathcal{H}}}}}}}}}}_{{{{{{{{\mathcal{S}}}}}}}}}= \mathop{\sum}\limits_{\alpha \beta }\mathop{\sum}\limits_{\sigma }{\epsilon }_{\alpha \beta }{\hat{c}}_{\alpha \sigma }^{{{{\dagger}}} }{\hat{c}}_{\beta \sigma }^{{{{\dagger}}} } +\frac{1}{2}\mathop{\sum}\limits_{\alpha \beta \gamma \delta }\mathop{\sum}\limits_{\sigma {\sigma }^{{\prime} }}{V}_{\alpha \beta \gamma \delta }{\hat{c}}_{\alpha \sigma }^{{{{\dagger}}} }{\hat{c}}_{\gamma {\sigma }^{{\prime} }}^{{{{\dagger}}} }{\hat{c}}_{\delta {\sigma }^{{\prime} }}^{{{{\dagger}}} }{\hat{c}}_{\beta \sigma }^{{{{\dagger}}} },$$
$${\hat{{{{{{{{\mathcal{H}}}}}}}}}}_{{{{{{{{\mathcal{B}}}}}}}}}=-\mathop{\sum}\limits_{ab}\mathop{\sum}\limits_{\sigma }{\lambda }_{ab}{\hat{f}}_{a\sigma }^{{{{\dagger}}} }{\hat{f}}_{b\sigma }^{{{{\dagger}}} },$$
$${\hat{{{{{{{{\mathcal{H}}}}}}}}}}_{{{{{{{{\mathcal{SB}}}}}}}}}=\mathop{\sum}\limits_{a\alpha }\mathop{\sum}\limits_{\sigma }\left({{{{{{{{\mathcal{D}}}}}}}}}_{a\alpha }{\hat{c}}_{\alpha \sigma }^{{{{\dagger}}} }{\hat{f}}_{a\sigma }^{{{{\dagger}}} }+{{h.c.}}\right).$$

Here α, β, γ, δ are composite indices for sites and spatial orbitals in the physical subsystem. Likewise, the bath sites and orbitals are labeled by a, b, and σ is the spin index. The fermionic ladder operators \({\hat{c}}^{{{{\dagger}}} }\) and \({\hat{f}}^{{{{\dagger}}} }\) are used to distinguish the physical and bath orbital sites.

The one-body component and two-body Coulomb interaction in the physical subsystem are specified by matrix ϵ and tensor V. The quadratic bath and its coupling to the subsystem are defined by matrix λ and \({{{{{{{\mathcal{D}}}}}}}}\), respectively. Compared with typical quantum chemistry calculations, the embedding Hamiltonian is much sparser since the two-body interaction only exists between electrons in the physical subsystem.

For clarification, we name the above-defined embedding Hamiltonian system as (\({N}_{{{{{{{{\mathcal{S}}}}}}}}},{N}_{{{{{{{{\mathcal{B}}}}}}}}}\)) impurity model, where (\({N}_{{{{{{{{\mathcal{S}}}}}}}}},{N}_{{{{{{{{\mathcal{B}}}}}}}}}\)) is the number of spatial orbitals in the system and bath models. Within GQCE, the ground state solution of the embedding Hamiltonian at half-electron filling is needed, which is achieved by a chemical potential absorbed in the one-body Hamiltonian coefficient matrices ϵ and λ in Eq. (1).

In the numerical simulations presented here, we choose a Gutzwiller embedding Hamiltonian for the degenerate \({{{{{{{\mathcal{M}}}}}}}}\)-band Hubbard model. The noninteracting density of states of the lattice model adopts a semi-circular form \(\rho (\omega )=\frac{2{{{{{{{\mathcal{M}}}}}}}}}{\pi D}\sqrt{1-{(\omega /D)}^{2}}\) as shown in Fig. 1a, which corresponds to the Bethe lattice in infinite dimensions. In the following, we set the half band width D = 1 as the energy unit. In physical systems, D is of the order of a few eV. The Coulomb matrix V takes the Kanamori form specified by Hubbard U and Hund’s J parameters: Vαααα = U, Vααββ = U−2J, and Vαβαβ = Vαββα = J for α ≠ β. Here we have assumed spin and orbital rotational invariance (within the eg or t2g manifold) for simplicity and to limit the interaction parameter space.

Fig. 1: Model setup.
figure 1

a The noninteracting density of states (DOS) of the degenerate multi-band Hubbard-Hund lattice model on the Bethe lattice has a semicircular shape. b (\({{{{{{{\mathcal{M}}}}}}}},{{{{{{{\mathcal{M}}}}}}}}\)) site impurity model with \({{{{{{{\mathcal{M}}}}}}}}\)-fold degenerate correlated orbitals coupled with \({{{{{{{\mathcal{M}}}}}}}}\) bath orbitals. The interactions among the physical orbitals are specified by the Coulomb matrix V. Due to symmetry, each physical orbital (positioned at zero energy level) is coupled with a single bath orbital at energy level λ with a coupling parameter \({{{{{{{\mathcal{D}}}}}}}}\). The models with \({{{{{{{\mathcal{M}}}}}}}}=2\) and 3 correspond to that of eg and t2g orbitals in cubic crystal symmetry, respectively.

The embedding Hamiltonian, as illustrated in Fig. 1b, is represented with \(2{{{{{{{\mathcal{M}}}}}}}}\) spatial orbitals: \({{{{{{{\mathcal{M}}}}}}}}\) degenerate physical orbital plus \({{{{{{{\mathcal{M}}}}}}}}\) degenerate bath orbitals. The symmetry of the model reduces matrices ϵ, λ and \({{{{{{{\mathcal{D}}}}}}}}\) to single parameters proportional to identity.

In the following, we set the electron filling for the lattice model to \({{{{{{{\mathcal{M}}}}}}}}+1\), which is one unit larger than half-filling, and fix the ratio of the Hund’s to Hubbard interaction to J/U = 0.3 and U = 7. These parameters put the model deep in the correlation-induced bad metallic state, with physical properties distinct from doped Mott insulators25. It represents a wide class of strongly correlated materials, such as iron pnictides and chalcogenides, where Hund’s coupling significantly reduces the low-energy quasiparticle coherence scale26,48,49. Hund’s metal physics is far beyond a static mean-field description and requires treating the localized and itinerant characters of electrons on equal footing, which can be realized in the quantum embedding approach adopted here.

In the calculations below, we consider \({{{{{{{\mathcal{M}}}}}}}}=2\) and \({{{{{{{\mathcal{M}}}}}}}}=3\), which correspond to eg and t2g orbitals in cubic crystal symmetry, respectively. The associated \(({N}_{{{{{{{{\mathcal{S}}}}}}}}},{N}_{{{{{{{{\mathcal{B}}}}}}}}})=(2,2)\) and (3, 3) impurity models have in total 8 and 12 spin-orbitals. The two models host nontrivial many-body ground states and represent important checkpoints along the path to achieve a practical quantum advantage in correlated materials simulations through a hybrid quantum-classical embedding framework. In quantum simulations reported below, parity encoding which exploits the symmetry in a total number of electrons and spin z-component is used to transform the fermionic Hamiltonian to qubit representation.

Variational quantum eigensolvers

GQCE leverages quantum computing technologies to solve for the ground state of the embedding Hamiltonian, specifically the energy and one-particle density matrix. Note that the ground state is always prepared at half-filling for the embedding system, which is determined by the Gutzwiller embedding algorithm and is independent of the actual electron filling of the physical lattice model33,44. For this purpose, we benchmark multiple versions of VQE with fixed or adaptively generated ansatz to prepare the ground state of the above embedding Hamiltonian. We consider VQE calculations with fixed UCCSD ansatz and the associated qubit-ADAPT VQE using a simplified UCCSD operator pool. The calculations are naturally performed in the molecular orbital (MO) basis representation, where the reference Hartree–Fock (HF) state becomes a simple tensor product state and fermionic excitation operators can be naturally defined. However, using a MO representation comes at the cost of reducing the sparsity of the embedding Hamiltonian compared to the atomic orbital (AO) basis representation. To take advantage of the Hamiltonian sparsity in AO representation, we consider a generalized form of the HVA and the associated qubit-ADAPT VQE with a modified HC operator pool.

VQE algorithm

For an Nq-qubit system with Hamiltonian \(\hat{{{{{{{{\mathcal{H}}}}}}}}}\), VQE amounts to minimizing the cost function \(E({{{{{{{\boldsymbol{\theta }}}}}}}})=\langle {{\Psi }}[{{{{{{{\boldsymbol{\theta }}}}}}}}]| \,\hat{{{{{{{{\mathcal{H}}}}}}}}}\,| {{\Psi }}[{{{{{{{\boldsymbol{\theta }}}}}}}}]\rangle\) with respect to the variational parameters θ, as schematically illustrated in Fig. 2. Here, \(\left\vert {{\Psi }}[{{{{{{{\boldsymbol{\theta }}}}}}}}]\right\rangle =\hat{U}({{{{{{{\boldsymbol{\theta }}}}}}}})\left\vert {{{\Psi }}}_{0}\right\rangle\) is obtained by application of a parametrized quantum circuit \(\hat{U}({{{{{{{\boldsymbol{\theta }}}}}}}})\) onto a reference state \(\left\vert {{{\Psi }}}_{0}\right\rangle\). The cost function is evaluated on a quantum computer and the optimization is performed classically using E(θ) as input. The accuracy of VQE is therefore tied to the variational ansatz \(\left\vert {{\Psi }}[{{{{{{{\boldsymbol{\theta }}}}}}}}]\right\rangle\) and to the performance of the classical optimization, e.g., how often the cost function is called during the optimization and how well the approach converges to the global (as opposed to a local) minimum of E(θ).

Fig. 2: Schematic illustration of the variational quantum eigensolver algorithm.
figure 2

Given an initial guess for the parameter vector θ, the many-body state is prepared using parametrized circuit \(\hat{U}({{{{{{{\boldsymbol{\theta }}}}}}}})\) on the quantum computer. A set of measurements are performed on a computational basis to estimate the cost function E(θ), possibly including classical postprocessing for error mitigation. This value is subsequently passed to a classical optimizer. The parameters θ are then updated by the optimizer, which triggers a new iteration of state preparation and energy measurement. The cycle continues until E(θ) converges.

UCCSD ansatz

The UCCSD ansatz takes the following form:

$$\left\vert {{\Psi }}[{{{{{{{\boldsymbol{\theta }}}}}}}}]\right\rangle ={e}^{\hat{T}[{{{{{{{\boldsymbol{\theta }}}}}}}}]-{\hat{T}}^{{{{\dagger}}} }[{{{{{{{\boldsymbol{\theta }}}}}}}}]}\left\vert {{{\Psi }}}_{0}\right\rangle ={e}^{-i{\sum }_{j}{\theta }_{j}{f}_{j}(\{\hat{\sigma }\})}\left\vert {{{\Psi }}}_{0}\right\rangle .$$

The operator \(\hat{T}[{{{{{{{\boldsymbol{\theta }}}}}}}}]\) consists of single and double excitation operators with respect to the HF reference state \(\left\vert {{{\Psi }}}_{0}\right\rangle\):

$$\hat{T}[{{{{{{{\boldsymbol{\theta }}}}}}}}]=\mathop{\sum}\limits_{p\bar{p}}{\theta }_{p}^{\bar{p}}{\hat{c}}_{\bar{p}}^{{{{\dagger}}} }{\hat{c}}_{p}+\mathop{\sum}\limits_{p < q,\bar{p} < \bar{q}}{\theta }_{pq}^{\bar{p}\bar{q}}{\hat{c}}_{\bar{p}}^{{{{\dagger}}} }{\hat{c}}_{\bar{q}}^{{{{\dagger}}} }{\hat{c}}_{q}{\hat{c}}_{p}.$$

Here p, q and \(\bar{p},\bar{q}\) refer to the occupied and unoccupied MOs, respectively, with spin included implicitly. \({f}_{j}(\{\hat{\sigma }\})={\sum }_{k}{w}_{jk}{\hat{P}}_{k}\) is a weighted sum of Pauli strings (\({\hat{P}}_{k}\in {\{I,X,Y,Z\}}^{\otimes {N}_{q}}\)) for the qubit representation of the fermionic excitation operator associated with parameter θj. Here θj runs over the set of parameters \({\theta }_{p}^{\bar{p}}\) and \({\theta }_{pq}^{\bar{p}\bar{q}}\). For the impurity model without spin–orbit interaction, only excitation operators which conserve a respective number of electrons in the spin-up and spin-down sectors need to be considered. In practical implementation, a single-step Trotter approximation is often adopted to construct the UCCSD circuit:

$$\left\vert {{\Psi }}[{{{{{{{\boldsymbol{\theta }}}}}}}}]\right\rangle \approx \mathop{\prod}\limits_{jk}{e}^{-i{\theta }_{j}{w}_{jk}{\hat{P}}_{k}}\left\vert {{{\Psi }}}_{0}\right\rangle .$$

Furthermore, the final circuit state generally depends on the order of the unitary gates. In the calculations reported here, we apply gates with single-excitation operators first following the implementation in Qiskit50.

Qubit-ADAPT VQE with simplified UCCSD pool

VQE-UCCSD is a useful reference point for quantum chemistry calculations. However, the fixed UCCSD ansatz has limited accuracy and often involves deep quantum circuits for implementations. Various approaches have been proposed to construct a more compact variational ansatz with systematically improvable accuracy. In this work, we will focus on the qubit-ADAPT VQE method16, where the ansatz takes a similar pseudo-Trotter form:

$$\left\vert {{\Psi }}[{{{{{{{\boldsymbol{\theta }}}}}}}}]\right\rangle =\mathop{\prod }\limits_{j=1}^{{N}_{{{{{{{{\boldsymbol{\theta }}}}}}}}}}{e}^{-i{\theta }_{j}{\hat{P}}_{j}}\left\vert {{{\Psi }}}_{0}\right\rangle .$$

With qubit-ADAPT, the ansatz is recursively expanded by adding one unitary at a time, followed by reoptimization of parameters. The additional unitary is constructed with a generator selected from a predefined Pauli string pool which gives maximal energy gradient amplitude \(| g{| }_{\max }\) at the preceding ansatz state. The ansatz expansion process iterates until convergence, which is set by \(| g{| }_{\max } < 1{0}^{-4}\) here. Note that we have set the half bandwidth of the original noninteracting lattice model to D = 1, such that \(| g{| }_{\max } \sim 0.1\) meV in physical systems with D ~ 1 eV.

The computational complexity of qubit-ADAPT VQE calculations is tied to the size of the operator pool, which consists of a set of Pauli strings. Naturally, one can construct an operator pool using all the Pauli strings in the qubit representation of fermionic single and double excitation operators. However, the dimension of this UCCSD-compatible pool is usually quite big and scales as \({{{{{{{\mathcal{O}}}}}}}}({N}_{q}^{4})\). Here we propose a much-simplified operator pool, which consists of Pauli strings from single-excitation and paired double-excitation operators only. The pair excitation involves a pair of electrons with opposite spins, which are initially occupying the same spatial MO, hopping together to another initially unoccupied spatial MO. To further reduce the circuit depth, only one Pauli string is chosen from each qubit representation of the fermionic excitation operator. The qubit representation is a weighted sum of equal-length Pauli strings, and a specific choice of which one of them does not seem to be important in practical calculations reported here. This simplified pool containing operators arising from the UCC ansatz restricted to single and paired double excitation operators (sUCCSpD)51,52 greatly reduces the number of Pauli strings compared to the UCCSD pool. The dimension of this sUCCSpD pool scales as \({{{{{{{\mathcal{O}}}}}}}}({N}_{q}^{2})\). For the (2, 2) eg impurity model, the pool size reduces from 152 for UCCSD to 56 for sUCCSpD, and for the (3, 3) t2g impurity model it reduces from 828 to 192. The code to perform the above qubit-ADAPT VQE calculations at the state vector level with examples are available in figshare53.

Hamiltonian variational ansatz

The Hamiltonian sparsity in the AO basis naturally motivates the application of the Hamiltonian variational ansatz21, which generally takes the form of multi-layer Trotterized annealing-like circuits. While different ways of designing specific HVA forms have been developed, we propose the following ansatz with L layers for the impurity model:

$$\left\vert {{\Psi }}[{{{{{{{\boldsymbol{\theta }}}}}}}}]\right\rangle =\mathop{\prod }\limits_{l=1}^{L}\mathop{\prod }\limits_{j=1}^{{N}_{{{{{{{{\rm{G}}}}}}}}}}{e}^{-i{\theta }_{lj}{\hat{h}}_{j}}\left\vert {{{\Psi }}}_{0}\right\rangle .$$

Here \(\hat{{{{{{{{\mathcal{H}}}}}}}}}=\mathop{\sum }\nolimits_{j = 1}^{{N}_{{{{{{{{\rm{G}}}}}}}}}}{\hat{h}}_{j}\), with \({\hat{h}}_{j}\) being a subgroup of Hamiltonian terms which share the same coefficient and mutually commute. Such ansatz construction aims to differentiate the physical and bath orbitals while retaining the degeneracy information among the orbitals in a systematic way. For each layer of unitaries, we first apply the multi-qubit rotations that are generated by the interacting part of the Hamiltonian, since these act as entangling gates. For the (\({{{{{{{\mathcal{M}}}}}}}},{{{{{{{\mathcal{M}}}}}}}}\)) impurity model, two reference states have been tried: \(\left\vert {{{\Psi }}}_{0}^{{{{{{{{\rm{(I)}}}}}}}}}\right\rangle\) is a simple tensor product state with \({{{{{{{\mathcal{M}}}}}}}}\) physical orbitals fully occupied and the bath orbitals empty; \(\left\vert {{{\Psi }}}_{0}^{{{{{{{{\rm{(II)}}}}}}}}}\right\rangle\) is the ground state of the noninteracting part of \(\hat{{{{{{{{\mathcal{H}}}}}}}}}\), which is equivalent to the one-electron core Hamiltonian in quantum chemistry. We did not find any significant difference between the two choices of reference state in practical simulations of the impurity models. Therefore, only HVA calculations with the reference state \(\left\vert {{{\Psi }}}_{0}^{{{{{{{{\rm{(I)}}}}}}}}}\right\rangle\) are reported here. We adopt the gradient-based Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm as the classical optimizer. Proper parameter initialization for HVA optimization is crucial, as barren plateaus and local energy minima are generally present in the variational energy landscape. In practice, we find that a uniform initialization of the parameters, such as setting all to π/7, overall works well for simulations reported here.

Inspired by the idea of adaptive ansatz generation13, we also tried constructing and optimizing an L-layer HVA ansatz by adaptively adding layers from 1 to L. Specifically, the calculation starts with optimizing a single-layer ansatz, followed by appending another layer to the ansatz while keeping the first layer at previously obtained optimal angles. The two-layer ansatz is then optimized with the parameters for the new layer initialized randomly or uniformly. The procedure continues with the optimization of l-layer ansatz leveraging the (l−1)-layer solution until the ansatz reaches L layers.

Let the number of cost function evaluations for optimizing an l-layer ansatz be \({N}_{l}^{(2)}\). The total number of function evaluations amounts to \({N}^{(2)}=\mathop{\sum }\nolimits_{l = 1}^{L}{N}_{l}^{(2)}\). In practice, we find that the direct optimization of the L-layer ansatz using a uniform initialization takes N(1) function evaluations with \({N}^{(1)} \sim {N}_{L}^{(2)} < {N}^{(2)}\), and reaches the same accuracy. Starting with L layers is therefore more efficient than growing the ansatz layer by layer.

Intuitively, this can be related to the fact that successive HVA optimization introduces discontinuities in the variational path toward the ground state whenever a new layer of unitaries is added. Since the energy gradient associated with new variational parameters that are initialized to zero (for continuity) vanishes (see the “Methods” section), they have to be initialized away from zero. In other words, the (l−1)-layer HVA solution is not a good starting point for the optimization of the l-layer ansatz. The open source code to perform the above HVA calculations at the state vector level with examples are available in figshare54.

Hamiltonian commutator pool

It has been demonstrated that the qubit-ADAPT VQE in the MO basis outperforms VQE-UCCSD calculations regarding circuit complexity and numerical accuracy13,16. Motivation by this observation, we compare the corresponding qubit-ADAPT VQE with a Hamiltonian-compatible pool in AO basis and HVA calculations. Following HVA, we choose the simple tensor product state \(\left\vert {{{\Psi }}}_{0}^{{{{{{{{\rm{(I)}}}}}}}}}\right\rangle\) as the reference state. In the qubit-ADAPT step, the energy gradient criterion \({g}_{\theta }=2{{{{{\mathrm{Im}}}}}} [\langle {{\Psi }}[{{{{{{{\boldsymbol{\theta }}}}}}}}]| \,\hat{P}\hat{{{{{{{{\mathcal{H}}}}}}}}}\,| {{\Psi }}[{{{{{{{\boldsymbol{\theta }}}}}}}}]\rangle ]\) to append a new unitary generated by \(\hat{P}\) vanishes due to symmetry with Ψ[θ], if the number of Pauli-Y operators in the Pauli string \(\hat{P}\) is even13,55. This can be simply shown by the following argument. Because the impurity model in this study respects time-reversal symmetry and spin-flip (Z2) symmetry, both Hamiltonian \(\hat{{{{{{{{\mathcal{H}}}}}}}}}\) and wavefunction are real (\(\hat{{{{{{{{\mathcal{H}}}}}}}}}={\hat{{{{{{{{\mathcal{H}}}}}}}}}}^{* },{{\Psi }}[{{{{{{{\boldsymbol{\theta }}}}}}}}]={{\Psi }}{[{{{{{{{\boldsymbol{\theta }}}}}}}}]}^{* }\)). The Pauli string \(\hat{P}\) is also real (\(\hat{P}={\hat{P}}^{* }\)) if it has an even number of Pauli-Y operators. Consequently, the expectation value of \(\langle {{\Psi }}[{{{{{{{\boldsymbol{\theta }}}}}}}}]| \,\hat{P}\hat{{{{{{{{\mathcal{H}}}}}}}}}\,| {{\Psi }}[{{{{{{{\boldsymbol{\theta }}}}}}}}]\rangle\) is real and gθ vanishes if the associated generator \(\hat{P}\) has an even number of Pauli-Y operators.

By construction, the sUCCSpD pool consists of Pauli strings of an odd number of Y’s. However, the Hamiltonian of the impurity models studied here is all real. Consequently, all the Pauli strings in the qubit representation of the Hamiltonian contain an even number of Y’s, which excludes the option of directly constructing the operator pool from the Hamiltonian operators. Nevertheless, the practical usefulness of HVA implies that the Hamiltonian-like pool can be constructed by commuting the Hamiltonian terms, which we call the Hamiltonian commutator (HC) pool \({{{{{{{{\mathscr{P}}}}}}}}}_{{{{{{{{\rm{HC}}}}}}}}}\). Mathematically \({{{{{{{{\mathscr{P}}}}}}}}}_{{{{{{{{\rm{HC}}}}}}}}}\) is constructed in the following manner:

$$\begin{array}{r}{{{{{{{{\mathscr{P}}}}}}}}}_{{{{{{{{\rm{HC}}}}}}}}}=\left\{\right.\frac{1}{2i}[\hat{P},{\hat{P}}^{{\prime} }]\,\,\left\vert \right.\,\,\hat{P},{\hat{P}}^{{\prime} }\in {{{{{{{{\mathscr{P}}}}}}}}}_{{{{{{{{\rm{H}}}}}}}}}, {{{{{{{\rm{and}}}}}}}}\,{N}_{Y}([\hat{P},{\hat{P}}^{{\prime} }])(\,{{{{{{\mathrm{mod}}}}}}}\,\,\,2)=1\left\}\right.,\end{array}$$

Here \({{{{{{{{\mathscr{P}}}}}}}}}_{{{{{{{{\rm{H}}}}}}}}}\) is the set of Pauli strings \(\{\hat{{P}_{h}}\}\) present in the qubit representation of Hamiltonian \(\hat{{{{{{{{\mathcal{H}}}}}}}}}={\sum }_{h}{w}_{h}{\hat{P}}_{h}\). \({N}_{Y}(\hat{P})\) counts the number of Y operators in the Pauli string \(\hat{P}\). Therefore, the size of \({{{{{{{{\mathscr{P}}}}}}}}}_{{{{{{{{\rm{HC}}}}}}}}}\) can scale as \({N}_{{{{{{{{\rm{H}}}}}}}}}^{2}\), where NH is the total number of Hamiltonian terms. Clearly, the pool \({{{{{{{{\mathscr{P}}}}}}}}}_{{{{{{{{\rm{HC}}}}}}}}}\) should only be applied to sparse Hamiltonian systems. The dimension of the HC pool is 56 for the eg impurity model, and 192 for the t2g model.

Quantum circuit implementation

Performing a calculation on a quantum computer always needs to deal with the presence of noise. Even for ideal fault-tolerant quantum computers, quantum sampling (or shot) noise is present due to a finite number of measurements that are used to estimate expectation values. The current noisy quantum devices exhibit additional noise originating from qubit relaxation and dephasing as well as hardware imperfections when implementing unitary gate operations. In this subsection, we describe several techniques adopted in our simulations to most efficiently use the available quantum resources and stabilize the calculations against sampling noise. We discuss how to mitigate gate noise in the final subsection.

Measurement circuit reduction

The quantum circuit implementation for VQE and its adaptive version amounts to the direct measurement of the Hamiltonian as a weighted sum of Pauli string expectation values, \(\langle \hat{{{{{{{{\mathcal{H}}}}}}}}}\rangle ={\sum }_{h}{w}_{h}\langle {\hat{P}}_{h}\rangle\), with respect to parametrized circuits U[θ]. Here, \(\hat{{{{{{{{\mathcal{H}}}}}}}}}={\sum }_{h}{w}_{h}{\hat{P}}_{h}\) is the Hamiltonian in qubit representation. Because the number of shots (or repeated measurements) scales with the desired precision ϵ as \({N}_{{{{{{{{\rm{sh}}}}}}}}}\propto \frac{1}{{\epsilon }^{2}}\) due to the central limit theorem, Nsh is often huge in practical calculations. Therefore, it is desirable to group the Pauli strings into mutually commuting sets such that the number of distinct measurement circuits is reduced to a minimum. Indeed, many techniques to achieve such measurement reduction have been developed56,57,58,59,60,61. In this work, we adopt the measurement reduction strategy based on the Hamiltonian integral factorization61, which shows a favorable linear system-size scaling of the number of distinct measurement circuits and embraces a diagonal representation for the operators to be measured.

Specifically, we transform the physical subsystem Hamiltonian as follows:

$${\hat{{{{{{{{\mathcal{H}}}}}}}}}}_{{{{{{{{\mathcal{S}}}}}}}}}=\mathop{\sum}\limits_{\alpha \beta \sigma }{\widetilde{\epsilon }}_{\alpha \beta }{\hat{c}}_{\alpha \sigma }^{{{{\dagger}}} }{\hat{c}}_{\beta \sigma }+\frac{1}{2}\mathop{\sum}\limits_{\alpha \beta \gamma \delta }\mathop{\sum}\limits_{\sigma {\sigma }^{{\prime} }}{V}_{\alpha \beta \gamma \delta }{\hat{c}}_{\alpha \sigma }^{{{{\dagger}}} }{\hat{c}}_{\beta \sigma }^{{{{\dagger}}} }{\hat{c}}_{\gamma {\sigma }^{{\prime} }}^{{{{\dagger}}} }{\hat{c}}_{\delta {\sigma }^{{\prime} }},$$

with \({\widetilde{\epsilon }}_{\alpha \beta }={\epsilon }_{\alpha \beta }-\frac{1}{2}{\sum }_{\gamma }{V}_{\alpha \gamma \gamma \beta }\). A typical way to simplify the measurement of the two-body terms \({\hat{{{{{{{{\mathcal{H}}}}}}}}}}_{{{{{{{{\mathcal{S}}}}}}}}}^{(2)}\) in Eq. (11) is to perform nested matrix factorization for the Coulomb V tensor. Namely, we first rewrite \({\hat{{{{{{{{\mathcal{H}}}}}}}}}}_{{{{{{{{\mathcal{S}}}}}}}}}^{(2)}\) in the following factorized form by diagonalizing the real symmetric positive semidefinite supermatrix V(αβ),(γδ):

$${\hat{{{{{{{{\mathcal{H}}}}}}}}}}_{{{{{{{{\mathcal{S}}}}}}}}}^{(2)}=\frac{1}{2}\mathop{\sum }\limits_{l=1}^{L}\mathop{\sum}\limits_{\alpha \beta }\mathop{\sum}\limits_{\sigma }{\left({{{{{{{{\mathcal{L}}}}}}}}}_{\alpha \beta }^{(l)}{\hat{c}}_{\alpha \sigma }^{{{{\dagger}}} }{\hat{c}}_{\beta \sigma }\right)}^{2}.$$

Here l runs through the L positive eigenvalues of the supermatrix V, and the lth component of the auxiliary tensor \({{{{{{{\mathcal{L}}}}}}}}\) is obtained by multiplying the lth eigenvector with the square root of lth positive eigenvalue. Each tensor component, \({{{{{{{{\mathcal{L}}}}}}}}}^{(l)}\), which is a real symmetric matrix, is subsequently diagonalized to reach the following decomposition:

$$\mathop{\sum}\limits_{\alpha \beta \sigma }{{{{{{{{\mathcal{L}}}}}}}}}_{\alpha \beta }^{(l)}{\hat{c}}_{\alpha \sigma }^{{{{\dagger}}} }{\hat{c}}_{\beta \sigma } =\mathop{\sum }\limits_{m=1}^{{M}_{l}}{\lambda }_{m}^{(l)}\mathop{\sum}\limits_{\alpha \beta \sigma }{U}_{\alpha m}^{(l)}{U}_{\beta m}^{(l)}{\hat{c}}_{\alpha \sigma }^{{{{\dagger}}} }{\hat{c}}_{\beta \sigma } =\mathop{\sum }\limits_{m=1}^{{M}_{l}}\mathop{\sum}\limits_{\sigma }{\lambda }_{m}^{l}{\hat{n}}_{m\sigma }^{(l)}$$

Here, we have defined \({\hat{n}}_{m\sigma }^{(l)}\equiv {\sum}_{\alpha \beta }{U}_{\alpha m}^{(l)}{U}_{\beta m}^{(l)}{\hat{c}}_{\alpha \sigma }^{{{{\dagger}}} }{\hat{c}}_{\beta \sigma }\). The index m goes through the Ml nonzero eigenvalues \({\lambda }_{m}^{(l)}\) and associated eigenvectors \({U}_{m}^{(l)}\), which determines the single-particle basis transformation for the lth component. The whole embedding Hamiltonian of Eq. (1) can then be cast into the following doubly-factorized form with a unitary transformation similar to Eq. (13) for the one-body part:

$$\begin{array}{rcl}\hat{{{{{{{{\mathcal{H}}}}}}}}}&=&\mathop{\sum }\limits_{m=1}^{{M}_{0}}\mathop{\sum}\limits_{\sigma }{\epsilon }_{m}^{(0)}{\hat{n}}_{m\sigma }^{(0)}+\frac{1}{2}\mathop{\sum }\limits_{l=1}^{L}\mathop{\sum }\limits_{m=1}^{{M}_{l}}\mathop{\sum}\limits_{\sigma }{\left({\lambda }_{m}^{(l)}{\hat{n}}_{m\sigma }^{(l)}\right)}^{2},\\ \end{array}$$

which is composed of L + 1 groups characterized by unique single-particle basis transformations {U(l)}, including one from the single-electron component. This form allows efficient measurement of the Hamiltonian expectation value using \(L+1\propto {{{{{{{\mathcal{O}}}}}}}}(N)\) distinct circuits for a generic quantum chemistry problem with a single-particle basis dimension given by N.

The expectation value of \(\hat{{{{{{{{\mathcal{H}}}}}}}}}\) is obtained by measuring each group l independently in the variational state \(\left\vert {{\Psi }}[{{{{{{{\boldsymbol{\theta }}}}}}}}]\right\rangle\). The variational state is transformed to the same representation used in the lth group by applying a series of Givens rotations, \(\{{e}^{{\theta }_{\mu \nu }({\hat{c}}^{{{{\dagger}}} }_{\mu \sigma }{\hat{c}}_{\nu \sigma }-h.c.)}\}\), with the set of {θμν} determined by the single-particle transformation matrix U(l). Here μ and ν are generic indices for physical and bath orbital sites. Therefore, the number of distinct measurement circuits is Nc = L + 1. As an example, we have Nc = 4 for eg model. We refer to the “Methods” section for further details.

In practice, it is advantageous to isolate the one-body and two-body terms that contain only density operators before the double factorization procedure, because they are already in a diagonal representation. For the eg model we have carried out the double-factorization with explicit calculations in the “Methods” section and we ultimately find Nc = 3 for the eg model. This can be compared with the Hamiltonian measurement procedure using the mutual qubit-wise commuting groups: operators that commute with respect to every qubit site are placed in the same group. This commuting Pauli approach generally needs \({N}_{c}\propto {{{{{{{\mathcal{O}}}}}}}}({N}^{4})\) distinct circuits for Hamiltonian measurement. And for the eg model, it requires Nc = 5.

Noise-resilient optimization

Although classical optimization approaches such as BFGS, which rely on a computation of the energy gradient, are effective, they rely on very accurate cost function evaluations. Because of the inherent noise in quantum computing, optimization algorithms that are robust to cost function noise are highly desirable. In the noisy quantum simulations reported here, we adopt two optimization techniques that are more tolerant to noise than BFGS: sequential minimal optimization (SMO)62 and Adadelta63. Because of their similar performance in the noisy simulations, we only discuss SMO in the main text and leave the discussions of Adadelta in the “Methods” section.

SMO is the first technique we use for our noisy quantum simulations. Tailored to the qubit-ADAPT ansatz of Eq. (8) where each variational parameter is associated with a single Pauli string generator, the optimization consists of Nsw sweeps of sequential single parameter minimization of the cost function. At a specific optimization step with varying parameter θj, while keeping others fixed, the cost function has a simple form of \(a\cos (2{\theta }_{j}-b)+c\), with the optimal \({\theta }_{j}^{* }=b/2\) if a < 0 and (b + π)/2 otherwise. To determine the parameters a, b, and c, one requires knowledge of function values for at least three mesh points in the range of [−π/2, π/2). In practice, we use eight uniformly spaced mesh points to better mitigate the effect of noise in the cost function. Consequently, least square fitting is used to determine the values of a, b and c. In SMO calculations, we use the number of sweeps as the parameter to control the convergence, which we set to Nsw = 40. Alternative control parameters, such as energy and gradient, usually are required to be evaluated at higher precision, which can be challenging and introduce additional quantum computation overhead.

In this work, we perform noisy simulations with classical optimizations that include sampling noise due to a finite number of measurements or shots (Nsh) as well as both sampling and gate noise. The purpose is to investigate the performance of the qubit-ADAPT algorithm in the presence of sampling and gate noise and to separate the effects of sampling noise, which is controlled by a single parameter Nsh from the effect of gate noise. The code with the circuit implementation of qubit-ADAPT VQE with examples on QASM simulator and quantum hardware are available at figshare64.

Statevector simulations

In this section, we present numerical simulation results using a statevector simulator, which is equivalent to a fault-tolerant quantum computer with an infinite number of measurements (Nsh = ). Figure 3 shows the ground state energy calculations of the (2, 2) eg and (3, 3) t2g impurity models using VQE-HVA as well as qubit-ADAPT VQE with sUCCSpD and HC pools. The reference UCCSD energy is 0.029 higher than the exact ground state energy EGS for the eg model and 0.128 higher for the t2g model. This implies that both models are in the strong electron correlation region. For calculations of the eg model, the energy converges below 10−5 with Nθ = 20 variational parameters for VQE-HVA, Nθ = 59 for ADAPT-sUCCSpD, and Nθ = 31 for ADAPT-HC. Although the qubit-ADAPT VQE calculation on a statevector simulator is in principle deterministic, the operator selection from a predefined operator pool can introduce some randomness due to the numerical accuracy and near degeneracy of scores (i.e., the associated gradient components) for some operators. As a result, the converged Nθ can slightly change by about one between runs.

Fig. 3: Energy convergence of variational quantum eigensolver (VQE) calculations with four types of ansätze.
figure 3

Panels a, b show the energy difference between the variational and the exact ground state energy EGS as a function of a number of variational parameters Nθ. Panels c, d show the energy difference versus the number of CNOT gates Ncx. Panels a, c are for the degenerate \(({N}_{{{{{{{{\mathcal{S}}}}}}}}}=2,{N}_{{{{{{{{\mathcal{B}}}}}}}}}=2)\)eg impurity model and panels b, d correspond to the (3, 3) t2g impurity model. VQE calculations are reported with fixed Hamiltonian variational ansatz (HVA, orange dashed line) and unitary coupled cluster ansatz with single and double excitations (UCCSD, black cross) as well as with adaptive ansätze constructed from a simplified unitary coupled cluster pool with single and paired double excitation operators (sUCCSpD, black line) and a Hamiltonian commutator pool (HC, sky blue line). Here Ncx is estimated according to each multi-qubit rotation gate with a Pauli string generator P of lengthlcontributing 2(l−1) CNOT gates, which assumes a full qubit connection. The Hamiltonian parameters are ϵ = −9.8(−12.7), λ = 0.3(0.1), \({{{{{{{\mathcal{D}}}}}}}}=-0.3(-0.3)\) with the same Hubbard U = 7 for the eg (t2g) model, corresponding to the correlated bad metallic regime. The energy unit is the half-band width D of the noninteracting DOS for the multi-band lattice model (see Fig. 1).

As a simple estimation of the circuit complexity for NISQ devices, we provide the number of CNOT gates Ncx assuming full qubit connectivity, which can be realized in trapped ion systems. The converged circuit has Ncx = 288 for VQE-HVA, Ncx = 292 for ADAPT-sUCCSpD, and Ncx = 150 for ADAPT-HC. As a reference, the UCCSD ansatz has Nθ = 26 and Ncx = 1096. The HVA calculation converges with the smallest number of variational parameters, but the number of CNOT gates (Ncx) is in between that of ADAPT-HC and ADAPT-sUCCSpD because each variational parameter in HVA is associated with a generator composed of a weighted sum of Pauli strings. The ADAPT-HC calculation starts from a reference state \(\left\vert {{{\Psi }}}_{0}^{{{{{{{{\rm{(I)}}}}}}}}}\right\rangle\), a simple tensor product state on an AO basis, with energy higher than the HF reference state used by ADAPT-sUCCSpD, yet ADAPT-HC converges faster to the ground state. In fact, the initial state fidelity, defined as f ≡ 〈Ψ0 ΨGS2, is 0.19 for ADAPT-HC, compared with 0.76 for ADAPT-sUCCSpD. Therefore, the final ansatz complexity does not show a simple positive correlation with the initial state fidelity, which implies that both the Hamiltonian structure and operator pool are determining factors.

Compared with ADAPT-sUCCSpD, the advantage of ADAPT-HC becomes more prominent when applied to the t2g model. To reach energy convergence below 10−5, ADAPT-HC needs Nθ = 270 parameters and Ncx = 2052 CNOTs, while ADAPT-sUCCSpD requires as many as Nθ = 1020 parameters and Ncx = 8066 CNOTs. For reference, the UCCSD ansatz has Nθ = 117 parameters and Ncx = 9200 CNOTs. The HVA calculation is carried out with up to L = 10 layers, which amounts to Nθ = 70 and Ncx = 2420, and the energy converges close to 10−6.

We emphasize that strong electron correlation effects are present in our chosen model that lies deep in the bad metallic state48,49. This state cannot be accurately captured within a mean-field description and hence requires the application of an appreciable number of unitary gates to the reference state. Generally, the circuit depth of a variational ansatz is tied to both the complexity of the problem (i.e. the complexity of the ground state wavefunction) and the desired state fidelity. As shown in Fig. 4, when we require a state fidelity close to 99.9% or an energy error close to 0.001, which is typically necessary for practical calculations, one observes a sharp rise of Nθ when the system is tuned from the weak correlation (U < 1) to the strong correlation (U > 2) regime by increasing Hubbard U.

Fig. 4: Error and state fidelity analysis of qubit adaptive derivative-assembled pseudo-trotter (ADAPT) ansatz.
figure 4

a Log-scale contour plot of the variational energy error EEGS of the qubit-ADAPT ansatz as a function of Nθ and Hubbard U for the (2, 2) eg impurity model. Here, EGS is the exact ground state energy and E is the converged variational energy. b State infidelity 1−f = 1−〈Ψ[θ]ΨGS2 versus Nθ and U. Here, \(\left\vert {{\Psi }}[{{{{{{{\boldsymbol{\theta }}}}}}}}]\right\rangle\) is the converged ansatz state and \(\left\vert {{{\Psi }}}_{{{{{{{{\rm{GS}}}}}}}}}\right\rangle\) is the exact ground state. The color bar indicates a log scale from 10−5 to 1. At a fixed energy accuracy, we find that Nθ generally increases with U and then saturates. The same holds for infidelity. We also observe a sharp rise of Nθ at smaller U ≈ 1−3 when demanding an energy accuracy or infidelity below 10−3. This signifies the onset of correlation effects in the many-body ground state. The results are obtained from qubit-ADAPT calculations using the Hamiltonian commutator pool of the eg model, where U varies from 0.5 to 8 with 0.5 as the step size. The other model parameters can be retrieved at figshare53.

Simulations with shot noise

The ADAPT VQE calculations are often reported at the statevector level, and a systematic study including the effect of noise is not yet available13,16,65,66,67. Here we present qubit-ADAPT VQE calculations of the (2, 2)eg model including shot noise.

Figure 5 shows the representative convergence behavior of the qubit-ADAPT energy with an increasing number of variational parameters Nθ calculated using different numbers of shots per observable measurement: Fig. 5a is for Nsh = 212, and Fig. 5b is for Nsh = 216. We use SMO for the classical optimization. The adaptive ansatz energy E overall decreases as the circuit grow and more variational parameters are used. The energy uncertainty is tied to the number of shots Nsh. The energy spread roughly reduces by a factor of 4 when Nsh increases from 212 to 216, consistent with the 16-fold increase in Nsh due to the central limit theorem.

Fig. 5: Energy convergence of qubit adaptive derivative-assembled pseudo-trotter (ADAPT) simulations for eg model with shot noise.
figure 5

The difference between the exact ground state energy EGS and that of qubit-ADAPT simulations with sampling noise was obtained with a number of shots Nsh = 212 in panel (a) and Nsh = 216 in panel (b). Panel c shows the energy differences evaluated using statevector for the adaptive ansätze obtained in simulations including shot noise, with Nsh = 210 (black line), 212 (orange line), 214 (sky blue line), and 216 (bluish green line). The statevector simulation results (Nsh = , yellow line) of the qubit-ADAPT algorithm are also shown in a dashed line for reference. Hamiltonian parameters are identical to those used in Fig. 3.

The energy points shown include not only the final SMO optimized energies of the qubit-ADAPT ansatz with Nθ parameters but also the intermediate energies after each of the Nsw = 40 sweeps during SMO optimizations to provide more detailed convergence information. The above-reported Nsh is referred to as measurements for SMO optimizations. At the operator screening step of the qubit-ADAPT calculation to expand the ansatz by appending an additional optimal unitary, we fix Nsh = 216 shots for energy evaluations in all cases and determine the energy gradient by the parameter-shift rule68.

To further assess the quality of the qubit-ADAPT ansatz obtained in these QASM simulations, we plot in Fig. 5c the ansatz energies evaluated using a statevector simulator at the end of each noisy SMO optimization. The four solid curves are calculated using the variational parameters that are obtained by QASM optimizations with different numbers of shots Nsh as indicated and noiseless optimization results are shown for comparison as the dashed line. While there is no clear order of the energies during the early stages of the simulation, the final convergence is consistently improved with more shots. Specifically, the error converges close to and below 10−3 for Nsh = 214 and 216 and the fidelity f improves beyond 99.9%. The associated single-particle density matrix elements also converge to an accuracy better than 10−2.

Similar QASM simulations of qubit-ADAPT VQE have been performed using the Adadelta optimizer, as specified in the “Methods” section. Generally, we find the numerical results and the dependence on the number of shots to be comparable to SMO. Compared with SMO, Adadelta can potentially take advantage of multiple QPUs by evaluating the gradient vector in parallel.

Discussion of optimal pool size

One important factor determining the computational load of qubit-ADAPT VQE calculations is the size of the operator pool Np. One simple strategy to reduce Np is to strip off Pauli Z’s in the pool of operators because they contribute negligibly to the ground state energy as pointed out in refs. 16,66. This reduces Np of the Hamiltonian commutator (HC) pool from 56 to 16 for the eg model, and from 192 to 60 for the t2g model, due to a large degeneracy. Furthermore, some qualitative guidance has been laid out in the literature to construct a minimal complete pool (MCP) of size 2(Nq−1)16,69, where Nq is the number of qubits. Indeed, we find that an MCP can be constructed using a subset of operators in the HC pool.

We discover a dichotomy that the reduction of the pool size can potentially make the optimization of the qubit-ADAPT ansatz more challenging, especially in the presence of noise. Figure 6 compares qubit-ADAPT calculations using three different pool sizes of dimensions 56, 16, and 10, which were introduced above. Figure 6a shows the qubit-ADAPT energies with increasing Nθ from statevector simulations of the eg model using the three pools. All the simulations converge with 31 parameters and final CNOT gate numbers Ncx = 150,  98, and 62 which decrease for the smaller pools. The details of the convergence rate of the three runs differ significantly. When the pool dimension decreases, the region of Nθ with minimal energy change expands, as seen by the almost flat segments of the curves of Fig. 6a. The minimal energy gain implies that small noise in the cost function evaluation could deteriorate the parameter optimization.

Fig. 6: Pool size dependence of the energy convergence behavior for qubit adaptive derivative-assembled pseudo-trotter (ADAPT) calculations of the eg model.
figure 6

The difference between the exact ground state energy EGS and qubit-ADAPT results as a function of Nθ from a statevector simulations and b Quantum assembly language (QASM)-based simulations with Nsh = 216 shots using three different operator pools of size 56 (black line), 16 (orange line) and 10 (sky blue line), derived from the Hamiltonian commutator pool. The respective energy differences evaluated using statevector for the adaptive ansätze obtained in the noisy simulations of panel (b) are shown in panel (c).

Indeed as shown in Fig. 6b, the qubit-ADAPT energy from noisy simulation converges slower as the pool size decreases. The flat segments in the energy curves become more evident owing to the stochastic energy errors. We further analyze the quality of the qubit-ADAPT ansatz by evaluating the energy at optimal angles obtained in noisy simulations, as plotted in Fig. 6c. The energy difference is 0.001, 0.027, 0.135 at Nθ = 31 where the statevector simulation converges, and 0.0006, 0.001, 0.005 at the end of Nθ = 40 for calculations with pools of size 56, 16, and 10, respectively.

Our analysis clearly shows the strikingly distinct convergence behaviors of qubit-ADAPT calculations using different complete operator pools in the presence of sampling noise. This indicates that the optimal pool in practical calculations can be a trade-off between choosing a small pool size and guaranteeing sufficient connectivity of the operators in the pool.

Simulations with noise models

Besides the inherent sampling noise in quantum computing, NISQ hardware is subject to various other error effects. These include coherent errors due to imperfect gate operations as well as stochastic errors due to qubit decoherence, dephasing, and relaxation. Here, we perform a preliminary investigation of the impact of hardware imperfections on qubit-ADAPT VQE calculations by adopting a realistic decoherence noise model proposed by Kandala et al. in ref. 3. The model includes an amplitude-damping channel (\(\rho \to \mathop{\sum }\nolimits_{i = 1}^{2}{E}_{i}^{a}\rho {E}_{i}^{a{{{\dagger}}} }\)) and a dephasing channel (\(\rho \to \mathop{\sum }\nolimits_{i = 1}^{2}{E}_{i}^{d}\rho {E}_{i}^{d{{{\dagger}}} }\)). These act on the qubit density matrix following each single-qubit or two-qubit gate operation. The Kraus operators are given as:

$$\begin{array}{rcl}{E}_{1}^{{\rm {a}}}&=&\left(\begin{array}{rc}1&0\\ 0&\sqrt{1-{p}^{a}}\end{array}\right),{E}_{2}^{{\rm {a}}}=\left(\begin{array}{rc}0&\sqrt{{p}^{{\rm {a}}}}\\ 0&0\end{array}\right),\\ {E}_{1}^{{\rm {d}}}&=&\left(\begin{array}{rc}1&0\\ 0&\sqrt{1-{p}^{{\rm {d}}}}\end{array}\right),{E}_{2}^{{\rm {d}}}=\left(\begin{array}{rc}0&0\\ 0&\sqrt{{p}^{{\rm {d}}}}\end{array}\right).\end{array}$$

The error rates \({p}^{{\rm {a}}}=1-{e}^{-\tau /{T}_{1}}\) and \({p}^{{\rm {d}}}=1-{e}^{-2\tau /{T}_{\phi }}\) are determined by the gate time τ, the qubit relaxation time T1 and the dephasing time Tϕ = 2T1T2/(2T1T2), where T2 is the qubit coherence time. For the sake of simplicity of the analysis, we choose a uniform single-qubit gate error rate \({p}_{1}^{{\rm {a}}}={p}_{1}^{{\rm {d}}}\equiv {p}_{1}=1{0}^{-4}\), which is close to the value found in current hardware. We also assume a uniform two-qubit error rate \({p}_{2}^{{\rm {a}}}={p}_{2}^{{\rm {d}}}={p}_{2}\) that we vary between 10−4 and 10−2, in order to study the impact of two-qubit noise on the VQE optimization.

Figure 7a shows a typical qubit-ADAPT energy curve EEGS during optimization as a function of the number of variational parameters Nθ obtained in noisy simulations with p2 = 10−2, 10−3, and 10−4. Here, EGS is the exact ground state energy. The results with only single-qubit noise are also shown for reference. Figure 7b contains the associated exact energies for the ansatz states, which we obtain by evaluating the VQE ansatz on a statevector simulator.

Fig. 7: Noisy qubit adaptive derivative-assembled pseudo-trotter (ADAPT) simulations and analysis of eg model.
figure 7

a Difference between the exact ground state energy EGS and qubit-ADAPT noisy simulation results with a uniform two-qubit gate error rate p2 = 10−1 (black line), 10−3 (orange line), 10−4 (sky blue line) and 0 (bluish green line). We use a uniform single-qubit error rate p1 = 10−4 and Nsh = 216 shots per measurement circuit. b Energy differences evaluated using statevector for the adaptive ansätze obtained in the noisy simulations. The noisy simulations are performed with the Hamiltonian commutator pool of size 56.

For p2 = 10−2, which represents the current hardware noise level, the noisy energy increases with Nθ, indicating that the error rate is too large to get reliable energy estimation. Nevertheless, as shown in the corresponding statevector analysis in Fig. 7b, one still observes a sizable energy reduction in the early stage of the optimization. The evaluated ansatz state fidelity is found to improve from 0.19 in the initial state to about 0.70 with 4 < Nθ < 9. When further increasing Nθ, however, the statevector ansatz energy shows an upward trend due to noise accumulation, signifying a failure of the noisy optimization. For a smaller error rate, p2 = 10−3, which was demonstrated recently with the IBM Falcon device70, the noisy energy initially decreases and reaches a minimum near Nθ = 7. This is again followed by an upturn as the number of variational parameters Nθ grows. On the other hand, the corresponding statevector analysis shows a clear continuous energy improvement up to Nθ = 25, followed by saturation with small fluctuations. We find the ansatz state fidelity saturates near 0.97. Similar observations apply to the noisy simulations with other two-qubit error rates. The statevector analysis shows that the energy converges at an error ≈ 3 × 10−3 with a fidelity ≈ 0.997 for p2 = 10−4. When including only single-qubit errors, we find an error ≈ 1 × 10−3 with a fidelity ≈0.9992.

The observed improvement of the ansatz (revealed using statevector analysis), even though the noisy energy expectation value increases, is intriguing. This effect is most clearly seen in results for p = 10−3 between 7 ≤ Nθ < 25. It demonstrates the robustness of VQE to certain types of noise effects and can be rationalized as follows. Assuming for simplicity a global depolarizing error channel, we can relate the expectation value of an observable \(\bar{\langle O\rangle }\) with respect to a noisy density matrix to the noiseless result 〈O〉 as \(\bar{\langle O\rangle }=(1-p)\langle O\rangle +\frac{p}{{2}^{n}}{\rm {Tr}}[O]\)71,72. Since any observable can be shifted to be traceless (Tr[O] = 0), 〈O〉 is equivalent to \(\bar{\langle O\rangle }\) up to a constant scaling factor. The noise thus only rescales the energy landscape of the variational ansatz and maintains the optimal parameters. The fact that we find the ansatz energy to saturate in the statevector analysis with finite p2 is caused by our choice of noise model, which includes noise effects beyond a global depolarizing channel. This observation of state improvement during optimization masked by noisy energy expectation values suggests that with reasonably small error rates, expensive error mitigation techniques may be restricted to the final converged state at the end of VQE calculations to ensure accurate observable measurements.

Estimating ground state energy on NISQ devices

As a further step to benchmark the realistic noise effect on qubit-ADAPT VQE calculations of the multi-orbital quantum impurity models, we measure the Hamiltonian expectation value of the eg model with a converged qubit-ADAPT ansatz on the IBM quantum device ibmq_casablanca. The ansatz with optimal parameters is obtained with the HC pool using statevector simulations. The converged qubit-ADAPT ansatz used for the ground state energy estimate has 32 parameters, and the associated 32 generators for multi-qubit unitary gates are listed in the “Methods” section.

To reduce the noise in the cost function measurement, it is essential to utilize a range of error mitigation techniques. We employ the standard readout error mitigation using the full confusion matrix approach, as implemented in Qiskit50. The adopted measurement circuits based on Hamiltonian integral factorization also allow convenient symmetry detection and filtering with respect to how well the ansatz preserves the total electron number Ne = 4 and total spin z-projection Sz = 0. The gate error is mitigated using zero noise extrapolation (ZNE) with Richardson second-order polynomial inference73,74. The noise scale factor increases from 1 to 2 and 3 for each measurement circuit by local random unitary folding following the implementation in Mitiq75,76. Because of the random gate folding and the stochastic SWAP mapping during transpilation to native gates50, we perform ten runs for each measurement circuit at each noise level to smooth out the nondeterministic effects with averaging. For each run, we apply Nsh = 214 shots for the measurements.

Figure 8a shows the Richardson extrapolation for the ground state energy with measured points at noise scale factors λ = 1, 2, 3, taking all 10 runs for each λ into account. The estimated energy has an absolute error Δ(E) = 0.6 ± 1.4 compared with the exact result indicated by the horizontal dashed line. This corresponds to a relative error of 3%. The standard deviation is obtained by fitting the sample points with a second-order polynomial using the SciPy function curve_fit which takes both the mean values and standard deviations into account77. In the postprocessing for the mean value of the energy cost function from statistical samplings, we first apply readout calibration, followed by symmetry filtering which discards the configurations with total electron number Ne ≠ 4 or total spin Sz ≠ 0. We observe that the ten runs can be divided into two groups based on the average Ne and Sz evaluated before symmetry filtering, as shown in Fig. 8c and d. A subgroup of five runs denoted by square symbols has much less bias away from the correct conserved quantum numbers Ne = 4 and Sz = 0 than the other five runs shown as circles. A more accurate ground state energy can be obtained when restricting to this optimal subgroup, as shown in Fig. 8b. The estimated energy error reduces significantly to Δ(E) = 0.1 ± 0.2, with a relative error of 0.7%.

Fig. 8: Estimating ground state energy of the eg model on IBM device ibmq_casablanca.
figure 8

Richardson energy extrapolation is applied by a quadratic curve fitting for three data points of increasing noise scale with averages over 10 runs in (a) and an optimal subset of 5 runs in (b). Distinct but equivalent hardware native circuits are associated with each run owing to the nondeterministic nature of local random unitary folding and transpilation. The average number of electrons Ne and total spin z-component Sz for each of the 10 runs in terms of their deviations from ideal values are plotted in (c) and (d), respectively. The optimal subset of five runs is identified by smaller symmetry violations Ne−4 < 0.2 and Sz < 0.1. The inset a shows the qubit layout of ibmq_casablanca. The dark numbered circles represent the qubits adopted in the calculation with that particular order. Inset in b: the energy error Δ(E) = EEGS in log scale. The error bar denotes the standard deviation of the sample mean.

In the above calculations on QPU, the circuits are transpiled into the basis gates of ibmq_casablanca device using the qubit layout and coupling map illustrated in the inset of Fig. 8a. Due to the limited qubit connectivity between nearest neighbors, each of the three transpiled measurement circuits for the eg model contains about 350 CNOT gates, which amounts to over two-fold increase compared to about 150 CNOTs without qubit swapping. Therefore, we also benchmark the calculations on other types of QPUs with full qubit connectivity such as trapped-ion devices. As an initial reference, we perform an energy estimation with the same ansatz on Quantinuum’s trapped-ion Honeywell System Model H1-2. The transpiled circuits have about 150 two-qubit ZZMax gates as expected. Due to limited access to the device, we apply only Nsh = 450 shots per circuit for the measurements without utilizing any error mitigation. The energy thus obtained is −17.6 ± 2, which should be compared with data points in Fig. 8a at a scale factor 1, and is found to be located near the lower end of that range. Here the error bar is estimated using multiple runs of simulations with the associated system Model H1-2 emulator (H1-2E) including a realistic noise model.


In an effort towards performing hybrid quantum-classical simulations of realistic correlated materials using a quantum embedding approach29,30,31,32,33,34,35,36,37, we assess the gate depth and accuracy of variational ground state preparation with fixed and adaptive ansätze for two representative interacting multi-orbital, eg and t2g, impurity models. To take advantage of the sparsity of the Hamiltonian in the atomic orbital representation in real space, we consider the HVA ansatz and an adaptive variant in the qubit-encoded atomic orbital basis. An HC pool composed of pairwise commutators of the Hamiltonian terms is developed to allow fair comparison between the qubit-ADAPT and HVA ansatz. For reference, the standard UCCSD and related qubit-ADAPT calculations using UCCSD-compatible pools are also presented. The qubit-ADAPT calculation with an HC pool generally produces the most compact circuit representation with a minimal number of CNOTs in the final converged circuit. The fixed HVA ansatz follows very closely and has the additional advantage of requiring the least variational parameters Nθ.

To address the effect of quantum shot noise, we report QASM simulations of qubit-ADAPT VQE in the presence of shot noise for different numbers of shots (Nsh) that allow controlling the stochastic error. For our benchmark, we adopt state-of-the-art techniques such as low-rank tensor factorization to reduce the number of distinct measurement circuits and a noise resilient optimization including sequential minimal optimization and Adadelta. We find a modest number of shots Nsh = 214 per measurement circuit can lead to a variational representation of the ground state with fidelity f > 99.9%.

We further discuss ways to simplify the pool operators and reduce the pool size using eg model as an example. It is pointed out that a minimal complete pool, as defined in refs. 16,69, can be constructed using a subset of the HC pool. While a simplified pool can reduce the quantum computation resource in the adaptive operator screening procedure, it can make classical optimization more complicated, especially in the presence of noise. This suggests both the dimension and connectivity of operators are joint determining factors to design a practically optimal pool.

To assess the effects of realistic noise on VQE calculations of multi-orbital impurity models, we perform qubit-ADAPT VQE calculations with a realistic decoherence noise model that includes amplitude and dephasing error channels. We find the impact of two-qubit errors to dominate over those of single-qubit errors, also since they are larger in NISQ hardware. We report that practically useful results can be obtained for p2 = 10−3, which is close to current hardware levels. Importantly, we observe that the classical optimization continues to improve the ansatz even in a regime, where the noisy energy expectation value starts to rise. We reveal this behavior by executing the ansatz state on statevector simulators. Such persisting ansatz state improvement masked by noise shows that VQE is robust to certain noise effects and implies that costly error mitigation methods can potentially be reserved for the evaluation of expectation values in the final converged state.

Finally, we measure the energy for a converged qubit-ADAPT ansatz of the eg model on the ibmq_casablanca QPU and Quantinuum’s H1-2 device. Using the results from IBM hardware, we obtain an error of 0.1 (0.7%) for the total energy by adopting error mitigation techniques such as zero-noise extrapolation, combined with a careful post-selection based on symmetry and the conservation of quantum numbers.

Moving forward, the full qubit-ADAPT VQE calculations of quantum impurity models will be extended from noisy QASM simulations to simulations that include device-specific noise effects beyond our decoherence model and finally to experiments on real hardware. Our study shows that an array of error mitigation techniques, including readout calibration, zero-noise extrapolation73,74, and potentially probabilistic error cancellation74,78,79, Clifford data regression80,81, and probabilistic machine-learning-based techniques82, need to be adopted to reach sufficiently accurate results. This is especially important when using VQE as an impurity solver in a quantum embedding approach as sufficiently accurate impurity model results are needed in order to enable the convergence of the classical self-consistency loop. Our results constitute an important step forward in demonstrating high-fidelity ground state preparation of impurity models on quantum devices. This is essential for realizing correlated material simulations through hybrid quantum-classical embedding approaches, where the ground state preparation of a generic f-electron impurity model consisting of 28 spin-orbital is on the verge of achieving practical quantum advantage35.


Energy gradient of HVA

Here we show that the outermost lth layer gradient component vanishes (\({\left.\frac{\partial {{{{{{{\mathcal{E}}}}}}}}({{{{{{{\boldsymbol{\theta }}}}}}}})}{\partial {\theta }_{lj}}\right\vert }_{{{{{{{{{\boldsymbol{\theta }}}}}}}}}_{l} = 0}=0\)) for an l-layer HVA ansatz \(\left\vert {{{\Psi }}}_{l}[{{{{{{{\boldsymbol{\theta }}}}}}}}]\right\rangle ={{{\Pi }}}_{j = 1}^{{N}_{{{{{{{{\rm{G}}}}}}}}}}{e}^{-i{\theta }_{lj}{\hat{h}}_{j}}\left\vert {{{\Psi }}}_{l-1}[{{{{{{{\boldsymbol{\theta }}}}}}}}]\right\rangle\):

$${\left.\frac{\partial {{{{{{{\mathcal{E}}}}}}}}({{{{{{{\boldsymbol{\theta }}}}}}}})}{\partial {\theta }_{lj}}\right\vert }_{{{{{{{{{\boldsymbol{\theta }}}}}}}}}_{l} = 0} ={\left.\frac{\partial \langle {{{\Psi }}}_{l}[{{{{{{{\boldsymbol{\theta }}}}}}}}]| \hat{{{{{{{{\mathcal{H}}}}}}}}}| {{{\Psi }}}_{l}[{{{{{{{\boldsymbol{\theta }}}}}}}}]\rangle }{\partial {\theta }_{lj}}\right\vert }_{{{{{{{{{\boldsymbol{\theta }}}}}}}}}_{l} = 0}\\ =-i\langle {{{\Psi }}}_{l-1}[{{{{{{{\boldsymbol{\theta }}}}}}}}]| \,\hat{{{{{{{{\mathcal{H}}}}}}}}}{\hat{h}}_{j}\,| {{{\Psi }}}_{l-1}[{{{{{{{\boldsymbol{\theta }}}}}}}}]\rangle +c.c.$$

Because the system Hamiltonian \(\hat{{{{{{{{\mathcal{H}}}}}}}}}\) under study is real due to time-reversal symmetry, HVA is also real by construction. Therefore, \(\langle {{{\Psi }}}_{l-1}[{{{{{{{\boldsymbol{\theta }}}}}}}}]| \,\hat{{{{{{{{\mathcal{H}}}}}}}}}{\hat{h}}_{j}\,| {{{\Psi }}}_{l-1}[{{{{{{{\boldsymbol{\theta }}}}}}}}]\rangle\) is real, and \({\left.\frac{\partial {{{{{{{\mathcal{E}}}}}}}}({{{{{{{\boldsymbol{\theta }}}}}}}})}{\partial {\theta }_{lj}}\right\vert }_{{{{{{{{{\boldsymbol{\theta }}}}}}}}}_{l} = 0}\) vanishes. Note that the exactly same reason motivates the development of the HC pool for qubit-ADAPT calculations.

Hamiltonian factorization of the impurity model

Here we explain explicitly how the Hamiltonian factorization is obtained using the eg model as an example, whose Hamiltonian takes the following specific form:

$$\hat{{{{{{{{\mathcal{H}}}}}}}}}=D\mathop{\sum }\limits_{i=1}^{2}\mathop{\sum}\limits_{\sigma }\left({\hat{c}}_{i\sigma }^{{{{\dagger}}} }{\hat{\,f}}_{i\sigma }+h.c.\right)$$
$$+J/2{\left({\hat{c}}_{1\uparrow }^{{{{\dagger}}} }{\hat{c}}_{2\uparrow }+{\hat{c}}_{1\downarrow }^{{{{\dagger}}} }{\hat{c}}_{2\downarrow }+h.c.\right)}^{2}$$
$$+U\mathop{\sum }\limits_{i=1}^{2}{\hat{n}}_{i\uparrow }{\hat{n}}_{i\downarrow }+(U-2J)\mathop{\sum}\limits_{\sigma {\sigma }^{{\prime} }}{\hat{n}}_{1\sigma }{\hat{n}}_{2{\sigma }^{{\prime} }}$$
$$+\widetilde{\epsilon }\mathop{\sum }\limits_{i=1}^{2}\mathop{\sum}\limits_{\sigma }{\hat{n}}_{i\sigma }+\lambda \mathop{\sum }\limits_{i=1}^{2}\mathop{\sum}\limits_{\sigma }{\hat{n}}_{i\sigma }^{f}.$$

Here \({\hat{n}}_{i\sigma }={\hat{c}}_{i\sigma }^{{{{\dagger}}} }{\hat{c}}_{i\sigma }\) and \({\hat{n}}_{i\sigma }^{f}={\hat{f}}_{i\sigma }^{{{{\dagger}}} }{\hat{\,f}}_{i\sigma }\) are the electron occupation number operators for the physical and bath orbitals, respectively. The factorization procedure is only needed for the single-particle hybridization term (17) and the pair hopping and spin-flip terms (18), as the rest are already in the diagonal representation.

The hybridization term (17) can be written in a diagonal form through single-particle rotations on the physical and bath orbitals as follows:

$$\mathop{\sum }\limits_{i=1}^{2}\left({\hat{c}}_{i\sigma }^{{{{\dagger}}} }{\hat{\,f}}_{i\sigma }+h.c.\right)=-{\hat{n}}_{1\sigma }^{(0)}-{\hat{n}}_{2\sigma }^{(0)}+{\hat{n}}_{3\sigma }^{(0)}+{\hat{n}}_{4\sigma }^{(0)},$$

where \({\hat{n}}_{m\sigma }^{(0)}={\hat{c}}_{m\sigma }^{{{{\dagger}}} (0)}{\hat{c}}_{m\sigma }^{(0)}\) and the rotated fermionic operators \({\hat{c}}_{m\sigma }^{(0)}\) are given by,

$$\begin{array}{rcl}{\hat{c}}_{1\sigma }^{(0)}&=&\frac{1}{\sqrt{2}}({\hat{c}}_{1\sigma }+{\hat{f}}_{1\sigma }),{\hat{c}}_{2\sigma }^{(0)}=\frac{1}{\sqrt{2}}({\hat{c}}_{2\sigma }+{\hat{f}}_{2\sigma }),\\ {\hat{c}}_{3\sigma }^{(0)}&=&\frac{1}{\sqrt{2}}({\hat{c}}_{1\sigma }-{\hat{f}}_{1\sigma }),{\hat{c}}_{4\sigma }^{(0)}=\frac{1}{\sqrt{2}}({\hat{c}}_{2\sigma }-{\hat{f}}_{2\sigma }).\end{array}$$

This can be derived conveniently in the matrix formulation:

$$\begin{array}{rcl}&&\mathop{\sum }\limits_{i=1}^{2}\left({\hat{c}}_{i\sigma }^{{{{\dagger}}} }\,{\hat{f}}_{i\sigma }+h.c.\right)\\ &&=\left(\begin{array}{llll}{\hat{c}}_{1\sigma }^{{{{\dagger}}} }&{\hat{c}}_{2\sigma }^{{{{\dagger}}} }&{\hat{f}}_{1\sigma }^{{{{\dagger}}} }&{\hat{f}}_{2\sigma }^{{{{\dagger}}} }\end{array}\right)\left(\begin{array}{llll}0&0&1&0\\ 0&0&0&1\\ 1&0&0&0\\ 0&1&0&0\end{array}\right)\left(\begin{array}{l}{\hat{c}}_{1\sigma }\\ {\hat{c}}_{2\sigma }\\ {\hat{f}}_{1\sigma }\\ {\hat{f}}_{2\sigma }\end{array}\right)\\ &&=\left(\begin{array}{llll}{\hat{c}}_{1\sigma }^{{{{\dagger}}} }&{\hat{c}}_{2\sigma }^{{{{\dagger}}} }&{\hat{f}}_{1\sigma }^{{{{\dagger}}} }&{\hat{f}}_{2\sigma }^{{{{\dagger}}} }\end{array}\right)\left(\begin{array}{llll}\frac{1}{\sqrt{2}}&0&\frac{1}{\sqrt{2}}&0\\ 0&\frac{1}{\sqrt{2}}&0&\frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}}&0&-\frac{1}{\sqrt{2}}&0\\ 0&\frac{1}{\sqrt{2}}&0&-\frac{1}{\sqrt{2}}\end{array}\right)\\ &&\times \left(\begin{array}{llll}-1&0&0&0\\ 0&-1&0&0\\ 0&0&1&0\\ 0&0&0&1\end{array}\right)\left(\begin{array}{llll}\frac{1}{\sqrt{2}}&0&\frac{1}{\sqrt{2}}&0\\ 0&\frac{1}{\sqrt{2}}&0&\frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}}&0&-\frac{1}{\sqrt{2}}&0\\ 0&\frac{1}{\sqrt{2}}&0&-\frac{1}{\sqrt{2}}\end{array}\right)\left(\begin{array}{l}{\hat{c}}_{1\sigma }\\ {\hat{c}}_{2\sigma }\\ {\hat{f}}_{1\sigma }\\ {\hat{f}}_{2\sigma }\end{array}\right)\\ &&=\left(\begin{array}{llll}{\hat{c}}_{1\sigma }^{{{{\dagger}}} (0)}&{\hat{c}}_{2\sigma }^{{{{\dagger}}} (0)}&{\hat{c}}_{3\sigma }^{{{{\dagger}}} (0)}&{\hat{c}}_{4\sigma }^{{{{\dagger}}} (0)}\end{array}\right)\left(\begin{array}{llll}-1&0&0&0\\ 0&-1&0&0\\ 0&0&1&0\\ 0&0&0&1\end{array}\right)\left(\begin{array}{l}{\hat{c}}_{1\sigma }^{(0)}\\ {\hat{c}}_{2\sigma }^{(0)}\\ {\hat{c}}_{3\sigma }^{(0)}\\ {\hat{c}}_{4\sigma }^{(0)}\end{array}\right).\end{array}$$

The pair hopping and spin-flip terms of the second line of Eq. (18) can be rewritten as:

$$J/2{\left(\left(\begin{array}{llll}{\hat{c}}_{1\uparrow }^{{{{\dagger}}} }&{\hat{c}}_{1\downarrow }^{{{{\dagger}}} }&{\hat{c}}_{2\uparrow }^{{{{\dagger}}} }&{\hat{c}}_{2\downarrow }^{{{{\dagger}}} }\end{array}\right){{{{{{{{\mathcal{L}}}}}}}}}^{(1)}\left(\begin{array}{l}{\hat{c}}_{1\uparrow }\\ {\hat{c}}_{1\downarrow }\\ {\hat{c}}_{2\uparrow }\\ {\hat{c}}_{2\downarrow }\end{array}\right)\right)}^{2}.$$


$${{{{{{{{\mathcal{L}}}}}}}}}^{(1)}=\left(\begin{array}{llll}0&0&1&0\\ 0&0&0&1\\ 1&0&0&0\\ 0&1&0&0\end{array}\right).$$

The above expression is obtained by diagonalizing the Coulomb supermatrix of V(αβ),(γδ) with density-density elements set to zero, V(αα),(γγ) ≡ 0, which gives a single eigenvector associated with nonzero eigenvalue. Following the similar derivation in Eq. (23), the pair hooping and spin-flip terms have the following diagonal representation:

$$J/2{\left(-{\hat{n}}_{1\uparrow }^{(1)}-{\hat{n}}_{1\downarrow }^{(1)}+{\hat{n}}_{2\uparrow }^{(1)}+{\hat{n}}_{2\downarrow }^{(1)}\right)}^{2},$$

with \({\hat{n}}_{m\sigma }^{(1)}={\hat{c}}_{m\sigma }^{{{{\dagger}}} (1)}{\hat{c}}_{m\sigma }^{(1)}\) and

$${\hat{c}}_{1\sigma }^{(1)}=\frac{1}{\sqrt{2}}({\hat{c}}_{1\sigma }+{\hat{c}}_{2\sigma }),{\hat{c}}_{2\sigma }^{(1)}=\frac{1}{\sqrt{2}}({\hat{c}}_{1\sigma }-{\hat{c}}_{2\sigma }).$$

Finally, we can represent the embedding Hamiltonian for eg model in the following doubly-factorized form:

$$\hat{{{{{{{{\mathcal{H}}}}}}}}}= \, D\mathop{\sum}\limits_{\sigma }\left(-{\hat{n}}_{1\sigma }^{(0)}-{\hat{n}}_{2\sigma }^{(0)}+{\hat{n}}_{3\sigma }^{(0)}+{\hat{n}}_{4\sigma }^{(0)}\right)\\ + J/2{\left(-{\hat{n}}_{1\uparrow }^{(1)}-{\hat{n}}_{1\downarrow }^{(1)}+{\hat{n}}_{2\uparrow }^{(1)}+{\hat{n}}_{2\downarrow }^{(1)}\right)}^{2}\\ +U\mathop{\sum }\limits_{i=1}^{2}{\hat{n}}_{i\uparrow }{\hat{n}}_{i\downarrow }+(U-2J)\mathop{\sum}\limits_{\sigma {\sigma }^{{\prime} }}{\hat{n}}_{1\sigma }{\hat{n}}_{2{\sigma }^{{\prime} }}\\ +\widetilde{\epsilon }\mathop{\sum }\limits_{i=1}^{2}\mathop{\sum}\limits_{\sigma }{\hat{n}}_{i\sigma }+\lambda \mathop{\sum }\limits_{i=1}^{2}\mathop{\sum}\limits_{\sigma }{\hat{n}}_{i\sigma }^{f}.$$

With the Hamiltonian integral factorization we find that three distinct measurement circuits are needed for the Hamiltonian expectation value: (i) the diagonal terms in the original atomic orbital basis, (ii) the hybridization terms in the basis of \({c}_{m\sigma }^{(0)}\)(22), (iii) the pair hopping and spin-flip terms in the basis of \({c}_{m\sigma }^{(1)}\)(27).

Quantum simulation with Adadelta optimizer

In the main text, we reported the qubit-ADAPT VQE calculation with shots using the SMO optimizer. Here we additionally perform the calculations using the Adadelta optimization method, which is potentially tolerant to cost function errors63. Below we describe the implementation of the algorithm followed by the results.

The algorithm minimizes the cost function along the steepest decent direction in parameter space, with a parameter update at step t as θt = θt−1wtgt. The gradient vector is determined from the derivative of the energy function along every parameter direction gt = θE(θt), where \(E({{{{{{{\boldsymbol{\theta }}}}}}}})=\langle {{\Psi }}[{{{{{{{\boldsymbol{\theta }}}}}}}}]| \,\hat{{{{{{{{\mathcal{H}}}}}}}}}\,| {{\Psi }}[{{{{{{{\boldsymbol{\theta }}}}}}}}]\rangle\) is the estimated energy. The set of parameter-dependent adaptive learning rates is determined as \({{{{{{{{\bf{w}}}}}}}}}_{t}=\frac{\sqrt{{{\Delta }}{{{{{{{{\boldsymbol{\theta }}}}}}}}}_{t-1}+\epsilon }}{\sqrt{{{{{{{{{\bf{s}}}}}}}}}_{t}+\epsilon }}\), where the leaked average of the square of rescaled gradients at the previous step is obtained as \({{\Delta }}{{{{{{{{\boldsymbol{\theta }}}}}}}}}_{t-1}=\beta {{\Delta }}{{{{{{{{\boldsymbol{\theta }}}}}}}}}_{t-2}+(1-\beta ){({{{{{{{{\bf{w}}}}}}}}}_{t-1}\odot {{{{{{{{\bf{g}}}}}}}}}_{t-1})}^{2}\), and that of gradients is evaluated as \({{{{{{{{\bf{s}}}}}}}}}_{t}=\beta {{{{{{{{\bf{s}}}}}}}}}_{t-1}+(1-\beta ){{{{{{{{\bf{g}}}}}}}}}_{t}^{2}\). The operator  denotes element-wise product. The Adadelta algorithm involves a hyperparameter ϵ to regularize the ratio in determining wt, which is set to 10−8, and a mixing parameter set to β = 0.9. The leaked averages are all initialized to zero. We fix the number of steps in Adadelta optimization to Ns = 250 in our simulations. Considering that the evaluation of one gradient component associated with a variational parameter involves cost function measurements at two distinct parameter points following the parameter-shift rule, the quantum computational resource for Adadelta optimization is comparable to SMO with Nsw = 60.

Figure 9 shows the representative convergence behavior of qubit-ADAPT energy with an increasing number of variational parameters Nθ calculated using a number of shots Nsh = 216 per observable. The adaptive ansatz energy E decreases as the circuit depth increases with more variational parameters. The energy points shown include not only the final Adadelta optimized energies of the qubit-ADAPT ansatz with Nθ parameters but also intermediate energies for the 250 Adadelta steps to provide a detailed view of the convergence. For the operator screening step of the qubit-ADAPT calculation we fix Nsh = 216 for energy evaluations in all cases, and determine the energy gradient by the parameter-shift rule68. The final energy error from the calculations with Adadelta is EEGS = 4.4 × 10−3. This is comparable with the result from the SMO optimizer.

Fig. 9: Energy convergence of qubit adaptive derivative-assembled pseudo-trotter (ADAPT) noisy simulations of eg model with Adadelta optimizer.
figure 9

The difference between the exact ground state energy EGS and qubit-ADAPT noisy simulation results, obtained with number of shots Nsh = 216 in panel (a). The energy differences evaluated using statevector for the adaptive ansätze obtained in the noisy simulations with Nsh = 216 (orange line) are shown in panel (b). The statevector simulation results (Nsh = ) of the qubit-ADAPT method are also shown in a blue dashed line for reference.

The ground state ansatz of (2, 2) e g model used on ibmq_casablanca

The qubit-ADAPT ansatz takes the pseudo-Trotter form. The converged ansatz for the eg model which we used for the calculations on IBM quantum hardware ibmq_casablanca is composed of 32 generators for the multi-qubit unitary gates, which are listed here with parity encoding (in the order that they appear in the ansatz):