Variational Quantum Eigensolver with Reduced Circuit Complexity

The variational quantum eigensolver (VQE) is one of the most promising algorithms to find eigenvalues and eigenvectors of a given Hamiltonian on noisy intermediate-scale quantum (NISQ) devices. A particular application is to obtain ground or excited states of molecules. The practical realization is currently limited by the complexity of quantum circuits. Here we present a novel approach to reduce quantum circuit complexity in VQE for electronic structure calculations. Our algorithm, called ClusterVQE, splits the initial qubit space into subspaces (qubit clusters) which are further distributed on individual (shallower) quantum circuits. The clusters are obtained based on quantum mutual information reflecting maximal entanglement between qubits, whereas entanglement between different clusters is taken into account via a new"dressed"Hamiltonian. ClusterVQE therefore allows exact simulation of the problem by using fewer qubits and shallower circuit depth compared to standard VQE at the cost of additional classical resources. In addition, a new gradient measurement method without using an ancillary qubit is also developed in this work. Proof-of-principle demonstrations are presented for several molecular systems based on quantum simulators as well as an IBM quantum device with accompanying error mitigation. The efficiency of the new algorithm is comparable to or even improved over qubit-ADAPT-VQE and iterative Qubit Coupled Cluster (iQCC), state-of-the-art circuit-efficient VQE methods to obtain variational ground state energies of molecules on NISQ hardware. Above all, the new ClusterVQE algorithm simultaneously reduces the number of qubits and circuit depth, making it a potential leader for quantum chemistry simulations on NISQ devices.


I. INTRODUCTION
One of the major goals of computational chemistry is the development of methods and algorithms for the calculation of the molecular electronic ground and excited state energies and corresponding wave functions from first-principles. Such eigenvalues and eigenvectors can be obtained from the solution of the time-independent electronic Schrödinger equation. However, since the invention of classical digital computers in the early 1940s, the exact numerical solution of this central quantum mechanical equation remains infeasible for systems having more than 12 electrons distributed on 184 spin-orbitals [1]. The reason is that the solution space in this approximation (i.e., the Fock space) grows factorially with the system size (e.g., the number of electrons and basis functions). One of the most promising and immediate applications of quantum computers is solving classically intractable quantum chemistry problems [2,3].
The quantum phase estimation (QPE) algorithm [4] represents the natural translation of the full configuration interaction (FCI) procedure to quantum computers [5]. However, QPE requires millions of qubits and quantum gates even for relatively small systems, making the * zhy@lanl.gov algorithm unsuitable for practical applications on nearterm noisy intermediate-scale quantum (NISQ) devices [6]. Full quantum eigensolver (FQE), in which the complexity of basic gate operators is polylogarithmic in the number of spin-orbitals, represents another fully quantum algorithm to simulate molecular systems [7]. However, FQE might not be suitable for advantageous quantum chemistry demonstrations due to the limitations of NISQ devices. Indeed, leading digital quantum computers based on superconducting qubit technology have limited coherence times (∼ 100 µs) and gate error rates (∼ 2 ×10 −2 for a two-qubit gate), limiting the possible number of operations that can be executed to evolve the quantum state. Under these conditions, leveraging classical resources as much as possible through a hybrid quantumclassical approach seems to be the most promising route toward achieving quantum chemical advantage on NISQ devices [8][9][10].
The variational quantum eigensolver (VQE) is one of the leading hybrid quantum-classical algorithms developed specifically to simulate molecular systems in their ground [11][12][13][14] and ultimately excited states [15][16][17][18][19][20]. In a typical VQE setup, a variational parameterized ansatz is used to represent the trial wave function prepared with a quantum circuit and the expectation value of the qubit Hamiltonian is measured. Then, the parameters of the ansatz are iteratively optimized on a classical computer using the Rayleigh-Ritz variational principle. Although VQE simulations of small molecules have been performed on various quantum architectures such as photons [11], superconducting qubits [21,22] and trapped ions [23], major efforts are needed to scale up this approach to larger molecular systems of chemical interest. Here, albeit not as large as QPE circuits, VQE would also suffer from the large size of the quantum circuits, which coupled to a classical optimization requiring many variational parameters, can render calculations intractable. Consequently, the construction of an efficacious ansatz and improved classical optimizer is an active area of research [9,10].
There are two main approaches for ansatz design for electronic structure calculations. Hardware-efficient ansatzes [21,22] are constructed from repeated, dense blocks of a limited set of parameterized gates that are easy to implement on a quantum device. While such an ansatz requires fewer resources on the quantum processor, because of its totally chemically agnostic nature it might face difficulties in the parameter optimization due to the barren-plateau problem [24][25][26][27]. Consequently, simulation of large molecular systems with hardwareefficient ansatz might become practically challenging and even impossible. Chemically-inspired ansatzes, such as the Unitary Coupled-Cluster Singles and Doubles (UCCSD) [28][29][30][31][32], are designed by using the domain knowledge from classical quantum chemistry. The standard version of UCCSD, however, has unfavorable scaling in the number of gates required for larger molecular systems [33]. Consequently, various approaches are being developed to improve on the UCCSD method to obtain shorter circuits or obtain higher than UCCSD accuracy at comparable circuit depth [14,[34][35][36][37][38]. Among them, "iterative VQE" algorithms [39][40][41][42][43][44], which instead of using a fixed ansatz, construct problem-tailored ansatzes on-the-fly, have gained attention because of their controllable circuit-size.
Two notable iterative VQE algorithms to reduce quantum circuit depth are (fermionic and qubit) ADAPT-VQE [39,40] and iterative Qubit Coupled Cluster (iQCC) [41][42][43]. Fermionic-ADAPT-VQE employs a predetermined pool of spin-adapted fermionic (generalized) singles and doubles excitation operators from which the ansatz is dynamically constructed [39]. The more measurement-and circuit-efficient (e.g. less CNOT gates in particular) qubit-ADAPT-VQE uses selected Pauli words (obtained from mapping fermionic operators into Pauli operators) to form the "qubit" pool, but the algorithm comes with a larger number of ADAPT iterations because of the larger number of variational parameters (more gradient calculations required) [40]. IQCC uses arbitrarily shallow quantum circuit depth at the expense of an iterative canonical Hamiltonian dressing technique employing the qubit coupled cluster (QCC) ansatz. The exponential growth of the "dressed" Hamiltonian with the size of the QCC transformation in an anti-commuting set could be mitigated using the involuntary linear combinations of Paulis technique [42]. But the algorithm is exponential with respect to the dressing steps.
Although circuit depth is one of the most critical metrics for NISQ devices. The number of physical qubits on the actual quantum chip (circuit width) is the necessary condition that limits the problem that can be solved. Several techniques based on symmetries presented in the Hamiltonian have been used to reduce the number of qubits [45][46][47][48]. But these techniques can only reduce a few qubits and have certain symmetry requirements. In this work, we present the first-ever algorithm which reduces both quantum circuit depth and width in VQE at the cost of additional classical resources. Our novel algorithm, called ClusterVQE, uses the mutual information (measure of correlation between spin-orbitals [49,50]) to group the qubits into different clusters, which can be distributed to different quantum circuits. The correlation between different clusters is minimized by the mutual information optimization method and is taken into account by building a "dressed" Hamiltonian. Even shallower circuit depths can be achieved compared to the state-ofthe-art qubit-ADAPT-VQE. Consequently, the new algorithm is more robust to noise making it particularly attractive for NISQ devices. Because the correlations between different clusters are minimized, the dressed Hamiltonian protocol only introduces minor additional classical resources requirements compared to the iQCC method. Most importantly, the new ClusterVQE algorithm removes the entanglement between different clusters via the dressed Hamiltonian. Consequently, Clus-terVQE can solve the original problem in a much smaller qubit space (clusters), reducing the number of qubits significantly for simulating large molecules on NISQ devices.

A. Mutual Information based Clustering of Qubits
The first essential step of ClusterVQE is to split the qubits into different clusters and minimize the correlations between different clusters. With this aim, an entanglement map based on the mutual information (MI) that reflects the electronic correlations among all pairs of qubits is calculated [38]. Previous results obtained for the orbital ordering problem in the Density Matrix Renormalization Group (DMRG) method in classical quantum chemistry calculations [49] showed that the MI is a reliable parameter to quantify the correlation between two quantum particles. Our previous work also demonstrated that an approximate MI is sufficient for the qubit rearrangement [38] for the reduction of circuit depth. The MI has also been used to construct a compact entangler pool for ADAPT-VQE [51]. According to quantum information theory, the MI between qubits i and j is defined as follows [49]: where S i and S ij are the single-and two-qubit Von Neumann entropies, respectively, and δ ij is the Kronecker delta. S i and S ij are obtained from the corresponding one-and two-body density matrix [38]. Once the MI is obtained, it can be used to cluster the qubits into different groups by minimizing the MI between different clusters which is defined as the summation over MIs between inter-cluster qubits. The qubit clustering can be achieved by using a classical graph partitioning method, such as the graph partition function of the Metis library [52] (Appendix. I A). Alternatively, this can be mapped into a quadratic unconstrained binary optimization (QUBO) problem and solved on a quantum annealer by either the MI-based clustering method (details are given Appendix I B) or the quantum community detection method (QCD) [53] (details are given in Appendix I C). Problems with up to 64 variables or nodes can be solved directly on the D-Wave 2000Q quantum annealer. Larger sizes require a quantum-classical approach using qbsolv [54] + D-Wave.

B. The ClusterVQE algorithm
Within the VQE algorithm, a parameterized ansatz |Φ =Û (θ) |Φ r is suggested and encoded on the quantum circuit, where |Φ r is the reference state. In this work, the mean-field Hartree-Fock (HF) state is used as a reference. The unitary operatorÛ (θ) introduces the correlations between qubits. Since current quantum gates can only operate on a few qubits at a time,Û (θ) is decomposed as a tensor product of many unitary operators, denoted as entanglers, with each entangler acting on a few qubits, i.e.,Û (θ) = D jÛ (j) whereÛ j = e iθjÂj are the entanglers and D is the number of entanglers in the ansatz. In the widely used UCCSD ansatz, the {Â j } can be readily obtained by mapping the single/double Fermionic excitation operators into qubit-space as shown in Appendix II. With the ansatz, the ground-state (GS) energy can be obtained by imposing the variational principle, where the expectation value is measured on a quantum circuit. However, the contribution of each entangler to the GS energy is different. Thus, the UCCSD ansatz can be made even more compact by the adaptive construction [23,39] in the qubit-ADAPT-VQE, at the cost of more VQE calculations. This algorithm ends up with a compact sequence of entanglersÛ (θ) = D j=1Û j (θ j ) with D < D, whereÛ j (θ j ) = e iθjPj and the corresponding {P j } is chosen from the operator pool {Â j } based on their contributions to the energy weighted by the gradients [39].
Alternatively, according to the qubit clustering introduced in Sec. II A, the entanglers used in the ansatz can be grouped into different sets, where M is the number of clusters. In this way, the ordering of entanglers inÛ (θ) is changed and the corresponding parameters can be different as well. The entangler set {Û ci } only acts on the corresponding cluster, and {Û cij } represents the entanglers that couple different clusters. Consequently, the energy minimization in Eq. 2 can be rewritten as cijĤÛcij is the dressed Hamiltonian. Because theĤ d can be represented in terms of Pauli words and each Pauli word can be trivially rewritten as the tensor product of different clusters, i.e.,Ĥ d = k α kPk = k α k iP k (c i ) and the initial state can be decomposed as i Φ r (c i ), the energy in Eq. 4 can be further rewritten as  in Figure 1. In practice, like the qubit-ADAPT-VQE algorithm, clusterVQE adaptively grows the ansatz in Eq. 4. The ansatz used in the j th iteration is denoted asÛ j . However, in contrast to the growing ansatz in qubit-ADAPT-VQE [39] and growing Hamiltonian in iQCC [41], the ansatz in ClusterVQE is decomposed onto smaller quantum circuits, and only the entanglers between different clusters are used to dress the Hamiltonian. Consequently, the iterative construction breaks the original VQE algorithm with deeper and wider quantum circuits into a series of quantum circuits with a shallower depth and a smaller number of qubits, as illustrated by Figure 1.
The entanglerÛ k chosen in the ansatzÛ j for the j th iteration is determined by its contribution to the energy weighted by the gradient of energy with respect to the parameter, i.e., theÛ k terms with largest gradients ∂E ∂θ k are selected [39], where the gradient is defined as WhereĤ d,j−1 is the dressed Hamiltonian of the j − 1 iteration. It should be noted thatĤ d,j may be the same as theĤ d,j−1 if the selected entangler of the j th iteration is not used to dress the Hamiltonian. Since the measurement of each gradient term is independent, all the gradients can be measured in parallel with several uncoupled quantum computers [39]. The flowchart of the ClusterVQE scheme is presented in Figure 2. In the first step, the qubit Hamiltonian and operator pool are prepared: a) Provided classical meanfield calculation of the molecule, map the second quantized Hamiltonian into its qubit counterpart by using either Jordan-Wigner (JW) [55], Bravyi-Kitaev (BK) [56], or parity [45,57] encoding. b) Generate the UCCSD excitation operators and map them into Pauli words.
Here only Pauli words with the different flip indices (Appendix. II) are chosen and added into the excitation pool {Â i }. In addition, clustering of qubits is performed if MI is provided. Otherwise, the qubits in the first iteration are simply grouped according to their spin index. If the largest gradient in the second step is smaller than the convergence criteria , the energy minimization criteria are satisfied and the algorithm exits from the iterative loop. The fifth step is only triggered if clustering on-thefly is desired.
Since some of the entanglers inÛ j are used to dress the Hamiltonian, the circuit depth is smaller than that of qubit-ADAPT-VQE. Besides, because the clustering minimizes the entanglement between different clusters, the number of dressing entanglers is much smaller than that of iQCC. Consequently, ClusterVQE can balance the computational costs between the Quantum Processing Unit (QPU) and CPU. The circuit depth/width is further reduced compared to qubit-ADAPT-VQE, and exponential scaling in growing of the Hamiltonian in iQCC is avoided. As a result, ClusterVQE is more robust to noise, as shown and discussed later, making the ClusterVQE algorithm more suitable for NISQ devices than qubit-ADAPT-VQE. Most importantly, compared to previous approaches, ClusterVQE can solve larger problems on smaller quantum circuits.

C. Analytical gradients for VQE optimization
In order to improve the algorithm performance in the presence of noise, an analytical gradient is essential for classical optimization. It should be noted that the gradient for VQE optimization is different from the gradients in Eq. 6 which is used to chose entanglers for each iteration. Ref. [58] suggests calculation of the analytical gradients by performing a measurement on an ancillary qubit [32,59]. A similar approach has been developed in this work for the analytical gradient measurements but without introducing an ancillary qubit. The gradient of energy with respect to the variational parameter reads as follows.
whereP eff i is the dressed Pauli word as shown in Appendix III. Therefore, the energy gradient can be measured in the same way as the energy measurement by re-placingĤ withĤP eff j and no ancillary qubit is required. On a noiseless simulator, VQE calculations with numerical gradients or analytical gradients give the same results. However, analytical gradients are much more stable on noisy simulators and quantum computers.
It should be noted that even though the ClusterVQE method can reduce the number of qubits and ansatz complexity by distributing the measurements on smaller quantum circuits, the number of measurements increases with the number of entanglers used in the dressed Hamiltonian. To further reduce the number of measurements, future work may consider recently developed measurement reduction techniques [60][61][62][63][64][65][66][67].

III. RESULTS AND DISCUSSION
To demonstrate the validity of the ClusterVQE algorithm, we have simulated several molecules on both quantum simulator and IBM quantum devices. In order to perform the simulations, an in-house modified Qiskit code [68] has been developed. The Jordan-Wigner operator transformation [55] and the Limited-memory BFGS Bound (L-BFGS-B) optimizer [69] were used in this work. The obtained results are compared with that of exact diagonalization. In order to make the comparison over different methods (VQE, qubit-ADAPT-VQE, iQCC) meaningful, the same ansatz, qubit-UCCSD (details in Appendix II), is used for all simulations below. The convergence criteria of = 10 −4 and 5 × 10 −3 are used for LiH and N 2 , respectively, if not specified elsewhere. For the N 2 molecules, the maximum number of iterations is 25, unless convergence occurs earlier.

A. LiH molecule
We start with simulating the ground state of the LiH molecule. LiH requires 10 qubits with the minimal STO-3G basis set and frozen core orbitals. Figures 3(a) and (c) compare the energy convergence of the qubit-ADAPT-VQE, ClusterVQE, and iQCC approaches for different Li-H bond lengths (1.547Åand 2.4Å). Because iQCC only implements one entangler on the quantum circuit, the number of parameters to be optimized is also smaller compared to that of qubit-ADAPT-VQE and Cluster-VQE. Consequently, a smaller number of optimization cycles is required per iteration. For the iQCC method, once the entanglers in the previous iteration are used to dress the Hamiltonian, these parameters are fixed. In contrast, the parameterized ansatz in the qubit-ADAPT- VQE and ClusterVQE approaches are to be re-optimized at each iteration, making them more flexible to achieve better convergence. As a result, the convergence of iQCC is slightly worse than that of qubit-ADAPT-VQE for a given number of iterations, as shown in Figures 3(a) and (c). For ClusterVQE, because only a few operators are used to dress the Hamiltonian, the performance of Clus-terVQE is almost the same as that of qubit-ADAPT-VQE. Because the Hamiltonian of the iQCC and Cluster-VQE methods is dressed by the entanglers, the number of Pauli words in the dressed Hamiltonian is increased. However, compared to iQCC, only two operators are used to dress the Hamiltonian in ClusterVQE as shown, the number of Pauli words of the dressed Hamiltonian does not increase significantly. In contrast, the number of Pauli words in iQCC increases exponentially with the number of iterations before saturation, see Figures 3(b) and (d). We also compared performance of the qubit-ADAPT-VQE, iQCC, and ClusterVQE approaches on the ground-state potential energy surface (PES) of LiH as shown in Figure 4. Overall, ClusterVQE can achieve the same energy error as that of qubit-ADAPT-VQE but with smaller quantum circuits. And ClusterVQE uses less Pauli words compared to iQCC, as shown in  We also compared the performance of the different clustering methods used (Appendix. I). For most geometries of the LiH molecule, these three methods produce the same clustering. However, at the bond length of 2.4Å, the Metis graph partition method results in less optimal clustering. Consequently, more entanglers are needed to dress the Hamiltonian and ClusterVQE with the Metis method ends up with 808 Pauli words. This comparison suggests that the QCD and MI-based clustering methods provide better and more stable performance in clustering and help improve the ClusterVQE performance.

B. N2 molecule
We next demonstrated the validity of ClusterVQE in the strongly correlated regime. The N 2 molecule is a prototype system for testing multi-reference methods in quantum chemistry due to its strong electronic correlations particularly when the bond is stretched. Here N 2 at equilibrium (1.09Å) and stretched configurations (1.6Å) are used as examples. The 14-qubit N 2 /STO-3G system is simulated after freezing the lowest two spinorbitals. As shown in Figures 5(a) and (c), the energy convergence of ClusterVQE is almost the same as that of qubit-ADAPT-VQE and is better than that of iQCC. Moreover, the size of the dressed Hamiltonian in Clus-terVQE is much smaller than that of iQCC. It should be noted that the stretched N 2 molecule has stronger correlations, however, accompanied with a better separation between clusters. Consequently, less dressing is required for the stretched N 2 , and ClusterVQE only slightly increases the size of the dressed Hamiltonian. In contrast, the size of the Hamiltonian exceeds 15,000 Pauli words in iQCC after 25 iterations. These results suggest that, even for strongly correlated systems, our ClusterVQE reduces circuit complexity without a significant increase in Hamiltonian size.

C. Noisy simulation and clustering on-the-fly
We next examine the dissociation curve of the LiH molecule in the presence of noise to demonstrate Cluster-VQE's resilience to this common feature of all NISQ architectures. Figure 6 shows the dissociation curve of the LiH molecule calculated with different algorithms. The energy values are averaged over 5 runs. As shown by the curves, in the presence of noise, the dissociation curves obtained from the VQE, qubit-ADAPT-VQE, iQCC, and ClusterVQE methods are different from the exact solution. Since the VQE algorithm requires more parameters and the deepest circuit, the noise influence is the largest across the set, showing the strongest deviation from the reference. The qubit-ADAPT-VQE method, in contrast, is able to grow a compact ansatz with fewer parameters and lower circuit depth. The influence of noise is suppressed. On the other hand, the iQCC algorithm uses much shallower circuit depth. As a result, the iQCC is most immune to the noise, and the corresponding dissociation curve is the closest to the reference. Next is ClusterVQE which partially dresses the Hamiltonian and grows the ansatz. The circuit depth of ClusterVQE lies between that of iQCC and qubit-ADAPT-VQE. Consequently, the energy error in the presence of noise is also in between that of the iQCC and qubit-ADAPT-VQE methods.
We also compare the performance of the four algorithms with different error rates. Figures S1(a) and (b) compare the energy errors with respect to different singleand two-qubit error rates, respectively. The single-qubit gates usually have much lower error rates on real quantum devices. For example, the single-and two-qubit gate error rates of the IBM-Yorktown [70] device are around ∼ 10 −3 and ∼ 10 −2 , respectively. Consequently, the error of single-qubit gates have a much smaller influence on the results. In contrast, the results are significantly affected by the error of two-qubit gates. In general, the error of ClusterVQE lies between that of the qubit-ADAPT-VQE and iQCC methods on the noisy simulator, as expected.
In the previous examples, the MI is calculated based on the exact wavefunction (WF) for proof-of-principle. But, the exact WF is not known before the VQE calculation. However, an approximate calculation of MI provides a sufficient estimate of the correlation between different qubits, as shown in the previous DMRG-type selection of active space [71] and our recent development of the PermVQE algorithm [38]. The approximate MI can be estimated on classical computers by using either MP2 or DMRG (with small bond-dimension) methods. Here, we also demonstrate that ClusterVQE even works with MI calculated on-the-fly. At each iteration, the MI is up- dated and used for re-clustering the qubits. As shown by the red dots in Figures 3 (a) and (c), the performance of ClusterVQE on-the-fly is almost the same as that using the exact MI. The clustering is suboptimal only for the first three iterations and the dressed Hamiltonian has a slightly larger size as a result. The compositions of each cluster are changing before a stable clustering is found. Nevertheless, these results illustrate that Cluster-VQE also works when clustering on-the-fly.

D. Simulation on a quantum computer
Finally, we implement ClusterVQE on an actual quantum backend. The IBMQ-Rome device with 5 qubits was chosen to simulate the LiH molecule described by 10 qubits (lowest energy spatial orbital is assumed to be frozen). In order to run on IBMQ-Rome, we set the size of each cluster to be 5. Figure 7 shows the energies as a function of ClusterVQE iterations without error mitigation, and the inset plot shows the VQE optimization for the first, third, and fifth ClusterVQE iterations, respec- tively. Due to the large CNOT error rate, the energies go up with the first a few ClusterVQE iterations without error mitigation. Namely, the number of entanglers encoded on the quantum circuits increases with the number of iterations and introduces more errors. However, the energy error does not increase monotonically with Clus-terVQE iterations, suggesting that the ClusterVQE can compete with noise even without error mitigation.
The ClusterVQE performance on real quantum devices can be further improved with recently developed error mitigation techniques [72]. For proof-of-principle, we first perform ClusterVQE on the ideal simulator to generate the circuits of the final iteration and run the final circuits on both the noisy simulator and real quantum device (IBMQ-Rome) for error mitigation. We perform the mitigation using Clifford data regression (CDR) [73], [74]. For the noisy simulation, we use a noise model obtained by gate set tomography of the IBMQ Ourense device [75]. The results are listed in Table. I, which shows that error mitigation reduces the error by several orders of magnitude. Such simulations demonstrate that the Cluster-VQE method, with error mitigation, can readily achieve chemical accuracy on real quantum devices.

IV. CONCLUSION
To summarize, we developed a novel ClusterVQE algorithm to reduce quantum circuit complexity, including both the number of qubits and circuit depth. Our algorithm groups qubits into clusters based on maximal correlation between them and further takes into account the residual entanglement between clusters in a new "dressed" Hamiltonian. Hence, ClusterVQE naturally implements the benefits of the previously developed PermVQE algorithm, which significantly reduces the number of controlled-NOT gates to connect strongly correlated qubits [38]. The entanglers that couple different clusters are further used to dress the Hamiltonian and remove the entanglement between different clusters. Consequently, the Hamiltonian can be decomposed as a tensor product of each cluster. Each cluster can then be measured separately with a smaller number of qubits and shallower circuit depth. Therefore, ClusterVQE is a particularly promising algorithmic development for quantum chemistry applications, where it is able to reduce the quantum hardware requirements significantly. Moreover, our algorithm has the flexibility of reducing qubits to any number at the expense of additional computational costs on classical computers. Since fewer entanglers are encoded on quantum circuits, ClusterVQE is more robust to noise than qubit-ADAPT-VQE, being one of the most circuit-efficient VQE algorithms, which is verified by our numerical experiments on both the noisy simulator and real quantum platforms. Since the correlation between different clusters is minimized, only a few operators are used to dress the Hamiltonian. Consequently, the Hamiltonian size does not increase significantly compared to that of the iQCC method.
ClusterVQE also uses a newly proposed gradient measurement method for the VQE without using an ancillary qubit. The analytical gradient can significantly improve the performance of ClusterVQE in noisy environments. The simulation of the LiH molecule on the noisy quantum simulator found that the energy error of the ClusterVQE algorithm is smaller than that of the qubit-ADAPT-VQE method due to the shallower circuit depth. A groundbreaking example of the exact error-corrected LiH molecule simulation described by 10 qubits on the 5-qubits IBMQ-Rome quantum computer with Cluster-VQE is demonstrated. Obtained results further demonstrate the validity and advantage of the ClusterVQE algorithm. Hence, we believe the algorithm developed in this work can pave the way towards the adaptation of molecular quantum chemistry simulation on NISQ devices. It should be mentioned that since the size of the qubit Hamiltonian increases with the number of operators used to dress the Hamiltonian, it increases both computational costs on classical computers and measurements on quantum computers. The recently developed measurement reduction techniques [60][61][62][63][64][65][66][67] are likely a path-forward to further reduce the simulation cost on quantum devices.
Finally, it should be noted that the computational scaling of ClusterVQE is dependent on the number of dressing operators. Even though the scaling of computing the dressed Hamiltonian is exponential against the number of dressing operators [76], the scaling of ClusterVQE against iterations is not exponential in general because not every entangler is used to dress the Hamiltonian. For the best cases, only a limited number of entanglers is used to dress the Hamiltonian (LiH, for example). Such a situation can be intuitively expected in molecular multimers, where the intermolecular coupling is usually smaller than the intramolecular correlation. For the worse cases, usually in strongly correlated systems in which qubit clustering is not optimal, the scaling of ClusterVQE may be exponential if a larger number of entanglers are used to dress the Hamiltonian. However, even in strongly correlated molecules, the scaling of different geometries can be different. For example, the ClusterVQE simulation of 14-qubits N 2 at equilibrium ends up with a large number of Pauli words ( Figure 5(b)). However, the simulation of N 2 with stretched bond has a much better scaling (Figure 5(d)) even though the stretched configuration has a stronger correlation. That being said, the bottleneck of our approach relies on whether the inter-cluster MI can be efficiently minimized. Hence, ClusterVQE does not introduce exponential scaling for general cases.

A. Metis Graph Partitioning on CPU
The MI matrix can be viewed as a graph. This allows for the use of classical graph partitioning methods from the Metis library [52]. The part graph function in the Metis library was used and the partition weights were also optimized to minimize the cut.

B. MI-based clustering on Quantum Annealer QPU
The MI between qubits can be interpreted as a degree of correlation. Note that when two random variables are fully correlated, their joint entropy is minimal, and the MI is maximal. Moreover, MI can be mapped to what others have coined as the "generalized correlation coefficient", where all higher order correlations are taking into account [77][78][79]. This generalized correlation coefficient is computed as: where d is the dimensionality of the problem, and I ij are is the Mutual Information matrix elements. Note that, when I ij goes to infinity, r M I ij approaches 1; whereas if I ij tends to zero, r M I ij tends to 0. Taking into account Equation S2, we see that, by maximizing MI for a constrained set of qubits, we will be maximizing the correlation among that same set. Moreover, maximizing the MI restricted to a certain number of qubits, is somehow related to getting the "most central hub," according to node centrality ordering. Albeit for some exceptions, we have verified the latter using eigenvector centrality metric.
Maximizing the MI restricted to a certain number of qubits can be formulated as a quadratic unconstrained binary optimization (QUBO) problem. This formulation can, in turn, benefit from available alternative computational paradigms tailored to solve these kinds of problems, e.g. Quantum, and Digital annealers. The cost function to be maximized can be written as: where the vector x is a bit-string containing the information of the selected qubits. This is, if the i-th qubit is selected x i = 1, and 0 otherwise. Matrix I contains all the MI elements I ij as defined by Eq. 1, and R is the matrix representation of an operator that "restricts" the total number of selected qubits [80]. A penalty constant λ is used to tune the strength of the constraint. Since the annealing methods and devices commonly minimize cost function, the problem is rather solved by minimizing −C leading to a QUBO matrix equal to Q = −I + λR Matrix R, is defined as follows.
Here, f is the target fraction of qubits to be selected; S is the total size of the qubits pool; and α is a constant that depends on the optimizer (1 or 2 depending if only the upper triangle vs the full QUBO matrix is used to compute the energy). For all the calculations where this clustering method was used we have set f = 0.5, and λ = 2.0. An explanation of why the R matrix restricts the set is described next. Consider the case where a general vector x of length S = N is such that k x k = K. Now let us compute the energy term of that vector given by matrix R. This is E R (x, such that ||x|| 1 = K) = x t Rx. Then, the off diagonal elements of R will contribute with K(K− 1). If we want f = M/N , then, the diagonal elements will contribute with K − αM K. By summing the two contributions, we have that E R (K) = K 2 − αM K = K(K − αM ). This means that the function will have two minima, the trivial one at K=0, and another at K = αM (αM bits equal to 1).

C. Quantum Community Detection on Quantum
Annealer QPU Classical community detection as graph clustering maximizes modularity to split the MI matrix into clusters that minimize correlation or connectivity between clusters [81]. In quantum community detection (QCD), the problem is represented as a QUBO that can be run on a D-Wave quantum annealer as an NP-complete combinatorial optimization problem.
The symmetric MI matrix is viewed as a graph. The adjacency matrix A is calculated as follows.
The modularity matrix B is calculated from the adjacency matrix A and degree of each node (number of connected nodes) d i .
While QCD can cluster a graph into 2 or more clusters [53,82,83], in this work we are interested in splitting only into 2 clusters. The modularity matrix in QUBO form is run on the D-Wave 2000Q quantum annealer: The resulting x is a string of zeros and ones that maps each of the MI nodes (representing qubits) to one of two clusters that can be used in the ClusterVQE algorithm. The error in energy (∆E) as a function of the error rates of single (a) and two-qubit (b) gates, respectively. The two and single-qubit gate error in a and b are fixed at 0.01 and 0.001, respectively. The single-qubit gate on real quantum devices is usually one order of magnitude smaller than that of two-qubit gates. Hence, single qubit error has a smaller impact on the results. single and double excitation (and de-excitation) operators: |Φ(θ) = eT (θ)−T † (θ) |Φ r , where |Φ r is the reference state being the HF state in this work.T (θ) = T 1 (θ 1 ) +T 2 (θ 2 ) is the excitation operator withT 1 (θ 1 ) andT 2 (θ 2 ) being the parameterized single and double excitation operators, respectively, For simplicity, indices i, j, · · · are used to represent the occupied molecular orbitals (MOs), a, b, · · · represent the unoccupied MOs. Because the NISQ devices only have gates acting on a few qubits at a time, the UCCSD operator must be broken up into a sequence of single operators (either single or double excitation) via the Trotter expansion of an exponent operator [84], i.e., e A+B = lim n→∞ e A/n e B/n n . Even though the truncated Trotter expansion is an approximation to the UCCSD ansatz, the variational procedure is able to absorb the error according to recent report [85]. Consequently, a finite order of Trotter expansion is able to reproduce the FCI energies due to the fact that the many-body interaction can be decomposed as the tensor product of one-and twobody interactions [39]. In this way, the exact FCI state can be written as the tensor product of single and double excitation operators, |Φ(θ) = D k=1 et k (θ) |Φ r , wherê t k (θ) represents either single or double excitation operators and their anti-Hermitian forms (t † k = −t k ). Consequently, the GS energy is readily obtained by minimizing the parameterized ansatz, i.e., In the UCCSD ansatz, the SD sequences are directly generated from the Trotter expansion of theT (θ) −T † (θ) operator. In the recently developed ADAPT-VQE algorithm [39], the sequence of SD operators in the ansatz are chosen adaptively by measuring their contribution to the GS energy weighted by the energy gradients with respect to the parameters. With the adaptive method, a more compact sequence of the unitary operators is generated and a smaller number of parameters is optimized at the cost of more VQE simulations. Because the UCCSD ansatz is not hardware-efficient, a qubit version of the UCCSD ansatz was developed recently in which the ansatz is broken up into a series of Pauli words and a small portion of the Pauli words is chosen to adaptively grow the ansatz. Eq. 6 indicates the prod-uctP kj = [P k ,P j ], that only contains theẐ operators, has non-vanishing derivatives because Φ r |σ i |Φ r = 0 if σ i =X,Ŷ . Therefore, the Pauli wordP j that containŝ Z has no contribution to the derivative. And two Pauli words with the the same flip indices (indices ofX/Ŷ ) have the same contribution. Hence, after the fermionqubit mapping of the UCCSD operators, we only include those Pauli words with unique flip indices. Thus, only one of these operators is included in the unitary operators {Û j }. In this way, the number of operators and circuit depth can be reduced.

III. ANALYTICAL GRADIENT IN VQE
The energy gradient is used in the parameter optimization on the classical computers. The gradient can be obtained by the finite-difference method or analytical gradients. According to our numerical experiment on either a noisy simulator or real quantum devices, the analytical gradient can significantly improve the performance of VQE in the presence of noise. In this work, the analytical gradient is obtained by the measurements on quantum circuits. The gradient of a parameter is, whereP eff i is the dressed operator. Consequently, the energy gradient can be obtained by In this way, calculation of the dressed operatorP eff i on classical computers is required, while no ancillary qubit on the quantum circuit is needed.