Deterministic improvements of quantum measurements with grouping of compatible operators, non-local transformations, and covariance estimates

Obtaining the expectation value of an observable on a quantum computer is a crucial step in the variational quantum algorithms. For complicated observables such as molecular electronic Hamiltonians, one of the strategies is to present the observable as a linear combination of measurable fragments. The main problem of this approach is a large number of measurements required for accurate estimation of the observable’s expectation value. We consider three previously studied directions that minimize the number of measurements: (1) grouping commuting operators using the greedy approach, (2) involving non-local unitary transformations for measuring, and (3) taking advantage of compatibility of some Pauli products with several measurable groups. The last direction gives rise to a general framework that not only provides improvements over previous methods but also connects measurement grouping approaches with recent advances in techniques of shadow tomography. Following this direction, we develop two measurement schemes that achieve a severalfold reduction in the number of measurements for a set of model molecules compared to previous state-of-the-art methods.


I. INTRODUCTION
Variational Quantum Algorithms (VQA) constitute one of the most promising class of applications for quantum computers in the noisy intermediate scale quantum era. 1,2In VQAs, classically intractable optimization problems are represented as lowest eigenstates of N q -qubit operators Ĥ = N P n=1 c n Pn , Pn = ⊗ Nq k=1 σk (1)   where c n 's are coefficients and Pn 's are tensor products of Pauli operators or identities, σk ∈ {x k , ŷk , ẑk , 1k }.
VQAs then solve these problems by minimizing E(θ) = ψ (θ)| Ĥ |ψ (θ) , where the quantum computer prepares the trial wavefunction |ψ (θ) and is given a task to measure E(θ), while a classical optimizer determines the optimal θ.However, it was found that estimating E(θ) accurately for chemical systems requires large numbers of measurements that diminish VQA's advantage over classical alternatives. 3easuring E(θ) is indeed not a straightforward task since only z-Pauli operators can be measured on current digital quantum computers.A common approach to measuring the expectation value of the Hamiltonian is to present Ĥ as a sum of measurable fragments Ĥ = α Âα .The condition for selecting Âα 's is that they can be easily rotated into polynomial functions of z-Pauli operators Unfortunately, in general, the wavefunction |ψ (θ) is not an eigenstate of Âα , and thus each fragment requires a set of measurements to obtain an estimator Āα for ψ (θ)| Âα |ψ (θ) .The efficiency of the Hamiltonian measurement scheme is determined by the total number of measurements, M , needed to reach accuracy for E(θ).For a simple estimator of E(θ) as the sum of Āα estimators, the error scales as = α Var ψ Âα /m α , where Var ψ Âα = ψ| Â2 α |ψ − ψ| Âα |ψ 2 is the variance of each fragment, and m α 's are the numbers of measurements allocated for each fragment, with the condition α m α = M .The optimal distribution of measurements is m α ∼ Var ψ Âα , which gives the total estimator error as = α Var ψ Âα / √ M .
This consideration shows superiority of estimators operating with a set of measurable fragments that have the lowest sum over variance square roots.For practical use of this consideration, there are two difficulties in explicit optimization of the estimator error: 1) there is an overwhelming number of choices for measurable operator fragments and 2) variance estimates require knowledge of the wavefunction.The second problem can be addressed by introducing a classically efficient proxy for the quantum wavefunction (e.g. from Hartree-Fock or configuration interaction singles and doubles (CISD) methods in quantum chemistry problems) or by utilizing the measurement results from VQAs to gain empirical estimates.

arXiv:2201.01471v3 [quant-ph] 18 Apr 2022
Yet, the search space in the first problem is so vast that it has only been addressed heuristically in previous studies.The Hamiltonian partitioning has been done in qubit space [4][5][6][7][8][9][10][11] and in the original fermionic space with subsequent transfer of all operators into the qubit space. 12,13n initial heuristic idea was to reduce the number of measurable fragments without accounting for variances.It was shown for several partitioning that the number of fragments is not a good proxy for the total number of measurements, and the fragments' variances cannot be ignored. 13,14The key element determining a particular set of measurable fragments is a class of unitary transformations Ûα 's in Eq. (2).Compared to singlequbit transformations, multi-qubit transformations are more flexible and therefore have a greater potential to minimize the total number of measurements by selecting fragments with lower variances.Yet, they also have a downside of an extra circuit overhead needed to perform the rotation before the measurement.Once the set of unitary transformations has been selected, empirically, it was found more beneficial for the estimator variance to use greedy algorithms for the Hamiltonian partitioning.In these algorithms one finds Âα fragments sequentially by minimizing the norm of the difference between partial sum of Âα 's and Ĥ. 13,14 This can be rationalized considering that greedy algorithms produce first fragments with larger variances and later fragments with smaller variances.Such a distribution of variances makes sum of square roots somewhat smaller compare to the case where variances are distributed relatively equally over all fragments.
Fragmentation techniques in the qubit space are based on grouping mutually commuting Pauli products in each fragment Âα [Eq.(2)].Two types of commutativity between Pauli products are used: qubit-wise and full commutativity.The full commutativity (FC) is the regular commutativity of two operators, 7 whereas the qubitwise commutativity (QWC) for two Pauli products is a condition when corresponding single-qubit operators commute. 5Using either commutativity to find Âα 's, one can efficiently identify unitary operators Ûα 's from the Clifford group that bring the fragments to the form of Eq. (2) for measurement.Only one-qubit Clifford gates are sufficient for Ûα 's of the qubit-wise commuting fragments, 5 while Ûα 's for fully commuting fragments require also two-qubit Clifford gates. 7nitial QWC-and FC-based schemes had Âα 's consisting of disjoint (non-overlapping) sets of Pauli products.Generally, each Pauli product can belong to multiple Âα 's as long as it commutes with all terms in these fragments.This follows from non-transitivity of both FC and QWC as binary relations: if P1 commutes with P2 , and P2 commutes with P3 , this does not lead to commutativity of P1 and P3 .For the measurement problem, P1 and P3 form separate measurable groups while P2 can be measured within both of these groups.Here, P2 constitutes an overlapping element for the P1 and P3 groups (see Fig. 1 where P1 , P2 , and P3 are ẑ1 , ẑ1 ẑ2 , and x1 x2 respectively).Recent developments based on shadow tomography [15][16][17][18] and grouping 19,20 techniques exploiting overlapping fragments found considerable reduction in the number of needed measurements over non-overlapping grouping schemes.However, all nonoverlapping schemes used in those comparisons did not use the greedy approach.Since within qubit-based partitioning schemes there are multiple estimator improvement techniques, it is interesting to assess them all systematically.In this work, we assess improvements in the total number of measurements from introducing a series of ideas: 1) grouping commuting operators using the greedy approach, 2) involving non-local (entangling) unitaries for measuring groups of fully commuting Pauli products, and 3) taking advantage of compatibility of some Pauli products with several measurable groups (i.e.overlapping grouping).It is shown that these ideas, used separately or combined, can give rise to schemes superior to prior art within grouping and shadow tomography techniques.One of the most striking findings is that using only greedy non-overlapping grouping within the QWC approach can already surpass the performance of recent shadow tomography techniques that employed overlapping local frames.We do not consider fermionic-algebra-based techniques here because they do not allow overlapping grouping while all other improvements were already discussed for them. 13Other measurement techniques that do not involve grouping of Hamiltonian terms are also outside of the scope of the current work.
where P α 's are disjoint sets of Pauli products measured as parts of corresponding Âα 's, and c k 's are coefficients of Pk 's in the Hamiltonian.The commutativity between Pauli products within P α implies a common eigen-basis B α , where one can measure all the members of P α .Initial proposals to find these fragments aim to minimize the total number of fragments using graph coloring algorithms, such as the largest first (LF) algorithm. 5,7But later the sorted insertion (SI) algorithm employing the greedy approach was found to produce lower variances for the energy estimator. 14et H denote the estimator for ψ| Ĥ |ψ ; it is a sum of estimators for its parts Each Āα comes from m α repeated measurements of Âα , where A α,i is the i-th measurement result of Âα .The variance of H is where Var Āα 's are variances of estimators characterizing differences between Āα 's and the true expectation values ψ| Âα |ψ 's.Note that covariances between different fragments Cov( Āα , Āβ ) are zero because measurements of different fragments are done independently.Var Āα 's can be evaluated using quantum operator variances Var ψ Âα 's, Var Āα = Var ψ Âα /m α , which leads to the Hamiltonian estimator variance as Using the constraint M = α m α one can minimize Var H with respect to m α 's 14,26 In practice, quantum variances Var ψ Âα are not known a priori.They can be evaluated using covariances between Pauli products, where Pj , Pk ∈ P α .The covariances for different Pauli products are generally non-zero because all of these Pauli products are measured together within the same fragment.The covariances can be approximated for molecular Hamiltonians using approximate wavefunctions obtained on a classical computer.Configuration interaction singles and doubles (CISD) is one example for obtaining approximation for |ψ that will be used in the current work.Alternatively, the measurements results obtained from measurement basis B α can help estimate the covariances between Pauli products of P α during VQA's cycles.

B. Optimization by Coefficient Splitting
Many Pauli products in the Hamiltonian can be measured in multiple fragments because of their compatibility with other members of those fragments.The coefficient splitting approach, briefly mentioned in Ref. 14, takes advantage of this opportunity by splitting coefficients of Pauli products that are compatible with multiple fragments where I k is a set of group indices α corresponding to fragments Âα whose members are compatible with Pk (see Fig. 1 for an example).Note that the equations for the estimator variance and the optimal measurement distribution remain the same [Eqs.( 8) and ( 9)].However, freedom in the coefficient splitting approach [Eq.( 13)] can be used to minimize the Hamiltonian estimator variance [( 9)].
A straightforward approach to minimization of Var H with respect to c k 's grows with the size of the system.As a computationally more efficient alternative, we propose an iterative heuristic that quickly converges to a zero gradient solution.
Iterative coefficient splitting (ICS): Given a particular choice of c (α) k 's and its optimal m α 's, the procedure consists of iteratively applying two steps: (1) optimization of c k 's.The second step is straightforward using Eq. ( 8).For the first step, notice that when m α 's is fixed, the derivatives of Var H with respect to c To account for the constraints in Eq. ( 13), for each splitting of c k , we fix one of the {c This allows us to find an optimal c (α) k 's by solving the linear system of equations obtained by equating the gradients to zero.
If the number of c k 's overcomes computationally affordable limits, one can always limit the minimization to a selected subset of c (α) k 's.The criteria for the suitable subset could be the Pk variances, which correlate with magnitudes of their covariances and therefore the importance of their coefficients for Var H .

C. Optimization by Measurement Allocation
Another approach to reducing the Hamiltonian estimator variance is to measure each Pauli product as a member of as many compatible measurable fragments as possible.This idea was used in classical shadow tomography methods based on local transformations for measurement of Pauli products. 15,16,19First, for a particular Pauli product Pk , one finds a set of measurement bases B α 's where Pk can be measured (see Fig. 1 for an example, by a measurement group this method considers a set of compatible Pauli products).Then, all measurement results for Pk obtained in B α 's are used to estimate Pk : where P To proceed further, it is important to distinguish covariances between Pauli products measured within the same fragment and in different fragments.The former correspond to α = β and u = v in Eq. ( 17) and generally are non-zero, while the latter (α = β or u = v) are zero Note that the key element in deriving this Hamiltonian estimator variance is the consideration that if a Pauli product is measured as a part of a certain group, all members of this group contribute to the average and to the variance.Thus, the variance of each group gives rise to covariances between its members.Since the covariances in different groups are different in magnitude, placing a particular Pauli product in all compatible groups can be sub-optimal for the total variance of the Hamiltonian estimator.
Dependencies of M j and M k on m α 's in Var H [Eq. ( 18)] make finding the optimal measurement allocation in the analytic form infeasible. To minimize Var H with respect to m α 's in Eq. ( 18) one can numerically optimize m α 's as positive variables with restriction α m α = M .We will refer to this strategy as the measurement allocation approach.It turns out that such minimization is a version of the coefficient splitting approach with c k 's for m α 's in Âα and using Eq. ( 7), we obtain Var H as which agrees with Eq. (18).Therefore, the measurement allocation approach can be seen as a particular choice of coefficient splitting for reducing Var H .The main advantage of the measurement allocation approach is a much lower number of optimization variables (m α ) compared to that of the coefficient splitting scheme (c (α) k ).One can formulate approximation for gradients of Var H with respect to continuous proxy of m α 's (see Appendix A), which leads to a gradient descent scheme that we will refer to as gradient-based measurement allocation (GMA).Yet, a computationally more efficient, non-gradient iterative scheme was found and detailed below.
Iterative measurement allocation (IMA): Given an initial guess for m (0) α 's and resulting M (0) k 's, the corresponding coefficient splitting partitioning of the Hamiltonian is where Recall that the optimal measurement allocation for any coefficient splitting is given by Eq. ( 8).Thus, we use this optimal allocation to update m α 's as which leads to the update in measurable groups Since there is no guarantee that each iteration will necessarily lower Var H in Eq. ( 18), we repeat these steps multiple times and choose m α 's that result in the lowest estimator variance.Empirically, the procedure finds the best measurement allocation in first few cycles.

III. RESULTS AND DISCUSSIONS
We assess the performance of the proposed approaches (IMA, GMA, and ICS) in comparison to prior works (LF, SI, and classical-shadow-based algorithms) in estimating energy expectation values for ground eigen-states of several molecular electronic Hamiltonians. 27To compare different schemes, we normalize the total measurement budget α m α = 1 and require m α 's to be positive real numbers.The estimator variance with m α 's, Var H , can be scaled by 1/M to approximate the estimator variance Var H ≈ Var H /M for m α = M m α measurements for each group.The overlapping groups (P α 's) of the proposed methods are obtained from an extension of the SI technique (see Appendix B).The initial measurement allocations or coefficient splittings are derived from measurement allocations of the SI technique using exact or CISD wavefunctions.
To illustrate the relative performance of our methods, Table I presents the Hamiltonian estimator variances based on covariances calculated with the exact wavefunction.Lower variances in SI compared to those in LF are consistent with earlier findings 14 .All proposed methods result in lower variances than those in SI.As the most flexible approach, the coefficient splitting method ICS achieves the lowest variances.GMA has a slight edge over IMA in estimator variances, but due to the computational cost of GMA, we will only consider IMA from here on.
Table II shows the number of optimization variables in the measurement allocation and coefficient splitting techniques.For the measurement allocation approaches (IMA and GMA) the number of variables is equal to the number of measurable groups.For the qubit-wise (full) commutativity, the number of such groups scales as ∼ N P /3 (∼ N 3 q ) since on average each group contains three (N q ) Pauli products.For relatively small molecules in our set (i.e.only few atoms), N P scales as N 4 q .In the coefficient splitting approach, the number of variables is a product of N P and an average number of measurable groups that are compatible with an average Pauli product.For our model systems, it was found empirically that the latter number grows as ∼ N 3 q for the qubitwise commutativity, whereas for the full commutativity the number is within a range of [0.4,2.3] and thus can be considered relatively constant.These considerations clarify why the measurement allocation techniques can be employed for both commutativities, but the coefficient splitting without extra constraints can be afforded only for the FC grouping.To compare the proposed methods to the classcial shadow tomography techniques (Derand 16 and OGM 19 ), we consider QWC grouping methods that do not require non-local (entangling) transformations (Table III).Unlike the original OGM treatment, we avoid deleting measurement bases to compare all methods on an equal footing.Comparison between the non-overlapping techniques (LF and SI) and classical shadow techniques reveals that only employing the greedy approach to QWC grouping in SI is already enough to surpass the classical shadow tomography techniques.In accord with results of Table I, both IMA and ICS outperform SI even when approximate covariances are used.
Lastly, we present improvements on the variances that can be achieved by adopting non-local unitary transformations and FC grouping (Table IV).For all considered systems, IMA and ICS are superior to SI.This shows that approximate covariances based on the CISD wavefunction can perform similarly to the exact ones.

IV. CONCLUSIONS
We assessed multiple ideas for reduction of the number of measurements required to accurately obtain the expectation value of any operator that can be written as a sum of Pauli products.Since these ideas can be used separately or combined, our main goal was to understand the impact on the number of measurements and incurred computational cost of each idea.Exploring the idea of Pauli products' compatibility led to the realization that the coefficient splitting framework is the most general implementation of this idea for the grouping methods.
We found that although classical shadow techniques have shown performance superior to that of the non-overlapping measurement scheme based on graphcoloring algorithms, by employing a greedy heuristic the non-overlapping scheme can already outperform the classical shadow techniques.Due to the dependence of the total number of measurements on the sum of square roots of variances for measurable fragments, the greedy approach to grouping performs the best by creating distributions of fragments that are highly non-uniform in variance.The SI technique based on greedy grouping and using only qubit-wise commutativity surpasses shadow tomography based techniques (Derand and OGM) by a factor of 2-3 for larger systems.Thus, for future developments, the shadow tomography approaches need to be compared with greedy grouping based algorithms rather than with grouping approaches that try to minimize the overall number of measurable groups (e.g.LF).
Unlike previous classical shadow techniques that focus on qubit-wise commuting groups, we also considered measuring techniques involving non-local (entangling) transformations that allow one to measure groups of fully commuting Pauli products.An efficient implementation of these non-local transformations using Clifford gates was proposed by Gottesman 28 and would introduce only O(N 2 q / log N q ) CNOT gates.The schemes based on fully commuting groups outperform their qubitwise commuting counterparts up to a factor of seven in variances of the expectation value estimators.
Taking advantage of compatibility of some Pauli products with members of multiple measurable groups (i.e.overlapping groups idea) can be generally presented as augmenting the measurable groups with all Pauli prod-ucts compatible with initial members of these groups.Then the coefficients of Pauli products entering multiple groups are optimized to lower the estimator variance, with the constraint that the sum over coefficients in different groups for each Pauli product is equal to the coefficient of the Pauli product in the Hamiltonian.This coefficient splitting approach incorporates as a special case a heuristic technique of optimizing measurement allocation for overlapping measurable groups.
Even though the coefficient splitting variance minimization provides the lowest variances among all studied approaches, it requires optimizing a large number of variables: ∼ N 4 q (∼ N 7 q ) for full (qubit-wise) commutativity.Due to certain restrictions, the measurement allocation approach is much more economical in the number of optimization variables: ∼ N 3 q (∼ N 4 q ) for full (qubit-wise) commutativity.Another contributor of the computational cost of these techniques is calculation of the variance gradients.To reduce the computational cost of this part we proposed iterative schemes, the ICS method converges to true extrema, while the IMA scheme deviates from extrema.IMA and ICS provide up to forty and eighty percent reduction in the number of measurements required compared to corresponding best non-overlapping techniques.
Both IMA and ICS use approximate covariances between Pauli products to lower the estimator variance.Use of CISD wavefunction for calculating these covariances shown improvements comparable to those obtained using the exact covariances.Additionally, in IMA and ICS, one can improve covariances obtained from approximate wavefunctions using accumulated measurement results.

DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon request.

CODE AVAILABILITY
Some part of the code that supports the findings of this study is available in the OpenFermion 29 and PySCF 30 libraries.The rest of the code is available from the corresponding author upon request.
for non-overlapping Pauli groups All measurable fragments Âα 's are linear combinations of mutually commuting or qubit-wise commuting Pauli products Âα = k c k Pk , Pk ∈ P α , (α) k 's is to use analytical gradients ∂Var H /∂c (α) k .The gradients are non-linear functions of c (α) k and computing them becomes computationally expensive as the number of c (α) (α) k 's with fixed m α 's and (2) optimization of m α 's with fixed c (α) is the i-th measurement result of Pk measured in basis B α , and M k = α∈I k m α is the total number of times Pk is measured.Pk 's are used in the Hamiltonian estimator as H = k c k Pk .The variance of H is Var H = jk c j c k Cov Pj , Pk

c 2 × 2
i c j M i M j α∈Ii∩Ij m α Cov ψ Pi , Pj (A1) j c i c j Cov ψ Pi , Pj .(A2)Notethat M i 's are dependent on m α 's.We minimize Var H using m α 's as variables under the constraints α m α = 1 and m α > 0. The variance derivatives with respect to m α 's are 2ci c j Cov ψ Pi , Mj if α ∈ I i − m β Mi(Mj ) 2 if α ∈ I j 1st + 2nd if α ∈ I i ∩ I j    × 2c i c j Cov ψ Pi , Pj .(A3)To avoid constrained optimization with m α 's, we use auxiliary variables p α 's that express m α 's asm α = e pα β e p β (A4)to introduce the α m α = 1 and m α > 0 conditions.This is known as the softmax function often used in machine learning techniques.Derivatives ∂Var H /∂p β

TABLE I .
Variances of the Hamiltonian estimators in different methods calculated with the exact wavefunction.Covariances calculated with the exact wavefunction were used for finding optimal parameters in all methods.

TABLE IV .
Variances of Hamiltonian estimators with fully commuting fragments: largest first (LF), sorted insertion (SI), iterative measurement allocation (IMA), and iterative coefficient splitting (ICS).All algorithms utilize CISD wavefunctions for evaluating covariances, but the final variances are computed using exact wavefunctions.search was enabled in part by support provided by Compute Ontario and Compute Canada.The variance of the estimator for Ĥ as a function of m α 's and M k 's is