Variational Quantum State Eigensolver

Extracting eigenvalues and eigenvectors of exponentially large matrices will be an important application of near-term quantum computers. The Variational Quantum Eigensolver (VQE) treats the case when the matrix is a Hamiltonian. Here, we address the case when the matrix is a density matrix $\rho$. We introduce the Variational Quantum State Eigensolver (VQSE), which is analogous to VQE in that it variationally learns the largest eigenvalues of $\rho$ as well as a gate sequence $V$ that prepares the corresponding eigenvectors. VQSE exploits the connection between diagonalization and majorization to define a cost function $C=\Tr(\tilde{\rho} H)$ where $H$ is a non-degenerate Hamiltonian. Due to Schur-concavity, $C$ is minimized when $\tilde{\rho} = V\rho V^\dagger$ is diagonal in the eigenbasis of $H$. VQSE only requires a single copy of $\rho$ (only $n$ qubits) per iteration of the VQSE algorithm, making it amenable for near-term implementation. We heuristically demonstrate two applications of VQSE: (1) Principal component analysis, and (2) Error mitigation.


I. INTRODUCTION
Near-term quantum computers hold great promise but also pose great challenges. Low qubit counts place constraints on problem sizes that can be implemented. Decoherence and gate infidelity place constraints on the circuit depth that can be implemented. These constraints are captured in the (now widely used) term Noisy Intermediate-Scale Quantum (NISQ) [1].
To address the circuit depth constraint, Variational Quantum Algorithms (VQAs) have been proposed for many applications . VQAs employ a quantum-classical optimization loop to train the parameters θ of a quantum circuit V (θ). Leveraging classical optimizers allows the quantum circuit depth to remain shallow. This makes VQAs powerful tools for error mitigation on NISQ devices.
A particularly important application of NISQ computers will be extracting the spectra, eigenvalues and eigenvectors, of very large matrices. Indeed the most famous VQA, known as the Variational Quantum Eigensolver (VQE), aims to variationally determining the energies and state-preparation circuits for the ground state and low-lying excited states of a given Hamiltonian, i.e., a Hermitian matrix. VQE promises to revolutionize the field of quantum chemistry [28,29], and perhaps even nuclear [30] and condensed matter [31,32] physics.
If one instead considers a positive-semidefinite matrix, then extracting the spectrum has direct application as a machinelearning primitive known as Principal Component Analysis (PCA). Along these lines, Lloyd et al. [33] introduced a quantum algorithm called quantum PCA (qPCA) to deterministically extract the spectrum of an n-qubit density matrix ρ. qPCA employs quantum phase estimation and density matrix exponentiation as subroutines and hence requires a large number of quantum gates and copies of ρ. In an effort to reduce circuit depth in the NISQ era, LaRose et al. [6] developed a VQA for this application called Variational Quantum State Diagonalization (VQSD). VQSD requires two copies of ρ, hence 2n qubits, and trains the parameters θ of a gate sequence V (θ) so thatρ = V (θ)ρV † (θ) is approximately diagonal. A different variational approach, called Quantum Singular Value Decomposition (QSVD), was introduced by Bravo-Prieto et al. [27]. QSVD takes a purification |ψ of ρ as its input and hence requires however many qubits it takes to purify ρ (possibly 2n qubits).
In this work, we introduce a variational algorithm for PCA that only requires a single copy of ρ and hence only n qubits per iteration of the algorithm. Our approach, called the Variational Quantum State Eigensolver (VQSE), exploits the mathematical connection between diagonalization and majorization. Namely, it is well known that the eigenvalues of a density matrix ρ majorize the diagonal elements in any basis. Hence, by choosing a cost function C that is a Schur concave function of the diagonal elements of ρ, one can ensure that the cost function is minimized when ρ is diagonalized. Specifically, we write the cost as C = Tr(ρH), where H is some Hamiltonian with a non-degenerate spectrum, which ensures the Schur concavity property. Note that evaluating C simply involves measuring the expectation value of H onρ, and hence one can see why only n qubits are required.
To learn the optimal θ parameters, we introduce a new training approach, not previously used in other VQAs. Specifically, we employ a time-dependent Hamiltonian H that we adapt based on information gained from measurements performed throughout the optimization. The aim of this adaptive approach is: (1) to mitigate barren plateaus in training landscapes, and (2) to get out of local minima. With our numerics, we find that using an adaptive Hamiltonian is better than simply fixing the Hamiltonian throughout the optimization. Here, we further provide a rigorous analysis of the measurement shot requirements of VQSE where we show that the relative error induces from statistical sampling error is, with high probability, smaller than δ, if one measures the system response with a number of shots that scales only as Ω(log(1/δ)/λ 2 m ), with λm being the smallest eigenvalue one wishes to estimate.
Finally, we illustrate two important applications of VQSE with our numerical implementations. First, we use VQSE for error mitigation of the W -state preparation circuit. Namely, by projecting the state onto the eigenvector with the largest eigenvalue, we re-purify the state, mitigating the effects of incoherent errors. Second, we use VQSE to perform entanglement spectroscopy (which is essentially PCA on the reduced state of a bipartition) on the ground state of an XY -model spin chain. This allows us to identify quantum critical points in this model.

A. Theoretical Basis of VQSE
Consider an n-qubit quantum state ρ with (unknown) spectral decomposition ρ = k λ k |λ k λ k |, such that the eigenvalues are ordered in decreasing order (i.e., λ k λ k+1 for k = 1, . . . , rank(ρ), while λ k = 0 for k rank(ρ)). The goal of VQSE is to estimate the m-largest eigenvalues of ρ, where m 2 n , and furthermore to return a gate sequence V (θ) that approximately prepares their associated eigenvectors from standard basis elements.
At first sight, this looks like a matrix diagonalization problem. Indeed, this is the perspective taken in the literature, e.g., by the VQSD algorithm [6] which employs a cost function that quantifies how farρ = V (θ)ρV † (θ) is from a diagonal matrix. However, our VQSE algorithm takes a conceptually different approach, focusing on majorization instead of diagonalization.
We write the VQSE cost function as an energy, or the expectation value of a Hamiltonian: Here, H is a simple n-qubit Hamiltonian that is diagonal in the standard basis and whose eigenenergies and associated eigenstates are known and respectively given by {E k } and {|e k } (where e k = e 1 k · . . . · e n k for k = 1, . . . , 2 n are bitstrings of length n). Moreover, we henceforth assume that the eigenenergies are non-negative and ordered in increasing order, i.e., E k E k+1 . We have where we defined the vectors E = (E1, E2, . . .) and p = (p1, p2, . . .). Similarly, let us define the vector of eigenvalues of ρ as λ = (λ1, λ2, . . .). Then, since the eigenvalues of a positive semidefinite matrix majorize its diagonal elements λ p, and since the dot product with an increasingly ordered vector is a Schur concave function [34,35], we have where we have used the fact that ρ andρ have the same eigenvalues. Hence, one can see that C(θ) is minimized when V (θ) maps the eigenbasis of ρ to the eigenbasis of H, with appropriate ordering. Since the latter is chosen to be the standard basis, this corresponds to diagonalizing ρ. Thus, even though it may not be obvious at first sight, minimizing C(θ) corresponds to diagonalizing ρ. . It then outputs estimates of the m-largest eigenvalues of ρ, and their associated eigenvectors. The first step of the algorithm is a hybrid quantum-classical optimization loop to train the parameters θ, and minimize the cost function defined in (4) as the expectation value of a Hamiltonian H(t) over the stateρ = V (θ)ρV † (θ). To facilitate this optimization, we adaptively update H(t) using information obtained via measurements onρ. When this optimization terminates, at which point we say θ = θopt, one reads off the eigenvalues. Namely, by preparing V (θopt)ρV † (θopt) and measuring in the standard basis, one obtains bitstrings z whose associated frequencies are estimates of the eigenvalues of ρ. Finally, one prepares the estimated eigenvectors by preparing the states |z and acting on them with V † (θopt).
B. The VQSE algorithm Figure 1 shows a schematic diagram of the Variational Quantum State Eigensolver (VQSE) algorithm. The three inputs to VQSE are: (1) a n-qubit quantum state ρ, (2) an integer m, and (3) a parameterized gate sequence or ansatz V (θ). The outputs of VQSE are: . While in principle m can be as large as 2 n , we assume that one is interested in a number m of eigenvalues and eigenvectors that grows at worse as O(poly(n)).
After taking in the inputs, VQSE enters a hybrid quantumclassical optimization loop to train the parameters θ in the ansatz V (θ). This loop employs a quantum computer to evaluate the VQSE cost function, denoted Here, H(t) is a Hamiltonian that could, in general, depend on the time t, where t ∈ [0, 1] is a parameter that indicates the optimization loop run-time such that the loop starts at t = 0 and ends at t = 1. For all t, we assume that H(t) can be efficiently measured on a quantum computer and that it is diagonal in the standard basis, with its lowest m eigenenergies being non-degenerate and non-negative. We further elaborate on how to choose H(t) in Section II C. Note that the quantum circuit to evaluate the cost C(t, θ), as depicted in Fig. 1, simply involves applying V (θ) to the state ρ and then measuring the Hamiltonian H(t).
The quantum computer then feeds the value of the cost (or the gradient of the cost for gradient-based optimization) to a classical computer, which adjusts the parameters θ for the next round of the loop. The ultimate goal is to find the global minimum of the cost landscape at t = 1, i.e., to solve the problem: θopt ≡ arg min In reality, one will need to impose some termination condition on the optimization loop and hence the final parameters obtained (which we still denote as θopt) will only approximately satisfy Eq. (5). Nevertheless, we provide a verification procedure below in Section II D that allows one to quantify the quality of the solution even when (5) is not exactly satisfied. As shown in Fig. 1, the next step of VQSE is the eigenvalue readout. From the parameters θopt one can estimate the eigenvalues of ρ by acting with the gate sequence V (θopt) and then measuring in the standard basis {|z k }. Let Pr(z k ) be the probability of the z k outcome. Then by taking the m largest of these probabilities we define L ≡ { λi} m i=1 as the ordered set of estimates of the m-largest eigenvalues of ρ, and we define Z as the set of bitstrings {zi} m i=1 associated with the elements of L: λi = Pr(zi) = zi| ρ|zi , such that λi λi+1 . (6) Note that λi in (6) correspond to diagonal elements of ρ in the standard basis, and not to its eigenvalues.
In practice, when estimating the eigenvalues one measures ρ in the standard basis a finite number of times Nruns. Hence, if a bitstring zi ∈ Z has frequency fi for Nruns total runs, then we can estimate λi as One can think of this as a Bernouilli trial. Let Λi be a random variable that takes value 1 if we get outcome zi (with probability λi), and takes value 0 otherwise (with probability 1 − λi). After repeating the experiment Nruns times we are interested in bounding the probability that the relative error εi ≡ | λ est i − λi|/ λi is larger than a certain value c 0. From Hoeffding's inequality, we find For fixed Nruns, Eq. (8) shows that the smaller the inferred eigenvalue λi, the larger the probability of having a given relative error. Equation (8) also implies that increasing Nruns reduces the probability of large relative errors. Hence, we can always choose Nruns such that the probability of error is smaller than a given δ for all m eigenvalues via where λm is the smallest eigenvalue of interest. Analogously, from (9) we have that all eigenvalues larger than log(1/δ) 2c 2 Nruns have a probability of error smaller than δ. The last step of VQSE is to prepare the inferred eigenvectors of ρ. Given a bitstring zi ∈ Z, one can prepare the associated inferred eigenvector by taking the state |0 = |0 ⊗n , acting on it with the gate X z i 1 ⊗ X z i 2 ⊗ . . . ⊗ X z i n , and then applying the gate sequence V (θopt) † : Note that while the inferred eigenvalues can be stored classically, the eigenvectors are prepared on a quantum computer, and hence one needs to perform measurements to extract information about these eigenvectors.

C. Cost functions
Consider the Hamiltonian H(t) that defines the VQSE cost function in (4). Recall that we choose H(t) so that: (1) it is diagonal in the standard basis, (2) its lowest m eigenvalues are non-negative and non-degenerate, and (3) it can be efficiently measured on a quantum computer. Let us now discuss possible choices for H(t).
Fixed Hamiltonians. When the Hamiltonian is fixed (i.e., time-independent), we write H(t) ≡ H, and C(t, θ) ≡ C(θ). In this case, a simple, intuitive cost function is given by with qi > 0 (such that qi > qi+1), and where the |ei are orthogonal states in the standard basis. The spectrum of HG is composed of m non-degenerate eigenenergies, and a (2 n − m)-fold degenerate eigenenergy. On the one hand, this large degeneracy makes it easier to find a global minimum as the solution space is large. That is, denoting as Vopt an optimal unitary that minimizes (11), then there is a large set of such optimal unitaries Sopt = {Vopt}, which are not related by global phases. This is due to the fact that one is only interested in the m rows and the m columns of V (θ) that diagonalize ρ in the subspace spanned by {|ei } m i=1 . Specifically, any optimal unitary must satisfy zi|Vopt|λi = λi|Vopt|zi = δz i e i for i = 1, . . . , m (and with zi ∈ Z), while the (2 n − m) × (2 n − m) unitary principal submatrix of Vopt with matrix elements zi|Vopt|z i , where zi, z i ∈ Z, remains completely arbitrary.
On the other hand, it has been shown that when employing hardware-efficient ansatzes [36] for V (θ), global cost functions like CG(θ) are untrainable for large problem sizes as they exhibit exponentially vanishing gradients (i.e., barren plateaus [37]) even when the ansatz is short depth [38]. Such barren plateaus can be avoided by employing a different type of cost function known as a local cost [38,39], where C is defined such that one compares states or operators with respect to each individual qubit rather than comparing them in a global sense.
One can construct a local cost where the Hamiltonian is a weighted sum of local z-Pauli operators: where rj ∈ R and Zj is the z-Pauli operator acting on qubit j. Care must be taken when choosing the coefficients {rj} n j=1 to ensure that the lowest m-eigenenergies of HL are nondegenerate. For instance, when targeting the largest eigenvalue of ρ (m = 1), the simple choice rj = 1, ∀j achieves this goal. On the other hand, if one is interested in m = n + 1 eigenvalues, then one can choose rj = r1 + (j − 1)δ with r1 δ, which will ensure that the m-lowest energy levels, {E1, E1 + r1, E1 + r1 + δ, ..., E1 + r1 + (m − 1)δ}, are nondegenerate. Henceforth, we will assume that one has chosen {rj} n j=1 such that the m-lowest energy levels are nondegenerate.
While fixed local cost functions do not exhibit barren plateaus for shallow depth, they still have several trainability issues. First, having less degeneracy in HL leads to a more difficult optimization problem. Since degeneracy allows for additional freedom in the solution space, non-degeneracy constrains the possible solutions. Therefore, there is a tradeoff between engineering non-degeneracy (which allows one to distinguish more eigenvalues of ρ) versus keeping degeneracy (which allows for more solutions). Second, we expect both CL and CG to have a high density of local minima, especially for large m. This is because there will be partial solutions to the problem where one correctly assigns some eigenvalues of ρ to the right energy levels of the Hamiltonian, while incorrectly assigning other eigenvalues. This local minima issue is what motivates the following adaptive approach.
Adaptive Hamiltonian. Let us now we introduce an approach to adaptively update the VQSE Hamiltonian (and hence the cost function) based on information obtained via measurements during the optimization loop. This method allows us to mitigate the issues discussed in the previous section that arise for cost functions with fixed local or global Hamiltonians. Namely, the adaptive cost function solves the following three problems: (1) barren plateaus for shallow depth [38], (2) high density of local minima, (3) smaller solution space arising from non-degenerancies.
Consider a time-dependent Hamiltonian of the form where f (t) is a real-valued function such that f (0) = 0, f (1) = 1, and HL is a local Hamiltonian as in (12). We recall here that t ∈ [0, 1] is a parameter that indicates the optimization loop run time. Moreover, we define the time-dependent global Hamiltonian where the coefficients qi are real and positive, and chosen in the same way as in (11). In addition, the states |zi(t) are adaptively chosen throughout the optimization loop by preparing ρ, measuring in the standard basis to obtain the sets L and Z, and updating HG(t) so that zi(t) ∈ Z.
As schematically shown in Fig. 2(a), in order to mitigate the barren plateau phenomena it is important to choose a function f (t) which is not rapidly growing with t. Hence, for small t, H(t) ∼ HL and the cost function will be trainable as it will not present a barren plateau. Then, as t increases, one can deal with the issue of local minima by updating HG(t). As depicted in the insets of Fig. 2(a), adaptively changing HG(t) transforms local minima in the cost landscape into global minima. Then, by the end of the algorithm we have We remark that Ref. [40] proposed a method called adiabatically assisted VQE (AAVQE), which dynamically updates the VQE cost function by driving between a simple Hamiltonian to the non-trivial problem Hamiltonan. Note that the goals of AAVQE and our adaptive training method are diffferent. Furthermore, in our method one adaptively updates the cost function based on information obtained through measurements, while AAVQE does not use information gained during the optimization.
Operational meaning of the cost function. Here we discuss the operational meaning of the VQSE cost function, showing that small cost values imply small eigenvalue and eigenvector errors. Let {| λi } m i=1 be the set of the inferred eigenvector associated with every λi in L, and let |δi = ρ| λi − λi| λi . We then define eigenvalue and eigen-= = Figure 3.
Ansatz diagram. (a) Layered hardwareefficient ansatz for V (θ). A single layer of the ansatz is composed of two-qubit gates Bµ(θµ) acting on neighboring qubits. Shown is the case of two layers. (b) While there are many choices for each block Bµ(θµ), in our numerics we employed two different parameterizations. Top: Each gate is composed of a controlled-Z gate preceded and followed by single-qubit rotations about the y-axis Ry(θ) = e iθσy . Bottom: Each gate is composed of a CNOT gate preceded and followed by a single-qubit rotation G(θ1, θ2, θ3) = e iθ 3 σz /2 e iθ 2 σy /2 e iθ 1 σz /2 . The number of parameters in θ increases linearly with the number of layers and the number of qubits n.
vector errors as follows: Here δi | δi quantifies the component of ρ| λi that is orthogonal to | λi , which follows from the following identity: Then by using the Cauchy-Schwarz inequality, majorization conditions, and Schur convexity, we establish the following upper bound on eigenvalue and eigenvector errors (see Section IV A for more details): where (E1, . . . , Em) are the m-smallest eigeneneries of H, and where for simplicity we have omitted the t dependence. Thus Eq. (16) provides an operational meaning to our cost function, as small values of the cost function lead to small eigenvalue and eigenvector errors.

D. Verification of solution quality
Let us show how to verify the results obtained from the VQSE algorithm. We remark that this verification step is optional, particularly because it requires 2n qubits, whereas the rest of VQSE only requires n qubits.
In Section IV B of Methods, we prove the following useful bound on eigenvalue and eigenvector error: where one can take m as any integer between m and 2 n . One can efficiently estimate the right-hand-side of (17) as follows.
Given two copies of ρ, Tr[ρ 2 ] can be estimated by a depthtwo quantum circuit with classical post-processing that scales linearly with n [41]. Moreover, since Tr[ρ 2 ] is independent of V (θ), one only needs to compute it once (outside of the optimization loop). Estimating the λi for i = 1, ..., m essentially comes for free as part of the eigenvalue readout step of VQSE, where we note that taking m > m simply involves keeping track of the frequencies of more bitstrings (more than the m-largest) during this readout step. Finally, we remark that while Eq. (16) can also be used for verification, in Section IV B we show that (17) provides a tighter bound, particularly as one increases m.

E. Ansatz
While there are many possible choices for the ansatz V (θ), we are here restricted to state-agnostic ansatzes which do not require any a prior information about ρ. One such ansatz is the Layered Hardware Efficient Ansatz [36] shown in Fig. 3(a). Here, V (θ) consists of a fixed number L of layers of twoqubit gates Bµ(θµ) acting on alternating pairs of neighboring qubits. Figure 3(b) illustrates possible choices for Bµ(θµ). Note that with this structure, the number of parameters in θ grows linearly with the n and L.
Let us remark that the Layered Hardware Efficient Ansatz can lead to trainability issues as the system size increases [37,38]. Hence, different strategies have been proposed to mitigate such difficulties, such as learning to initialize parameters [42], layer-by-layer training [43], and correlating the parameters [44]. In addition, these methods can be combined with more sophisticated ansatzes, such as a variable-structure ansatz [6,41] where the structure of the ansatz is not fixed, and where the gate placement becomes an optimizable hyperparameter. This variable-structure approach has already been shown to improve performance in the context of extracting the eigensystem of a quantum state [6].
Finally, since VQSE optimization corresponds to an energy minimization problem, a natural ansatz that can also be used to mitigate trainability issues is the Quantum Alternating Operator Ansatz (QAOA) [3,45]. Specifically, one could employ H(t) as the problem Hamiltonian in the QAOA and use a standard mixing Hamiltonian. While we do not employ this ansatz in our heuristics, it is nevertheless of interest for future work.

F. Optimization
Regarding the optimization of the parameters θ, while gradient-free methods are an option [46,47], there has been recent evidence that gradient-based methods can perform better [48][49][50][51]. Moreover, as shown in [52,53], for cost functions like (1), gradients can be analytically determined (see Section IV C in the Methods for an explicit derivation of the gradient formula). Therefore, in our heuristics, we employ gradient-based optimization.

G. Numerical Implementations
Here we present the numerical results obtained from implementing VQSE. We first employ VQSE to estimate the spectrum of quantum states of different dimensions and compare the performance of cost functions based on the global, local,   (c) Figure 5. Runs-per-success versus inverse absolute error 1/ε λ . We implemented VQSE for the states of: (a) n = 6, (b) n = 8, and (c) n = 10 qubits corresponding to Fig. 4. The insets depict the same data in the small 1/ε λ regime. Runsper-success is defined as the total number of runs divided by the number of runs with a relative error smaller than a target ε λ . For all three cases, we can see that as 1/ε λ increases, the adaptive Hamiltonian has the lowest number of runs-per-success, and hence the best performance. In all cases the x axis is plotted on a log scale. and adaptive Hamiltonians discussed in Section II C. Then we use VQSE for error mitigation of the W -state preparation circuit. Finally, we implement VQSE for entanglement spectroscopy on the ground-state of an XY -spin chain, which allows us to detect the presence of quantum critical points. VQSE for quantum principal component analysis. Figure 4 presents the results of implementing VQSE to estimate the six largest eigenvalues (m = 6) of quantum states with n = 6, 8, and 10 qubits. In all cases we have rank(ρ) = 16, as the states were prepared by randomly entangling the system qubits with four ancillary qubits, which were later traced out. Moreover, we chose ρ to be real and not sparse in the standard basis.
In our heuristics we used the Layered Hardware Efficient Ansatz of Fig. 3(b, top), and we employed the fixed-local, fixed-global, and adaptive cost functions of Section II C. The termination condition was stated in terms of the maximum number of iterations in the optimization loop. Hardware noise and finite sampling were not included in these heuristics. (The next subsection shows heuristics with noise.) For the fixed local cost function we chose the {rj} n j=1 in (12) so that the first six energy eigenvalues of HL were non-degenerate. Moreover, we defined the fixed global Hamiltonian such that the first six energy levels (i.e., associated eigenvectors and spectral gaps) coincided with those of HL. Finally, the adaptive Hamiltonian was constructed according to the procedure described above, and more specifically, in Algorithm 1 in the Methods section.
Since for these examples we can calculate the exact eigenvalues λi, we compute and plot the following quantities which we use as figures of merit for the performance of the VQSE algorithm: Here ε λ and εr respectively quantify absolute error and relative error in estimating the exact eigenvalues. We remark that these two quantities provide different information: The absolute error is biased towards the error in estimating the large eigenvalues of ρ, while on the other hand, the relative error is more sensitive to errors in estimating the small eigenvalues of ρ. Figure 4 plots the relative and absolute errors versus number of iterations (with the total number of iterations fixed). While we performed many runs, these plots show only the run that achieved the lowest absolute error. For all system sizes considered (n = 6, 8 and 10), VQSE achieves smaller relative and absolute errors when employing the adaptive Hamiltonian approach than using a fixed Hamiltonian. For n = 6, the errors obtained by adaptively updating H(t) are two orders of magnitude smaller than those obtained with fixed Hamiltonians, while for n = 8 they are one order of magnitude smaller. As shown in Fig. 4(c) for n = 10, the adaptive Hamiltonian approach achieves error of the order: ∼ 10 −5 for the relative error, and ∼ 10 −7 for the absolute error, and again outperforms the fixed local Hamiltonian approaches. We here finally remark that that we can use Eq. (9) to determine the number of shots needed to guarantee that with a probability larger that 99% the relative error induced by finite sampling is smaller that 0.001. Namely, we find that one needs a number of shots larger than 50.2K, which is well within the order of magnitude of shots regularly used.
It is natural to ask whether the runs shown in Fig. 4 are representative of the algorithm performance. To provide an analysis of the average VQSE performance, we plot in Fig. 5 the runs-per-success versus 1/ε λ for each of the aforementioned examples. Here, runs-per-success is defined as the total number of runs divided by the number of runs with an absolute error smaller than a target ε λ .
From all three panels in Fig. 5 we see that for large 1/ε λ , the adaptive Hamiltonian always has the best performance as it requires less runs-per-success to achieve smaller errors. Finally, it is interesting to note from the insets of Fig. 5 that there is a regime where the run time has a linear dependence on log(1/ε λ ) when employing an adaptive approach. This suggests that VQSE may perform quite efficiently for large ε λ . However, the linear dependence breaks down for small ε λ , where the number of runs-per-success seems to grow exponential with 1/ε λ . Despite, such growth, for up to 8 qubits we only need 100 repetitions to achieve an error of order 10 −6 . We leave for future work a more detailed study of the dependence of runs-per-success for small error. Finally, the results presented in Figs. 4 and 5 suggest that for a sufficiently large value of number of iterations, the adaptive Hamiltonian approach outperforms the fixed Hamiltonian approaches as it requires the least number of iterations to converge to very small values of ε λ .
Error mitigation. Here we discuss an important application of the VQSE algorithm for error mitigation. Quantum state preparation circuits (gate sequences U which prepare a target state |ψ ) are used as subroutines in many quantum Cost function value and fidelity versus number of iterations. We implement VQSE for error mitigation of the three qubit W -state preparation circuit. The input state ρ corresponds to the mixed state obtained by running the W -state preparation circuit on a noisy simulator. The dashed line corresponds to the fidelity F (ρ, |ψ ) between ρ and the the exact W state |ψ . For each iteration step, we compute the fidelity F (σ, |ψ ), where the mixed state σ is obtained by running the VQSE eigenvector preparation circuit on the noisy simulator. Curves depict the average of 10 instances of the algorithm. As the number of iterations increases the cost function value decreases, which implies that we are able to train V (θ) in the presence of noise. After a few iterations of the VQSE optimization loop, we find F (σ, |ψ ) > F (ρ, |ψ ).
algorithms. However, since current quantum computers are noisy, all state preparation circuits produce mixed states ρ. If there is little enough incoherent noise, we can expect that the largest eigenvalue of ρ is associated with |ψ . Here we show that VQSE can be implemented to re-purify ρ and estimate |ψ . Naturally, when running the VQSE eigenvector preparation circuit, noise will also produce a mixed state σ. However, if the depth of V (θ) is shorter than the depth of U , one can obtain a higher fidelity between σ and |ψ in comparison to the fidelity between ρ and |ψ . In this case one can mitigate errors by replacing the state preparation circuit by the VQSE eigenvector preparation circuit.
Let us now consider the three qubit W -state preparation circuit from [54, Section 2.2] (see also [55]). By employing a noisy quantum computer simulator with the noise profile of IBM's Melbourne processor [56], we find that the fidelity between ρ and the exact W state |ψ is F (ρ, |ψ ) ≈ 0.785. We then train 10 instances of VQSE with two layers of the ansatz in Fig. 3(b, bottom) and with a termination condition of 50 iterations. Moreover, we employ the adaptive Hamiltonian, where we update H(t) every 10 iterations according to Algorithm 1. Figure 6 shows the average cost function value and average fidelity between |ψ and the state σ obtained by running the VQSE eigenvector preparation circuit. As the number of iterations increases, the cost value tends to decrease, showing that we are able to train in the presence of noise. Moreover, we also see that F (σ, |ψ ) increases and saturates at a value larger than F (ρ, |ψ ), namely at 0.853, hence showing that we are in fact mitigating the effect of noise. This can be explained by the fact that we reduced the circuit depth, as our ansatz contains two CNOTs, while the textbook circuit contains three CNOTs.
Entanglement Spectroscopy. We now discuss the possibility of employing VQSE to compute the entanglement spectrum of a state ρ which is obtained as the reduced state of a bipartite quantum system |ψAB , i.e., ρ = TrB|ψAB ψAB|. Let d denote the dimension of ρ. The entanglement spectrum [57] refers to the collection {λ k } d k=1 of eigenvalues of ρ, and as discussed in [58], entanglement spectroscopy is a useful tool to analyze states |ψAB prepared by simulating many-body systems on a quantum computer. Specifically, the entanglement spectrum is useful to study the bipartite entanglement, as it contains more universal signatures than the von Neumann entropy alone [57], and it can detect the presence of quantum critical points [59,60].
Let us now consider an N = 8 spin-1/2 cyclic chain interacting trough uniform XY first-neighbor Heisenberg coupling in the presence of a non-transverse magnetic field. The Hamiltonian of the system is where j labels the site in the chain, S µ j the spin operator (with µ = x, y, z), Jµ the coupling strength, and hµ the magnetic fields. Here, Jµ > 0 leads to ferromagnetic (FM) coupling, while Jµ < 0 to antiferromagnetic (AFM) coupling. As shown in [60,61], for specific values of the fields hµ (known as factorizing fields) the Hamiltonian in (19) presents quantum critical points known as "factorization" points. At the non-transverse factorizing field, the ground-state of H becomes a separable non-degenerate state such that one of its eigenvalues is exactly equal to one, while the rest are exactly zero.
In Fig. 7(a) and (c), we show results of implementing VQSE with an adaptive Hamiltonian to compute the three largest eigenvalues of the state ρ defined as the reduced state of 4 neighboring spins obtained from the ground state of (19). For simplicity we have parametrized the fields as (hz, hx) = h(cos(γ), sin(γ)) with γ fixed. Specifically, in Fig. 7(a) and (c) we plot the estimated eigenvalues versus the field magnitude h for a system with FM and AFM couplings, respectively. Moreover, dashed lines indicate the exact eigenvalues. For each field value, we run 8 instances of VQSE, and even for such a small number of runs, the estimated eigenvalues give good approximations as we get relative errors which in general are of the order of ∼ 10 −2 .
In Fig. 7(b) and (d), we show the same data as in (a) and (c) but the y axis is plotted on a logarithmic scale, and where instead of plotting the largest eigenvalue λ1, we plot 1 − λ1. For the FM (AFM) case, there is a factorization points at h/Jx ≈ 0.76 (h/Jx ≈ 1.43). As depicted in these panels, around critical points we correctly find λ1 ≈ 1, and λ2, λ3 ≈ 0. These results show that VQSE can detect quantum critical factorization points.

III. DISCUSSION
In the NISQ era, every qubit and every gate counts. Wasteful usage of qubits or gates will ultimately limit the problem size that an algorithm can solve. In this work, we presented an algorithm for extracting the eigensystem of a quantum state ρ that is as frugal as we could imagine, with respect to qubit count. We introduced the Variational Quantum State Eigensolver (VQSE), which estimates the m-largest eigenvalues and associated eigenvectors of ρ, using only a single copy of ρ, and hence only n qubits per iteration of the VQSE. VQSE exploits the mathematical connection between diagonalization and majorization to define an efficiently computable cost function as the expectation value of a Hamiltonian. We derived an operational meaning of this cost function as a bound on eigensystem error. Furthermore, we introduced a training method that involved adaptively updating the VQSE cost function based on the information gained from measurements performed throughout the optimization. This was aimed at addressing both barren plateaus and local minima in the cost landscape.
We have numerically implemented VQSE for several applications. We showed that VQSE can be employed for PCA by implementing the VQSE algorithm on states of n = 6, 8, and 10 qubits to estimate the six largest eigenvalues. Our numerical results (Figs. 4 and 5) indicate that our adaptive cost function approach leads to smaller errors than the ones obtained by training a fixed cost function. We also showed ( Fig. 7) that one can detect quantum critical points by performing entanglement spectroscopy with the eigenvalues obtained via VQSE. Finally, we employed VQSE to mitigate errors that occur during the W -state preparation circuit. This involved running VQSE on a noisy simulator to re-purify the state, i.e., find the circuit that prepares the eigenvector with the largest eigenvalue. We found (Fig. 6) that the re-purified state obtained by VQSE improved the fidelity with the target W state, and hence reduced the effects of noise.
Comparison to literature. Since VQSE only requires n qubits, it is as qubit frugal as it can possible be when compared to other algorithms for the same task, such as quantum Principal Component Analysis (qPCA) [33], Variational Quantum State Diagonalization (VQSD) [6], and Quantum State Singular Value Decomposition (QSVD) [27]. The quantum phase estimation and density matrix exponentiation primitives in qPCA make it difficult to implement in the near term [62], and this is supported an by attempted implementation in [6] that resulted in poor performance. On the other hand, VQSD and QSVD are variational algorithms and hence have the possibility of lower-depth requirements. But they still need to employ a larger number of qubits than VQSE. Specifically, VQSD needs to perform the so-called Diagonalized Inner Product Test [6] that requires two copies of ρ, i.e., requires twice as many qubits as VQSE. In addition, it is also worth noting that VQSD is vulnerable to noise, since any asymmetry between the noise acting of each copy of ρ will affect the result of the algorithm. Finally, in QSVD, one needs to either compute or have access to a purification |ψ of ρ. Hence QSVD requires a number of qubits between n and 2n. Moreover, we expect that noise will be a bigger issue for QSVD than for VQSE, since in practice the assumption that one has a pure state in QSVD can often be violated due to incoherent noise during state preparation.
Quantum-inspired classical algorithms [63] for PCA can perform well in practice, provided that the matrix has a very large dimension, low rank, and low condition number [64]. We note that VQSE does not have such limitations, except the fact that VQSE yields results with high accuracy for low-rank states. Here is is also paramount to recall that recent results have show that an exponential advantage is still possible for PCA [65,66], even for near-term algorithms, as the quantuminspired classical algorithms are artificially given too much power via access to quantum state amplitudes. Thus, in view of these recent result, VQSE can be useful in the quest for achieving a quantum speedup with quantum PCA, particularly for analysis of quantum data. For the case of classical data, the success of VQSE for PCA relies on the efficiency of preparing a quantum state corresponding to the covariance matrix of the classical data [67]. In addition, VQSE has applications not only for PCA but also for other tasks such as entanglement spectroscopy and error mitigation on NISQ devices. For error mitigation, we leave for future work combining our approach with other error mitigation techniques such as Virtual Distillation [68,69], which also seeks to re-purify noisy quantum states.
Future Directions. Due to the rapid rise of VQE [2], much research has gone into how to prepare ground and excited states on NISQ devices. However, more research is needed on how to characterize these states, once prepared. This is where VQSE comes in, as VQSE can extract the entanglement spectra of these states and hence characterize important properties like topological order [57]. Hence it is worth exploring in the future the idea of pairing up the VQE and VQSE algorithms, where VQSE is implemented immediately after VQE.
Furthermore, VQSE has immediate application for estimating the fidelity of two quantum states with reduced resource requirements. This is because an algorithm was previously in-troduced [8] to estimate fidelity by using state diagonlization as a subroutine, and hence VQSE can provide a more efficient version of this subroutine.
A crucial technical idea in this work was our adaptivelyupdated cost function, which improved optimization performance. It is worth investigating whether this adaptive method can improve the performance of other variational quantum algorithms .
Another direction to explore is whether VQSE exhibits noise resilience [17]. We suspect this to be true given the similar structure of VQSE and the variational quantum compiling algorithms investigated in Ref. [17].
This is important as we are proposing that VQSE will be a useful tool for error mitigation. Namely, we envision that VQSE could be used as a subroutine to improve the accuracy of several quantum algorithms. For example, one could use VQSE to re-purify the noisy quantum state obtained as the outcome of the VQE algorithm. Alternatively, one could periodically perform VQSE whilst running a dynamical quantum simulation on a NISQ device, which would re-purify the state as it is evolving in time. This could allow one to simulate long-time dynamics, i.e., times significantly beyond the coherence time of a NISQ device.

A. Operational meaning of the cost function
In this section, we provide a derivation for Eq. (16). First, we rewrite the eigenvalue error in Eq. (15) as follows: where λ m ≡ (λ1, . . . , λm) and λ m ≡ ( λ1, . . . , λm). Since the eigenvalues of a positive semidefinite operator majorize its diagonal elements, we have that λ m λ m . Moreover, from the Schur convexity property of the dot product with an ordered vector, it follows that λ m · λ m λ m · λ m , which further implies the following inequality: Similarly, from Eq. (15) we get where we again used the fact that the eigenvalues of a positive semidefinite operator majorize its diagonal elements, and hence λ m · λ m m i=1 λi|ρ 2 | λi . We recall from Eq. (3) that the VQSE cost function can be expressed as C = d i=1 Eipi, where we omit the θ, and t dependence of C. Therefore, the following chain of inequalities hold: where d = 2 n . The first inequality follows the fact that Ei Em+1, ∀i m + 1 and i>m pi = 1 − m i=1 pi. The second inequality follows from the Cauchy-Schwarz inequality for the dot product of two vectors |u · v| |u||v|. By combining Eq. (23) with the fact that m i=1 λ 2 i m i=1 p 2 i (since λi ∈ L are the largest diagonal elements of ρ), we find that Using the fact that λ m · λ m λ · λ = Tr[ρ 2 ], we obtain the following equality from (24) Combining this with (21) and (22) leads to (16).

B. Verification of solution quality
Here we provide a proof of Eq. (17), and we show that this bound is tighter than the bound in (16). From the definition of the eigenvalue and eigenvector error in (15), it is straightforward to see that ε λ d i=1 (λi − λi) 2 , and εv d i=1 δi | δi , where d = 2 n . By following a procedure similar to the one employed in deriving (22), we find where we recall that λ and λ denote d-dimensional vectors of ordered exact and estimated eigenvalues of ρ, respectively.
Let λ = ( λ1, . . . , λ m , , with m > m, be a vector majorized by λ, i.e., λ λ. Since the dot product with an ordered vector is a Schur convex function, we have λ · λ λ · λ λ · λ, which further implies the following inequality: This inequality can be combined with (25) and (26) to obtain the bound in (17). We now show that (17) is tighter than (16). Specifically, we prove that the negative term in the right-hand side of (17) is larger than the one in (16). Consider the following chain of inequalities: where we used m > m, and where the last inequality follows from (24).
Here θ ηµ µ are continous parameters, and we can always write without loss of generality U k (θ) = R k (θ)T k , where R k (θ) = e iθσ k /2 is a single qubit rotation and T k is an unparametrized gate.
D. Algorithm for the adaptive cost function Algorithm 1 shows a simple adaptive strategy that illustrates how one can update H(t). Specifically, we consider the case when f (t) is a stepwise function. In addition, we define the VQSE optimization loop termination condition in terms of the maximum number of iterations allowed Nmax. We also define an updating parameter s (with Nmax/s being an integer) such that we update HG(t) every s steps. Finally, here we use the term optimizer, denoted as opt, as a function that takes as inputs a set of parameters θ and a cost function C(t, θ) (or the gradient of the cost for gradient-based optimization) and returns an updated set of parameters that attempts to solve the minimization problem of (5).