Variational quantum state eigensolver

Cerezo, M.; Sharma, Kunal; Arrasmith, Andrew; Coles, Patrick J.

doi:10.1038/s41534-022-00611-6

Download PDF

Article
Open access
Published: 21 September 2022

Variational quantum state eigensolver

npj Quantum Information volume 8, Article number: 113 (2022) Cite this article

6065 Accesses
28 Citations
8 Altmetric
Metrics details

Subjects

Abstract

Extracting eigenvalues and eigenvectors of exponentially large matrices will be an important application of near-term quantum computers. The variational quantum eigensolver (VQE) treats the case when the matrix is a Hamiltonian. Here, we address the case when the matrix is a density matrix ρ. We introduce the variational quantum state eigensolver (VQSE), which is analogous to VQE in that it variationally learns the largest eigenvalues of ρ as well as a gate sequence V that prepares the corresponding eigenvectors. VQSE exploits the connection between diagonalization and majorization to define a cost function $C={{{\rm{Tr}}}}(\tilde{\rho }H)$ where H is a non-degenerate Hamiltonian. Due to Schur-concavity, C is minimized when $\tilde{\rho }=V\rho {V}^{{\dagger} }$ is diagonal in the eigenbasis of H. VQSE only requires a single copy of ρ (only n qubits) per iteration of the VQSE algorithm, making it amenable for near-term implementation. We heuristically demonstrate two applications of VQSE: (1) Principal component analysis, and (2) Error mitigation.

Variational quantum state diagonalization

Article Open access 26 June 2019

Variational quantum non-orthogonal optimization

Article Open access 17 June 2023

A universal variational quantum eigensolver for non-Hermitian systems

Article Open access 15 December 2023

Introduction

Near-term quantum computers hold great promise but also pose great challenges. Low qubit counts place constraints on problem sizes that can be implemented. Decoherence and gate infidelity place constraints on the circuit depth that can be implemented. These constraints are captured in the (now widely used) term noisy intermediate-scale quantum (NISQ)¹.

To address the circuit depth constraint, variational quantum algorithms (VQAs) have been proposed for many applications^{2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27}. VQAs employ a quantum-classical optimization loop to train the parameters θ of a quantum circuit V(θ). Leveraging classical optimizers allows the quantum circuit depth to remain shallow. This makes VQAs powerful tools for error mitigation on NISQ devices.

A particularly important application of NISQ computers will be extracting the spectra, eigenvalues, and eigenvectors, of very large matrices. Indeed the most famous VQA, known as the variational quantum eigensolver (VQE), aims to variationally determining the energies and state-preparation circuits for the ground state and low-lying excited states of a given Hamiltonian, i.e., a Hermitian matrix. VQE promises to revolutionize the field of quantum chemistry^28,29, and perhaps even nuclear³⁰ and condensed matter^31,32 physics.

If one instead considers a positive-semidefinite matrix, then extracting the spectrum has direct application as a machine-learning primitive known as principal component analysis (PCA). Along these lines, Lloyd et al.³³ introduced a quantum algorithm called quantum PCA (qPCA) to deterministically extract the spectrum of an n-qubit density matrix ρ. qPCA employs quantum phase estimation and density matrix exponentiation as subroutines and hence requires a large number of quantum gates and copies of ρ. In an effort to reduce circuit depth in the NISQ era, LaRose et al.⁶ developed a VQA for this application called variational quantum state diagonalization (VQSD). VQSD requires two copies of ρ, hence 2n qubits, and trains the parameters θ of a gate sequence V(θ) so that $\tilde{\rho }=V({{{\boldsymbol{\theta }}}})\rho {V}^{{\dagger} }({{{\boldsymbol{\theta }}}})$ is approximately diagonal. A different variational approach, called quantum singular value decomposition (QSVD), was introduced by Bravo-Prieto et al.²⁷. QSVD takes a purification $\left|\psi \right\rangle$ of ρ as its input and hence requires however many qubits it takes to purify ρ (possibly 2n qubits).

In this work, we introduce a variational algorithm for PCA that only requires a single copy of ρ and hence only n qubits per iteration of the algorithm. Our approach, called the variational quantum state eigensolver (VQSE), exploits the mathematical connection between diagonalization and majorization. Namely, it is well known that the eigenvalues of a density matrix ρ majorize the diagonal elements in any basis. Hence, by choosing a cost function C that is a Schur concave function of the diagonal elements of ρ, one can ensure that the cost function is minimized when ρ is diagonalized. Specifically, we write the cost as $C={{{\rm{Tr}}}}(\tilde{\rho }H)$, where H is some Hamiltonian with a non-degenerate spectrum, which ensures the Schur concavity property. Note that evaluating C simply involves measuring the expectation value of H on $\tilde{\rho }$, and hence one can see why only n qubits are required.

To learn the optimal θ parameters, we introduce a new training approach, not previously used in other VQAs. Specifically, we employ a time-dependent Hamiltonian H that we adapt based on information gained from measurements performed throughout the optimization. The aim of this adaptive approach is: (1) to mitigate barren plateaus in training landscapes, and (2) to get out of local minima. With our numerics, we find that using an adaptive Hamiltonian is better than simply fixing the Hamiltonian throughout the optimization. Here, we further provide a rigorous analysis of the measurement shot requirements of VQSE where we show that the relative error induces from statistical sampling error is, with high probability, smaller than δ, if one measures the system response with a number of shots that scales only as ${{\Omega }}(\log (1/\delta )/{\lambda }_{m}^{2})$, with λ_m being the smallest eigenvalue one wishes to estimate.

Finally, we illustrate two important applications of VQSE with our numerical implementations. First, we use VQSE for error mitigation of the W-state preparation circuit. Namely, by projecting the state onto the eigenvector with the largest eigenvalue, we re-purify the state, mitigating the effects of incoherent errors. Second, we use VQSE to perform entanglement spectroscopy (which is essentially PCA on the reduced state of a bipartition) on the ground state of an XY-model spin chain. This allows us to identify quantum critical points in this model.

Results

Theoretical basis of VQSE

Consider an n-qubit quantum state ρ with (unknown) spectral decomposition $\rho ={\sum }_{k}{\lambda }_{k}\left|{\lambda }_{k}\right\rangle \,\left\langle {\lambda }_{k}\right|$, such that the eigenvalues are ordered in decreasing order (i.e., λ_k ⩾ λ_k+1 for k = 1, …, rank(ρ), while λ_k = 0 for k ⩾ rank(ρ)). The goal of VQSE is to estimate the m-largest eigenvalues of ρ, where m ≪ 2ⁿ, and furthermore to return a gate sequence V(θ) that approximately prepares their associated eigenvectors from standard basis elements.

At first sight, this looks like a matrix diagonalization problem. Indeed, this is the perspective taken in the literature, e.g., by the VQSD algorithm⁶ which employs a cost function that quantifies how far $\tilde{\rho }=V({{{\boldsymbol{\theta }}}})\rho {V}^{{\dagger} }({{{\boldsymbol{\theta }}}})$ is from a diagonal matrix. However, our VQSE algorithm takes a conceptually different approach, focusing on majorization instead of diagonalization.

We write the VQSE cost function as an energy, or the expectation value of a Hamiltonian:

$$C({{{\boldsymbol{\theta }}}})\equiv \left\langle H\right\rangle ={{{\rm{Tr}}}}\left[HV({{{\boldsymbol{\theta }}}})\rho {V}^{{\dagger} }({{{\boldsymbol{\theta }}}})\right]\,.$$

(1)

Here, H is a simple n-qubit Hamiltonian that is diagonal in the standard basis and whose eigenenergies and associated eigenstates are known and respectively given by {E_k} and $\{\left|{{{{\boldsymbol{e}}}}}_{k}\right\rangle \}$ (where ${{{{\boldsymbol{e}}}}}_{k}={e}_{k}^{1}\cdot \ldots \cdot {e}_{k}^{n}$ for k = 1, …, 2ⁿ are bitstrings of length n). Moreover, we henceforth assume that the eigenenergies are non-negative and ordered in increasing order, i.e., E_k ⩽ E_k+1. We have

$$C({{{\boldsymbol{\theta }}}})=\mathop{\sum }\limits_{k=1}^{{2}^{n}}{E}_{k}{p}_{k}={{{\boldsymbol{E}}}}\cdot {{{\boldsymbol{p}}}}\,,\quad {p}_{k}=\langle {{{{\boldsymbol{e}}}}}_{k}| \tilde{\rho }| {{{{\boldsymbol{e}}}}}_{k}\rangle \,,$$

(2)

where we defined the vectors E = (E₁, E₂, …) and p = (p₁, p₂, …). Similarly, let us define the vector of eigenvalues of ρ as λ = (λ₁, λ₂, …). Then, since the eigenvalues of a positive semidefinite matrix majorize its diagonal elements λ ≻ p, and since the dot product with an increasingly ordered vector is a Schur concave function^34,35, we have

$$C({{{\boldsymbol{\theta }}}})={{{\boldsymbol{E}}}}\cdot {{{\boldsymbol{p}}}}\,\geqslant\, {{{\boldsymbol{E}}}}\cdot {{{\boldsymbol{\lambda }}}}=\mathop{\sum}\limits_{k}{E}_{k}{\lambda }_{k}\,,$$

(3)

where we have used the fact that ρ and $\tilde{\rho }$ have the same eigenvalues. Hence, one can see that C(θ) is minimized when V(θ) maps the eigenbasis of ρ to the eigenbasis of H, with appropriate ordering. Since the latter is chosen to be the standard basis, this corresponds to diagonalizing ρ. Thus, even though it may not be obvious at first sight, minimizing C(θ) corresponds to diagonalizing ρ.

The VQSE algorithm

Figure 1 shows a schematic diagram of the VQSE algorithm. The three inputs to VQSE are: (1) a n-qubit quantum state ρ, (2) an integer m, and (3) a parameterized gate sequence or ansatz V(θ). The outputs of VQSE are: (1) estimates ${\{{\tilde{\lambda }}_{i}\}}_{i = 1}^{m}$ of the m-largest eigenvalues ${\{{\lambda }_{i}\}}_{i = 1}^{m}$ of ρ, and (2) a gate sequence V(θ_opt) that prepares approximate versions ${\{|{\tilde{\lambda }}_{i}\rangle \}}_{i = 1}^{m}$ of the associated m eigenvectors ${\{\left|{\lambda }_{i}\right\rangle \}}_{i = 1}^{m}$. While in principle m can be as large as 2ⁿ, we assume that one is interested in a number m of eigenvalues and eigenvectors that grows at worse as ${{{\mathcal{O}}}}(\,{{\mbox{poly}}}\,(n))$.

After taking in the inputs, VQSE enters a hybrid quantum-classical optimization loop to train the parameters θ in the ansatz V(θ). This loop employs a quantum computer to evaluate the VQSE cost function, denoted

$$C(t,{{{\boldsymbol{\theta }}}})\equiv \left\langle H(t)\right\rangle ={{{\rm{Tr}}}}\left[H(t)\widetilde{\rho }\right]\,,\quad \widetilde{\rho }=V({{{\boldsymbol{\theta }}}})\rho {V}^{{\dagger} }({{{\boldsymbol{\theta }}}})\,.$$

(4)

Here, H(t) is a Hamiltonian that could, in general, depend on the time t, where t ∈ [0, 1] is a parameter that indicates the optimization loop run-time such that the loop starts at t = 0 and ends at t = 1. For all t, we assume that H(t) can be efficiently measured on a quantum computer and that it is diagonal in the standard basis, with its lowest m eigenenergies being non-degenerate and non-negative. We further elaborate on how to choose H(t) in section ‘Cost functions’. Note that the quantum circuit to evaluate the cost C(t, θ), as depicted in Fig. 1, simply involves applying V(θ) to the state ρ and then measuring the Hamiltonian H(t).

The quantum computer then feeds the value of the cost (or the gradient of the cost for gradient-based optimization) to a classical computer, which adjusts the parameters θ for the next round of the loop. The ultimate goal is to find the global minimum of the cost landscape at t = 1, i.e., to solve the problem:

$${{{{\boldsymbol{\theta }}}}}_{{{\mbox{opt}}}}\equiv \arg \mathop{\min }\limits_{{{{\boldsymbol{\theta }}}}}C(1,{{{\boldsymbol{\theta }}}})\,.$$

(5)

In reality, one will need to impose some termination condition on the optimization loop and hence the final parameters obtained (which we still denote as θ_opt) will only approximately satisfy Eq. (5). Nevertheless, we provide a verification procedure below in section ‘Verification of solution quality’ that allows one to quantify the quality of the solution even when (5) is not exactly satisfied.

As shown in Fig. 1, the next step of VQSE is the eigenvalue readout. From the parameters θ_opt one can estimate the eigenvalues of ρ by acting with the gate sequence V(θ_opt) and then measuring in the standard basis $\{\left|{{{{\boldsymbol{z}}}}}_{k}\right\rangle \}$. Let Pr(z_k) be the probability of the z_k outcome. Then by taking the m largest of these probabilities we define ${{{\mathcal{L}}}}\equiv {\{{\widetilde{\lambda }}_{i}\}}_{i = 1}^{m}$ as the ordered set of estimates of the m-largest eigenvalues of ρ, and we define ${{{\mathcal{Z}}}}$ as the set of bitstrings ${\{{{{{\boldsymbol{z}}}}}_{i}\}}_{i = 1}^{m}$ associated with the elements of ${{{\mathcal{L}}}}$:

$${\widetilde{\lambda }}_{i}=\Pr ({{{{\boldsymbol{z}}}}}_{i})=\langle {{{{\boldsymbol{z}}}}}_{i}| \widetilde{\rho }| {{{{\boldsymbol{z}}}}}_{i}\rangle \,,\quad \,{{\mbox{such that}}}\,\quad {\widetilde{\lambda }}_{i}\,\geqslant\, {\widetilde{\lambda }}_{i+1}\,.$$

(6)

Note that ${\widetilde{\lambda }}_{i}$ in (6) correspond to diagonal elements of $\widetilde{\rho }$ in the standard basis, and not to its eigenvalues.

In practice, when estimating the eigenvalues one measures $\tilde{\rho }$ in the standard basis a finite number of times N_runs. Hence, if a bitstring ${{{{\boldsymbol{z}}}}}_{i}\in {{{\mathcal{Z}}}}$ has frequency f_i for N_runs total runs, then we can estimate ${\widetilde{\lambda }}_{i}$ as

$${\widetilde{\lambda }}_{i}^{\,{{\mathrm{est}}}}=\frac{{f}_{i}}{{N}_{{{\mathrm{runs}}}}}\,.$$

(7)

One can think of this as a Bernouilli trial. Let Λ_i be a random variable that takes value 1 if we get outcome z_i (with probability ${\widetilde{\lambda }}_{i}$), and takes value 0 otherwise (with probability $1-{\widetilde{\lambda }}_{i}$). After repeating the experiment N_runs times we are interested in bounding the probability that the relative error ${\varepsilon }_{i}\equiv | {\widetilde{\lambda }}_{i}^{\,{{\mathrm{est}}}\,}-{\widetilde{\lambda }}_{i}| /{\widetilde{\lambda }}_{i}$ is larger than a certain value c ⩾ 0. From Hoeffding’s inequality, we find

$$\Pr ({\varepsilon }_{i}\,\geqslant\, c)\,\leqslant\, {{\rm{e}}}^{-2{N}_{{{\mathrm{runs}}}}{c}^{2}{\widetilde{\lambda }}_{i}^{2}}\,,\quad \forall c \,>\, 0\,.$$

(8)

For fixed N_runs, Eq. (8) shows that the smaller the inferred eigenvalue ${\widetilde{\lambda }}_{i}$, the larger the probability of having a given relative error. Equation (8) also implies that increasing N_runs reduces the probability of large relative errors. Hence, we can always choose N_runs such that the probability of error is smaller than a given δ for all m eigenvalues via

$$\forall i\in [1,m],\,\,\,\Pr ({\varepsilon }_{i}\,\geqslant\, c)\,\leqslant\, \delta \,\to \,{N}_{{{\mathrm{runs}}}}\,\geqslant\, \frac{\log (1/\delta )}{2{c}^{2}{\lambda }_{m}^{2}}\,,$$

(9)

where λ_m is the smallest eigenvalue of interest. Analogously, from (9) we have that all eigenvalues larger than $\sqrt{\frac{\log (1/\delta )}{2{c}^{2}{N}_{{{\mathrm{runs}}}}}}$ have a probability of error smaller than δ.

The last step of VQSE is to prepare the inferred eigenvectors of ρ. Given a bitstring ${{{{\boldsymbol{z}}}}}_{i}\in {{{\mathcal{Z}}}}$, one can prepare the associated inferred eigenvector by taking the state $\left|{{{\boldsymbol{0}}}}\right\rangle ={\left|0\right\rangle }^{\otimes n}$, acting on it with the gate ${X}^{{z}_{1}^{i}}\otimes {X}^{{z}_{2}^{i}}\otimes \ldots \otimes {X}^{{z}_{n}^{i}}$, and then applying the gate sequence $V{({{{{\boldsymbol{\theta }}}}}_{{{\mathrm{opt}}}})}^{{\dagger} }$:

$$|{\widetilde{\lambda }}_{i}\rangle ={V}^{{\dagger} }({{{{\boldsymbol{\theta }}}}}_{{{\mathrm{opt}}}})\left|{{{{\boldsymbol{z}}}}}_{i}\right\rangle \,,\quad \left|{{{{\boldsymbol{z}}}}}_{i}\right\rangle ={X}^{{z}_{1}^{i}}\otimes \ldots \otimes {X}^{{z}_{n}^{i}}\left|{{{\mathbf{0}}}}\right\rangle \,.$$

(10)

Note that while the inferred eigenvalues can be stored classically, the eigenvectors are prepared on a quantum computer, and hence one needs to perform measurements to extract information about these eigenvectors.

Cost functions

Consider the Hamiltonian H(t) that defines the VQSE cost function in (4). Recall that we choose H(t) so that: (1) it is diagonal in the standard basis, (2) its lowest m eigenvalues are non-negative and non-degenerate, and (3) it can be efficiently measured on a quantum computer. Let us now discuss possible choices for H(t).

Fixed Hamiltonians

When the Hamiltonian is fixed (i.e., time-independent), we write H(t) ≡ H, and C(t, θ) ≡ C(θ). In this case, a simple, intuitive cost function is given by

$${C}_{G}({{{\boldsymbol{\theta }}}})={{{\rm{Tr}}}}[{H}_{G}\widetilde{\rho }],\quad {H}_{G}={\mathbb{1}}-\mathop{\sum }\limits_{i=1}^{m}{q}_{i}\left|{{{{\boldsymbol{e}}}}}_{i}\right\rangle \,\left\langle {{{{\boldsymbol{e}}}}}_{i}\right|\,,$$

(11)

with q_i > 0 (such that q_i > q_i+1), and where the $\left|{{{{\boldsymbol{e}}}}}_{i}\right\rangle$ are orthogonal states in the standard basis. The spectrum of H_G is composed of m non-degenerate eigenenergies, and a (2ⁿ − m)-fold degenerate eigenenergy.

On the one hand, this large degeneracy makes it easier to find a global minimum as the solution space is large. That is, denoting as V_opt an optimal unitary that minimizes (11), then there is a large set of such optimal unitaries ${{{{\mathcal{S}}}}}_{{{\mbox{opt}}}}=\{{V}_{{{\mbox{opt}}}}\}$, which are not related by global phases. This is due to the fact that one is only interested in the m rows and the m columns of V(θ) that diagonalize $\widetilde{\rho }$ in the subspace spanned by ${\{\left|{{{{\boldsymbol{e}}}}}_{i}\right\rangle \}}_{i = 1}^{m}$. Specifically, any optimal unitary must satisfy $\langle {{{{\boldsymbol{z}}}}}_{i}| {V}_{{{\mbox{opt}}}}| {\lambda }_{i}\rangle =\langle {\lambda }_{i}| {V}_{{{\mbox{opt}}}}| {{{{\boldsymbol{z}}}}}_{i}\rangle ={\delta }_{{{{{\boldsymbol{z}}}}}_{i}{{{{\boldsymbol{e}}}}}_{i}}$ for i = 1, …, m (and with ${{{{\boldsymbol{z}}}}}_{i}\in {{{\mathcal{Z}}}}$), while the (2ⁿ − m) × (2ⁿ − m) unitary principal submatrix of V_opt with matrix elements $\langle {{{{\boldsymbol{z}}}}}_{i}| {V}_{{{\mbox{opt}}}}| {{{{\boldsymbol{z}}}}}_{i^{\prime} }\rangle$, where ${{{{\boldsymbol{z}}}}}_{i},{{{{\boldsymbol{z}}}}}_{i^{\prime} }\,\notin\, {{{\mathcal{Z}}}}$, remains completely arbitrary.

On the other hand, it has been shown that when employing hardware-efficient ansatzes³⁶ for V(θ), global cost functions like C_G(θ) are untrainable for large problem sizes as they exhibit exponentially vanishing gradients (i.e., barren plateaus³⁷) even when the ansatz is short depth³⁸. Such barren plateaus can be avoided by employing a different type of cost function known as a local cost^38,39, where C is defined such that one compares states or operators with respect to each individual qubit rather than comparing them in a global sense.

One can construct a local cost where the Hamiltonian is a weighted sum of local z-Pauli operators:

$${C}_{L}\equiv \langle {H}_{L}\rangle \,,\quad {H}_{L}={\mathbb{1}}-\mathop{\sum }\limits_{j=1}^{n}{r}_{j}{Z}_{j}\,,$$

(12)

where ${r}_{j}\in {\mathbb{R}}$ and Z_j is the z-Pauli operator acting on qubit j. Care must be taken when choosing the coefficients ${\{{r}_{j}\}}_{j = 1}^{n}$ to ensure that the lowest m-eigenenergies of H_L are non-degenerate. For instance, when targeting the largest eigenvalue of ρ (m = 1), the simple choice r_j = 1, ∀ j achieves this goal. On the other hand, if one is interested in m = n + 1 eigenvalues, then one can choose r_j = r₁ + (j − 1)δ with r₁ ≫ δ, which will ensure that the m-lowest energy levels, {E₁, E₁ + r₁, E₁ + r₁ + δ, . . . , E₁ + r₁ + (m − 1)δ}, are non-degenerate. Henceforth, we will assume that one has chosen ${\{{r}_{j}\}}_{j = 1}^{n}$ such that the m-lowest energy levels are non-degenerate.

While fixed local cost functions do not exhibit barren plateaus for shallow depth, they still have several trainability issues. First, having less degeneracy in H_L leads to a more difficult optimization problem. Since degeneracy allows for additional freedom in the solution space, non-degeneracy constrains the possible solutions. Therefore, there is a tradeoff between engineering non-degeneracy (which allows one to distinguish more eigenvalues of ρ) versus keeping degeneracy (which allows for more solutions). Second, we expect both C_L and C_G to have a high density of local minima, especially for large m. This is because there will be partial solutions to the problem where one correctly assigns some eigenvalues of ρ to the right energy levels of the Hamiltonian, while incorrectly assigning other eigenvalues. This local minima issue is what motivates the following adaptive approach.

Adaptive Hamiltonian

Let us now we introduce an approach to adaptively update the VQSE Hamiltonian (and hence the cost function) based on information obtained via measurements during the optimization loop. This method allows us to mitigate the issues discussed in the previous section that arise for cost functions with fixed local or global Hamiltonians. Namely, the adaptive cost function solves the following three problems: (1) barren plateaus for shallow depth³⁸, (2) high density of local minima, (3) smaller solution space arising from non-degenerancies.

Consider a time-dependent Hamiltonian of the form

$$H(t)\equiv (1-f(t)){H}_{L}+f(t){H}_{G}(t)\,,$$

(13)

where f(t) is a real-valued function such that f(0) = 0, f(1) = 1, and H_L is a local Hamiltonian as in (12). We recall here that t ∈ [0, 1] is a parameter that indicates the optimization loop run time. Moreover, we define the time-dependent global Hamiltonian

$${H}_{G}(t)\equiv {\mathbb{1}}-\mathop{\sum }\limits_{i=1}^{m}{q}_{i}\left|{{{{\boldsymbol{z}}}}}_{i}(t)\right\rangle \left\langle {{{{\boldsymbol{z}}}}}_{i}(t)\right|\,,$$

(14)

where the coefficients q_i are real and positive, and chosen in the same way as in (11). In addition, the states $\left|{{{{\boldsymbol{z}}}}}_{i}(t)\right\rangle$ are adaptively chosen throughout the optimization loop by preparing $\widetilde{\rho }$, measuring in the standard basis to obtain the sets ${{{\mathcal{L}}}}$ and ${{{\mathcal{Z}}}}$, and updating H_G(t) so that ${{{{\boldsymbol{z}}}}}_{i}(t)\in {{{\mathcal{Z}}}}$.

As schematically shown in Fig. 2a, in order to mitigate the barren plateau phenomena it is important to choose a function f(t) which is not rapidly growing with t. Hence, for small t, H(t) ~ H_L and the cost function will be trainable as it will not present a barren plateau. Then, as t increases, one can deal with the issue of local minima by updating H_G(t). As depicted in the insets of Fig. 2a, adaptively changing H_G(t) transforms local minima in the cost landscape into global minima. Then, by the end of the algorithm, we have H(1) = H_G(1), and as shown in panel (b) of Fig. 2, the spectrum of H becomes highly degenerate and the dimension of the solution space increases. In section ‘Algorithm for the adaptive cost function’ of the “Methods”, we present an algorithm to illustrate how one can update H(t).

We remark that ref. ⁴⁰ proposed a method called adiabatically assisted VQE (AAVQE), which dynamically updates the VQE cost function by driving between a simple Hamiltonian to the non-trivial problem Hamiltonian. Note that the goals of AAVQE and our adaptive training method are different. Furthermore, in our method, one adaptively updates the cost function based on information obtained through measurements, while AAVQE does not use information gained during the optimization.

Operational meaning of the cost function

Here we discuss the operational meaning of the VQSE cost function, showing that small cost values imply small eigenvalue and eigenvector errors. Let ${\{|{\widetilde{\lambda }}_{i}\rangle \}}_{i = 1}^{m}$ be the set of the inferred eigenvector associated with every ${\widetilde{\lambda }}_{i}$ in ${{{\mathcal{L}}}}$, and let $\left|{\delta }_{i}\right\rangle =\rho |{\widetilde{\lambda }}_{i}\rangle -{\widetilde{\lambda }}_{i}|{\widetilde{\lambda }}_{i}\rangle$. We then define eigenvalue and eigenvector errors as follows:

$${\varepsilon }_{\lambda }\equiv \mathop{\sum }\limits_{i=1}^{m}{({\lambda }_{i}-{\widetilde{\lambda }}_{i})}^{2},\quad {\varepsilon }_{v}\equiv \mathop{\sum }\limits_{i=1}^{m}\langle {\delta }_{i}| {\delta }_{i}\rangle .$$

(15)

Here 〈δ_i∣δ_i〉 quantifies the component of $\rho |{\widetilde{\lambda }}_{i}\rangle$ that is orthogonal to $|{\widetilde{\lambda }}_{i}\rangle$, which follows from the following identity: $\left|{\delta }_{i}\right\rangle =({\mathbb{1}}-|{\widetilde{\lambda }}_{i}\rangle \langle {\widetilde{\lambda }}_{i}|)\rho |{\widetilde{\lambda }}_{i}\rangle$.

Then by using the Cauchy–Schwarz inequality, majorization conditions, and Schur convexity, we establish the following upper bound on eigenvalue and eigenvector errors (see section ‘Operational meaning of the cost function’ for more details):

$${\varepsilon }_{\lambda },{\varepsilon }_{v}\,\leqslant\, {{{\rm{Tr}}}}[{\rho }^{2}]-\frac{{({E}_{m+1}-C({{{\boldsymbol{\theta }}}}))}^{2}}{\mathop{\sum }\nolimits_{i=1}^{m}{({E}_{m+1}-{E}_{i})}^{2}}\,,$$

(16)

where (E₁, …, E_m) are the m-smallest eigeneneries of H, and where for simplicity we have omitted the t dependence. Thus Eq. (16) provides an operational meaning to our cost function, as small values of the cost function lead to small eigenvalue and eigenvector errors.

Verification of solution quality

Let us show how to verify the results obtained from the VQSE algorithm. We remark that this verification step is optional, particularly because it requires 2n qubits, whereas the rest of VQSE only requires n qubits.

In section ‘Verification of solution quality’ of “Methods”, we prove the following useful bound on eigenvalue and eigenvector error:

$${\varepsilon }_{\lambda },{\varepsilon }_{v}\,\leqslant\, {{{\rm{Tr}}}}[{\rho }^{2}]-\left(\mathop{\sum }\limits_{i=1}^{\widehat{m}}{\widetilde{\lambda }}_{i}^{2}+\frac{{(1-\mathop{\sum }\nolimits_{i = 1}^{\widehat{m}}{\widetilde{\lambda }}_{i})}^{2}}{{2}^{n}-\widehat{m}}\right)\,,$$

(17)

where one can take $\widehat{m}$ as any integer between m and 2ⁿ. One can efficiently estimate the right-hand-side of (17) as follows. Given two copies of ρ, ${{{\rm{Tr}}}}[{\rho }^{2}]$ can be estimated by a depth-two quantum circuit with classical post-processing that scales linearly with n⁴¹. Moreover, since ${{{\rm{Tr}}}}[{\rho }^{2}]$ is independent of V(θ), one only needs to compute it once (outside of the optimization loop). Estimating the ${\widetilde{\lambda }}_{i}$ for $i=1,...,\widehat{m}$ essentially comes for free as part of the eigenvalue readout step of VQSE, where we note that taking $\widehat{m} \,>\, m$ simply involves keeping track of the frequencies of more bitstrings (more than the m-largest) during this readout step. Finally, we remark that while Eq. (16) can also be used for verification, in section ‘Verification of solution quality’ we show that (17) provides a tighter bound, particularly as one increases $\widehat{m}$.

Ansatz

While there are many possible choices for the ansatz V(θ), we are here restricted to state-agnostic ansatzes which do not require any a prior information about ρ. One such ansatz is the Layered Hardware Efficient Ansatz³⁶ shown in Fig. 3a. Here, V(θ) consists of a fixed number L of layers of two-qubit gates B_μ(θ_μ) acting on alternating pairs of neighboring qubits. Figure 3b illustrates possible choices for B_μ(θ_μ). Note that with this structure, the number of parameters in θ grows linearly with the n and L.

Let us remark that the Layered Hardware Efficient Ansatz can lead to trainability issues as the system size increases^37,38. Hence, different strategies have been proposed to mitigate such difficulties, such as learning to initialize parameters⁴², layer-by-layer training⁴³, and correlating the parameters⁴⁴. In addition, these methods can be combined with more sophisticated ansatzes, such as a variable-structure ansatz^6,41 where the structure of the ansatz is not fixed, and where the gate placement becomes an optimizable hyper-parameter. This variable-structure approach has already been shown to improve performance in the context of extracting the eigensystem of a quantum state⁶.

Finally, since VQSE optimization corresponds to an energy minimization problem, a natural ansatz that can also be used to mitigate trainability issues is the quantum alternating operator ansatz (QAOA)^3,45. Specifically, one could employ H(t) as the problem Hamiltonian in the QAOA and use a standard mixing Hamiltonian. While we do not employ this ansatz in our heuristics, it is nevertheless of interest for future work.

Optimization

Regarding the optimization of the parameters θ, while gradient-free methods are an option^46,47, there has been recent evidence that gradient-based methods can perform better^48,49,50,51. Moreover, as shown in refs. ^52,53, for cost functions like (1), gradients can be analytically determined (see section ‘Gradient of the cost function’ in the “Methods” for an explicit derivation of the gradient formula). Therefore, in our heuristics, we employ gradient-based optimization.

Numerical Implementations

Here we present the numerical results obtained from implementing VQSE. We first employ VQSE to estimate the spectrum of quantum states of different dimensions and compare the performance of cost functions based on the global, local, and adaptive Hamiltonians discussed in section ‘Cost functions’. Then we use VQSE for error mitigation of the W-state preparation circuit. Finally, we implement VQSE for entanglement spectroscopy on the ground-state of an XY-spin chain, which allows us to detect the presence of quantum critical points.

VQSE for quantum principal component analysis

Figure 4 presents the results of implementing VQSE to estimate the six largest eigenvalues (m = 6) of quantum states with n = 6, 8, and 10 qubits. In all cases, we have rank(ρ) = 16, as the states were prepared by randomly entangling the system qubits with four ancillary qubits, which were later traced out. Moreover, we chose ρ to be real and not sparse in the standard basis.

**Fig. 4: Relative and absolute error versus the number of iterations.**

In our heuristics, we used the layered hardware efficient ansatz of Fig. 3(b, top), and we employed the fixed-local, fixed-global, and adaptive cost functions of section ‘Cost functions’. The termination condition was stated in terms of the maximum number of iterations in the optimization loop. Hardware noise and finite sampling were not included in these heuristics. (The next subsection shows heuristics with noise.) For the fixed local cost function, we chose the ${\{{r}_{j}\}}_{j = 1}^{n}$ in (12) so that the first six energy eigenvalues of H_L were non-degenerate. Moreover, we defined the fixed global Hamiltonian such that the first six energy levels (i.e., associated eigenvectors and spectral gaps) coincided with those of H_L. Finally, the adaptive Hamiltonian was constructed according to the procedure described above, and more specifically, in Algorithm 1 in the “Methods” section.

Since for these examples, we can calculate the exact eigenvalues λ_i, we compute and plot the following quantities which we use as figures of merit for the performance of the VQSE algorithm:

$${\varepsilon }_{\lambda }\equiv \mathop{\sum }\limits_{i=1}^{6}{({\lambda }_{i}-{\widetilde{\lambda }}_{i})}^{2}\,,\quad {\varepsilon }_{r}\equiv \mathop{\sum }\limits_{i=1}^{6}{({\lambda }_{i}-{\widetilde{\lambda }}_{i})}^{2}/{\lambda }_{i}^{2}\,.$$

(18)

Here ε_λ and ε_r respectively quantify absolute error and relative error in estimating the exact eigenvalues. We remark that these two quantities provide different information: The absolute error is biased towards the error in estimating the large eigenvalues of ρ, while on the other hand, the relative error is more sensitive to errors in estimating the small eigenvalues of ρ.

Figure 4 plots the relative and absolute errors versus number of iterations (with the total number of iterations fixed). While we performed many runs, these plots show only the run that achieved the lowest absolute error. For all system sizes considered (n = 6, 8 and 10), VQSE achieves smaller relative and absolute errors when employing the adaptive Hamiltonian approach than using a fixed Hamiltonian. For n = 6, the errors obtained by adaptively updating H(t) are two orders of magnitude smaller than those obtained with fixed Hamiltonians, while for n = 8 they are one order of magnitude smaller. As shown in Fig. 4c for n = 10, the adaptive Hamiltonian approach achieves error of the order: ~10⁻⁵ for the relative error, and ~10⁻⁷ for the absolute error, and again outperforms the fixed local Hamiltonian approaches. We here finally remark that we can use Eq. (9) to determine the number of shots needed to guarantee that with a probability larger than 99% the relative error induced by finite sampling is smaller than 0.001. Namely, we find that one needs a number of shots larger than 50.2K, which is well within the order of magnitude of shots regularly used.

It is natural to ask whether the runs shown in Fig. 4 are representative of the algorithm performance. To provide an analysis of the average VQSE performance, we plot in Fig. 5 the runs-per-success versus 1/ε_λ for each of the aforementioned examples. Here, runs-per-success is defined as the total number of runs divided by the number of runs with an absolute error smaller than a target ε_λ.

**Fig. 5: Runs-per-success versus inverse absolute error 1/ε_λ.**

From all three panels in Fig. 5 we see that for large 1/ε_λ, the adaptive Hamiltonian always has the best performance as it requires less runs-per-success to achieve smaller errors. Finally, it is interesting to note from the insets of Fig. 5 that there is a regime where the run time has a linear dependence on $\log (1/{\varepsilon }_{\lambda })$ when employing an adaptive approach. This suggests that VQSE may perform quite efficiently for large ε_λ. However, the linear dependence breaks down for small ε_λ, where the number of runs-per-success seems to grow exponential with 1/ε_λ. Despite, such growth, for up to 8 qubits we only need 100 repetitions to achieve an error of order 10⁻⁶. We leave for future work a more detailed study of the dependence of runs-per-success for small error. Finally, the results presented in Figs. 4 and 5 suggest that for a sufficiently large value of number of iterations, the adaptive Hamiltonian approach outperforms the fixed Hamiltonian approaches as it requires the least number of iterations to converge to very small values of ε_λ.

Error mitigation

Here we discuss an important application of the VQSE algorithm for error mitigation. Quantum state preparation circuits (gate sequences U which prepare a target state $\left|\psi \right\rangle$) are used as subroutines in many quantum algorithms. However, since current quantum computers are noisy, all state preparation circuits produce mixed states ρ. If there is little enough incoherent noise, we can expect that the largest eigenvalue of ρ is associated with $\left|\psi \right\rangle$. Here we show that VQSE can be implemented to re-purify ρ and estimate $\left|\psi \right\rangle$. Naturally, when running the VQSE eigenvector preparation circuit, noise will also produce a mixed state σ. However, if the depth of V(θ) is shorter than the depth of U, one can obtain a higher fidelity between σ and $\left|\psi \right\rangle$ in comparison to the fidelity between ρ and $\left|\psi \right\rangle$. In this case one can mitigate errors by replacing the state preparation circuit by the VQSE eigenvector preparation circuit.

Let us now consider the three qubit W-state preparation circuit from ref. ⁵⁴, section 2.2 (see also ref. ⁵⁵). By employing a noisy quantum computer simulator with the noise profile of IBM’s Melbourne processor⁵⁶, we find that the fidelity between ρ and the exact W state $\left|\psi \right\rangle$ is $F(\rho ,\left|\psi \right\rangle )\approx 0.785$. We then train 10 instances of VQSE with two layers of the ansatz in Fig. 3(b, bottom) and with a termination condition of 50 iterations. Moreover, we employ the adaptive Hamiltonian, where we update H(t) every 10 iterations according to Algorithm 1. Figure 6 shows the average cost function value and average fidelity between $\left|\psi \right\rangle$ and the state σ obtained by running the VQSE eigenvector preparation circuit. As the number of iterations increases, the cost value tends to decrease, showing that we are able to train in the presence of noise. Moreover, we also see that $F(\sigma ,\left|\psi \right\rangle )$ increases and saturates at a value larger than $F(\rho ,\left|\psi \right\rangle )$, namely at 0.853, hence showing that we are in fact mitigating the effect of noise. This can be explained by the fact that we reduced the circuit depth, as our ansatz contains two CNOTs, while the textbook circuit contains three CNOTs.

**Fig. 6: Cost function value and fidelity versus number of iterations.**

Entanglement spectroscopy

We now discuss the possibility of employing VQSE to compute the entanglement spectrum of a state ρ which is obtained as the reduced state of a bipartite quantum system $\left|{\psi }_{AB}\right\rangle$, i.e., $\rho ={{{{\rm{Tr}}}}}_{B}\left|{\psi }_{AB}\right\rangle \,\left\langle {\psi }_{AB}\right|$. Let d denote the dimension of ρ. The entanglement spectrum⁵⁷ refers to the collection ${\{{\lambda }_{k}\}}_{k = 1}^{d}$ of eigenvalues of ρ, and as discussed in ref. ⁵⁸, entanglement spectroscopy is a useful tool to analyze states $\left|{\psi }_{AB}\right\rangle$ prepared by simulating many-body systems on a quantum computer. Specifically, the entanglement spectrum is useful to study the bipartite entanglement, as it contains more universal signatures than the von Neumann entropy alone⁵⁷, and it can detect the presence of quantum critical points^59,60.

Let us now consider an N = 8 spin-1/2 cyclic chain interacting trough uniform XY first-neighbor Heisenberg coupling in the presence of a non-transverse magnetic field. The Hamiltonian of the system is

$$H=-\mathop{\sum}\limits_{j}({h}_{x}{S}_{j}^{x}+{h}_{z}{S}_{j}^{z}+{J}_{x}{S}_{j}^{x}{S}_{j+1}^{x}+{J}_{y}{S}_{j}^{y}{S}_{j+1}^{y})\,,$$

(19)

where j labels the site in the chain, ${S}_{j}^{\mu }$ the spin operator (with μ = x, y, z), J_μ the coupling strength, and h_μ the magnetic fields. Here, J_μ > 0 leads to ferromagnetic (FM) coupling, while J_μ < 0 to antiferromagnetic (AFM) coupling. As shown in refs. ^60,61, for specific values of the fields h_μ (known as factorizing fields) the Hamiltonian in (19) presents quantum critical points known as "factorization” points. At the non-transverse factorizing field, the ground-state of H becomes a separable non-degenerate state such that one of its eigenvalues is exactly equal to one, while the rest are exactly zero.

In Fig. 7a, c, we show results of implementing VQSE with an adaptive Hamiltonian to compute the three largest eigenvalues of the state ρ defined as the reduced state of 4 neighboring spins obtained from the ground state of (19). For simplicity, we have parametrized the fields as $({h}_{z},{h}_{x})=h(\cos (\gamma ),\sin (\gamma ))$ with γ fixed. Specifically, in Fig. 7a, c we plot the estimated eigenvalues versus the field magnitude h for a system with FM and AFM couplings, respectively. Moreover, dashed lines indicate the exact eigenvalues. For each field value, we run 8 instances of VQSE, and even for such a small number of runs, the estimated eigenvalues give good approximations as we get relative errors which in general are of the order of ~10⁻².

**Fig. 7: Exact and estimated eigenvalues versus field value, for the VQSE entanglement spectroscopy implementations.**

In Fig. 7b, d, we show the same data as in (a) and (c) but the y axis is plotted on a logarithmic scale, and where instead of plotting the largest eigenvalue λ₁, we plot 1 − λ₁. For the FM (AFM) case, there is a factorization points at h/J_x ≈ 0.76 (h/J_x ≈ 1.43). As depicted in these panels, around critical points we correctly find ${\widetilde{\lambda }}_{1}\approx 1$, and ${\widetilde{\lambda }}_{2},{\widetilde{\lambda }}_{3}\approx 0$. These results show that VQSE can detect quantum critical factorization points.

Discussion

In the NISQ era, every qubit and every gate counts. Wasteful usage of qubits or gates will ultimately limit the problem size that an algorithm can solve. In this work, we presented an algorithm for extracting the eigensystem of a quantum state ρ that is as frugal as we could imagine, with respect to qubit count.

We introduced the VQSE, which estimates the m-largest eigenvalues and associated eigenvectors of ρ, using only a single copy of ρ, and hence only n qubits per iteration of the VQSE. VQSE exploits the mathematical connection between diagonalization and majorization to define an efficiently computable cost function as the expectation value of a Hamiltonian. We derived an operational meaning of this cost function as a bound on eigensystem error. Furthermore, we introduced a training method that involved adaptively updating the VQSE cost function based on the information gained from measurements performed throughout the optimization. This was aimed at addressing both barren plateaus and local minima in the cost landscape.

We have numerically implemented VQSE for several applications. We showed that VQSE can be employed for PCA by implementing the VQSE algorithm on states of n = 6, 8, and 10 qubits to estimate the six largest eigenvalues. Our numerical results (Figs. 4 and 5) indicate that our adaptive cost function approach leads to smaller errors than the ones obtained by training a fixed cost function. We also showed (Fig. 7) that one can detect quantum critical points by performing entanglement spectroscopy with the eigenvalues obtained via VQSE. Finally, we employed VQSE to mitigate errors that occur during the W-state preparation circuit. This involved running VQSE on a noisy simulator to re-purify the state, i.e., find the circuit that prepares the eigenvector with the largest eigenvalue. We found (Fig. 6) that the re-purified state obtained by VQSE improved the fidelity with the target W state, and hence reduced the effects of noise.

Comparison to literature

Since VQSE only requires n qubits, it is as qubit frugal as it can possible be when compared to other algorithms for the same task, such as quantum PCA (qPCA)³³, VQSD⁶, and quantum state singular value decomposition (QSVD)²⁷. The quantum phase estimation and density matrix exponentiation primitives in qPCA make it difficult to implement in the near term⁶², and this is supported an by attempted implementation in ref. ⁶ that resulted in poor performance. On the other hand, VQSD and QSVD are variational algorithms and hence have the possibility of lower-depth requirements. But they still need to employ a larger number of qubits than VQSE. Specifically, VQSD needs to perform the so-called Diagonalized Inner Product Test⁶ which requires two copies of ρ, i.e., requires twice as many qubits as VQSE. In addition, it is also worth noting that VQSD is vulnerable to noise, since any asymmetry between the noise acting of each copy of ρ will affect the result of the algorithm. Finally, in QSVD, one needs to either compute or have access to a purification $\left|\psi \right\rangle$ of ρ. Hence QSVD requires a number of qubits between n and 2n. Moreover, we expect that noise will be a bigger issue for QSVD than for VQSE, since in practice the assumption that one has a pure state in QSVD can often be violated due to incoherent noise during state preparation.

Quantum-inspired classical algorithms⁶³ for PCA can perform well in practice, provided that the matrix has a very large dimension, low rank, and low condition number⁶⁴. We note that VQSE does not have such limitations, except the fact that VQSE yields results with high accuracy for low-rank states. Here is also paramount to recall that recent results have show that an exponential advantage is still possible for PCA^65,66, even for near-term algorithms, as the quantum-inspired classical algorithms are artificially given too much power via access to quantum state amplitudes. Thus, in view of these recent result, VQSE can be useful in the quest for achieving a quantum speedup with quantum PCA, particularly for analysis of quantum data. For the case of classical data, the success of VQSE for PCA relies on the efficiency of preparing a quantum state corresponding to the covariance matrix of the classical data⁶⁷. In addition, VQSE has applications not only for PCA but also for other tasks such as entanglement spectroscopy and error mitigation on NISQ devices. For error mitigation, we leave for future work combining our approach with other error mitigation techniques such as Virtual Distillation^68,69, which also seeks to re-purify noisy quantum states.

Future directions

Due to the rapid rise of VQE², much research has gone into how to prepare ground and excited states on NISQ devices. However, more research is needed on how to characterize these states, once prepared. This is where VQSE comes in, as VQSE can extract the entanglement spectra of these states and hence characterize important properties like topological order⁵⁷. Hence it is worth exploring in the future the idea of pairing up the VQE and VQSE algorithms, where VQSE is implemented immediately after VQE.

Furthermore, VQSE has immediate application for estimating the fidelity of two quantum states with reduced resource requirements. This is because an algorithm was previously introduced⁸ to estimate fidelity by using state diagonlization as a subroutine, and hence VQSE can provide a more efficient version of this subroutine.

A crucial technical idea in this work was our adaptively-updated cost function, which improved optimization performance. It is worth investigating whether this adaptive method can improve the performance of other VQAs^{2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27}.

Another direction to explore is whether VQSE exhibits noise resilience¹⁷. We suspect this to be true given the similar structure of VQSE and the variational quantum compiling algorithms investigated in ref. ¹⁷.

This is important as we are proposing that VQSE will be a useful tool for error mitigation. Namely, we envision that VQSE could be used as a subroutine to improve the accuracy of several quantum algorithms. For example, one could use VQSE to re-purify the noisy quantum state obtained as the outcome of the VQE algorithm. Alternatively, one could periodically perform VQSE whilst running a dynamical quantum simulation on a NISQ device, which would re-purify the state as it is evolving in time. This could allow one to simulate long-time dynamics, i.e., times significantly beyond the coherence time of a NISQ device.

Methods

Operational meaning of the cost function

In this section, we provide a derivation for Eq. (16). First, we rewrite the eigenvalue error in Eq. (15) as follows:

$$\begin{array}{r}{\varepsilon }_{\lambda }={{{{\boldsymbol{\lambda }}}}}^{m}\cdot {{{{\boldsymbol{\lambda }}}}}^{m}+{{{{\widetilde{\boldsymbol{\lambda }}}}}}^{m}\cdot {{{{\widetilde{\boldsymbol{\lambda }}}}}}^{m}-2{{{{\boldsymbol{\lambda }}}}}^{m}\cdot {{{{\widetilde{\boldsymbol{\lambda }}}}}}^{m}\,,\end{array}$$

(20)

where λ^m ≡ (λ₁, …, λ_m) and ${{{{\widetilde{\boldsymbol{\lambda }}}}}}^{m}\equiv ({\widetilde{\lambda }}_{1},\ldots ,{\widetilde{\lambda }}_{m})$. Since the eigenvalues of a positive semidefinite operator majorize its diagonal elements, we have that ${{{{\boldsymbol{\lambda }}}}}^{m}\succ {{{{\widetilde{\boldsymbol{\lambda }}}}}}^{m}$. Moreover, from the Schur convexity property of the dot product with an ordered vector, it follows that ${{{{\boldsymbol{\lambda }}}}}^{m}\cdot {{{{\widetilde{\boldsymbol{\lambda }}}}}}^{m}\,\geqslant\, {{{{\widetilde{\boldsymbol{\lambda }}}}}}^{m}\cdot {{{{\widetilde{\boldsymbol{\lambda }}}}}}^{m}$, which further implies the following inequality:

$${\varepsilon }_{\lambda }\,\leqslant\, {{{{\boldsymbol{\lambda }}}}}^{m}\cdot {{{{\boldsymbol{\lambda }}}}}^{m}-{{{{\widetilde{\boldsymbol{\lambda }}}}}}^{m}\cdot {{{{\widetilde{\boldsymbol{\lambda }}}}}}^{m}\,.$$

(21)

Similarly, from Eq. (15) we get

$$\begin{array}{r}{\varepsilon }_{v}\,\leqslant\, {{{{\boldsymbol{\lambda }}}}}^{m}\cdot {{{{\boldsymbol{\lambda }}}}}^{m}-{{{{\widetilde{\boldsymbol{\lambda }}}}}}^{m}\cdot {{{{\widetilde{\boldsymbol{\lambda }}}}}}^{m}\,,\end{array}$$

(22)

where we again used the fact that the eigenvalues of a positive semidefinite operator majorize its diagonal elements, and hence ${{{{\boldsymbol{\lambda }}}}}^{m}\cdot {{{{\boldsymbol{\lambda }}}}}^{m}\,\geqslant\, \mathop{\sum }\nolimits_{i=1}^{m}\langle {\widetilde{\lambda }}_{i}| {\rho }^{2}| {\widetilde{\lambda }}_{i}\rangle$.

We recall from Eq. (3) that the VQSE cost function can be expressed as $C=\mathop{\sum }\nolimits_{i = 1}^{d}{E}_{i}{p}_{i}$, where we omit the θ, and t dependence of C. Therefore, the following chain of inequalities hold:

$$\begin{array}{ll}C\,\geqslant\, \mathop{\sum }\limits_{i=1}^{m}{E}_{i}{p}_{i}+{E}_{m+1}\mathop{\sum}\limits_{i > m}{p}_{i}\\ \quad=\,{E}_{m+1}-\left(\mathop{\sum }\limits_{i=1}^{m}{p}_{i}({E}_{m+1}-{E}_{i})\right)\\ \quad\geqslant\, {E}_{m+1}-\sqrt{\left(\mathop{\sum }\limits_{i=1}^{m}{p}_{i}^{2}\right)\left(\mathop{\sum }\limits_{i=1}^{m}{({E}_{m+1}-{E}_{i})}^{2}\right)},\end{array}$$

(23)

where d = 2ⁿ. The first inequality follows the fact that E_i ⩾ E_m+1, ∀ i ⩾ m + 1 and ${\sum }_{i \,{ > }\,m}{p}_{i}=1-\mathop{\sum }\nolimits_{i = 1}^{m}{p}_{i}$. The second inequality follows from the Cauchy–Schwarz inequality for the dot product of two vectors ∣u ⋅ v∣ ⩽ ∣u∣∣v∣. By combining Eq. (23) with the fact that $\mathop{\sum }\nolimits_{i = 1}^{m}{\widetilde{\lambda }}_{i}^{2}\geqslant \mathop{\sum }\nolimits_{i = 1}^{m}{p}_{i}^{2}$ (since ${\widetilde{\lambda }}_{i}\in {{{\mathcal{L}}}}$ are the largest diagonal elements of $\widetilde{\rho }$), we find that

$$\sqrt{\mathop{\sum }\limits_{i=1}^{m}{\tilde{\lambda }}_{i}^{2}}\,\geqslant\, \frac{{E}_{m+1}-C}{\sqrt{\mathop{\sum }\nolimits_{i = 1}^{m}{({E}_{m+1}-{E}_{i})}^{2}}}\,.$$

(24)

Using the fact that ${{{{\boldsymbol{\lambda }}}}}^{m}\cdot {{{{\boldsymbol{\lambda }}}}}^{m}\,\leqslant\, {{{\boldsymbol{\lambda }}}}\cdot {{{\boldsymbol{\lambda }}}}={{{\rm{Tr}}}}[{\rho }^{2}]$, we obtain the following equality from (24)

$${{{{\boldsymbol{\lambda }}}}}^{m}\cdot {{{{\boldsymbol{\lambda }}}}}^{m}-{{{{\widetilde{\boldsymbol{\lambda }}}}}}^{m}\cdot {{{{\widetilde{\boldsymbol{\lambda }}}}}}^{m}\,\leqslant\, {{{\rm{Tr}}}}[{\rho }^{2}]-\frac{{({E}_{m+1}-C)}^{2}}{\mathop{\sum }\nolimits_{i = 1}^{m}{({E}_{m+1}-{E}_{i})}^{2}}\,.$$

Combining this with (21) and (22) leads to (16).

Verification of solution quality

Here we provide a proof of Eq. (17), and we show that this bound is tighter than the bound in (16). From the definition of the eigenvalue and eigenvector error in (15), it is straightforward to see that ${\varepsilon }_{\lambda }\,\leqslant\, \mathop{\sum }\nolimits_{i = 1}^{d}{({\lambda }_{i}-{\widetilde{\lambda }}_{i})}^{2}$, and ${\varepsilon }_{v}\,\leqslant\, \mathop{\sum }\nolimits_{i = 1}^{d}\langle {\delta }_{i}| {\delta }_{i}\rangle$, where d = 2ⁿ. By following a procedure similar to the one employed in deriving (22), we find

$${\varepsilon }_{\lambda }\,\leqslant\, {{{\boldsymbol{\lambda }}}}\cdot {{{\boldsymbol{\lambda }}}}-{{{\widetilde{\boldsymbol{\lambda }}}}}\cdot {{{\widetilde{\boldsymbol{\lambda }}}}}\,,$$

(25)

where we recall that λ and ${{{\widetilde{\boldsymbol{\lambda }}}}}$ denote d-dimensional vectors of ordered exact and estimated eigenvalues of ρ, respectively. Moreover, from $\left|{\delta }_{i}\right\rangle =({\mathbb{1}}-|{\widetilde{\lambda }}_{i}\rangle \langle {\widetilde{\lambda }}_{i}|)\rho |{\widetilde{\lambda }}_{i}\rangle$, it is straightforward to get

$${\varepsilon }_{v}\,\leqslant\, \mathop{\sum }\limits_{i=1}^{d}\langle {\widetilde{\lambda }}_{i}| {\rho }^{2}| {\widetilde{\lambda }}_{i}\rangle -{{{\widetilde{\boldsymbol{\lambda }}}}}\cdot {{{\widetilde{\boldsymbol{\lambda }}}}}={{{\boldsymbol{\lambda }}}}\cdot {{{\boldsymbol{\lambda }}}}-{{{\widetilde{\boldsymbol{\lambda }}}}}\cdot {{{\widetilde{\boldsymbol{\lambda }}}}}\,,$$

(26)

where we used the fact that $\mathop{\sum }\nolimits_{i = 1}^{d}\langle {\widetilde{\lambda }}_{i}| {\rho }^{2}| {\widetilde{\lambda }}_{i}\rangle ={{{\rm{Tr}}}}[{\rho }^{2}]$, which follows from the invariance of trace under a basis transformation.

Let ${{{\widehat{\boldsymbol{\lambda }}}}}=({\widetilde{\lambda }}_{1},\ldots ,{\widetilde{\lambda }}_{\widehat{m}},\frac{1-\mathop{\sum }\nolimits_{i = 1}^{\widehat{m}}{\widetilde{\lambda }}_{i}}{{2}^{n}-\widehat{m}},\ldots ,\frac{1-\mathop{\sum }\nolimits_{i = 1}^{\widehat{m}}{\widetilde{\lambda }}_{i}}{{2}^{n}-\widehat{m}})$, with $\widehat{m} \,>\, m$, be a vector majorized by ${{{\widetilde{\boldsymbol{\lambda }}}}}$, i.e., ${{{\widetilde{\boldsymbol{\lambda }}}}}\succ {{{\widehat{\boldsymbol{\lambda }}}}}$. Since the dot product with an ordered vector is a Schur convex function, we have ${{{\widehat{\boldsymbol{\lambda }}}}}\cdot {{{\widehat{\boldsymbol{\lambda }}}}}\,\leqslant\, {{{\widehat{\boldsymbol{\lambda }}}}}\cdot {{{\widetilde{\boldsymbol{\lambda }}}}}\,\leqslant\, {{{\widetilde{\boldsymbol{\lambda }}}}}\cdot {{{\widetilde{\boldsymbol{\lambda }}}}}$, which further implies the following inequality:

$${{{\boldsymbol{\lambda }}}}\cdot {{{\boldsymbol{\lambda }}}}-{{{\widetilde{\boldsymbol{\lambda }}}}}\cdot {{{\widetilde{\boldsymbol{\lambda }}}}}\,\leqslant\, {{{\boldsymbol{\lambda }}}}\cdot {{{\boldsymbol{\lambda }}}}-\left(\mathop{\sum }\limits_{i=1}^{\widehat{m}}{\widetilde{\lambda }}_{i}^{2}+\frac{{(1-\mathop{\sum }\nolimits_{i = 1}^{\widehat{m}}{\widetilde{\lambda }}_{i})}^{2}}{{2}^{n}-\widehat{m}}\right).$$

(27)

This inequality can be combined with (25) and (26) to obtain the bound in (17).

We now show that (17) is tighter than (16). Specifically, we prove that the negative term in the right-hand side of (17) is larger than the one in (16). Consider the following chain of inequalities:

$$\begin{array}{l}\left(\mathop{\sum }\limits_{i=1}^{\widehat{m}}{\widetilde{\lambda }}_{i}^{2}+\frac{{(1-\mathop{\sum }\nolimits_{i = 1}^{\widehat{m}}{\widetilde{\lambda }}_{i})}^{2}}{{2}^{n}-\widehat{m}}\right)\geqslant \mathop{\sum }\limits_{i=1}^{m}{\widetilde{\lambda }}_{i}^{2}\\ \qquad\qquad\qquad\qquad\qquad\quad\geqslant \frac{{({E}_{m+1}-C)}^{2}}{\mathop{\sum }\nolimits_{i=1}^{m}{({E}_{m+1}-{E}_{i})}^{2}}\,,\end{array}$$

where we used $\widehat{m} \,>\, m$, and where the last inequality follows from (24).

Gradient of the cost function

Here we show that the partial derivative of (4) with respect to an angle θ_ν is given by

$$\begin{array}{l}\displaystyle{\frac{\partial C(t,{{{\boldsymbol{\theta }}}})}{\partial {\theta }_{\nu }}}=\frac{1}{2}\left({{{\rm{Tr}}}}\left[H(t)V({{{{\boldsymbol{\theta }}}}}_{+})\rho {V}^{{\dagger} }({{{{\boldsymbol{\theta }}}}}_{+})\right]\right.\\ \left.\qquad\quad-{{{\rm{Tr}}}}\left[H(t)V({{{{\boldsymbol{\theta }}}}}_{-})\rho {V}^{{\dagger} }({{{{\boldsymbol{\theta }}}}}_{-})\right]\right)\,.\end{array}$$

(28)

Writing θ = (θ₁, …, θ_ν, …), then θ_± are simply given by θ_± = (θ₁, …, θ_ν ± π/2, …), which shows that the gradient values are efficiently accessible by shifting the parameters in θ and measuring the expectation value 〈H(t)〉.

Let us consider the layered hardware efficient ansatz of Fig. 3a. Here V(θ) consists of a fixed number L of layers of 2-qubit gates B_μ(θ_μ) acting on alternating pairs of neighboring qubits. Moreover, B_μ(θ_μ) can always be expressed as a product of η_μ gates from a given alphabet ${{{\mathcal{A}}}}=\{{U}_{k}({\theta }_{k})\}$ as

$${B}_{\mu }({{{{\boldsymbol{\theta }}}}}_{\mu })={U}_{{\eta }_{\mu }}({\theta }_{\mu }^{{\eta }_{\mu }})\ldots {U}_{\nu }({\theta }_{\mu }^{\nu })\ldots {U}_{1}({\theta }_{\mu }^{{\eta }_{1}})\,.$$

(29)

Here ${\theta }_{\mu }^{{\eta }_{\mu }}$ are continous parameters, and we can always write without loss of generality U_k(θ) = R_k(θ)T_k, where ${R}_{k}(\theta )={\rm{e}}^{{\rm{i}}\theta {\sigma }_{k}/2}$ is a single qubit rotation and T_k is an unparametrized gate.

We can then compute ${\partial }_{\nu }{B}_{\mu }({{{{\boldsymbol{\theta }}}}}_{\mu })\equiv \partial {B}_{\mu }({{{{\boldsymbol{\theta }}}}}_{\mu })/\partial {\theta }_{\mu }^{\nu }$ as

$${\partial }_{\nu }{B}_{\mu }({{{{\boldsymbol{\theta }}}}}_{\mu })=\frac{{\rm{i}}}{2}{U}_{{\eta }_{\mu }}({\theta }_{\mu }^{{\eta }_{\mu }})\ldots {\sigma }_{\nu }{U}_{\nu }({\theta }_{\mu }^{\nu })\ldots {U}_{1}({\theta }_{\mu }^{{\eta }_{1}})\,.$$

(30)

Then, without loss of generality let us write V(θ) = V_L(θ_L)B_μ(θ_μ)V_R(θ_R), where V_L(θ_L), and V_R(θ_R) contain all gates in V(θ) except for B_μ(θ_μ). By noting that ∂_νV(θ) = V_L(θ_L)∂_νB_μ(θ_μ)V_R(θ_R), we have

$$\begin{array}{ll}{\partial }_{\nu }C={{{\rm{Tr}}}}\left[H{V}_{L}{\partial }_{\nu }{B}_{\mu }{V}_{R}\rho {V}_{R}^{{\dagger} }{B}_{\mu }^{{\dagger} }{V}_{L}^{{\dagger} }\right]\\ \qquad+\,{{{\rm{Tr}}}}\left[H{V}_{L}{B}_{\mu }{V}_{R}\rho {V}_{R}^{{\dagger} }{\partial }_{\nu }{B}_{\mu }^{{\dagger} }{V}_{L}^{{\dagger} }\right]\,,\end{array}$$

where we omitted the parameter dependence for simplicity. Then, from Eq. (30) and using the following identity (which is valid for any matrix A)

$$i[{\sigma }_{\nu }A]={R}_{\nu }\left(-\frac{\pi }{2}\right)A{R}_{\nu }^{{\dagger} }\left(-\frac{\pi }{2}\right)-{R}_{\nu }\left(\frac{\pi }{2}\right)A{R}_{\nu }^{{\dagger} }\left(\frac{\pi }{2}\right)\,,$$

(31)

where ${R}_{k}(\theta )={\rm{e}}^{{\rm{i}}\theta {\sigma }_{k}/2}$, we obtain

$$\begin{array}{l}\displaystyle{\frac{\partial C(t,{{{\boldsymbol{\theta }}}})}{\partial {\theta }_{\nu }}}=\frac{1}{2}\left({{{\rm{Tr}}}}\left[H(t)V({{{{\boldsymbol{\theta }}}}}_{+})\rho {V}^{{\dagger} }({{{{\boldsymbol{\theta }}}}}_{+})\right]\right.\\ \left.\quad\qquad\qquad\;-\,{{{\rm{Tr}}}}\left[H(t)V({{{{\boldsymbol{\theta }}}}}_{-})\rho {V}^{{\dagger} }({{{{\boldsymbol{\theta }}}}}_{-})\right]\right)\,.\end{array}$$

(32)

Algorithm for the adaptive cost function

Algorithm 1 shows a simple adaptive strategy that illustrates how one can update H(t). Specifically, we consider the case when f(t) is a stepwise function. In addition, we define the VQSE optimization loop termination condition in terms of the maximum number of iterations allowed N_max. We also define an updating parameter s (with N_max/s being an integer) such that we update H_G(t) every s steps. Finally, here we use the term optimizer, denoted as opt, as a function that takes as inputs a set of parameters θ and a cost function C(t, θ) (or the gradient of the cost for gradient-based optimization) and returns an updated set of parameters that attempts to solve the minimization problem of Eq. (5).

Algorithm 1

Adaptive cost function with stepwise schedule f(t)

Input: state ρ; trainable unitary V(θ); integer m; timestep $\delta {{{\rm{t}}}}=1/{{{{\rm{N}}}}}_{\max }$; adapting stepsize t_s = 1/s; local time-independent Hamiltonian H_L; a set of constant parameters ${\left\{{q}_{i}\right\}}_{i = 1}^{m}$; classical optimizer opt.

Output: parameters θ_opt which minimize the cost function, i.e., ${{{{\boldsymbol{\theta }}}}}_{{{{\rm{opt}}}}}=\arg {\min }_{\theta }\,C({{{\boldsymbol{\theta }}}}).$

Init: randomly choose a set of initial parameters θ; H(t) ← H_L; t ← δt

1: while t ⩽ 1 do

2: if t if divisible by t_s then

3: measure V(θ)ρV^†(θ) in the standard basis.

define the sets ${{{\mathcal{L}}}}$ and ${{{\mathcal{Z}}}}$

4: ${H}_{G}(t)\leftarrow {\mathbb{1}}-\mathop{\sum }\nolimits_{i = 1}^{m}{q}_{i}\left|{{{{\boldsymbol{z}}}}}_{i}\right\rangle \left\langle {{{\boldsymbol{z}}}}_{i}\right|$ with ${z}_{i}\in {{{\mathcal{Z}}}}$

5: H(t) ← (1 − t)H_L + tH_G(t)

6: run opt with C and θ as input, and ${{{{\boldsymbol{\theta }}}}}_{\min }$ as output

7: ${{{\boldsymbol{\theta }}}}\leftarrow {{{{\boldsymbol{\theta }}}}}_{\min }$

8: t ← t + δt

9: if t = 1 then

10: θ_opt ← θ

Return: θ_opt

Data availability

Data generated and analyzed during the current study are available from the corresponding author upon reasonable request.

References

Preskill, J. Quantum computing in the NISQ era and beyond. Quantum 2, 79 (2018).
Article Google Scholar
Peruzzo, A. et al. A variational eigenvalue solver on a photonic quantum processor. Nat. Commun. 5, 4213 (2014).
Article ADS Google Scholar
Farhi, E., Goldstone, J. & Gutmann, S. A quantum approximate optimization algorithm, Preprint at https://arxiv.org/abs/1411.4028 (2014).
Johnson, P. D., Romero, J., Olson, J., Cao, Y. & Aspuru-Guzik, A. QVECTOR: An algorithm for device-tailored quantum error correction. Preprint at https://arxiv.org/abs/1711.02249 (2017).
Romero, J., Olson, J. P. & Aspuru-Guzik, A. Quantum autoencoders for efficient compression of quantum data. Quantum Sci. Technol. 2, 045001 (2017).
Article ADS Google Scholar
LaRose, R., Tikku, A., O’Neel-Judy, Étude, Cincio, L. & Coles, P. J. Variational quantum state diagonalization. Npj Quantum Inf. 5, 57 (2019).
Article Google Scholar
Arrasmith, A., Cincio, L., Sornborger, A. T., Zurek, W. H. & Coles, P. J. Variational consistent histories as a hybrid algorithm for quantum foundations. Nat. Commun. 10, 3438 (2019).
Article ADS Google Scholar
Cerezo, M., Poremba, A., Cincio, L. & Coles, P. J. Variational quantum fidelity estimation. Quantum 4, 248 (2020).
Article Google Scholar
Jones, T., Endo, S., McArdle, S., Yuan, X. & Benjamin, S. C. Variational quantum algorithms for discovering hamiltonian spectra. Phys. Rev. A 99, 062304 (2019).
Article ADS Google Scholar
Yuan, X., Endo, S., Zhao, Q., Li, Y. & Benjamin, S. C. Theory of variational quantum simulation. Quantum 3, 191 (2019).
Article Google Scholar
Li, Y. & Benjamin, S. C. Efficient variational quantum simulator incorporating active error minimization. Phys. Rev. X 7, 021050 (2017).
Google Scholar
Kokail, C. et al. Self-verifying variational quantum simulation of lattice models. Nature 569, 355 (2019).
Article ADS Google Scholar
Khatri, S. et al. Quantum-assisted quantum compiling. Quantum 3, 140 (2019).
Article Google Scholar
Jones, T. & Benjamin, S. C. Robust quantum compilation and circuit optimisation via energy minimisation. Quantum 6, 628 (2022).
Article Google Scholar
Heya, K., Suzuki, Y., Nakamura, Y. & Fujii, K. Variational quantum gate optimization. Preprint at https://arxiv.org/abs/1810.12745 (2018).
Endo, S., Sun, J., Li, Y., Benjamin, S. C. & Yuan, X. Variational quantum simulation of general processes. Phys. Rev. Lett. 125, 010501 (2020).
Article ADS MathSciNet Google Scholar
Sharma, K., Khatri, S., Cerezo, M. & Coles, P. Noise resilience of variational quantum compiling. New J. Phys. 22, 043006 (2020).
Carolan, J. et al. Variational quantum unsampling on a quantum photonic processor. Nat. Phys. 16, 322–327 (2020).
Article Google Scholar
Yoshioka, N., Nakagawa, Y. O., Mitarai, K. & Fujii, K. Variational quantum algorithm for non-equilirium steady states. Phys. Rev. Res. 2, 043289 (2020).
Article Google Scholar
Bravo-Prieto, C. et al. Variational quantum linear solver: A hybrid algorithm for linear systems. Preprint at https://arxiv.org/abs/1909.05820 (2019).
Xu, X. et al. Variational algorithms for linear algebra. Sci. Bull. 66, 2181–2188 (2021).
Article Google Scholar
McArdle, S. et al. Variational ansatz-based quantum simulation of imaginary time evolution. Npj Quantum Inf. 5, 1–6 (2019).
Article Google Scholar
Cirstoiu, C. et al. Variational fast forwarding for quantum simulation beyond the coherence time. Npj Quantum Inf. 6, 1–10 (2020).
Article Google Scholar
Otten, M., Cortes, C. L & Gray, S. K. Noise-resilient quantum dynamics using symmetry-preserving ansatzes. Preprint at https://arxiv.org/abs/1910.06284 (2019).
Lubasch, M., Joo, J., Moinier, P., Kiffner, M. & Jaksch, D. Variational quantum algorithms for nonlinear problems. Phys. Rev. A 101, 010301 (2020).
Article ADS Google Scholar
Verdon, G., Marks, J., Nanda, S., Leichenauer, S. & Hidary, J. Quantum hamiltonian-based models and the variational quantum thermalizer algorithm. Preprint at https://arxiv.org/abs/1910.02071 (2019).
Bravo-Prieto, C., García-Martín, D. & Latorre, JoséI. Quantum singular value decomposer. Phys. Rev. A 101, 062310 (2020).
Article ADS MathSciNet Google Scholar
Cao, Y. et al. Quantum chemistry in the age of quantum computing. Chem. Rev. 119, 10856–10915 (2019).
Article Google Scholar
McArdle, S., Endo, S., Aspuru-Guzik, A., Benjamin, S. C. & Yuan, X. Quantum computational chemistry. Rev. Mod. Phys. 92, 015003 (2020).
Article ADS MathSciNet Google Scholar
Dumitrescu, E. F. et al. Cloud quantum computing of an atomic nucleus. Phys. Rev. Lett. 120, 210501 (2018).
Article ADS Google Scholar
Wecker, D., Hastings, M. B. & Troyer, M. Progress towards practical quantum variational algorithms. Phys. Rev. A 92, 042303 (2015).
Article ADS Google Scholar
Bauer, B., Wecker, D., Millis, A. J., Hastings, M. B. & Troyer, M. Hybrid quantum-classical approach to correlated materials. Phys. Rev. X 6, 031045 (2016).
Google Scholar
Lloyd, S., Mohseni, M. & Rebentrost, P. Quantum principal component analysis. Nat. Phys. 10, 631 (2014).
Article Google Scholar
Horn, R. A. & Johnson, C. R. Matrix Analysis (Cambridge University Press, 1990).
Bhatia, R. Matrix Analysis Vol. 169 (Springer Science & Business Media, 2013).
Kandala, A. et al. Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets. Nature 549, 242 (2017).
Article ADS Google Scholar
McClean, J. R., Boixo, S., Smelyanskiy, V. N., Babbush, R. & Neven, H. Barren plateaus in quantum neural network training landscapes. Nat. Commun. 9, 4812 (2018).
Article ADS Google Scholar
Cerezo, M., Sone, A., Volkoff, T., Cincio, L. & Coles, P. J. Cost function dependent barren plateaus in shallow parametrized quantum circuits. Nat. Commun. 12, 1–12 (2021).
Article Google Scholar
Sharma, K., Cerezo, M., Cincio, L. & Coles, P. J. Trainability of dissipative perceptron-based quantum neural networks. Phys. Rev. Lett. 128, 180505 (2022).
Article ADS MathSciNet Google Scholar
Garcia-Saez, A. & Latorre, J. I. Addressing hard classical problems with adiabatically assisted variational quantum eigensolvers. Preprint at https://arxiv.org/abs/1806.02287 (2018).
Cincio, L., Subaşı, Y., Sornborger, A. T. & Coles, P. J. Learning the quantum algorithm for state overlap. New J. Phys. 20, 113022 (2018).
Article ADS Google Scholar
Verdon, G. et al. Learning to learn with quantum neural networks via classical neural networks. Preprint at https://arxiv.org/abs/1907.05415 (2019).
Grant, E., Wossnig, L., Ostaszewski, M. & Benedetti, M. An initialization strategy for addressing barren plateaus in parametrized quantum circuits. Quantum 3, 214 (2019).
Article Google Scholar
Volkoff, T. & Coles, P. J. Large gradients via correlation in random parameterized quantum circuits. Quantum Sci. Technol. 6, 025008 (2021).
Article ADS Google Scholar
Hadfield, S. et al. From the quantum approximate optimization algorithm to a quantum alternating operator ansatz. Algorithms 12, 34 (2019).
Article MathSciNet MATH Google Scholar
Nakanishi, K. M., Fujii, K. & Todo, S. Sequential minimal optimization for quantum-classical hybrid algorithms. Phys. Rev. Res. 2, 043158 (2020).
Article Google Scholar
Parrish, R. M., Iosue, J. T., Ozaeta, A. & McMahon, P. L. A Jacobi diagonalization and Anderson acceleration algorithm for variational quantum algorithm parameter optimization. Preprint at https://arxiv.org/abs/1904.03206 (2019).
Harrow, A. W. & Napp, J. C. Low-depth gradient measurements can improve convergence in variational hybrid quantum-classical algorithms. Phys. Rev. Lett. 126, 140502 (2021).
Article ADS Google Scholar
Kübler, J. M., Arrasmith, A., Cincio, L. & Coles, P. J. An adaptive optimizer for measurement-frugal variational algorithms. Quantum 4, 263 (2020).
Article Google Scholar
Sweke, R. et al. Stochastic gradient descent for hybrid quantum-classical optimization. Quantum 4, 314 (2020).
Article Google Scholar
Arrasmith, A., Cincio, L., Somma, R. D. & Coles, P. J. Operator sampling for shot-frugal optimization in variational algorithms. Preprint at https://arxiv.org/abs/2004.06252 (2020).
Mitarai, K., Negoro, M., Kitagawa, M. & Fujii, K. Quantum circuit learning. Phys. Rev. A 98, 032309 (2018).
Article ADS Google Scholar
Schuld, M., Bergholm, V., Gogolin, C., Izaac, J. & Killoran, N. Evaluating analytic gradients on quantum hardware. Phys. Rev. A 99, 032331 (2019).
Article ADS Google Scholar
Cruz, D. et al. Efficient quantum algorithms for GHZ and W states, and implementation on the ibm quantum computer. Adv. Quantum Technol. 2, 1900015 (2019).
Article Google Scholar
Bärtschi, A. & Eidenbenz, S. International Symposium on Fundamentals of Computation Theory 126–139 (Springer, 2019).
Abdo, B. et al. IBM Q 16 Melbourne backend specification, https://github.com/Qiskit/ibmq-device-information/tree/master/backends/melbourne/V1 (2018).
Li, H. & Haldane, F. D. M. Entanglement spectrum as a generalization of entanglement entropy: Identification of topological order in non-abelian fractional quantum hall effect states. Phys. Rev. Lett. 101, 010504 (2008).
Article ADS Google Scholar
Subaşı, Y., Cincio, L. & Coles, P. J. Entanglement spectroscopy with a depth-two quantum circuit. J. Phys. A Math. Theor. 52, 044001 (2019).
Article ADS MATH Google Scholar
Giampaolo, S. M., Montangero, S., Dell’Anno, F., De Siena, S. & Illuminati, F. Universal aspects in the behavior of the entanglement spectrum in one dimension: Scaling transition at the factorization point and ordered entangled structures. Phys. Rev. B 88, 125142 (2013).
Article ADS Google Scholar
Cerezo, M., Rossignoli, R. & Canosa, N. Nontransverse factorizing fields and entanglement in finite spin systems. Phys. Rev. B 92, 224422 (2015).
Article ADS Google Scholar
Cerezo, M., Rossignoli, R. & Canosa, N. Factorization in spin systems under general fields and separable ground-state engineering. Phys. Rev. A 94, 042335 (2016).
Article ADS Google Scholar
Nielsen, M. A. & Chuang, I. L. Quantum Computation and Quantum Information (Cambridge University Press, 2010).
Tang, E. A quantum-inspired classical algorithm for recommendation systems. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing 217–228 (2019).
Arrazola, J. M., Delgado, A., Bardhan, B. R. & Lloyd, S. Quantum-inspired algorithms in practice. Quantum 4, 307 (2020).
Article Google Scholar
Cotler, J., Huang, H.-Y. & McClean, J. R. Revisiting dequantization and quantum advantage in learning tasks. Preprint at https://arxiv.org/abs/2112.00811 (2021).
Huang, H.-Y. et al. Quantum advantage in learning from experiments. Science 376, 1182–1186 (2022).
Article ADS MathSciNet Google Scholar
Aaronson, S. Read the fine print. Nat. Phys. 11, 291–293 (2015).
Article Google Scholar
Koczor, B. Exponential error suppression for near-term quantum devices. Phys. Rev. X 11, 031057 (2021).
Google Scholar
Huggins, W. J. et al. Virtual distillation for quantum error mitigation. Phys. Rev. X 11, 041036 (2021).
Google Scholar

Download references

Acknowledgements

We thank Lukasz Cincio for helpful conversations. All authors acknowledge support from LANL’s Laboratory Directed Research and Development (LDRD) program. M.C. was also supported by the Center for Nonlinear Studies at LANL. P.J.C. also acknowledges support from the LANL ASC Beyond Moore’s Law Project. This work was also supported by the U.S. Department of Energy (DOE), Office of Science, Office of Advanced Scientific Computing Research, under the Quantum Computing Application Teams program, and by the U.S. DOE, Office of Science, Basic Energy Sciences, Materials Sciences and Engineering Division, Condensed Matter Theory Program.

Author information

Authors and Affiliations

Theoretical Division, MS B213, Los Alamos National Laboratory, Los Alamos, NM, 87545, USA
M. Cerezo, Kunal Sharma, Andrew Arrasmith & Patrick J. Coles
Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, NM, USA
M. Cerezo
Hearne Institute for Theoretical Physics and Department of Physics and Astronomy, Louisiana State University, Baton Rouge, LA, USA
Kunal Sharma

Authors

M. Cerezo
View author publications
You can also search for this author in PubMed Google Scholar
Kunal Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Arrasmith
View author publications
You can also search for this author in PubMed Google Scholar
Patrick J. Coles
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The project was conceived by P.J.C. The manuscript was written by M.C., K.S., A.A., and P.J.C. The theoretical results were derived by M.C., K.S., and P.J.C. M.C. and K.S. performed the numerical simulations in Figs. 4, 5, and 7. A.A. developed the noisy simulator and performed the numerical simulations for error mitigation of Fig. 6. M.C. and K.S. contributed equally to this work and are considered as co-first authors.

Corresponding author

Correspondence to M. Cerezo.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Cerezo, M., Sharma, K., Arrasmith, A. et al. Variational quantum state eigensolver. npj Quantum Inf 8, 113 (2022). https://doi.org/10.1038/s41534-022-00611-6

Download citation

Received: 03 June 2020
Accepted: 04 August 2022
Published: 21 September 2022
DOI: https://doi.org/10.1038/s41534-022-00611-6

This article is cited by

Theoretical guarantees for permutation-equivariant quantum neural networks
- Louis Schatzki
- Martín Larocca
- M. Cerezo
npj Quantum Information (2024)
A semi-agnostic ansatz with variable structure for variational quantum algorithms
- M. Bilkis
- M. Cerezo
- Lukasz Cincio
Quantum Machine Intelligence (2023)

Subjects

Abstract

Similar content being viewed by others

Variational quantum state diagonalization

Variational quantum non-orthogonal optimization

A universal variational quantum eigensolver for non-Hermitian systems

Introduction

Results

Theoretical basis of VQSE

The VQSE algorithm

Cost functions

Fixed Hamiltonians

Adaptive Hamiltonian

Operational meaning of the cost function

Verification of solution quality

Ansatz

Optimization

Numerical Implementations

VQSE for quantum principal component analysis

Error mitigation

Entanglement spectroscopy

Discussion

Comparison to literature

Future directions

Methods

Operational meaning of the cost function

Verification of solution quality

Gradient of the cost function

Algorithm for the adaptive cost function

Algorithm 1

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Theoretical guarantees for permutation-equivariant quantum neural networks

A semi-agnostic ansatz with variable structure for variational quantum algorithms

Search

Quick links