Quantum simulation (QS) was the earliest proposed example of a quantum algorithm that could outcompete the best classical algorithm1. Accelerated QS would impact fields, including chemistry, materials science, and nuclear and high-energy physics. Current approaches include quantum emulation (or analogue QS)2,3,4,5,6, Suzuki–Trotter-based methods7,8,9,10, and Taylor expansion-based QSs using linear combinations of unitaries11,12,13. Quantum emulation and Suzuki–Trotter-based QSs have seen proof-of-principle demonstrations2,3,4,5,6,14, while Taylor expansion-based QSs have the best asymptotic scaling and will likely have application for fault-tolerant quantum computers (QCs) of the future.

In the current noisy intermediate-scale quantum (NISQ) era, variational quantum simulation (VQS) methods are expected to be important. Variational algorithms have been introduced for finding ground and excited states15,16,17,18, and for other applications19,20,21,22. In addition, some variational algorithms simulate system dynamics23,24,25,26. Of the variational dynamical simulation methods, some are based on knowledge of low-lying excited states26, and some are iterative in time23,24,25. Both approaches have the potential to outperform Suzuki–Trotter-based methods in the NISQ era.

Simulating the dynamics of a quantum system for time T typically requires Ω(T) gates so that a generic Hamiltonian evolution cannot be achieved in sublinear time. This result is known as the “no fast-forwarding theorem”, and holds both for a typical unknown Hamiltonian27 and for the query model setting28. However, there are particular Hamiltonians that can be fast-forwarded, which means that the quantum circuit depth does not need to grow significantly with simulation time. Hamiltonians that allow fast-forwarding are precisely those that lead to violations of time–energy uncertainty relations and equivalently allow for precise energy measurements27. For example, commuting local Hamiltonians27, quadratic fermionic Hamiltonians27, and continuous-time quantum walks on particular graphs29 can all be fast-forwarded. In addition, ref. 30 exploited the exact solvability of the transverse Ising model to formulate a quantum circuit for its exact diagonalization, allowing for fast-forwarding. This circuit was used to simulate the Ising model on Cloud QCs31. A subspace-search variational eigensolver was employed in ref. 26; to fast-forward low-lying states in a quantum system. In ref. 32, a Hamiltonian whose diagonalization is constructed out of IQP circuits is shown to give a quantum advantage for the task of energy sampling. More generally, it remains an open problem to determine the precise form for Hamiltonians that do and do not allow fast-forwarding.

Previous results analyze fast-forwarding of Hamiltonians mostly in a computational complexity setting27,28,33,34, in which the asymptotic scaling of the runtime of quantum circuits implementing a large-scale simulation is important. However, near-term devices are constrained to simulating intermediate-scale systems using finite depth circuits. The behavior of an algorithm to simulate large systems and long times may not be indicative of its behavior in smaller scale regimes. Therefore, as discussed further in Supplementary Note 5, whether or not asymptotic fast-forwarding is possible for a particular Hamiltonian has limited impact on the simulations that may be performed using near-term QCs.

The advantage of fast-forwarding, if possible, for near-term QCs is that the simulation time T can be much longer than the coherence time τ of the QC performing the simulation. This is because T is just a parameter that is set “by hand” in a fixed-depth quantum circuit26,30. Therefore we ask the following core question: Can we fast forward the evolution of a Hamiltonian beyond the coherence time of a near-term device using a variational algorithm?

In this paper, we introduce a variational, hybrid quantum-classical algorithm that we call variational fast forwarding (VFF). We envision it to be most useful for implementing QSs on near-term, NISQ computers. However, it could also have uses in fault-tolerant QS. It is distinct from SVQS26 in that our method searches for an approximate diagonalization of an entire QS unitary, rather than for a finite set of low-lying states. Most importantly, we analyze the simulation errors produced by VFF and guarantee a desired accuracy for the simulation once a termination condition is achieved. This is possible due to the operational meaning of our cost function. In contrast, low-energy subspace approaches as in SVQS may not be able to guarantee a desired simulation error, since the cost function (i.e., the energy) does not carry an obvious operational meaning.

The basic idea of VFF is depicted in Fig. 1. The “Results” section presents our main results including an overview of the algorithm, the cost function, error analysis, and implementations of VFF on a simulator and on Rigetti’s QC. The “Discussion” section discusses these results, and the “Methods” section elaborates on our ansatz and training methods.

Fig. 1: The concept of VFF.
figure 1

a A Trotterization-based quantum simulation with N = 5 timesteps. This simulation runs past the coherence limit of the quantum architecture. b A VFF-based quantum simulation. An approximate diagonalization of a short-time simulation is found variationally. Using the eigenvector W and diagonal D unitaries that were learned, an arbitrary length simulation is implemented by modifying the parameters in D. As long as VFF results in few enough gates that the circuit does not exceed the coherence time, longer simulations can be performed than the standard method in a.


The VFF algorithm


Given a Hamiltonian H on a d = 2n dimensional Hilbert space (i.e., on n qubits) evolved for a short time Δt with the simulation unitary eiHΔt, the goal is to find an approximation that allows the simulation at later times T to be fast-forwarded beyond the coherence time. Figure 2 schematically shows the VFF algorithm, which consists of the following steps:

  1. 1.

    Implement a unitary circuit Ut) to approximate eiHΔt, the simulation at a small timestep.

  2. 2.

    Compile Ut) to a diagonal factorization V = WDW ≈ eiHΔt with circuit depth L.

  3. 3.

    Approximately fast-forward the QS at large time T = NΔt using the same circuit of depth L: eiHT ≈ WDNW.

Fig. 2: The VFF algorithm.
figure 2

a An input Hamiltonian is transformed into b a gate sequence associated with a single-timestep Trotterized unitary, Ut). c The unitary is then variationally diagonalized by fitting a parameterized factorization, V(α, Δt) = W(θ)D(γ, Δt)W(θ). This variational subroutine employs gradient descent to minimize a cost function CLHST, whose gradient is efficiently estimated with a short-depth quantum circuit called the Local Hilbert–Schmidt Test (LHST). The variational loop is exited when a termination condition given by Eq. (18) is reached, which guarantees that a user-defined bound on the average fidelity \(\overline{F}(T)\) is achieved. d After the termination condition is reached, the optimal parameters (θopt, γopt) are used to implement a fast-forwarded simulation, with the fast-forwarding error growing sublinearly in the simulation time (see Eq. (14)). The fast-forwarding is performed by modifying the parameters of the diagonal unitary, D(γopt, Δt) → D(Nγopt, Δt), producing a quantum simulation unitary, W(θopt)D(Nγopt, Δt)W(θopt).

Typically Ut) will be a single-timestep Trotterized unitary approximating eiHΔt. We variationally search for an approximate diagonalization of Ut) by compiling it to a unitary with a structure of the form

$$V({\boldsymbol{\alpha }},\Delta t):=W({\boldsymbol{\theta }})D({\boldsymbol{\gamma }},\Delta t)W{({\boldsymbol{\theta }})}^{\dagger },$$

with α = (θ, γ) being a vector of parameters. Here, D(γ, Δt) is a parameterized unitary composed of commuting unitaries that encode the eigenvalues of Ut), while W(θ) is a parameterized unitary matrix consisting of corresponding eigenvectors. In the “Methods” section, we describe layered structures that provide ansätze for the circuits W(θ) and D(γ, Δt), and we detail our gradient-descent optimization methods for training θ and γ.

To approximately diagonalize Ut), the parameters α = (θ, γ) are variationally optimized to minimize a cost function CLHST(Ut), V) that can be evaluated using a short-depth quantum circuit called the Local Hilbert–Schmidt Test (LHST)35 shown in Fig. 2c. The compilation procedure we employ to approximate Ut) by V(α, Δt) makes use of the quantum-assisted quantum compiling (QAQC) algorithm35, that was later shown to be robust to quantum hardware noise36. The “Cost function and cost evaluation” section below elaborates on our cost function.

If we can find such an approximate diagonalization for Ut) then, for any total simulation time, T = NΔt, we have:

$${e}^{-iHT}={({e}^{-iH\Delta t})}^{N},$$
$$\approx {(U(\Delta t))}^{N},$$
$$\approx W({\boldsymbol{\theta }})D{({\boldsymbol{\gamma }},\Delta t)}^{N}W{({\boldsymbol{\theta }})}^{\dagger },$$
$$=W({\boldsymbol{\theta }})D(N{\boldsymbol{\gamma }},\Delta t)W{({\boldsymbol{\theta }})}^{\dagger }.$$

Hence, a QS for any total time, T, may be performed with a fixed-quantum circuit structure as depicted in Fig. 2d. In “Simulation error analysis”, we perform an error analysis to investigate how the approximate equalities in Eqs. (3) and (4) affect the overall simulation error.

Cost function and cost evaluation

For the variational compiling step of VFF (shown in Fig. 2c), we employ the cost function CLHST(U, V) introduced in ref. 35. This is defined as

$${C}_{{\rm{LHST}}}(U,V)=1-\frac{1}{n}\mathop{\sum }\limits_{j = 1}^{n}{F}_{e}^{(j)},$$

where the \({F}_{e}^{(j)}\) are entanglement fidelities and hence satisfy \(0\le {F}_{e}^{(j)}\le 1\). Specifically, \({F}_{e}^{(j)}\) is the entanglement fidelity for the quantum channel obtained from feeding into the unitary UV, the maximally mixed state on \(\overline{j}\) and then tracing over \(\overline{j}\) at the output of UV, where \(\overline{j}\) contains all qubits except for the j-qubit. We elaborate on the form of CLHST(U, V) in Supplementary Note 1.

This function has several important properties.

  1. 1.

    It is faithful, vanishing if and only if V = U (up to a global phase).

  2. 2.

    Nonzero values are operationally meaningful. Namely, CLHST(U, V) upper bounds the average-case compilation error as follows:

    $${C}_{{\rm{LHST}}}(U,V)\ge \frac{d+1}{nd}(1-\overline{F}(U,V)),$$

    where \(\overline{F}(U,V)\) is the average fidelity of states acted upon by V versus those acted upon by U, with the average being over all Haar-measure pure states.

  3. 3.

    The cost function appears to be trainable, in the sense that it does not have an obvious barren plateau issue (i.e., exponentially vanishing gradient, see refs. 35,37).

  4. 4.

    Estimating the cost function is DQC-1 hard and hence it cannot be efficiently estimated with a classical algorithm35.

  5. 5.

    There exists a short-depth quantum circuit for efficiently estimating the cost and its gradient.

Regarding the last point, each \({F}_{e}^{(j)}\) term in Eq. (6) is estimated with a different quantum circuit, and then one classically sums them up to compute CLHST(U, V). An example of such a circuit is depicted in Fig. 2c. It involves 2n qubits, with the top (bottom) n qubits denoted A (B). The probability of the 00 measurement outcome on qubits AjBj in this circuit is precisely the entanglement fidelity \({F}_{e}^{(j)}\). Therefore 2n single-qubit measurements are required to compute CLHST(U, V), a favorable scaling compared to, for example, the O(n4) measurements that are naively required for VQE15,38,39,40. We also remark that CLHST(U, V) was recently shown to have noise resilience properties, in that noise acting during the circuit in Fig. 2c tends not to affect the global optimum of this function36.

For simplicity, we will often write our cost function as

$${C}_{{\rm{LHST}}}^{{\rm{VFF}}}(T):={C}_{{\rm{LHST}}}({U}^{\frac{T}{\Delta t}},{V}^{\frac{T}{\Delta t}}),$$

with U = Ut) and V = V(α, Δt), and note that \({C}_{{\rm{LHST}}}^{{\rm{VFF}}}(\Delta t)\) is the quantity that we directly minimize in the optimization loop of VFF.

Simulation error analysis

Linear scaling in N

In practice, each of the steps in the VFF algorithm above will generate errors. This includes the algorithmic error from the approximate implementation, Ut), of the infinitesimal time evolution operator eiHΔt and error from the approximate compilation and diagonalization of Ut) into V(α, Δt). These two error sources bound the overall error via the triangle inequality:

$${\epsilon }_{p}^{{\rm{FF}}}(\Delta t)\le {\epsilon }_{p}^{{\rm{TS}}}(\Delta t)+{\epsilon }_{p}^{{\rm{ML}}}(\Delta t).$$

Here, \({\epsilon }_{p}^{{\rm{FF}}}(\Delta t)\) is the overall simulation error for time Δt, \({\epsilon }_{p}^{{\rm{TS}}}(\Delta t)\) is the Trotterization error (note that this error may always be reduced using higher-order Trotterizations at the cost of more gates), and \({\epsilon }_{p}^{{\rm{ML}}}(\Delta t)\) is the “machine learning” error associated with the variational compilation step. These quantities are defined as

$${\epsilon }_{p}^{{\rm{FF}}}(\Delta t)=| | {e}^{-iH\Delta t}-V({\boldsymbol{\alpha }},\Delta t)| {| }_{p},$$
$${\epsilon }_{p}^{{\rm{TS}}}(\Delta t)=| | {e}^{-iH\Delta t}-U(\Delta t)| {| }_{p},$$
$${\epsilon }_{p}^{{\rm{MS}}}(\Delta t)=| | U(\Delta t)-V({\boldsymbol{\alpha }},\Delta t)| {| }_{p},$$

where \(| | M| {| }_{p}={({\sum }_{j}{m}_{j}^{p})}^{1/p}\) is the Schatten p-norm, with {mj} the singular values of M.

Ultimately, we are interested in fast-forwarding and hence we want to bound \({\epsilon }_{p}^{{\rm{FF}}}(T)\) with T = NΔt. For this purpose, we prove a lemma in Supplementary Note 2 stating that

$$| | {U}_{1}^{N}-{U}_{2}^{N}| {| }_{p}\le N| | {U}_{1}-{U}_{2}| {| }_{p},$$

for any two unitaries U1 and U2. Combining this lemma with the triangle inequality in Eq. (9) gives

$${\epsilon }_{p}^{{\rm{FF}}}(T)\le N({\epsilon }_{p}^{{\rm{TS}}}(\Delta t)+{\epsilon }_{p}^{{\rm{ML}}}(\Delta t)).$$

Equation (14) implies that the overall simulation error scales at worst linearly with the number of timesteps, N.

We remark that, for the special case of p = 2, Eq. (13) can be reformulated in terms of our cost function as:

$${C}_{{\rm{LHST}}}^{{\rm{VFF}}}(T)\mathop{<}\limits_{\approx} n\ {N}^{2}\ {C}_{{\rm{LHST}}}^{{\rm{VFF}}}(\Delta t),$$

with \({C}_{{\rm{LHST}}}^{{\rm{VFF}}}\) given by Eq. (8). The approximation in Eq. (15) holds when the cost function \({C}_{{\rm{LHST}}}^{{\rm{VFF}}}(\Delta t)\) is small, which is the case after a successful optimization procedure. See Supplementary Note 2 for the non-approximate version of Eq. (15). Thus, we find that the VFF cost function scales at worst quadratically in N under fast-forwarding.

Certifiable error and a termination condition

Equation (14) holds for all Schatten norms, but of particular interest for our purposes is the Hilbert–Schmidt norm, p = 2, from which we can derive certifiable error bounds on the average-case error. In addition, the operator norm, p = , quantifies the worst-case error and is often used in the QS literature41,42. For our numerical implementations (“Implementations” section), we will consider both worst-case and average-case error. On the other hand, for our analytical results presented here, we will focus on average-case error since it is naturally suited to providing a termination condition for the optimization in VFF.

As an operationally meaningful measure of average-case error, we consider the average gate fidelity between the target unitary eiHT and the approximate simulation V(α, T), arising from the VFF algorithm:

$$\overline{F}(T)={\int}_{\psi }| \left\langle \psi \right|V{({\boldsymbol{\alpha }},T)}^{\dagger }{e}^{-iHT}\left|\psi \right\rangle {| }^{2}d\psi ,$$

where the integral is over all states \(\left|\psi \right\rangle\) chosen according to the Haar measure.

In Supplementary Note 2, we show that one can lower bound \(\overline{F}(T)\) based on the value of the VFF cost function,

$$\overline{F}(T)\mathop{>}\limits_{\approx} 1-\frac{d}{d+1}{N}^{2}{\left({\epsilon }_{\infty }^{{\rm{TS}}}(\Delta t)+\sqrt{n{C}_{{\rm{LHST}}}^{{\rm{VFF}}}(\Delta t)}\right)}^{2}.$$

This inequality holds to a good approximation in the limit that \({C}_{{\rm{LHST}}}^{{\rm{VFF}}}(\Delta t)\) is small, as is the case after a successful optimization procedure. See Supplementary Note 2 for the exact lower bound on \(\overline{F}(T)\), from which Eq. (17) is derived.

In addition, Eq. (17) provides a termination condition for the variational portion of VFF. If one has a desired threshold for \(\overline{F}(T)\), then this threshold can be guaranteed provided that \({C}_{{\rm{LHST}}}^{{\rm{VFF}}}(\Delta t)\) is below a certain value. Once \({C}_{{\rm{LHST}}}^{{\rm{VFF}}}(\Delta t)\) dips below this value, then the variational portion of VFF can be terminated. Specifically, the termination condition is \({C}_{{\rm{LHST}}}^{{\rm{VFF}}}(\Delta t)\le {C}_{\text{Threshold}}\), where

$${C}_{\text{Threshold}}\approx \frac{1}{n}{\left(\frac{1}{N}\sqrt{\frac{d+1}{d}(1-\overline{F}(T))}-{\epsilon }_{\infty }^{{\rm{TS}}}(\Delta t)\right)}^{2},$$

with the approximation holding when \({C}_{{\rm{LHST}}}^{{\rm{VFF}}}(\Delta t)\) is small. Again, for the exact expression for CThreshold, see Supplementary Note 2.


Here, we present results simulated classically and on quantum hardware. We refer the reader to the “Methods” section for details about our ansatz and optimization methods.

For our results below, it will be convenient, for a given simulation error tolerance δ, to define fast-forwarding as the case whenever \({R}_{\delta }^{{\rm{FF}}}>1\), where

$${R}_{\delta }^{{\rm{FF}}}={T}_{\delta }^{{\rm{FF}}}/{T}_{\delta }^{{\rm{Trot}}}$$

is the ratio of the simulation time achievable with VFF, \({T}_{\delta }^{{\rm{FF}}}\), to the simulation time achievable with standard Trotterization, \({T}_{\delta }^{{\rm{Trot}}}\). We note that \({T}_{\delta }^{{\rm{Trot}}}\) is a good empirical measure of the coherence time, since it can account for both decoherence and gate infidelity, and hence the condition \({R}_{\delta }^{{\rm{FF}}}>1\) intuitively captures the idea of simulation beyond the coherence time.

Comparing VFF to Trotterization and compiled Trotterizations

Here, we compare the performance of VFF to that of two other simulation strategies. One of these strategies is the standard Trotterization approach depicted in Fig. 1a. Another strategy is to first optimally compile the Trotterization step to a short-depth gate sequence, and use this optimal circuit (in place of the Trotterization step) for the approach in Fig. 1a. We refer to this as the QAQC strategy, since one can use the QAQC algorithm35 to obtain the optimal compilation of the Trotterization step. With the QAQC strategy, when finding the optimal short-depth compilation, we make no assumptions about the structure of the compiled circuit, which is in contrast with Trotterization, where the structure of the circuit is dictated by interactions in the Hamiltonian.

For concreteness, we consider the task of simulating the XY model, defined by the Hamiltonian

$${H}_{{\rm{XY}}}=-\mathop{\sum }\limits_{i}{X}^{i}{X}^{i+1}+{Y}^{i}{Y}^{i+1},$$

where Xi and Yi are Pauli operators on qubit i. We consider a five-qubit system with open boundary conditions. From the analytical diagonalization of the XY model43, it follows that the ansatz for the diagonal matrix D can be truncated at the first nontrivial term, as described in the “Methods” section (see Eq. (26)).

Figure 3 summarizes our results. Figure 3a shows how the \({C}_{{\rm{LHST}}}^{{\rm{VFF}}}\) cost function is iteratively minimized during the optimization procedure. We selected four approximate diagonalizations corresponding to different cost values denoted by VFFn, n = 1, …, 4, depicted by colored circles in Fig. 3a. Note that the colors match those used in other panels. Figure 3b compares these diagonalizations with different QS methods: Trotterization (dashed line), QAQC-compiled (dashed-dotted line), and VFF at different stages of optimization. We compare simulated time evolution governed by the XY model and observe how the fidelity decays with evolution time. The fidelity is given by \({\rm{Tr}}(\left|\psi (T)\right\rangle \left\langle \psi (T)\right|\rho (T))\), where \(\left|\psi (T)\right\rangle\) is the exact evolved state and ρ(T) is the state obtained with a noisy simulator. The initial state \(\left|\psi (0)\right\rangle\) was chosen randomly such that it has nonvanishing overlap with every eigenstate of the Hamiltonian. The circuits were simulated using a noisy trapped-ion simulator with error model from ref. 44. We took error rates, as specified in their Fig. 14 in ref. 44 and reduced them by a factor of five for clearer demonstration of VFF’s capabilities.

Fig. 3: Comparing different quantum simulation strategies.
figure 3

a The VFF cost function as it is iteratively minimized. Various quality diagonalizations are indicated in the colored circles. b Simulation fidelity as a function of time simulated. c Summed eigenenergy error and fast-forwarding as a function of the VFF cost. d Eigenenergy errors for the set of diagonalizations.

Results shown in Fig. 3b indicate that QAQC performed better than Trotterization, which is expected as QAQC optimizes in a circuit space larger than that given by Trotterization. Both approaches give better results than VFF1 (red curve). This confirms the intuition that at early stages of the VFF optimization (large values of \({C}_{{\rm{LHST}}}^{{\rm{VFF}}}\)), the error of the diagonalization is too big to outperform other methods. As the cost function is decreased, the length of time one can simulate using VFF increases. Indeed, as one can see from the green, blue, and purple curves (which are associated with cost values 10−2), VFF dramatically outperforms both Trotterization and QAQC.

One can see another important feature in Fig. 3b. For short simulations, Trotterization and QAQC are always more accurate than VFF, no matter how accurate the diagonalization is. That is because for small T, there are just a few timesteps taken by Trotterization and QAQC implementations, and the resulting circuits are shorter than VFF circuits implementing W, D, and W. This disadvantage rapidly diminishes since VFF circuits do not grow with T and the only error that impacts the fidelity comes from imperfect diagonalization. On the other hand, Trotter and QAQC circuits grow linearly with T and as a result, fidelity is dominated by noise (and not imperfections in the decomposition).

Figure 3c shows how the fast-forwarding factor \({R}_{\delta }^{{\rm{FF}}}\) and the error in the eigenvalue approximation (pertinent if VFF is used for eigenvalue estimation as discussed in the “Estimating energy eigenvalue“ section) depend on the cost function \({C}_{{\rm{LHST}}}^{{\rm{VFF}}}\). The data suggest power-law dependence in both cases. Bringing the cost function down to 10−3 allows us to reduce the eigenvalue error <0.1 and reach a fast-forwarding factor of ~30. Note that for VFF to be more efficient than Trotterization (\({R}_{\delta }^{{\rm{FF}}}\,>\,1\)), one has to lower the cost function below ~0.04. For this case, δ is defined as \(1-{\rm{Tr}}(\left|\psi (T)\right\rangle \left\langle \psi (T)\right|\rho (T))\) and we considered δ = 0.2. Figure 3d presents a more detailed analysis of the eigenvalue error, showing how the error of individual eigenvalues is reduced as the cost function is minimized.

Using VFF to fast-forward models across a range of parameters

Here, we show how to use VFF to efficiently find diagonalizations for new models that are nearby in parameter space, from previously diagonalized models.

Hubbard model

We applied VFF to Trotterized QS unitaries, \(U(\Delta t)\approx {e}^{-i{H}_{{\rm{Hub}}}\Delta t}\), of the Fermi–Hubbard model

$${H}_{{\rm{Hub}}}=-\tau \mathop{\sum }\limits_{i,j,\sigma }\left({c}_{i,\sigma }^{\dagger }{c}_{j,\sigma }+{c}_{j,\sigma }^{\dagger }{c}_{i,\sigma }\right)+u\mathop{\sum }\limits_{i = 1}^{N}{n}_{i,\uparrow }{n}_{i,\downarrow }.$$

Here, \({c}_{i,\sigma }^{\dagger }\) and ci,σ are electron creation and annihilation operators (resp.) for spin σ {, } at site i and \({n}_{i,\sigma }={c}_{i,\sigma }^{\dagger }{c}_{i,\sigma }\) is the electron number operator. The parameters τ and u are the hopping strength and on-site interaction (resp.). We studied a two-site, two-qubit Fermi–Hubbard model45, which, after translation via the Jordan-Wigner transform, takes the form

$${H}_{{\rm{Hub}},2}=-\tau (X\otimes I+I\otimes X)+uZ\otimes Z.$$

We took τ = 1 for our initial diagonalization, then perturbatively increased u from 0 to 0.1 in increments of 0.01. For Ut), we used a first-order Trotterization of \(\exp (-i{H}_{{\rm{Hub}},2}\Delta t)\). We set a threshold for optimization of 10−6. We used a three-layer ansatz for W and a two-layer ansatz for D, which we describe in the “Ansatz” section.

In representative results shown in Fig. 4a, b, we see that, after an initial optimization taking a number of iteration steps, VFF reached the optimization threshold. Then, as we perturbed away from u = 0, VFF rapidly found new parameters that diagonalized \(\exp (-i{H}_{{\rm{Hub}},2}\Delta t)\) to below the cost threshold. For all approximate diagonalizations, for an error tolerance of δ = 10−2, the simulation error remained below this tolerance for T = 30Δt. The diagonalization used nine single-qubit gates and seven two-qubit gates. The Trotterization used two single-qubit gates and one two-qubit gate. Thus, the fast-forwarded simulations used nine single-qubit layers and seven two-qubit gates, but the equivalent Trotterized simulations used 60 single-qubit gates and 30 two-qubit gates. Thus, VFF gave significant depth compression versus the Trotterized simulations, particularly with respect to entangling gates.

Fig. 4: Finding successive diagonalizations across a range of parameters.
figure 4

a, b VFF of a two-site, two-qubit Hubbard quantum simulation unitary (four-qubit circuit). a Optimization error. Here, cost estimates were made with nsamp = 106 and Δt = 0.1. We plot \({C}_{{\rm{LHST}}}^{{\rm{VFF}}}(\Delta t)\) versus optimization step for a sequence of parameters (see text). In this plot, red crosses depict the initial costs for each parameter before optimization. Each optimization was terminated after reaching CThreshold = 10−6. After taking some time to diagonalize the initial unitary with u = 0, subsequent optimizations took just a few iterations. b Simulation error. Here, we plot \({C}_{{\rm{LHST}}}^{{\rm{VFF}}}(T)\) versus N for all u. For this level of optimization, and taking \({T}_{\delta }^{{\rm{Trot}}}\) to be one Trotter step, fast-forwardings of ~30 timesteps were achieved. c, d VFF of a three-qubit Heisenberg quantum simulation unitary (six-qubit circuit). c Optimization error. Estimates were made with nsamp = 106 and Δt = 0.1. We plot \({C}_{{\rm{LHST}}}^{{\rm{VFF}}}(\Delta t)\) versus optimization step for a sequence of parameters (see text). In this plot, red crosses depict the initial costs for each parameter before optimization. Each optimization was terminated after reaching CThreshold = 10−6. d \({C}_{{\rm{LHST}}}^{{\rm{VFF}}}(T)\) versus N plotted for all values of Jz, Jx, and Jy. Here, fast-forwardings of ~70–100 timesteps were achieved.

Heisenberg model

Next, we applied VFF to the Heisenberg model,

$${H}_{{\rm{Heis}}}=\mathop{\sum }\limits_{i}{J}_{z}{Z}^{i}{Z}^{i+1}+{J}_{x}{X}^{i}{X}^{i+1}+{J}_{y}{Y}^{i}{Y}^{i+1}+h{Z}^{i},$$

where Xj, Yj, and Zj are Pauli spin matrices acting on qubit j, and h, Jx, Jy, and Jz are parameters.

Here, we took h = 1.0 and investigated the model acting on three qubits (whose Hamiltonian we denote HHeis,3). We used a first-order Trotterization of \(\exp (-i{H}_{{\rm{Heis,3}}}\Delta t)\). We set an optimization threshold of 10−6 and used a ten-layer ansatz for W and a two-layer ansatz for D. From Jz = 1.0 (a noninteracting Hamiltonian), we increased Jz to 5.0 in increments of 1.0. For these parameter values, HHeis is an antiferromagnetic classical Ising model.

Next, we kept h = 1.0 and Jz = 5.0 fixed, and increased Jx = Jy from 0.0 to 8.0 in increments of 2.0. When Jx = Jy, these are often called XXZ Heisenberg models.

Finally, we kept h = 1.0, Jz = 5.0, Jx = 8.0 and varied Jy from 0.0 to 10.0 in increments of 1.0 (XYZ Heisenberg models).

As may be seen in the representative results plotted in Fig. 4c, d, VFF rapidly found new diagonalizations \(WD{W}^{\dagger }\approx \exp (-i{H}_{{\rm{Heis,3}}}\Delta t)\) for all models considered. We performed additional searches for diagonalizations of ferromagnetic models (Jz, Jx, and Jy < 0) with similar results. For all approximate diagonalizations, the simulation error remained below an error tolerance of δ = 10−2, up to T ≈ 100Δt. For this simulation time, each diagonalization used 40 two-qubit gates and 71 single-qubit gates (111 total), whereas each Trotterization used 1200 two-qubit gates and 2500 single-qubit gates (3700 total).

VFF implemented on quantum hardware

We implemented VFF on 1 + 1 qubits (i.e., diagonalizing a random single-qubit unitary) on the Rigetti Aspen-4 QC (Fig. 5). Here, we considered the first-order Trotterization of the Hamiltonian H = αxσx + αyσy + αzσz, where α was a randomly chosen unit vector, at the time Δt = 0.5. We used W = Rz(θz)Rx(θx) and D = Rz(γz). The VFF cost function, as evaluated on the QC with nsamp = 104, was optimized to \({C}_{{\rm{LHST}}}^{{\rm{VFF}}}(\Delta t)\approx 1{0}^{-1}\).

Fig. 5: VFF on quantum hardware.
figure 5

a Training results for single-qubit VFF implemented on the Rigetti Aspen-4 quantum computer. Here, the quantum circuit acted on two qubits, one with a random single-qubit unitary, U, and the second with the diagonal ansatz, V = WDW. Optimization was performed using gradient descent of the VFF cost function. Results from four optimizations are shown. The plot shows the cost function evaluated on the QC (solid line) and the true (noiseless) cost function evaluated classically (dashed line), using the parameters found on the Rigetti QC via VFF. The table in a provides the optimal noisy cost values from the Rigetti QC and the equivalent true cost value for the given set of optimized parameters. b Process tomography for single-qubit VFF implemented on the Rigetti Aspen-4 quantum computer. Real (left) and imaginary (right) parts of the exact, classically computed process matrix of a first-order Trotterized quantum simulation (Exact Trotter) compared with a quantum simulation using an optimal diagonalization from the VFF shown in a (VFF on QC) and the first-order Trotterization (Trotter on QC), both computed on the Rigetti QC. The number of timesteps for the simulation are shown to the left. Note that for the VFF simulation, we used the optimization angles corresponding to the best cost from the noisy cost function, i.e., what was actually measured on the QC. To quantify the accuracy of the fast-forwarded simulation, we include a table in b containing the entanglement fidelity56 between the exact unitary and either the noisy process implemented by VFF or Trotterization, respectively, on the Rigetti QC.

With this system, we investigated how well VFF performed by classically computing the true, noiseless, cost for the parameters found on the Rigetti QC. This true cost converged to two orders of magnitude below the QC-evaluated cost, demonstrating significant robustness of VFF to the noise on the Rigetti QC.

We next simulated single-qubit evolution on the QC by (1) iterating the original Trotterization, Ut)N, and (2) using the VFF diagonalization (5). We then used process tomography to compare the resultant noisy process resulting from the Trotterization, and the process resulting from VFF to the exact process Ut)N calculated classically.

In this single-qubit case, the Trotterized simulation unitary could have been compiled to a circuit with many fewer gates; however, this would not be true for higher-dimensional unitaries and for this reason we did not compile the iterated gate sequence here.

In Fig. 5b, we show that VFF performed much better than the iterated Trotterization, giving a high-fidelity simulation. In these results, the entanglement fidelity between the process implemented using VFF and the exact process remained high until at least NVFF = 150 and never reached a value <0.7. On the other hand, the fidelity of the iterated Trotterization approach was already 0.586 by N = 25.

These results demonstrate that VFF on current QCs can allow for simulations beyond the coherence time. For example, taking an entanglement infidelity of 0.3 as our error tolerance δ, it follows from the table in Fig. 5b that we obtained a fast-forwarding beyond the coherence time of at least \({R}_{\delta }^{{\rm{FF}}}=6\).

Estimating energy eigenvalues

We primarily foresee VFF being used to study the long-time evolution of the observables of a system. But one may also use VFF to reduce the gate complexity of eigenvalue estimation algorithms, such as quantum phase estimation46 or time series analyses47,48. Such algorithms require simulating a Hamiltonian up to time \(T={\mathcal{O}}\left(\frac{1}{\sigma }\right)\) to obtain eigenvalue estimates of accuracy σ. Due to the constant depth circuits produced by VFF, we can therefore reduce the number of gates required for these algorithms by a factor of \({\mathcal{O}}\left(\frac{1}{\sigma }\right)\), increasing the viability of eigenvalue estimation on noisy QCs.

The eigenvalues of the Hamiltonian simulated via VFF are not directly accessible from the diagonal unitary D, since they are encoded in a set of Pauli operators. However, these can be extracted using the time series analysis in ref. 48. This method does not require large ancillary systems nor large numbers of controlled-unitary operations, and thus is a promising avenue for eigenvalue estimation in the NISQ era.

To demonstrate the practical utility of VFF for eigenvalue estimation, we numerically computed the spectrum of a two-site Hubbard model in Fig. 6a–c and in Fig. 6d, we show eigenenergy estimates for a five-qubit XY model. Specifically, we used a one-clean-qubit (DQC-1) quantum circuit to discretely sample the function

$$g(t)={\rm{Tr}}\left(\left|\psi \right\rangle \left\langle \psi \right|\ {e}^{-iDt}\right)=\mathop{\sum }\limits_{j}{e}^{-i{\lambda }_{j}t},$$

where \(\left|\psi \right\rangle :=\frac{1}{\sqrt{2}}{(\left|0\right\rangle +\left|1\right\rangle)}^{\otimes n}\), and then used classical time series analysis to estimate the eigenvalues. This is achieved by computing each spectral estimate S(λ) with respect to a discrete number of values for time variable, \({t}_{j}=\{0,\ldots ,{t}_{\max }\}\) in increments of 0.2Δt. A higher number of discrete points results in a better resolution of S(λ). The signal processing uncertainty principle constrains the spectral widths (variance in the estimate of λ) to obey \({\sigma }_{{\lambda }_{j}}{t}_{\max }\ge c\), where c is a constant of order 1. In Fig. 6a–c, we show three examples with successively better optimization and, hence, longer integration times, \({t}_{\max }\). We plot eigenenergy estimates of diagonalized two-site Hubbard models with parameters τ = 1 and u {0.0, 0.2, …, 1.0} ranging from weakly to strongly coupled models.

Fig. 6: Estimating energy eigenvalues using VFF.
figure 6

ac S(λ) estimates from a VFF diagonalization of a two-site Hubbard model with u/τ = {0.0, …, 1.0}. Only positive eigenenergies are shown. For each of a, b, and c, we chose \({t}_{\max }\) to be \({T}_{1{0}^{-2}}^{{\rm{FF}}}\), the simulation time achievable with a cost <10−2, with \({t}_{\max }={T}_{1{0}^{-2}}^{{\rm{FF}}}=500\Delta t,1000\Delta t,5000\Delta t\) in a, b, and c, respectively. The resolution of S(λ) scales inversely with \({T}_{1{0}^{-2}}^{{\rm{FF}}}\) causing the width of the spectral peaks to get successively narrower as \({t}_{\max }={T}_{1{0}^{-2}}^{{\rm{FF}}}\) is increased. d S(λ) estimate from a VFF diagonalization of a five-qubit Heisenberg XY model. The different spectral heights are due to degenerate eigenvalues (e.g., the multiplicity of the peak at an energy of 2 is twice that at 1.4).

The time series analysis extracts an estimate for the spectrum of energies corresponding to V, the approximate unitary given by VFF of the target unitary U, up to a global phase. The Hoffman–Wielandt theorem49 gives a bound on the total variation distance between spectra of U and V, in terms of the two-norm that in turn is directly related to the VFF cost function. For the concrete bounds we refer to Supplementary Note 3. This illustrates that the estimated spectral differences resulting from classical time series analysis give better approximations to the energy differences of the target Hamiltonian, when decreasing our VFF cost function. In Fig. 3c, d, we provide additional numerical analysis supporting this feature.


We presented a new variational method for QS called VFF. Our results showed that, once a diagonalization is in hand, one could form an approximate fast-forwarding of the simulation that allowed for QSs beyond the coherence time. We compared VFF simulation fidelities for a range of optimization errors with Trotterized and compiled-Trotterized simulations and showed that, as long as the VFF optimization error was sufficiently small, VFF could indeed fast-forward QSs. For the particular models, ansätze, and thresholds that we studied, we were able to fast-forward simulations by factors of ~30 (Hubbard) and ~80 (Heisenberg) simulation timesteps. In addition, a fast-forwarding of a factor of at least six, relative to a Trotterization approach, was found experimentally on Rigetti’s quantum hardware. We also explored the use of VFF for simplifying eigenenergy estimates and showed that the variance of eigenenergy estimates is reduced commensurately with the cost function. Essentially, the more accurate the diagonalization step of VFF is (i.e., the lower the cost function value), the longer is the achievable fast-forwarding simulation time and the better the eigenenergy estimate.

A crucial feature of VFF is the operational meaning of its cost function as a bound on average-case simulation error. Hence, any reduction in the cost results in a tighter bound on the simulation error. We used this feature to define a termination condition for the variational portion of VFF, such that once the cost is below a particular value, one can guarantee that the simulation error will be below a desired threshold. This is arguably the most important feature that distinguishes VFF from prior work on SVQS26, whose cost function does not have an obvious meaning in terms of simulation error. In addition, since VFF is not targeting a low-energy subspace, it is capable of simulating systems at moderate to high-temperature or more dramatic dynamics, such as quenches. The trade-off is that the diagonalization step of VFF can be more difficult than that of SVQS, since one is diagonalizing over the entire space rather than a subspace. This tradeoff will be important to study in future work.

In the NISQ era, the minimum value of the VFF cost function that can be achieved will be limited by quantum hardware noise. On the one hand, this will result in loose bounds on the simulation error obtained from Eq. (17). On the other hand, we have seen from our implementation of VFF on Rigetti’s quantum hardware that the true (noiseless) cost is often orders of magnitude lower than the noisy cost, implying that we learned the correct optimal parameters despite the noise. This noise resilience is analogous to analytical and numerical results recently reported in ref. 36. Hence, an important direction of future research would be to tighten our bound Eq. (17) for specific noise models, which would allow for tight simulation error bounds in the presence of noise.

Finally, a possible limitation of the scalability of VFF is the no fast-forwarding theorem, which is stated in a variety of forms27,28,33, but basically says that there exist some families of Hamiltonians for which asymptotically the number of gates needed for QS must grow roughly in proportion to the simulation time. Thus, VFF may not work for large-scale and/or long-time simulations of these Hamiltonians, perhaps because the circuit depth needed to achieve an accurate diagonalization will be long or perhaps because the cost landscape will be difficult to optimize. Nonetheless, there are many physically interesting Hamiltonians that are fast-forwardable or close to (i.e., perturbations of) models that are known to be fast-forwardable. Moreover, fast-forwardable Hamiltonians can generate highly nonclassical behavior32. Hence, future work needs to explore the class of Hamiltonians that are approximately fast-forwardable.



As with many variational quantum-classical algorithms, it is natural to employ a layered gate structure for W(θ) and D(γ, Δt), with the number of layers being a refinement parameter.

Ansatz for D

Let us first consider an ansatz for D. The problem of constructing quantum circuits for diagonal unitaries, D, is equivalent to finding a Walsh series approximation50

$$D={e}^{iG}=\mathop{\prod }\limits_{j = 0}^{{2}^{q}-1}{e}^{i{\gamma }_{j}{\otimes }_{k = 1}^{n}{({Z}_{k})}^{{j}_{k}}},$$

where q = n, G, and Zk are diagonal operators with the Pauli operator Zk acting on the kth qubit, and jk is the kth bit in a bitstring j. Efficient quantum circuits for minimum depth approximations of D may be obtained by resampling the function on the diagonal of G at sequencies lower than a fixed threshold, with q = k and k ≤ n. The resampled diagonal takes the same form as Eq. (25), but with q = k. The error after resampling is \({\epsilon }_{k}\le {\sup }_{x}| G^{\prime} (x)| /{2}^{k}\), where we have introduced a coordinate along the diagonal, x. While we do not know G, we can assume a particular ansatz for terms to include in the expansion.

In all of our implementations, we use a re-ordering of terms in Eq. (25). Namely, we take

$$D=\mathop{\prod }\limits_{m = 0}^{n}\mathop{\prod }\limits_{j\in {S}_{m}}{e}^{i{\gamma }_{j}{\otimes }_{k = 1}^{n}{({Z}_{k})}^{{j}_{k}}},$$

where Sm is a set of all indices j such that \(\mathop{\sum }\nolimits_{k = 1}^{n}{j}_{k}=m\). Note that the l-local terms, \({\otimes }_{k = 1}^{n}{({Z}_{k})}^{{j}_{k}}\), \(\mathop{\sum }\nolimits_{k = 1}^{n}{j}_{k}=l\), in Eq. (26) are organized in increasing order. We truncate the above product to a small number (up to l = 2) of initial l-local terms. The accuracy of the approximation is controlled by truncating the expansion in Eq. (26). The above expansion may be more suited than Eq. (25) for quantum many-body Hamiltonians. For instance, it is known that the quantum Ising model in a transverse field can be diagonalized exactly by keeping only one-local terms.

Ansatz for W

Let us now consider an ansatz for W(θ). With the Baker–Campbell–Hausdorff formula, we may generate any eigenvector unitary, W(θ), by appropriately interleaving non-commuting unitaries7,35. In general, this requires order d2 parameterized operations. Here, we briefly discuss two approaches to make its generation tractable.

The first approach is to use a fixed, layered ansatz for W(θ). By alternating sets of single- and two-qubit unitaries, we construct a polynomial number of non-commuting layers capable of generating a rich set of parameterized unitaries. Translational invariance of the system Hamiltonian may be incorporated into the ansatz for W(θ). In this case, all gates in a given layer may be chosen to be the same. As a result, the number of variational parameters is reduced by a factor of n.

Another approach is to employ a randomized ansatz, in which parameterized gates are randomly placed. This approach may be more suitable for irregular Hamiltonians H, where the optimal form of W(θ) is not easily deducible from H. The randomized approach may potentially find a shorter W(θ) that contains fewer gates, which is beneficial for near-term applications. Reference 19 discusses further details of both methods.

Growing the Ansatz and parameter initialization

We use the method of growing the ansatz in order to mitigate the problem of getting trapped in local minima during the optimization19,51. This technique can be used with both ansätze mentioned above. The optimization is initiated with a shallow circuit containing only a few variational parameters. After a local minimum is found, we add a resolution of the identity to the ansatz for W(θ). This takes a form of a layer of unitaries (for a layered ansatz) or a smaller block of parameterized gates (for a randomized ansatz) that evaluates to the identity. Adding such structures to W(θ) does not change the value of the cost function, but it increases the number of variational parameters. In the enlarged space, local minima encountered in previous steps may be turned into saddle points and the cost function may be further minimized toward the global minimum. The technique of systematically growing the ansatz to improve the quality of the result and mitigate the problem of local minima is described in detail in ref. 19.

In order to approach the issue of initializing the parameters θ and γ, we often use a perturbative method16,52, in which we pretrain these parameters for a slightly different Hamiltonian. Namely, we begin a VFF search for a unitary diagonalization with a known short-depth, readily diagonalizable, unitary. We then modify the Hamiltonian by adding successively perturbed terms in an attempt to guide the previously learned diagonalization from known initial parameters toward an unknown diagonalization of interest.

Ansatz for implementations

General ansatz considerations were discussed above, and now we discuss the specific ansätze used in our implementations. For our implementations, W consists of successive layers, each formed of three sublayers: (i) an initial sub-layer of single-qubit gates, (ii) a second sub-layer of entangling two-qubit gates acting on neighboring even–odd qubit pairs, and (iii) a third sub-layer of two-qubit gates acting on odd–even qubit pairs. The two-qubit gates are typically CNOTs, but equivalently we have used ZZ(θ) = CNOT(IRz(θ))CNOT or XX(θ) gates. The layers are appended successively always with a final layer of single-qubit gates.

In addition, our implementations use a set of layers consisting of various commuting operators for D. For the first layer, we use a set of single-qubit Z-rotations, Rz(γ), acting on all qubits. The second layer is a set of two-qubit ZZ(γ) gates acting on all pairs of qubits. The third layer would be a set of three-qubit gates ZZZ(γ) acting on all triplets of qubits. However, for the threshold used, we did not need a third layer for the results in the “Implementations” section.

Optimization via gradient descent

Gradient-based approaches can improve convergence of variational quantum-classical algorithms53, and the optimizer performance can be further enhanced by judiciously adapting the shot noise for each partial derivative54. Furthermore, the same quantum circuit used for cost estimation can be used for gradient estimation55. Therefore, we recommend a gradient-based approach for VFF, in what follows.

With the ansatz in Eq. (1), we denote the VFF cost function as \({C}_{{\rm{LHST}}}^{{\rm{VFF}}}:={C}_{{\rm{LHST}}}(U,WD{W}^{\dagger })\). The partial derivative of this cost function with respect to θk, a parameter of the eigenvector operator W(θ), is

$$\begin{array}{ll}\frac{\partial {C}_{{\rm{LHST}}}^{{\rm{VFF}}}}{\partial {\theta }_{k}}=\frac{1}{2}\left(\right.{C}_{{\rm{LHST}}}(U,{W}_{+}^{k}D{W}^{\dagger })\\ -\,{C}_{{\rm{LHST}}}(U,{W}_{-}^{k}D{W}^{\dagger })\\ +\,{C}_{{\rm{LHST}}}(U,WD{({W}_{+}^{k})}^{\dagger })\\ -\,{C}_{{\rm{LHST}}}(U,WD{({W}_{-}^{k})}^{\dagger })\left)\right..\end{array}$$

The operator \({W}_{+}^{k}\) (\({W}_{-}^{k}\)) is generated from the original eigenvector operator W(θ) by the addition of an extra \(\frac{\pi }{2}\) (\(-\frac{\pi }{2}\)) rotation about a given parameter’s rotation axis:

$${W}_{\pm }^{k}:=W\left({{\boldsymbol{\theta }}}_{\pm }^{k}\right)\,\,\,\text{with}\,\,\,{({\theta }_{\pm }^{k})}_{i}:={\theta }_{i}\pm \frac{\pi }{2}{\delta }_{i,k}.$$

Similarly, the partial derivative with respect to γ, a parameter of the diagonal operator D(γ), is

$$\begin{array}{ll}\frac{\partial {C}_{{\rm{LHST}}}^{{\rm{VFF}}}}{\partial {\gamma }_{\ell }}=\frac{1}{2}\left({C}_{{\rm{LHST}}}\left(U,W{D}_{+}^{\ell }{W}^{\dagger }\right)\right.\\-\,{C}_{{\rm{LHST}}}\left(U,W{D}_{-}^{\ell }{W}^{\dagger }\right)\left)\right.\end{array}$$


$${D}_{\pm }^{\ell }:=D\left({{\boldsymbol{\gamma }}}_{\pm }^{\ell }\right)\,\,\,\text{with}\,\,\,{({\gamma }_{\pm }^{\ell })}_{i}:={\gamma }_{i}\pm \frac{\pi }{2}{\delta }_{i,\ell }.$$

Equation (29) is derived in ref. 35 and we derive Eq. (27) in Supplementary Note 4.

Using Eqs. (27) and (29), we can evaluate the gradient of \({C}_{{\rm{LHST}}}^{{\rm{VFF}}}\) directly and use a simple gradient-descent iteration

$${\theta }_{k}^{(t+1)}={\theta }_{k}^{(t)}-\eta \frac{\partial {C}_{{\rm{LHST}}}^{{\rm{VFF}}}}{\partial {\theta }_{k}},$$
$${\gamma }_{\ell }^{(t+1)}={\gamma }_{\ell }^{(t)}-\eta \frac{\partial {C}_{{\rm{LHST}}}^{{\rm{VFF}}}}{\partial {\gamma }_{\ell }},$$

to minimize \({C}_{{\rm{LHST}}}^{{\rm{VFF}}}\).