An adaptive variational algorithm for exact molecular simulations on a quantum computer

Quantum simulation of chemical systems is one of the most promising near-term applications of quantum computers. The variational quantum eigensolver, a leading algorithm for molecular simulations on quantum hardware, has a serious limitation in that it typically relies on a pre-selected wavefunction ansatz that results in approximate wavefunctions and energies. Here we present an arbitrarily accurate variational algorithm that, instead of fixing an ansatz upfront, grows it systematically one operator at a time in a way dictated by the molecule being simulated. This generates an ansatz with a small number of parameters, leading to shallow-depth circuits. We present numerical simulations, including for a prototypical strongly correlated molecule, which show that our algorithm performs much better than a unitary coupled cluster approach, in terms of both circuit depth and chemical accuracy. Our results highlight the potential of our adaptive algorithm for exact simulations with present-day and near-term quantum hardware.


I. SUPPLEMENTARY DISCUSSION
A. Derivation of ADAPT-VQE as a sparse-update FCI ansatz In addition to the intuitive definition of the algorithm presented above, one can also view it as a version of full CI (FCI) VQE (using the ansatz definition in Eq. (7) of the main text), but with a sparsity constraint on the gradientbased update step, such that only single parameter updates are allowed per iteration. We will adopt a slightly more generic notation at this point to simplify the discussion: The goal is to find parameters θ k m such that the energy functional is minimized to a value that is as close as possible to the FCI energy: 1Â1 e −θ 1 2Â2 · · ·Ĥ · · · e θ 1 2Â2 e θ 1 1Â1 ψ HF = 0. (2)

B. First iteration
If we initialize all of the parameters, θ k m , to zero, then the expectation value of the 0th iteration state, ψ (0) , would simply return the uncorrelated HF energy: A gradient descent step from the HF state could then provide an improved set of parameters, lowering the expected energy. The gradient at the 0th iteration is quite simple due to the fact that all the θ k m vanish: Using the gradient to update all the parameters in the wavefunction would immediately lead to an exponentially complicated wavefunction which could no longer be simulated either classically or quantum mechanically, because each of the terms that were initially set to zero become non-zero, meaning that each term now has to appear in the circuit. The reason is that all k replicas have identical gradients: This would mean that the ansatz would have an exponentially large number of terms after the first iteration, and also the circuit depth would be exponentially large. Alternatively, we choose to impose a sparsity constraint on the parameter update, such that only a single parameter may be updated at a time. This has the consequence that after updating the single chosen parameter, the ansatz has only increased in complexity by a single non-zero parameter, and the resulting circuit has only grown in depth to accommodate the single extra one-or two-body operator: To recover the greatest amount of electron correlation with the maximally compact wavefunction, we choose to update only the operator with the largest gradient. As a result of this update, the ansatz has grown from HF to a simple UCC-type ansatz with only a single excitation operator.
To determine the stepsize for this iteration (essentially a line-search), we perform a variational minimization via VQE: The result of this relatively simple VQE experiment then provides the first iteration energy and wavefunction.

C. Second iteration
In the second iteration, the computation of the gradient of all the one-and two-body operators is required once again, and can be performed in a NISQ-friendly parallel execution on uncoupled quantum computers: However, because θ 1 p has already converged in the previous VQE step, its gradient is zero. Just as in the previous iteration, a sparse update is performed to identify and insert only the operator with the largest magnitude gradient. This grows the ansatz by one parameter, θ 1 q . The second iteration energy is then obtained by a new VQE simulation for a UCC ansatz containing two excitation operators: We emphasize that it is not only the new parameter which is optimized, but all previously added parameters are updated as well. Future iterations follow similarly.

D. Analytic Gradients
This section presents a derivation of the analytic gradient method used to improve both the computational efficiency and the numerical precision of the classical simulation of the parameter re-optimization that occurs in step 6 of the ADAPT-VQE algorithm, as described in the main text. It should be noted, however, that the gradient algorithm we present here is in fact completely general and could be used for an arbitrary Trotterized ansatz, not just ADAPT.
Consider a general Trotterized ansatz, |φ , obtained by applying N excitation operators on a given a reference state |ψ 0 : where the T i = t i − t † i are a set of anti-Hermitian operators, the θ i are free parameters, and the product is defined such that the terms with smaller values of the index i appear on the right, i.e., 3 i=1ô i =ô 3ô2ô1 . Assuming |ψ 0 is normalized, the functional to be minimized is then The state |φ can be obtained without direct matrix exponentiation, [1,2] and in this work we use the Scipy expm multiply function to compute the action of a matrix exponential on a [3] vector, which is relatively efficient since the operator in each exponent in Supp. Eq. 10 is incredibly sparse.
In order to perform a gradient-based minimization, the derivative of the energy with respect to each parameter in the ansatz is needed. The analytical derivatives [4] then follow directly as where in the second line, we use the convention where products with indices starting at higher values correspond to a reverse-order product in which higher-index terms are on the right, i.e., 1 j=3ô j =ô 1ô2ô3 . Recognizing that the second line is just the adjoint of the first, Supp. Eq. 12 can be simplified to In order to obtain a more compact expression for the ith parameter derivative, we partially expand the wavefunction expression in Supp. Eq. 10, grouping the unitaries up to and after i to define the notation for a state partially rotated with unitaries from 1 to i as |ψ 1,i : Using this notation and defining |σ i = i+1 j=N e −θj Tj Ĥ |φ where |σ N =Ĥ |φ , Supp. Eq. 13 becomes While this analytical gradient formula implemented N times (one for each parameter) is already an improvement over its numerical counterpart in terms of precision, it can be defined recursively to provide a significant improvement for computational efficiency. The gradient algorithm proceeds by computing one parameter gradient using precomputed quantities from the previous parameter's gradient evaluation, significantly reducing CPU cost. We start the algorithm, the base case, at the left-most operator, i = N , where we need to calculate At the second step, we need to compute but this can be facilitated by using the fact that the new states |σ N −1 and |ψ 1,N −1 needed at this stage can be obtained simply by applying a single matrix exponential to the states from the first step: This continues to be the case at all later steps, where the states needed at each step can be similarly obtained from those at the previous step: The benefit of this is that instead of applying all the Trotter operators in the ansatz for each single parameter, each parameter gradient only requires two expm multiply calls and one sparse matrix vector multiplication. Stacking the |σ i and |ψ 1,i into a two column matrix and computing the exponential action on that offers a numerical speedup in practice. As a consequence, obtaining the gradient with respect to all parameters is only roughly twice the cost of a single energy evaluation. Assuming the computational cost of other operations to be negligible, this recursive algorithm reduces the cost of computing a gradient from N 2 to 3N . In the case of water in the STO-3G basis (140 operators) this represents a nearly 47-fold improvement. For a comparison, Table I lists the times required for a few VQE iterations of H 2 O with the STO-3G basis set using both numerical and analytical gradients.

E. Energy discontinuities
As an adaptive algorithm, ADAPT-VQE self-consistently determines a quasi-optimal ansatz to describe the ground state of a particular Hamiltonian. However, as the nuclei move a new Hamiltonian is obtained. This means that any two distinct points on a potential energy surface will generally be treated with different ansatze. Not only will the two points generally have a different number of operators, but the order and composition of the ansatz will also generally differ. This means that the common shortcut in classical electronic structure theory which allows one to cancel systematic errors is not expected to work, and therefore, one must converge the ansatz growth tightly enough for the desired application.
As discussed in the text, two notable examples of this discontinuous behaviour occur for H 6 . In order to better illustrate how these discontinuities are created, we plot the norm of the gradient vector as a function of the iteration (i.e., number of parameters). A separate convergence plot is shown for each of the molecules, LiH, BeH 2 , and H 6 in Fig. 1.
While LiH and BeH 2 both exhibit rapid convergence, the strongly correlated H 6 has a distinctly different behavior. For short bond lengths (lighter colored lines) similar convergence behavior as the previous molecules is observed. In contrast, for longer bond lengths, a local minimum in the gradient norm begins to appear, requiring an increase in the number of parameters for tight convergence. This does not necessarily mean that the ansatz grown by ADAPT-VQE is not compact. H 6 is intrinsically more complicated, especially starting from a HF reference, and so it is expected to require more operators (and parameters). However, the consequence of this local minimum is that sometimes the algorithm may signal convergence prematurely, thus ignoring the extra operators that might be needed to make the ansatz sufficiently accurate. In Fig. 1(c), the difference between R = 2.4 and 2.5Å involves a significant reduction in the number of parameters for ADAPT( 2 ), due to the fact that the 2.5 curve has a local minimum which crosses the convergence threshold before actual convergence is achieved. The 2.4Å local minimum is less deep, and doesn't cross the threshold. As such, the algorithm needs to climb out of the local minimum to find the "true" convergence point where the gradient finally drops below the threshold with a larger number of parameters. The consequence of this is the significant drop in the number of parameters (and accuracy) going from 2.4 to 2.5Å. Future work will focus on analyzing difficult cases such as these to determine better convergence criteria that are less susceptible to local minimum convergence. Each line corresponds to a distinct bond distance, ranging from light to dark as the distance increases. For both LiH (R eq = 2.39Å) and BeH 2 (R eq = 1.342 A), the bond distance ranges from 0.5 R eq -2.5 R eq . For H 6 , the bond distance ranges from 0.5Å-2.5Å.