Optimizing a polynomial function on a quantum processor

The gradient descent method is central to numerical optimization and is the key ingredient in many machine learning algorithms. It promises to find a local minimum of a function by iteratively moving along the direction of the steepest descent. Since for high-dimensional problems the required computational resources can be prohibitive, it is desirable to investigate quantum versions of the gradient descent, such as the recently proposed (Rebentrost et al.1). Here, we develop this protocol and implement it on a quantum processor with limited resources. A prototypical experiment is shown with a four-qubit nuclear magnetic resonance quantum processor, which demonstrates the iterative optimization process. Experimentally, the final point converged to the local minimum with a fidelity >94%, quantified via full-state tomography. Moreover, our method can be employed to a multidimensional scaling problem, showing the potential to outperform its classical counterparts. Considering the ongoing efforts in quantum information and data science, our work may provide a faster approach to solving high-dimensional optimization problems and a subroutine for future practical quantum computers.


INTRODUCTION
A basic situation in optimization is the minimization or maximization of a polynomial subject to some constraints. As polynomials are in general non-convex and optimization is NP-hard, these problems cannot be solved accurately with efficient resource consumption 2 . As a special case for approximation algorithms, homogeneous polynomial optimization has wide applications, for examples, signal processing, magnetic resonance imaging 3 , data training 4 , approximation theory 5 , and material science 6 . These scientific and technological problems are especially demanding in the present-day era of big data 7 . The gradient algorithm, serving as one of the most fundamental solutions to non-convex optimization problems, lies at the heart of many machine learning methods, such as regression, support vector machines, and deep neural networks [8][9][10][11][12] . However, when dealing with large data sets, the gradient algorithm consumes tremendous resources and often pushes the current computational resources to their limits.
Quantum computing promises ultrafast computational capabilities by information processing via the laws of quantum mechanics [13][14][15] . With the intrinsic advantages in executing certain matrix multiplication operations, quantum algorithms are proposed to enhance data analysis techniques under some circumstances. For example, phase estimation, quantum principal component analysis, and the solver for linear system of equations can provide quantum advantages if the state preparation and readout procedure can be efficiently realized 13,16 . As for the optimization with gradients, which is the central issue in this article, several works focusing on developing quantum versions 1,17-21 have been done.
Optimization, i.e., maximization or minimization, of a cost function can be attempted by the prototypical gradient algorithm iteratively. Let the cost function be a map f : R N ! R. Set an initial guess x ð0Þ 2 R N , then move it along the direction of the gradient x ðtþ1Þ ¼ x ðtÞ ± η ∇f ðx ðtÞ Þ; (1) where x ¼ ðx 1 ; ; x N Þ T 2 R N and η is the learning rate. As the first experimental endeavor in this field and for the wide academic and industrial applications, in this work, an order-2p homogeneous polynomial optimization with the spherical constraints P i jjx 2 i jj ¼ 1 is investigated, whose cost function is expressed as The coefficients a i1 i2p 2 R can be reshaped to a N p × N p matrix A. And f(x) can be rewritten as 1 2 x T ::: x T Ax ::: x. Simultaneously, A is the linear summation of tensor product of N × N unitary matrix A α i , which represents as P K α¼1 A α 1 ::: A α p . K is the number of decomposition terms required to specify A and p is half the order of the cost function. Therefore, in light of the previous work, the gradient at x can be mapped into a matrix summation 1 With the amplitude encoding method, which encodes x as (1) is interpreted as an 1 evolution of x j i with an operation of D, where D is a parameter-dependent gradient operator, which in general is non-unitary. Note that two methods to decompose A are shown in Supplementary Note I(D) (theory and experiment) and the subscripts (t) will be omitted in the remainder.
In this work, we develop the gradient algorithm and propose an experimental protocol to perform the gradient descent iterations, with a prototypical experiment to demonstrate the process to optimize polynomials on a quantum simulator. The gradient algorithm in ref. 1 involves phase estimation, which requires substantial circuit depth for currently available circuits, giving logarithmic error and polynomial gate scaling. Hence, the algorithm is difficult to implement with current techniques on a quantum platform. Instead of phase estimation, our method uses the linear combination of unitaries to realize the gradient descent iterations. It provides a gate-based circuit only comprising of standard quantum gates, which is experimental friendly and implementable in current quantum techniques. Our protocol needs two copies of a quantum state x j i to produce the next quantum state at each iterative step, instead of multiple copies which is linearly depending on the order of objective function in the previous algorithm 1 . The product of decomposition terms K and the order p is an important indicator to determine the efficiency of our protocol. The protocol can be especially beneficial in cases when there is an explicit decomposition of A with comparably small Kp and A α i is Pauli product matrix, calculating the gradient with the OðKp log ðNÞÞ depth circuits. Moreover, the experiment benefits from the protocol as only two copies are required for optimization of each iteration. Therefore, given the unrivaled degree of nuclear magnetic resonance (NMR) quantum control techniques 22,23 , this homogeneous polynomial optimization is conducted with a four-qubit system encoded in a molecule of crotonic acid and a quantum state in the vicinity of the local minimum is iteratively obtained with high fidelity. Finally, multidimensional scaling (MDS) problems are introduced as a potential application of this protocol.

Experimental protocol
For the convenience to implement the evolution of the gradient operation, D (shown in Eq. (4)) is rewritten as where Fig. 1, to implement the quantum gradient algorithm, two specified circuits are involved. Parameters such as c m (m = 0...Kp − 1) can be obtained by the parameter circuits in Fig.  1a. This circuit evolves the system The iteration circuit, which is shown in Fig. 1b, is to generate the iterative state with D. The ancillary system s is the first ancillary system which is for the linear combination of two terms in Eq. (4), and d is second one which is for the implementing D with linear combination of operators. In our protocol, the first thing is to involve the minus signs in c m in the unitary operator A m to make c m positive. After dealing with the minus signs in Eq. (4), the iterative equation is rewritten as where x 0 j i is the state for the next step. The entire iteration circuit to x 0 j i can be implemented via following four steps: Step 1: for the work register, the amplitude encoding state x j i should be efficiently prepared. In general, for the first iteration, some easy access states can be our initial states, such as tensor product states. For the following iterations, the output of the last iteration can generate what we want. Hence in this situation, time complexity can be ignored. In the case of preparing a particular log ðNÞ-qubit input x j i ¼ P k c k k j i, we employ the amplitude encoding method in ref. 24,25 . It shows that if c k and P k = ∑ k |c k | 2 can be efficiently calculated by a classical algorithm, constructing this particular state takes O½poly log ðNÞ ð steps. Alternatively, we can resort to quantum random (RAM) access memory [26][27][28] or Hamiltonian simulation method 29 . Quantum RAM (qRAM) is an efficient method to do state preparation, whose complexity is Oðlog ðNÞÞ after the quantum memory cell was established.
As for the entire system, with the ancillary register s, d being in a specific superposition state by V 0 and controlled-V, it is driven into Fig. 1 Circuits for implementing the quantum gradient algorithm.
x j i denotes the input state of the work system, and ancillary systems are T 1 + 1 qubits in the 0 j i 0 j i T1 state, where T 1 ¼ log 2 ðKpÞ. The squares represent unitary operations and the circles represent the state of the controlling qubit. a Parameter circuit. b α j can be obtained with 〈σ x 〉 on register s, when register d is on Iteration circuit includes three steps: initialization, D application, and combination.
Remarkably, c i should be rescaled with c i = ffiffiffiffiffiffiffiffiffiffiffi β À 1 p to the unitary condition. While all other elements {V 0,1 , V 0,2 , ⋯, V Kp−1,Kp−1 } are arbitrary as long as V is unitary.
Step 2: to apply the gradient operation, D, on the system, the methods of linear combination of the unitary operations are employed [30][31][32][33][34][35][36] . A 0 , A 1 ...A Kp , tensor decompositions of A, are applied to the work system conditionally on the register d which is on 0 j i, 1 j i... Kp À 1 j i , correspondingly. In this way, the work system would feel an effective operation as P Kp i¼0 A i when registers s, d are delicately decoupled. However, A 0 would be applied to the work system in both 0 j i s 0 j i d and 1 j i s 0 j i d subspaces. Thus, an additional A y 0 is required for the compensation and the final state is Step 3: combination is implemented to combine the information in different subspaces of the ancillary system and generate the formalized D on the work space. Controlled-W and W 0 , which are the inverse operations of V and V 0 , are applied in this step, which produces The last two terms are orthogonal to the first term, and can be regarded as rubbish terms. When the ancillary system is in state (7)) is obtained from the work system.
Finally, the result state x 0 j i could either be the answer state, which satisfies the previous setup convergence condition or be the next input for both parameter and iteration circuits. More details on the protocol can be found in Supplementary Note I(A) and (B) (theory and experiment).
Algorithm complexity is concerned with the computing resources required to process a quantum information task. Particularly, the gate complexity quantifies the amount of the basic quantum operations taken to run as a function of size of input. Similarly, the memory complexity quantifies the amount of space or memory taken.
Kp is an important indicator to characterize the complexity of the protocol, where K is the number of decomposition terms required to specify A and p is half the order of the cost function. It determines both the size of the ancillary system and the depth of both parameter and iteration circuits.
For the size of the circuits which does not include state preparation, OðT 1 þ log ðNÞÞ qubits are required for both parameter and iteration circuits, where T 1 is the integer that satisfies 2 T 1 ¼ Kp. And two copies of the iterative state are needed for each iteration. If the number of iterations is r, the total memory consumption is Oð2 r log ðNÞ þ 2T 1 Þ.
For the depth of the circuits which already has the encoded states, OðKpÞ conditional-A m s are required. In addition, the gates complexity is provided under the assumption of Pauli product form of A m . As a log ðKpÞ-qubit-controlled gate can be implemented with Oðlog ðKpÞ 2 log ðNÞÞ basic quantum gates, which is included in Supplementary Note I(C) (theory and experiment) 37 , OðKp log ðKpÞ 2 log ðNÞ rÞ basic quantum gates are required for both circuits for r iterations.
As for the the state preparation step, the amplitude encoding method would consume Oðlog ðNÞÞ more qubit with Oðpoly log ðNÞÞ steps. If the qRAM is adopted in the state preparation, the spatial cost is not just Oðlog ðNÞÞ qubits, one also needs OðNÞ qutrits to establish quantum memory cell.
The protocol relies on the tensor decomposition of A, which is in general hard, especially as Kp grows. This protocol is theoretically efficient when there is an explicit decomposition of A with a limited Kp. However, there are some benefits when adopting this experimental protocol. The experiments are comparably easier since only two copies are required for each iteration optimization.
Success probability, for the parameter circuit, the probability of obtaining the required b α j is related to the size of the second ancillary resister, T 1 , which is proportional to 1/Kp. For the iteration circuit, the ancillary register finally stay on 0 j i s 0 j i T1 d and the output is determined to be the iterative state with the probability P s ¼k x ðtÞ ± D x ðtÞ k 2 =ð P KpÀ1 m¼0 c m þ 1Þ 2 .

Apparatus
All experiments were carried out on a Bruker DRX 400MHz NMR spectrometer at room temperature. As it is shown in Fig. 2a, a four-qubit system is required, represented by the liquid 13 Clabeled crotonic acid sample dissolved in d-chloroform. Four carbon-13 nuclei spins ( 13 C) are denoted as four qubits, C 1 as the register s, C 2,3 as the register d, and C 4 being the work system. The free evolution of this four-qubit system is dominated by the internal Hamiltonian, where ν j and J jk are the resonance frequency of the jth spin, and the J-coupling strength between spins j and k, respectively. Values of all parameters can be found in the the experimental Hamiltonian of Supplementary Note II(A) (theory and experiment). In order to master the evolution of the system, the transverse radio frequency (r.f.) pulses are introduced as the control field, ðcosðω r:f: t þ ϕÞσ i x þ sinðω r:f: t þ ϕÞσ i y Þ: By tuning the parameters in r.f. field (Eq. (14)), such as intensity ω 1 , phase ϕ, and frequency ω r.f. and duration, the four-qubit universal quantum gates are theoretically achievable with the combination of internal system (Eq. (13)) 38,39 .

Fig. 2
Molecule and quantum circuit. a Molecule structure of four-qubit sample : crotonic acid. b Quantum circuit for an iteration to realize gradient descent algorithm. x j i denotes the initial state of work system, and ancillary system are T 1 + 1 qubits in the 0 j i 0 j i T1 state, where T 1 = 2.

Experimental implementation
A bivariate quartic polynomial (Eq. (15)), serving as the cost function, is shown to be minimized by our experimental protocol iteratively. The problem is depicted as with |x| = 1, where x ¼ ðx 1 ; x 2 Þ T is a 2-d real vector. Though the number of independent variable is 1 for the normalization constrain, as the growth of the size of problem, a surge of information processing would be included. A, the coefficient matrix, has another representation by tensor products A = −σ I ⊗ σ x + σ x ⊗ σ z , where σ i (i = I, x, y, z) denotes the Pauli matrices.
With the amplitude encoding method(x ! x j i), the experimental demonstration for updating x 0 j i consists of both acquiring parameters and proceeding iterations. The iteration circuit, from the 0 j i s 00 j i d x j i to the output 0 j i s 00 j i d x 0 j i, is implemented with three steps, to ψ 1 j i, ψ 2 j i, ψ 3 j i and measurement, sequentially. As for acquiring parameters, for the hermitian of A α j , c m can be obtained as a measurement of A α j on x j i x h j, instead of the parameter circuits. For accuracy and limitation of the molecule sample, this conversion is adopted and we concentrate on the iteration part, where T 1 = 2, Kp = 4.
In the experiment, two sets of experiment s 1 and s 2 are conducted, with different initial guess, x s1 0 (−0.38, 0.92) and x s2 0 (0.86, 0.50). The realization of the iteration circuit is depicted as follows: Initialization-at room temperature, the four-qubit quantum system is in the thermal equilibrium state. This thermal equilibrium system can be driven to a pseudo-pure state (PPS) with spatial average method 40 . Then, 0 j i s 00 j i d x j i was prepared from this PPS, with simply a single-qubit rotation on C 4 , where x is either initial guess x s1 0 , x s2 0 or the output of last iteration. In this step, preparation of the PPS and individual control operations are the mature technology in NMR quantum control and can be found in Supplementary Note II(B) (theory and experiment).
Iteration circuit-the circuit consists of three steps and thus we pack our control pulses into three groups. They are shown in Fig.  2b: (1) a combination of single-qubit rotation V 0 and control-V gate realizes the transformation to ψ 1 j i. (2) Conditional operations of decompositions of A implement the ψ 2 j i. (3) W 0 and control-W achieve the disentanglement to ψ 3 j i. Remarkably, parameters c m in local operations, such as V and W are obtained by measuring the iterative state x j i. Gradient ascent pulse engineering was employed to generate three packages of optimized pulses to implement the operations listed above, with the simulated fidelity all over 99.9% and the time-length being 20, 30, and 20 ms, respectively 41 . Hence, in experiment, we got ρ 1 , ρ 2 , and ρ 3 correspondingly.
Measurement-since only the state in subspace of 0 j i s 00 j i d is necessary for obtaining the output, x 0 j i, a full tomography in such subspace was employed. All readout pulses are 0.9 ms with 99.8% simulated fidelity. For the sake of experimental errors, mixed states were led in our results, however, the two-dimension vector x 0 j i should be a pure real state. Hence, a purification step was added to search a closest pure state after this measurement and it is realized with the method of maximum likelihood 42 . As the consequence of the output x 0 j i, ϕ s1 i , and ϕ s2 i (i = 1...4) were found to be the closest to our experimental density matrices for two different cases s 1 and s 2 .
According to preset threshold, the output x 0 j i can be labeled as the updated input x j i to run the next iterative circuit, or be the final result and the iteration thus terminates.
The results are shown in Fig. 3. For the cases s 1 and s 2 , we could see the trend of convergence at x opt (0.50, 0.86) after four times iterations with the initial points, x s1 0 and x s2 0 . ϕ s1 i and ϕ s2 i (i = 1...4) are outputs of iteration circuit at ith iteration, which are plotted in the sub-figures (a) and (b). For comparison, theoretical simulation is provided, whose inputs were chosen as the output of the last experimental iteration.
In addition, by substituting x 1 = cosðθÞ and x 2 ¼ sinðθÞ, the cost function is rewritten as f ðxÞ ¼ À2sin 3 θ cos θ. Thus the problem is reduced to a one-dimensional unconstrained optimization problem, where the extreme points lie at θ = 0, π/3. Among them, θ = 0 is unstable, while π/3 is the stable local minimum. To show this results explicitly, both iteration outputs and the value of the cost function are shown in Fig. 3c. In this situation, the initial guesses are cosðθÞ ¼ À0:38 (s 1 and colored red) and cosðθÞ ¼ À0:86 (s 2 and colored blue), respectively. As with the growth of the number of the iteration, the value of the cost function gets lower and lower, until slipping into the neighbor of the local minimum.
As another aspect to show this convergence, in Fig. 3d, relations between the number of iterations and overlaps were given. The value of vertical axis was defined as the overlaps between the optimal state and the output state after each iteration: jhϕ opt jϕ j i ij (i = 0, 1...4. j = s 1 , s 2 ). The horizontal axis is the number of iterations. It shows that the overlaps converges to 1 weather the initial guess is chosen as x s1 0 or x s2 0 . For more information of the different seeds and investigation of unstable point, numerical simulations were carried out and some results are shown in Supplementary Note III (theory and experiment).
Furthermore, to check the performance of the circuits experimentally implemented, a four-qubit tomography was implemented at two points, after PPS preparation and after the iteration circuit. Thus four-qubit states ρ pps and ρ 3 were obtained. For the PPS, we got a fidelity~99.01%, and for those four-qubit states ρ 3 , they have an average of 94% fidelity. Detailed information is shown in the experimental part, Supplementary Note II(C) and (D) (theory and experiment).

Application
For the further applications, MDS is a technique, providing a visual representation of the pattern of proximities in a dataset. It is a common method of statistical analysis in sociology, quantitative psychology, marketing, and so on. We apply our method to quantize an algorithm for fitting the simplest of MDS models in major applications in the method.
Given a matrix A = δ ij , which is nonnegative symmetric with zero diagonal. A set of number δ ij is the data collected in a classical MDS problem, and δ ij is the dissimilarity between objects i and j. Representing n objects as n points via ignoring the objects size, the dissimilarity of objects i and j is approximately equal to the distance between points i and j. The goal is to find n points in m dimensions, denoted by x 1 , x 2 , ⋅, x n to form a configuration with coordinates in an n × m matrix X.
When m = 3, it is reduced to a molecular conformation problem 43 , which plays an important role in chemical and biological fields. Let d ij (X) denotes the Euclidean distances between the points x i and x j . It follows that We minimize the loss function, defined as where W = w ij is a symmetric weight matrix that can be used to code various Supplementary Information. The purpose of this algorithm is to find the most suitable information visualization configuration. Now, we map it to a quantum version. First, the loss function is rewritten as 44 K. Li et al. where and Thus, we only need to minimize f 0 ðXÞ ¼ À2gðXÞ þ h 2 ðXÞ. g(X) and h 2 (X) can be further expressed as a trace of some matrixes muiltiproduction. We have gðXÞ ¼ TrðX T BðXÞXÞ with B(X) = 1/ 2∑ i ∑ j w ij A ij k ij (X), where Similarly, hðXÞ 2 ¼ TrðX T CðXÞXÞ with C(X) = 1/2∑ i ∑ j w ij A ij . Then, we have where D(X) = C(X) − 2B(X). It should be noticed that here X is a n × m matrix. In order to represent X as quantum states, we map it to a sum of m column vectors X v of X. Now, we can apply our quantum gradient algorithm to minimize the objective function In this special case, the function order p = 1 and D(x) is a symmetric matrix which is likely to be decomposed efficiently. It potentially yields an exponential speedup over the classical algorithm in MDS problems. This protocol also provides potential applications in quantum control technology. For example, the cost function could be reduced to a quadratic optimization problem in the form of f ðxÞ ¼ x h jA x j i. If the coefficient matrix A is restricted to a density matrix, the objective function represents the overlap between A and x j i x h j. Thus, we can product a state x j i closely enough to a density matrix A by finding the maximum of f(x). It can be used as a quantum method to prepare the specific state. (i = 1...4). Green triangles are theoretical simulation results, while red squares are experimental measured outputs. They both begin with a same initial point. In addition, the moving directions are also labeled by the dashed arrows colored green or red. c The 1-d depiction. Beginning with two initial points, for s 1 (colored red) and s 2 (colored blue), the iteration outputs become lower and lower, until slipping into the neighbor of the local minimum. The dashed arrows show the moving direction for each iteration and in the zoom-in figure, it shows they gradually converging to the optimal minimum point were x 1 (or cos θ) = 0.5. d The relations between the number of iterations and the overlaps between the iterative states and the target.

DISCUSSION
In this article, an experiment-friendly protocol is proposed to implement the gradient algorithm. The protocol provides a quantum circuit only comprised of standard quantum gates, hence it can in principle be realized in current technologies. The experimental implementation required only two copies of quantum states for the parameter and iteration circuits. Moreover, if there is an explicit decomposition of A in terms of Pauli product matrix, OðKpÞ log ðNÞ depth circuits (OðKp log ðKpÞ 2 log ðNÞÞ basic quantum gates) are enough to calculate the gradient within with OðT 1 þ log ðNÞÞ qubits.
With a four-qubit NMR quantum system, we demonstrated an optimization of a homogeneous polynomial optimization and iteratively obtained the vicinity of the local minimum. The result is iteratively implemented with the iteration circuit in Fig. 2, while the parameters c m are measured with iterative states instead of the parameter circuit. With the initial guess either x s1 0 or x s2 0 , the demonstration shows the feasibility in near-future quantum devices for this shallow circuit. For the advanced control techniques of spin systems 22,45 , they are applied as the first trail to demonstrate the effectiveness of the more and more protocols. In addition, MDS problems are introduced as a potential application of this experimental protocol.
Polynomials, subject to some constraints, are basic models in the area of optimization. Furthermore, the gradient algorithm is considered as one of the most fundamental solutions to those non-convex optimization problems. Our protocol, which gives another implementation of the gradient algorithm using quantum mechanics, is applicable to homogeneous polynomials optimization with spherical constrains. When there is a simple and explicit decomposition of coefficients matrix, the protocol could provide an speedup with poly-logarithmic operations of the size of problem to calculate the gradient, which has potential to be used in near-future quantum machine learning. Our approach could be exceptionally useful for high-dimensional optimization problems, and the gate-based circuit makes it readily transferable to other systems, such as superconducting circuits and trapped ion quantum system, being an subroutine for future practical largescale quantum computers.

DATA AVAILABILITY
All data for the figure and table are available on request. All other data about experiments are available upon reasonable request.