Enhancing quantum control by bootstrapping a quantum processor of 12 qubits

Accurate and efficient control of quantum systems is one of the central challenges for quantum information processing. Current state-of-the-art experiments rarely go beyond 10 qubits and in most cases demonstrate only limited control. Here we demonstrate control of a 12-qubit system, and show that the system can be employed as a quantum processor to optimize its own control sequence by using measurement-based feedback control (MQFC). The final product is a control sequence for a complex 12-qubit task: preparation of a 12-coherent state. The control sequence is about 10% more accurate than the one generated by the standard (classical) technique, showing that MQFC can correct for unknown imperfections. Apart from demonstrating a high level of control over a relatively large system, our results show that even at the 12-qubit level, a quantum processor can be a useful lab instrument. As an extension of our work, we propose a method for combining the MQFC technique with a twirling protocol, to optimize the control sequence that produces a desired Clifford gate. Realizing high accuracy control of quantum systems represents a crucial ingredient in building large-scaled quantum computers. An international team of researchers led by Raymond Laflamme at Canada’s Institute for Quantum Computing has succeeded in manipulating a 12-qubit nuclear magnetic resonance quantum processor with unprecedented precision. The researchers build a closed-loop pulse auto-tunning setup which employs the controlled system itself to optimize its own pulses. This gives to the benefits of more efficient pulse optimization and more robustness to system uncertainties. Because that the experiment achieves high level of individual controls over all of the qubits, it is at the cutting edge of experimental quantum computing. The experimental techniques are ready to be transferred to other quantum technologies, such as nitrogen-vacancy centers, trapped ions or superconducting circuits.


I. INTRODUCTION
Quantum computers promise to outperform their classical counterparts in many applications [1][2][3][4][5][6].A primary obstacle in building large-scale quantum computers is the inadequacy of classical computers for the task of optimizing the experimental control field [7].Standard classical optimization algorithms are impractical in the long run since they have a running time that grows exponentially with the number of quantum bits (qubits) [8].In theory, a complex quantum circuit can be decomposed into elementary gates that work on a restricted number of qubits (usually one or two) and should be readily implemented in experiment [9].In reality however, the control fields are never localized and the qubits interact and evolve even in the absence of the control fields.Consequently, the implementation of each elementary gate may require a control sequence that takes into account a subsystem involving many more than one or two qubits.Moreover, the number of elementary gates required for a quantum algorithm grows polynomially with the system size and the errors accumulate with each successive gate.Therefore, an effective and efficient way to optimize the control field and minimize errors is a key ingredient for scaling up quantum information processing devices [10].
Here we consider the task of optimizing a control field that will drive the quantum system from a fixed input state ρ i to a desired target state ρ f .This problem is important in quantum information processing, as numerous tasks, such as al-gorithmic cooling in ensemble quantum computing [11,12], magic state preparation in fault-tolerant quantum computing [13] and encoding in quantum key distribution [14], all rely on steering states regardless of the propagator.The gradient ascent pulse engineering (GRAPE) algorithm [15] is the current state-of-the-art algorithm to (classically) optimize the control field in quantum state engineering problems.It is widely used in NMR [1], electron spin resonance [17], nitrogen-vacancy centers in diamond [18,19], superconducting circuits [20,21], and ion traps [22,23].The GRAPE method exploits the gradient of a fidelity function to update the control field iteratively.GRAPE has two major drawbacks that are indeed common to all classical optimization algorithms: its running time is exponential in the size of the n-qubit system, and its accuracy depends on the precision of experimentally obtained parameters describing the quantum system (e.g., the system Hamiltonian).Basically, it is a gradient-based iterative algorithm.At each iteration k, the algorithm computes the evolution of the system under the previous pulse, and produces a final state ρ and a fitness function f = tr(ρρ f ).It then computes the current gradient g for the use of updating the pulse.Classically, the computation involves the matrix exponential and multiplication in the 2 n -dimensional Hilbert space and hence takes an exponential (in the number of qubits n) amount of time.For instance, a cluster of 128 AMD Opteron 850 CPU (2.4 GHz) can only handle a problem size of about ten qubits using GRAPE [8].
Recently, Li et al. [24] and later Rebentrost et al. [25] showed that a quantum processor can be used to calculate f and g efficiently.A technique called measurement-based quantum feedback control (MQFC) enables direct measurement of f and g (see Fig. 1), allowing the quantum processor arXiv:1701.01198v2[quant-ph] 26 Oct 2017 to optimize its own pulses.MQFC addresses both the issues of scalability and control inaccuracies due to imperfect system characterization [26,27].Moreover, this technique is transferrable to any implementation in which control fields steer the system evolution and measurement in a standard basis is possible.In this work, we implement MQFC on a 12-qubit NMR quantum processor, and in particular demonstrate for the first time that MQFC enhances the control precision by about 10% due to its self-feedback property.Furthermore, by creating the 12-coherent state we demonstrate the capability of our quantum processor to function as a universal 12-qubit quantum processor with high-fidelity individual controls.This is also one of the largest quantum processors with individualcontrol to date.

II. RESULTS
In this paper, we refer to unnormalized deviation density matrices (without the identity term) as 'states', which is a standard convention in ensemble quantum computing.To distinguish from the Hamiltonian, we use capital X, Y, and Z to denote states and σ x , σ y and σ z to denote Hamiltonians, while they both refer to the same set of Pauli matrices.
Quantum processor.-In our NMR quantum processor, the liquid-state sample is per- 13 C labeled (1S,4S,5S)-7,7dichloro-6-oxo-2-thiabicyclo[3.2.0]heptane-4-carboxylic acid dissolved in d6-acetone, which forms a 12-qubit register.The 12 qubits are denoted by nuclear spins C 1 to C 7 ( 13 C-labeled) as qubits 1 to 7, and H 1 to H 5 as qubits 8 to 12 in the molecule shown by Fig. 2a.When placed in a static z-magnetic field, it has a system Hamiltonian where ν i 0 is the Larmor frequency of the ith qubit, J ij is the coupling between qubits i and j, and σ i z is the Pauli-z operator of the ith qubit.The values of these parameters can be found in Appendix C [28].
The control Hamiltonian is due to the transverse control field applied in the x-y plane, which is often digitized into M slices with slice length ∆t.In each slice, there are four constant control parameters, leading to a control Hamiltonian in the form of where, for example, B C x [m] means the x-component of the mth slice of control field in the 13 C channel.
The dynamics of the NMR system is governed by H s and H c simultaneously, with the propagator where The essence of NMR quantum information processing is to optimize a control field, i.e. find a sequence of B x,y [m], such that one can precisely realize a quantum gate or drive the system to a target state according to Eq. ( 3).Fundamentals of the GRAPE algorithm.-To implement a particular target gate or state we need to find an optimal B x,y [m].One of the most prominent optimization algorithms to date is the GRAPE algorithm [15] which was developed for the design of optimal control pulses in NMR spectroscopy.Here, we explain the basic principle of GRAPE by considering the problem of state engineering in the absence of relaxation.
Suppose the initial state of the spin system is ρ i , and the target output state is ρ f .After applying a M -slice trial control pulse, the system will evolve to The fitness function defined as f = tr(ρ f ρ) serves as a metric for the control fidelity, with the form Obviously, f is a function of 2M variables, and to find its optimium we calculate the gradient function to the first order where σ k x,y , U m 1 (ρ i ) is the commutator between σ k x,y and U m 1 (ρ i ).We may increase the fitness function f by using the gradient iteration rule where is a suitably chosen step size.
The GRAPE algorithm proceeds as follows on a classical computer: 1. start from an initial guess control B x,y [m]; 2. calculate ρ according to Eq. ( 5); 3. evaluate fitness function f = tr(ρ f ρ); 4. if f does not reach our preset value, evaluate gradient function g according to Eq. (7); 5. update control variables according to Eq. ( 8), then go to step 2.
MQFC optimization.-The GRAPE algorithm requires the calculation of U M 1 , i.e., the dynamics of the system.This step is inefficient on a classical computer when the size of the system is large.In contrast, the scheme of MQFC optimization provides an alternative way which enables direct measurement of f and g in the experimental manner, or explicitly, via the quantum evolution and measurement of the quantum processor.
Without loss of generality, let us discuss the scenario of ensemble quantum computing.e.g., NMR quantum computing, where the state is usually written as a traceless deviation density matrix and a single-shot measurement is sufficient to get the expected value of an observable.For other systems that use the computational basis or projective measurement, the following procedure needs to be slightly modified and more repetitions may be required to get the estimate of f and g.
Measuring f is straightforward.For an n-qubit system, the total number of elements in the Pauli basis is 4 n − 1 (without the identity term).If the target state ρ f has some decomposition, say, ρ f = G γ=1 x γ P γ with respect to the Pauli basis, then the fitness function is Here, 1 ≤ G ≤ 4 n denotes the number of nonzero components, P γ is the γ-th element of the Pauli basis, and x γ is its corresponding coefficient.Therefore, G experiments are required to estimate f .In the γ-th experiment, we just need to apply the control field to the initial state ρ i and measure the expectation value P γ of ρ.For a generic ρ f that contains all G = 4 n − 1 Pauli terms, measuring f in experiment is equivalent to carrying out full state tomography, and is thus inefficient.However, many tasks require the creation of a simple target state where G is quite small.For instance, if we aim to prepare the 12-coherent state ρ f = Z ⊗12 , one measurement is sufficient to obtain f .
Measuring g requires us to realize the commutator [σ k x,y , •] inside Eq. (7).In fact [24], in which R k x,y and R k x,y mean a π/2 rotation and −π/2 about x or y axis on the k-th qubit, repsectively.By substituting Eq. (10) into Eq.( 7), we get The terms on the right-hand side are very similar to the measurement of f in Eq. ( 6), and the only difference is the local ±π/2 pulse inserted between slices m and m + 1. Explicitly, the m-th component of g x,y is a weighted sum of 4nG measurement quantities, where 4 comes from the ±π/2 pulses about the x and y axes, n from the sum over all the qubits, and G from the measurement of f .In each experiment, compared to the way of measuring f , we just need to insert a local π/2 pulse after the m-th slice evolution.Provided that all the qubits are well individually addressed, high fidelities are attainable in implementing these local π/2 rotations.In summary, we need 4nGM experiments in total to perform the gradient measurement, which is linear in the number of qubits.
Experimental MQFC optimization.-Now we turn to the experiment where the MQFC optimization is used to create the 12-coherent state in the 12-qubit quantum processor.First, let us clarify that all other pulses except the MQFC pulse throughout our experiments are local rotations, which are generated from a subsystem-based gradient ascent pulse engineering (SSGRAPE) approach [1].It is a technical improvement of the original GRAPE for our particular implementation, but does not address its poor scalability issue (see Appendix D [28]).What makes the MQFC scheme remarkable is that, it does not involve the computationally expensive classical simulation of the 2 12 -dimensional quantum dynamics in the course of optimization.
For our optimization task, GRAPE is a powerful tool, but handling 12 qubits is near the limit of capability for a typical laptop computer.In contrast, MQFC is capable of overcoming this difficulty in certain cases.Taking our experiment as an example, MQFC is able to solve the problem of finding a control field that evolves single-coherence ZI ⊗11 into 12coherence Z ⊗12 in a time that scales linearly with the number of qubits.The entire experimental procedure is depicted in Fig. 2c, with a step-by-step description in Appendix E [28].
First, we prepare 7-coherence Z ⊗7 I ⊗5 on the seven 13 C spins, using the sequence in Fig. 2c before the MQFC optimization box.This procedure, benchmarked in our previous work [3], is mainly done with the aid of SSGRAPE.Subsequently, we create Z ⊗12 via MQFC on the quantum processor, which is the main focus of this work.We attempt to optimize a control field, namely a shaped radio frequency (r.f.) pulse, to evolve the system from the input ρ i = Z ⊗7 I ⊗5 to the output ρ f = Z ⊗12 .Our control field, as shown in the MQFC optimization box, is comprised of three sub-pulses to realize local rotations, and two free evolutions to let 13 C qubits interact with 1 H qubits for the purpose of generating higher coherence.The whole control field is digitized into M = 278 slices with ∆t = 20 µs width, while 110 slices are for three sub-pulses and 168 slices remain zero to realize the two 1.68 ms free evolutions (Appendix E [28]).The total dynamics of the pulse is given by U M 1 in Eq. (3).The fitness function is defined as f = tr(ρ f ρ), a metric for the control fidelity, where ρ = U M 1 (ρ i ) is the experimental state and ρ f = Z ⊗12 is the target.In our experiment, only one measurement of the expectation value of Z ⊗12 suffices to attain f after each iteration.If f does not hit our preset value with the current control field, we navigate the control field along its gradient g.In fact, to measure g x [m] (the same for g y [m]) which is the gradient of slice m, we just need three steps: insert a local ±π/2 pulse on every qubit about x-axis between slice m and m + 1; apply this new control field to the initial state ρ i and measure f (see Fig. 2b); compute g x [m] by directly combining these ±π/2-inserted results via Eq.(11).As long as accurate local ±π/2 pulses are available for each qubit, g can be measured on a quantum processor.In experiment, we have designed a 1 ms π/2 pulse on every 13 C nucleus with the simulated fidelity over 99.7% (Appendix D [28]).Having the gradient, we can update the control field and continue the MQFC procedure until a desired f is attained.
Direct observation of 12-coherence.-After the prepara-tion of the 12-coherent state, the next step is to observe it.In NMR spectroscopy, multiple coherence is hard to be observed directly in a one-dimensional spectrum, i.e., by flipping the target spin to the x-y plane while others remain in Z.If all coupling between the target spin and other spins can be resolved, such observation is feasible.For example, in a twoqubit system, we can flip spin one to X to observe ZZ.In fact, XZ can be written as The first term X ⊗ |0 0| leads to a positive peak at ν 1 − J 12 /2 in the spectrum, as the J-coupling term shifts the frequency of qubit 1 by −J 12 /2.Analogously, the second term X ⊗ |1 1| leads to a negative (due to the minus sign before the term) peak at ν 1 + J 12 /2.Generally, these two peaks can be resolved in the spectrum as long as J is large enough to separate them in frequencies.However, to observe multiple coherence, this requirement is of great challenge, since all Jcouplings between the target spin and other spins should be sufficiently large to prevent the annihilations of positive and negative peaks.As a result, two-dimensional spectra and special techniques are usually employed to observe multiple coherence in conventional NMR spectroscopy.
For the purpose of NMR quantum computing, it is certainly better if one can read out multiple coherence directly in a onedimensional spectrum, as one-dimensional spectrum reflects the state information more intuitively and reduces experimental running time remarkably compared to the two-dimensional spectroscopy.In our 12-qubit processor, although there are a few couplings as small as 0.01 Hz (Appendix C [28]), a direct observation of 12-coherence Z ⊗12 is still available on C 7 .Figure 3a exhibits a strong agreement between experimental observation 12-coherence with merely 32 scans and the simulation, after rescaling the experimental result by 1.21 times to compensate for decoherence.To our best knowledge, our experiment is the first direct observation of multiple coherence beyond ten spins, and provides a valid evidence that our 12qubit processor possesses excellent individual controllability and the potential to be a universal 12-qubit quantum processor.
Readout sequence.-Although the direct observation of 12coherence with 32 scans in Fig. 3a demonstrates our control precision, it is not suitable for the many experimental runs during the optimization since 32 scans leads to a great time cost.One solution is to decouple the five 1 H spins to boost the signal-to-noise ratio (SNR) by 2 5 = 32 times, which exactly compensates for the required scan number.We have designed a readout pulse sequence to realize it as shown in Fig. 4.
The local pulses in the readout sequence are computed by SSGRAPE, and the sequence is implemented before every measurement.The phase correction is a z-rotation to neutralize the unwanted chemical shift rotation during the free evolution.If the state is Z ⊗12 , the five 1 H spins will be evolved to the identity state after the readout sequence, and the decoupling of 1 H leads to the C 7 spectrum as shown in Fig. 3b, which is measured with a single scan.We then use spectrum fitting to obtain the signal's amplitude and phase, and thus the value of Z ⊗12 .This readout sequence induces errors in terms of decoherence and pulse imperfections.For the former one, through our simulation we find that it leads to about 30% signal loss, which is reasonable since multi-coherence is exceptionally vulnerable to decoherence.Therefore, this factor is taken into account for all the measurement results, that is, the measured values are rescaled by about 1.3.With respect to the pulse imperfection, it consists of two parts: the imperfection of the sequence itself, i.e., some approximations when we design this simple readout sequence, and the infidelities in implementing the pulses.In total, 3.5% error arises in simulation.We use this value as the uncertainty of the experimental value of Z ⊗12 , namely, the error bars in Fig. 3c.
Experimental results.-Figure 3b shows the spectrum of ρ after the readout stage for each odd iteration.The peak intensities correspond to the value of f = tr(Z ⊗12 ρ), which clearly shows that MQFC increases f during the optimization.This demonstrates that MQFC is a practical technique for designing control fields in large quantum systems.
Our experiment also exhibits MQFC's ability of correcting unknown experimental errors.To demonstrate this improvement, we implement another group of 12-coherence-creating experiments, where all experimental settings are the same except that the pulse is generated from the classical SSGRAPE method other than the MQFC approach.We then compare these two groups of experiments.Figure 3c illustrates the result of SSGRAPE and MQFC pulses both in simulation and experiment.Focusing on the final result at iteration 9 in Fig. 3d, in experiment SSGRAPE finally creates a 12coherence with f = 0.703 ± 0.034, whereas MQFC pulse creates f = 0.795 ± 0.027.This experimental improvement (nearly 10%) disagrees with simulation, as in simulation MQFC (0.830) is even worse than SSGRAPE (0.931).
Considering that MQFC is a feedback-control process, some incomplete knowledge of the experimental quantum process, such as the nonlinearity of the pulse generator or imprecision of the molecular Hamiltonian, may be inherently corrected during the optimization.Indeed, the experiment clearly suggests that MQFC is advantageous in terms of correcting errors from unknown sources.Furthermore, we simulate the decoherence effect during the procedure, and find that the upper bound of tr(Z ⊗12 ρ) in the presence of dephasing noise is about 0.824 (see Methods).Note that our MQFC result finally reaches 0.795, which is very close to this bound, demonstrating that our control of this 12-qubit processor is close to the theoretical prediction after accounting for decoherence.

III. DISCUSSION
Scalability.-One major concern about control methods is their scalability with the number of qubits n.Our MQFC protocol involves a single experiment to measure f and 4nM experiments to measure g for each iteration, where n is the number of qubits.Assuming each experiment takes τ exp time, the MQFC in total consumes T it = (4nM + 1)τ exp for each iteration.For comparison, one has to deal with massive 2 n × 2 n matrix multiplications and exponentials using GRAPE on a classical computer.The speed-up comes from the fact that MQFC utilizes the evolution of the quantum system instead of computing the system's dynamics when evaluating f and g.
For other potential problems when scaling up the GRAPE technique, MQFC confronts similar difficulties, such as how to effectively represent a generic target state, how to choose a good initial guess, how to determine the pulse parameters before optimization, and how many iterations are needed to reach a satisfactory fidelity.Unfortunately, experimental observation of running time versus number of qubits is not likely in NMR, since changing the number of qubits would usually require a different sample with different characteristics.So we cannot experimentally compare the scaling of MQFC versus GRAPE, instead we must be satisfied with the fact that MQFC performs well at the 12-qubit level and should theoretically scale better than GRAPE under standard assumptions.See Appendix A in [28] for details.
One may also ask if there could be other classical algorithms that scale as well (or better than) MQFC.This question remains open, but it seems very unlikely -the gradient calculation is based on the dynamics as shown in Eq. ( 3), i.e., the expected classical algorithm needs to simulate the dynamics of an NMR system in an efficient way.Even when boiling down to our particular state engineering task, as far as we can tell, there is no employed numerical method [30][31][32] to simplify such an optimization, despite extensive work on the subject since the early days of experimental quantum computing.Moreover, MQFC can correct unknown errors to some extent, while open-loop algorithms should require knowledge about the noise spectrum in advance, which is usually impractical for large quantum systems.In this sense, another potential application of MQFC is to demonstrate the quantum computing supremacy [33], where initial endeavors have been made in other systems, for example in a recent five-photon boson sampling experiment [34].
Optimizing Clifford gates.-While our experiment focuses on state engineering, MQFC can also be used for other quantum optimization tasks.As an example, we consider optimizing the pulse sequence for a generic Clifford gate.It is possible to use twirling to estimate the average gate fidelity of a Clifford gate efficiently [3].The twirling protocol is based on finding the fidelity between experimental states following the pulse sequence and the corresponding desired states following the ideal gate.In principle this should be done for a complete set of initial states, but a randomized protocol can be used to approximate the gate fidelity with a constant number of experiments.The MQFC protocol can be modified to extract the desired fidelities and optimize the pulse sequence accordingly (details in Appendix B in [28]).Note that right after our work, a five-qubit implementation of a different quantum algorithm for gate optimization was reported [35].
Comparison with previous work.-MQFC was originally introduced in Ref. [24] where it was implemented on a 7-qubit NMR processor.There are two significant improvements in our work.First, our work clearly demonstrates the superiority of MQFC in correcting unknown errors with around 10% fidelity boost compared to the best classical optimization re-sult, while in the 7-qubit experiment no improvement was observed.The reason could be that the characterization of a 7-qubit system is much more accurate than a 12-qubit one, indicating that MQFC should be more powerful when dealing with large systems as the knowledge of larger systems are more likely to be incomplete.Second, our 12-qubit experiment lies at the cutting edge of present experimental quantum computing, and the capability of individual controls at this qubit number is state-of-the-art.As a comparison, in a recent work [36], the 10-qubit entanglement in a superconducting circuit is created with fidelity 0.668 using global control.Moreover, we demonstrated that at the 12-qubit level, the algorithm is already fast enough to justify its use as a tool in the lab.
In summary, we have created a 12-coherence state on an NMR quantum processor using MQFC.Our experimental procedure and result, in particular the direct observation of 12-coherence with one qubit as the probe, signify the capability of our quantum processor to serve as a universal 12-qubit quantum processor with high-fidelity individual controls on each qubit.In terms of control field optimization, our experiment demonstrates two superiorities in efficiency and experimental performance of MQFC beyond its classical counterpart.MQFC requires a running time that scales linearly with the number of qubits, and yields about 10% improvement compared to the best result via classical optimization.This optimization approach could be exceptionally useful in a large system with incomplete characterization, and is readily transferrable to other systems such as superconducing circuits or nitrogen-vacancy centers in diamond.We expect that, as experiments involving more than 10 qubits become more common, quantum feedback methods such as MQFC will become standard tools in quantum computing labs.

IV. METHODS
To numerically simulate the decoherence effect in our 12qubit system, we first make the following assumptions: the environment is Markovian; only the T * 2 dephasing mechanism is taken into account since T 1 effect is negligible in our circuit; the dephasing noise is independent between all qubits; the dissipator and the total Hamiltonian commute in each pulse slice as ∆t = 20 µs is small.With these assumptions, we solve the master equation in two steps for each ∆t: evolve the system by the propagator in Eq. ( 3), and subsequently apply the dephasing noise for ∆t which is an exponential decay of offdiagonal elements in the density matrix.The typical length of simulating our 12-qubit experiment in the presence of dephasing noise is in the magnitude of days on a desktop computer.The simulation shows that at most F dec = 0.824 of Z ⊗12 can be achieved with the 5.56 ms MQFC pulse applied on Z ⊗7 I ⊗5 , which is reasonable as high-order coherence is very vulnerable to the dephasing noise.Alternatively speaking, the upper bound of the MQFC experimental result is 0.824, since the optimization procedure does not include the function of robustness against dephasing noise yet.

Data availability
The data sets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.Starting from an initial guess, a shaped pulse is created from the pulse generator and then applied to the sample.The fidelity function f of the control pulse and its gradient g are directly measured on the quantum processor, where g is used for updating the control field till that sufficiently high fidelity f has been achieved.

MQFC op�miza�on
Readout   .Readout sequence to boost the SNR of the C7 spectrum.It transforms the 1 H spins from Z to identity and thus enables the decoupling of 1 H channel.The phase correction compensates for the chemical shift evolutions, after which all relevant spins are along the y-axis.In principle, this technique improves the SNR by a factor of 32, and makes the measurement of f or g practical using one scan.
Appendix A: Choice of the initial guess Making a good initial guess is critical for an optimization task on a large system.For small-sized (e.g., 2-5 spins) systems, even starting from an arbitrary initial guess, it is highly possible to reach a good control solution.However, for larger quantum systems (e.g., 10 spins or more), it is not practical to start from a completely random guess.
The difficulty in choosing the initial guess is primarily due to the fact that there is no theory to tell what is the minimum time length required for a given control task.In general, for larger systems with more complex target states, we need much longer pulses for state engineering tasks.However, noticing that the slice length ∆t should be small to ensure the accuracy of gradient estimation, longer pulse leads to more time steps.A rough estimation is, for our 12-qubit system, typically the total time needed for a control pulse to prepare Z 12 from a single Z is about 100 ms.That is to say, for ∆t = 20 µs (what we used in the experiment), it means 5,000 slices.Obviously, optimizing so many control variables in a 2 12 -dimensional space is a huge task, in particular if one starts from a totally random guess.
A possible solution is to use the so-called sequence compiler technique [1] to generate a suitable initial guess.This is an efficient algorithm, and is also what we used in the current work.The basic idea is as follows.A quantum circuit is usually constructed based on the coupling network of the controlled system.In liquid NMR, the circuit is composed of local rotations and J-coupling evolutions, where local rotations are generally much faster than J-coupling evolutions.Therefore, the dominating part of the circuit is free evolution, accompanied by some slices that correspond to local rotations.Only the local rotations are to be optimized since free evolution indicates zero control parameters.This will greatly reduce the size of the parameter space, which, for example, eventually leaves us only 110 slices to be optimized for a Z 7 to Z 12 preparation.Furthermore, local operations can be optimized using compiled selective pulses as the initial guess [2].Various types of errors arise when a local rotation is realized by a selective without correction.The pulse compilation process is efficient and can eliminate the zero-th and first-order control errors, thus substantially increases the goodness of the initial guess.

Appendix B: Optimizing Clifford gates
While our experiment focuses on state engineering, MQFC can also be used for other quantum optimization tasks.As an example, we consider optimizing the pulse sequence for a generic Clifford gate.It is possible to use twirling to estimate the average gate fidelity of a Clifford gate efficiently [3].The twirling protocol is based on finding the fidelity between experimental states following the pulse sequence and the corresponding desired states following the ideal gate.In principle this should be done for a complete set of initial states, but a randomized protocol can be used to approximate the gate fidelity with a constant number of experiments.The MQFC protocol can be modified to extract the desired fidelities and optimize the pulse sequence accordingly.
For a faulty Clifford channel Λ, its average fidelity takes the following form: where Pr(0) is the probability of no error.In fact, Pr(0) is the linear combination of state fidelities i .In other words, the average fidelity of a Clifford gate is a summation of state fidelities.Although the expression of Pr(0) seems to contain 4 n −1 elements, we can approximately estimate its value in polynomial time by a uniform sampling approach based on the Hoeffding's inequality [3].In particular, we have demonstrated that one just needs to conduct a constant number of experiments (∼ 10 3 ) to achieve a 99% confidence level regardless of the system's size.Each experiment in the twirling protocol is a Pauli-to-Pauli state evolution, exactly the same as what we have done in this work.In other words, given a satisfactory confidence level, we just need to repeat MQFC for a constant number of input states, and linearly combine all fidelities and gradients of state-tostate optimizations to fulfill the task of optimizing a Clifford gate.Therefore, the complexity of optimizing a Clifford gate using MQFC is the same as that of optimizing a state-to-state evolution up to a constant, meaning that optimizing a Clifford gate via MQFC is feasible.
The system Hamiltonian for the sample is given as Eq. ( 1) in the main text.At room temperature, the thermal equilibrium state of this system is highly mixed, with the form where ≈ 10 −5 describes the polarization, I is a 2 12 × 2 12 identity matrix, and γ C and γ H are the gyromagnetic ratios of the 13 C and 1 H nuclei, respectively.In particular, γ H ≈ 4γ C , so the signal of 1 H is roughly four times as that of 13 C.As the large identity part in Eq. (C1) does not contribute to the NMR spectrum under unitary evolutions, we just omit it in general   and use the remaining term, the so-called deviation density matrix, to represent the state: where γ H ≈ 4γ C is used and the polarization is dropped.In Fig. S2, the 13 C and 1 H thermal equilibrium spectra are shown, while the spectra of C 2 , C 6 , H 3 , and H 5 are magnified for better visualization.All 12 spins can be individually addressed by their distinct chemical shifts as shown in Fig. S1.
For each spin, the spectrum in principle contains 2 11 peaks due to its couplings with the other 11 spins.However, many interactions, especially those between distant spins, are too small to be resolved spectrally, so the number of observable peaks is much less than 2 11 .
It would be more convenient to work in the rotating frame rather than in the lab frame.For the internal Hamiltonian, we set the two transmission frequencies as o 1 = 20, 696 Hz and o 2 = 2, 894 Hz for the channel 13 C and 1 H, respectively.The transmission frequencies are chosen as the central frequencies of the spectra.In this double-rotating frame, the internal Hamiltonian then becomes The external control field is applied in the transverse x-y plane, oscillating at the transmission frequency of 13 C and 1 H channel, respectively.These transmission frequencies are in the radio-frequency (r.f.) regime.There are four control parameters, namely B C x , B C y , B H x , and B H y , in a time-independent r.f.pulse, where, without loss of generality, B C x means the control field amplitude along x-axis for the 13 C channel.In the double-rotating frame, the Hamiltonian of the external control field is (C4) Together with the internal Hamiltonian H rot s in Eq. (C3), the dynamics of the total system is dominated by the joint action of the internal and external Hamiltonians, with the propagator All elementary gates such as single-qubit rotations and twoqubit controlled-NOT (CNOT) gates required in universal .Thermal spectra of a, 13 C and b, 1 H in the 12-qubit quantum processor.In particular, the spectra of C2, C6, H3, and H5 are magnified for better visualization.The y-axis represents the signal strength (a.u.).Different spins are individually addressed according to their distinct resonance frequencies, and the spectrum of each spin is split into up to 2 11 peaks due to its couplings with the other 11 spins (though many splittings are too small to be resolved).
quantum information processing can be realized by deliberately designing the external Hamiltonian [5].
For instance, if we want to realize a π/2 rotation about the x-axis for C 1 , denoted as R 1 x (π/2), we can use a M -slice shaped pulse with slice width ∆t.In each slice, the four parameters of the control field are constants, labeled as B C x [m], and B H y [m], respectively.The propagator of such a shaped pulse is a concatenation of the propagator in Eq. (C5) with U m = e −i(Hs+Hc[m])∆t .The next step is to find the shaped pulse, i.e., a sequence of B C,H x,y [m], such that the propagator in Eq. (C6) realizes the target operation R 1 x (π/2) with high fidelity.In state-of-the-art NMR techniques, this optimization procedure is often realized via the GRAPE algorithm, as a shaped pulse found by GRAPE can have the properties of short duration and robustness to uncertainties in the Hamiltonian, e.g., the inhomogeneity of the static or control field.

Appendix D: Subsystem-based GRAPE
In small-scale systems with around seven spins, GRAPE is quite powerful, as it generates high-fidelity shaped pulses readily with modern computing power.However, in the 12qubit system, GRAPE is significantly more challenging, as it requires much higher dimensional matrix multiplications and exponentiating.Therefore, we modified the original GRAPE and applied this algorithm based on subsystems, which we call subsystem-GRAPE (SSGRAPE).
We would like to stress at first that SSGRAPE is still classical and thus cannot address the scalability issues of GRAPE [1].Even though, SSGRAPE is an important modification to the original GRAPE algorithm, which can improve the timescale of calculating GRAPE pulses dramatically by defining subsystems based on the Hamiltonian of the molecule.For example, in our 12-qubit system, by artificially disconnecting C 2 and C 7 , we divided the entire system into two subsystems with each consisting of six spins.From Fig. S1 and the relevant parameters, it can be seen that the two subsystems are isolated to a good approximation.We define the subsystem with C 2 as S A , and the other as S B .Both internal and external Hamiltonians in S A and S B can be determined by tracing out the other subsystem.For a target operator, say x (π/2), it can be decomposed into two operators where U A tar and U B tar are now both 2 6 × 2 6 unitary operators, and U tar = U A tar ⊗ U B tar .Therefore, the 12-qubit GRAPE optimization problem can be treated as two 6-qubit problems, and SSGRAPE attempts to optimise a shaped pulse which can realize U A tar and U B tar simultaneously.In brief, the SSGRAPE technique greatly reduces the computation time of the pulse finding on our 12-qubit system, but it is worth emphasizing that it does not fundamentally solve the scalability issue.
Another requirement of adopting SSGRAPE is that the target unitary operator can be effectively decomposed using subsystems and does not involve interactions between subsystems.In our 12-qubit experiment, this condition holds for every operator.We list all the SSGRAPE-optimized shaped pulses that are needed in the experiment, as shown in Table I.We also simulated the fidelity of each pulse in the full 12 Table I.Shaped pulse optimized by SSGRAPE during the 12coherence creation.The pulses are listed in the order of their appearances in Fig. 2(c).Although the pulses are found with the subsystem method, the fidelities reported here are calculated on the full 12-qubit system.
qubit system.That is, each pulse was found using SSGRAPE in the two 6-qubit subsystems, but then simulated on the full system.All local pulses are over 99.7% fidelity in simulation, which demonstrates that SSGRAPE is a valid pulse searching method for our 12-qubit system.
Appendix E: Experimental implementation of creating a 7-coherence In this section, we present a step-by-step description of our experiment of creating the 7-coherence Z ⊗7 I ⊗5 , and show the relevant NMR spectra at each step.

From the initial state to ρa
The whole circuit is depicted in Fig. 2(c) in the main text, with four intermediate states labeled by ρ a , ρ b , ρ c , and ρ d .The initial state is the thermal equilibrium state: ρ dev = 7 i=1 Z i + 4 12 j=8 Z j , as was described in Eq. (C2).It first undergoes an 8 ms SWAP gate, which swaps the equilibrium polarizations between C 7 and H 1 .In doing so, the signal of C 7 is boosted by approximately four times.After this, a multiqubit rotation about the y-axis on all spins except C 7 is applied to rotate these spins to the transverse plane, followed by a zdirection gradient field.A z-gradient pulse is used to destroy non-zero coherences, i.e., removes all the Pauli terms that contain X and Y terms in our case.As all the other spins except C 7 are flipped to the x-y plane, the resulting state after the gradient pulse is (E1) Here, we have ignored the factor of four before Z 7 for convenience, as this state will be used as the reference for later calibrations.When we observed C 7 by rotating it to X, the two spectra of ρ a are shown in Figs.S3(a) and S3(b).The left one is the spectrum of C 7 in the 12-qubit regime, and the right one is obtained by decoupling the 1 H channel via the Waltz-16 sequence [6].This decoupling can be considered as a partial trace process, which can remarkably improve the spectrum resolution, but it requires that the state of the five 1 H nuclei is equal to the identity.

From ρa to ρ b
The next step is to create a 5-coherence on the nearest neighbours of C 7 , including C 2 , C 4 , C 5 , and C 6 .Let us start from a simple example to describe how to increase the coherence order.For two qubits, if we start from XI, choose t = 1/(2J), and let the system evolve under the J-coupling of σ z σ z term, the coherence order of the system can be increased by one according to XI The main idea of creating 5-coherence is to make use of the partial refocusing scheme [3], that only the desired J-coupling evolutions are left to undergo t = 1/(2J) evolutions, while all the unwanted couplings are refocused.Refocusing of an unwanted σ z σ z coupling term can be realised by inserting a π pulse on one spin in the centre of the evolution but no pulse on the other.Although the desired couplings J 27 , J 47 , J 57 , and J 67 are different, a simultaneous J-coupling evolution is possible through careful design of the π-pulses' positions; see Fig. 2(c).
After the partial refocusing sequence and a subsequent R 7 y (−π/2) pulse that rotates C 7 back to Z, the ideal state at point b is (E3) In experiment, we observed C 7 for this 5-coherence state, and the two spectra without and with decoupling the 1 H channel are shown in Figs.S3(c) and S3(d), respectively.The signal attenuation in experiment is about 20.7% due to decoherence, so the simulated spectra were rescaled by 1.26 times (compared to the simulated spectra of ρ a ) to fit the experimental data.

From ρ b to ρc
In this step, we create the 7-coherence involving all the 13 C nuclei.Coherence is transferred to the remaining C 1 and C 3 spins from their joint neighbour C 2 .Similar to the above procedure, this step also involves a partial refocusing sequence, which realises the t = 1/(2J) evolutions for J 12 and J 23 simultaneously.After a local pulse R 2 y (−π/2) on C 2 , the state at point c is where I ⊗5 indicates that all five 1 H's are still in the identity state.The experimental spectra of C 7 without and with 1 H decoupling are plotted in Figs.S3(e) and S3(f), respectively.The simulated spectra were rescaled by 1.42 (compared to the simulated spectra of ρ a ) to make an optimal fit with the experimental result.Despite the non-negligible decoherence effect during the experimental creation of 7-coherence, we emphasize that this signal attenuation will not impact the characterization of the MQFC procedure, which is the main focus of this work.As shown in Figs.S3(e) and S3(f), the creation of 7-coherence is remarkably precise up to a rescaling factor.This 7coherence state can be used as a reference to calibrate the 12coherence created via MQFC.Hence, the scaling factor of the 7-coherence state is irrelevant.We also direct readers to Ref. [3] for a detailed calibration of our 7-coherence result and the relevant spectrum of another spin C 2 .The spectra in the left column are averaged over 30 scans to gain a good signal-to-noise ratio, while the ones in the right column are averaged over only 10 scans.In each spectrum, the experiment is in strong agreement with simulation, indicating that our control on this 12-qubit system is precise.The central focus of this work is to design a shaped pulse based on MQFC to create 12-coherence from 7-coherence.More precisely, in the circuit of Fig. 2(c), we want to optimize the part between ρ c and ρ d .Unlike the preceding section where all pulses were calculated by SSGRAPE, MQFC optimization is quantum, as the fitness function f and gradient g were directly measured on the 12-qubit quantum computer.The only role that a classical computer played during MQFC was to update the control field in terms of the measured gradient g, which is merely simple algebra without additional time cost.That is, no inefficient calculations on classical computers were involved during MQFC.
After the creation of 7-coherence ρ c = Z ⊗7 I ⊗5 , we applied a z-direction gradient field as depicted in Fig. 2(c).The purpose is to remove unwanted terms produced due to exper-imental imperfections, since Z ⊗7 I ⊗5 itself is invariant under this gradient field.This technique is conventional in NMR quantum computing to 'clean up' the experimentally prepared input state, and has no influence on the subsequent MQFC procedure.
The structure of the shaped pulse used in MQFC was predesigned according to the molecular information in Fig. S1.It consists of five parts: three sub-pulses and two free evolutions in between sub-pulses as shown in the MQFC optimization box in Fig. 2(c).The general idea of this structure design is to let 13 C's interact with five 1 H's simultaneously and hence increase the coherence order by five.Note that all large C-H couplings in Fig. S1 are roughly J ave = 148.8Hz on average.Therefore, we set the time for the two free evolutions as 1/4J ave ≈ 1.68 ms, and expect that it enables sufficient C-H interaction time to produce higher coherence on the five 1 H's.The functions of the three sub-pulses are: the first one is to rotate C 2 , C 3 , C 4 , and C 7 , which are directly connected to 1 H, to the x-y plane; the second sub-pulse is to refocus unwanted couplings during the C-H interaction; the last one is to rotate the relevant spins back to Z.The MQFC pulse is set to be 5.56 ms with ∆t = 20 µs.The total number of slices is thus 278, where 168 of them remain zero as they are meant for free evolutions.The remaining 110 slices are divided into three parts: 30 for the first sub-pulse, 30 for the second sub-pulse, and 40 for the third sub-pulse.Hence, to measure the gradient g in each iteration, we only need to take these 110 slices into account, which greatly reduces the experimental running time.
Since MQFC is a gradient-based optimization procedure, the measurements of f and g are critical.As explained before, measuring f is actually equivalent to measuring the expectation value Z ⊗12 in the experimental state after applying the trial shaped pulse.This measurement requires only one experiment.Analogously, measuring g also involves the readout of the expectation value Z ⊗12 , with a π/2 local pulse inserted in the trial shaped pulse (see Fig. 2(b) and Eq. ( 11)).This measurement requires 4nM experiments, where n = 7 because we only need to apply local π/2 pulses on the seven 13 C's, and M = 110 is the number of slices as described in the preceding paragraph.In total, for each iteration, the experimental time of MQFC is where τ exp is dominated by the delay time between two experiments to reestablish thermal equilibrium.Typically, τ exp ≈ 5T 1 , implying a 30 s delay between experiments.However, the observation of Z ⊗12 in our 12-qubit system requires about 30 experimental scans to yield a good spectrum with acceptable signal-to-noise ratio (SNR), such as the one in Fig. S4(a).Estimated by Eq. (F1), this requirement leads to a time cost of over one month per iteration, which is impractical as in this experiment we used nine iterations to achieve a high-fidelity MQFC pulse.Therefore, we need to improve the SNR of the spectrum in order to reduce the number of scans and thus reduce the experimental time.A traditional way is to decouple 1 H spins, which should enhance the SNR by 2 5 = 32 times, because the NMR signal per peak attenuates exponentially with the number of interacting spins.However, when the five 1 H's are in Z ⊗5 , the decoupling, which in fact traces out 1 H, would lead to no signal on 13 C as shown in Fig. S4(b).In other words, it is necessary to evolve the state of 1 H to I ⊗5 before decoupling.In the experiment, we used a readout pulse to realize this transformation.
Appendix G: Readout sequence for the measurement of f and g As mentioned above, the direct observation of 12coherence Z ⊗12 requires about 30 scans to yield a good SNR in the spectrum.Compared to the 1 H decoupled spectrum which merely requires one scan, the experimental time of the undecoupled case is 30 times longer and thus impractical for measuring f and g.This section is to describe our readout technique, which enables the decoupling of 1 H's so that each experiment can be done with only one scan.Ideally, the two plots should be the same, as MQFC is measuring the gradient information on a quantum computer, which should not be different from the classical SSGRAPE calculations.Experimentally, however, they are seen to differ.This reflects the unknowns in the experimental system (uncertainties in the Hamiltonian, control fields, etc.), and demonstrates that MQFC is able to correct for these unknowns.
A readout sequence (see Fig. 4), computed by classical SS-GRAPE, is run just after the MQFC procedure.The phase correction is a z-rotation to compensate for the unwanted chemical shift evolutions during 1/2J 78 time.If the state is Z ⊗12 , the five 1 H's will evolve to the identity state after the readout sequence, and the decoupling of 1 H will lead to the C 7 spectrum in Fig. S4(d), which can be measured with a single scan.We used Lorentzian fitting to obtain the signal's amplitude and phase, and thus the value of Z ⊗12 .This readout sequence would inevitably induce errors due to the decoherence and pulse imperfections.For the former error source, through our simulation we found that the readout caused about 30% signal loss, which is reasonable since multi-coherence is exceptionally vulnerable to decoherence.Therefore, this factor was taken into account for all the measurement results, that is, the measured values are rescaled by 1.3.As to the pulse imperfection, it consists of two parts: the imperfection of the sequence itself, i.e. some approximations about J-couplings when designing this simple readout sequence, and the infidelity of the SSGRAPE pulse.In total, 3.5% error arises in simulation, but how the error affects the 12-qubit quantum states is difficult to quantify.We used this value as the uncertainty of the experimental value of Z ⊗12 , namely, the error bars in Fig. 3c in the main text.Fortunately, MQFC outperforms SSGRAPE, even with error bars accounted for, demonstrating that MQFC has the feedbackcontrol property that is able to correct unknown experimental errors.
In addition, we plotted the variations of B x between two iterations for both MQFC and SSGRAPE pulses in Fig. S5.∆B x is proportional to the measured g x , and note the ∆t = 20 µs factor in the form of g x in Eq. ( 11), computed by ∆B x = g x , where is a fixed step size.In experiment, we chose = 1.6e 7 according to the knowledge gained in SSGRAPE calculation.Note that can also be efficiently altered using a quadratic fit process, which is a potential improvement of the current experiment by speeding up the convergence of the optimization procedure.The difference of ∆B x in Fig. S5(a) and S5(b) reflects that unknowns in the experimental system (uncertainties in the Hamiltonian, control fields, etc.) are automatically accounted for by MQFC, confirming its feedback control property.
It is worth stressing that the readout technique used in our experiment is merely to reduce the time cost in measuring f and g.For other systems in which the signal is not exponentially decreased with the growing number of qubits, this readout stage is not necessary.Even in NMR, if we can shorten the reset time between two experiments, a greater number of scans can be done.Preliminary progress has been made towards this goal in our recent work [7].

Figure 1 .
Figure1.MQFC process for optimizing a control field.Starting from an initial guess, a shaped pulse is created from the pulse generator and then applied to the sample.The fidelity function f of the control pulse and its gradient g are directly measured on the quantum processor, where g is used for updating the control field till that sufficiently high fidelity f has been achieved.

Figure 2 .Figure 3 .
Figure 2. MQFC scheme in creating 12-coherence.a Molecular structure of the 12-qubit quantum processor.b Schematic of measuring the m-th step gradient gx,y[m].A π/2 rotation about x(y)-axis for qubit i is inserted between the m-th and (m + 1)-th slices.c Quantum circuit that evolves the system from the thermal equilibrium to 12-coherence, where MQFC is applied on 7-coherence Z ⊗7 I ⊗5 .

Figure 4
Figure 4. Readout sequence to boost the SNR of the C7 spectrum.It transforms the 1 H spins from Z to identity and thus enables the decoupling of 1 H channel.The phase correction compensates for the chemical shift evolutions, after which all relevant spins are along the y-axis.In principle, this technique improves the SNR by a factor of 32, and makes the measurement of f or g practical using one scan.
chosen from the Pauli group and ρ (k) f is the relevant output by applying the ideal Clifford gate on ρ (k)
Figure S2.Thermal spectra of a,13 C and b, 1 H in the 12-qubit quantum processor.In particular, the spectra of C2, C6, H3, and H5 are magnified for better visualization.The y-axis represents the signal strength (a.u.).Different spins are individually addressed according to their distinct resonance frequencies, and the spectrum of each spin is split into up to 2 11 peaks due to its couplings with the other 11 spins (though many splittings are too small to be resolved).

Figure S3 .
Figure S3.Spectra for the observation of ρa (a, b), ρ b (c, d), and ρc (e, f) on C7, respectively.The left column is without 1 H decoupling, and the right column is with 1 H decoupled.The spectra in the left column are averaged over 30 scans to gain a good signal-to-noise ratio, while the ones in the right column are averaged over only 10 scans.In each spectrum, the experiment is in strong agreement with simulation, indicating that our control on this 12-qubit system is precise.

FrequencyFigure S4 .
Figure S4.Spectra for the observation of 12-coherence ρ d (a, b) and after the readout stage (c, d) on C7, respectively.The left column is without 1 H decoupling, and the right column is with 1 H decoupled.The 1 H undecoupled spectra are averaged over 30 scans.As shown in a, the direct observation of experimental 12-coherence ρ d = Z ⊗12 matches remarkably well with the simulation, which demonstrates that we have successfully created 12-coherence using the MQFC pulse.Not surprisingly, the decoupling of 1 H leads to no signal in b, since the five 1 H's are no longer identity but Z ⊗5 .c and d are the spectra after the readout stage in Fig.2(c), whose purpose is to measure f and g in one scan per experiment.

Figure S5 .
Figure S5.Variations of Bx between iterations k + 1 and k, indicated by the x-axis, for a, MQFC pulse and b, SSGRAPE pulse.The y-axis represents the 110 slices in optimization (see Section F), and ∆Bx is plotted in colourscale.Ideally, the two plots should be the same, as MQFC is measuring the gradient information on a quantum computer, which should not be different from the classical SSGRAPE calculations.Experimentally, however, they are seen to differ.This reflects the unknowns in the experimental system (uncertainties in the Hamiltonian, control fields, etc.), and demonstrates that MQFC is able to correct for these unknowns.