Quantum computers promise to outperform their classical counterparts in many applications.1,2,3,4,5,6 A primary obstacle in building large-scale quantum computers is the inadequacy of classical computers for the task of optimizing the experimental control field.7 Standard classical optimization algorithms are impractical in the long run since they have a running time that grows exponentially with the number of quantum bits (qubits).8 In theory, a complex quantum circuit can be decomposed into elementary gates that work on a restricted number of qubits (usually one or two) and should be readily implemented in experiment.9 In reality however, the control fields are never localized and the qubits interact and evolve even in the absence of the control fields. Consequently, the implementation of each elementary gate may require a control sequence that takes into account a subsystem involving many more than one or two qubits. Moreover, the number of elementary gates required for a quantum algorithm grows polynomially with the system size and the errors accumulate with each successive gate. Therefore, an effective and efficient way to optimize the control field and minimize errors is a key ingredient for scaling up quantum information processing devices.10

Here we consider the task of optimizing a control field that will drive the quantum system from a fixed input state ρ i to a desired target state ρ f . This problem is important in quantum information processing, as numerous tasks, such as algorithmic cooling in ensemble quantum computing,11,12 magic state preparation in fault-tolerant quantum computing13 and encoding in quantum key distribution,14 all rely on steering states regardless of the propagator. The gradient ascent pulse engineering (GRAPE) algorithm15 is the current state-of-the-art algorithm to (classically) optimize the control field in quantum state engineering problems. It is widely used in NMR,16 electron spin resonance,17 nitrogen-vacancy centers in diamond,18,19 superconducting circuits,20,21 and ion traps.22,23 The GRAPE method exploits the gradient of a fidelity function to update the control field iteratively.

GRAPE has two major drawbacks that are indeed common to all classical optimization algorithms: its running time is exponential in the size of the n-qubit system, and its accuracy depends on the precision of experimentally obtained parameters describing the quantum system (e.g., the system Hamiltonian). Basically, it is a gradient-based iterative algorithm. At each iteration k, the algorithm computes the evolution of the system under the previous pulse, and produces a final state \(\tilde \rho\) and a fitness function \(f = {\rm{tr}}(\tilde \rho \rho _f)\). It then computes the current gradient g for the use of updating the pulse. Classically, the computation involves the matrix exponential and multiplication in the 2n-dimensional Hilbert space and hence takes an exponential (in the number of qubits n) amount of time. For instance, a cluster of 128 AMD Opteron 850 CPU (2.4 GHz) can only handle a problem size of about ten qubits using GRAPE.8

Recently, Li et al.24 and later Rebentrost et al.25 showed that a quantum processor can be used to calculate f and g efficiently. A technique called measurement-based quantum feedback control (MQFC) enables direct measurement of f and g (see Fig. 1), allowing the quantum processor to optimize its own pulses. MQFC addresses both the issues of scalability and control inaccuracies due to imperfect system characterization.26,27 Moreover, this technique is transferrable to any implementation in which control fields steer the system evolution and measurement in a standard basis is possible. In this work, we implement MQFC on a 12-qubit NMR quantum processor, and in particular demonstrate for the first time that MQFC enhances the control precision by about 10% due to its self-feedback property. Furthermore, by creating the 12-coherent state we demonstrate the capability of our quantum processor to function as a universal 12-qubit quantum processor with high-fidelity individual controls. This is also one of the largest quantum processors with individual-control to date.

Fig. 1
figure 1

MQFC process for optimizing a control field. Starting from an initial guess, a shaped pulse is created from the pulse generator and then applied to the sample. The fidelity function f of the control pulse and its gradient g are directly measured on the quantum processor, where g is used for updating the control field till that sufficiently high fidelity f has been achieved


In this paper, we refer to unnormalized deviation density matrices (without the identity term) as ‘states’, which is a standard convention in ensemble quantum computing. To distinguish from the Hamiltonian, we use capital X, Y, and Z to denote states and σ x , σ y and σ z to denote Hamiltonians, while they both refer to the same set of Pauli matrices.

Quantum processor

In our NMR quantum processor, the liquid-state sample is per-13C labeled (1S,4S,5S)-7,7-dichloro-6-oxo-2-thiabicyclo[3.2.0]heptane-4-carboxylic acid dissolved in d6-acetone, which forms a 12-qubit register. The 12 qubits are denoted by nuclear spins C1 to C7 (13C-labeled) as qubits 1 to 7, and H1 to H5 as qubits 8 to 12 in the molecule shown by Fig. 2a. When placed in a static z-magnetic field, it has a system Hamiltonian

$${\cal H}_s = - \pi \mathop {\sum}\limits_{i = 1}^{12} \nu _0^i\sigma _z^i + \frac{\pi }{2}\mathop {\sum}\limits_{i = 1 < j}^{12} J_{ij}\sigma _z^i\sigma _z^j,$$

where \(\nu _0^i\) is the Larmor frequency of the ith qubit, J ij is the coupling between qubits i and j, and \(\sigma _z^i\) is the Pauli-z operator of the ith qubit. The values of these parameters can be found in Appendix C (See Supplementary information).

Fig. 2
figure 2

MQFC scheme in creating 12-coherence. a Molecular structure of the 12-qubit quantum processor. b Schematic of measuring the m-th step gradient g x,y [m]. A π/2 rotation about x(y)-axis for qubit i is inserted between the m-th and (m + 1)-th slices. c Quantum circuit that evolves the system from the thermal equilibrium to 12-coherence, where MQFC is applied on 7-coherence Z7I5

The control Hamiltonian is due to the transverse control field applied in the xy plane, which is often digitized into M slices with slice length Δt. In each slice, there are four constant control parameters, leading to a control Hamiltonian in the form of

$$\begin{array}{ccccc}\\ {\cal H}_c[m] = & {\rm{B}}_x^{\rm{C}}[m]\mathop {\sum}\limits_{i = 1}^7 \sigma _x^i + {\rm{B}}_y^{\rm{C}}[m]\mathop {\sum}\limits_{i = 1}^7 \sigma _y^i\\ \\ & + {\rm{B}}_x^{\rm{H}}[m]\mathop {\sum}\limits_{j = 8}^{12} \sigma _x^j + {\rm{B}}_y^{\rm{H}}[m]\mathop {\sum}\limits_{j = 8}^{12} \sigma _y^j,\\ \end{array}$$

where, for example, \({\rm{B}}_x^{\rm{C}}[m]\) means the x-component of the mth slice of control field in the 13C channel.

The dynamics of the NMR system is governed by \({\cal H}_s\) and \({\cal H}_c\) simultaneously, with the propagator

$$U_1^M = U_MU_{M - 1} \cdots U_1,$$


$$U_m = e^{ - i({\cal H}_s + {\cal H}_c[m])\Delta t}.$$

The essence of NMR quantum information processing is to optimize a control field, i.e., find a sequence of B x,y [m], such that one can precisely realize a quantum gate or drive the system to a target state according to Eq. (3).

Fundamentals of the GRAPE algorithm

To implement a particular target gate or state we need to find an optimal B x,y [m]. One of the most prominent optimization algorithms to date is the GRAPE algorithm15 which was developed for the design of optimal control pulses in NMR spectroscopy. Here, we explain the basic principle of GRAPE by considering the problem of state engineering in the absence of relaxation.

Suppose the initial state of the spin system is ρ i , and the target output state is ρ f . After applying a M-slice trial control pulse, the system will evolve to

$$\tilde \rho = U_1^M\left( {\rho _i} \right) = U_1^M\rho _iU_1^{M\dagger }.$$

The fitness function defined as \(f = {\rm{tr}}(\rho _f\tilde \rho )\) serves as a metric for the control fidelity, with the form

$$f = {\rm{tr}}(\rho _f\tilde \rho ) = {\rm{tr}}\left( {U_1^M\left( {\rho _i} \right) \cdot \rho _f} \right).$$

Obviously, f is a function of 2M variables, and to find its optimum we calculate the gradient function to the first order

$$\begin{array}{l}g_{x,y}[m] = \frac{{\partial f}}{{\partial {\rm{B}}_{x,y}[m]}}\\ \approx \mathop {\sum}\limits_{k = 1}^n {\rm{tr}}\left( { - i\Delta t \cdot U_{m + 1}^M\left[ {\sigma _{x,y}^k,U_1^m\left( {\rho _i} \right)} \right]U_{m + 1}^{M\dagger } \cdot \rho _f} \right),\end{array}$$

where \(\left[ {\sigma _{x,y}^k,U_1^m\left( {\rho _i} \right)} \right]\) is the commutator between \(\sigma _{x,y}^k\) and \(U_1^m\left( {\rho _i} \right)\). We may increase the fitness function f by using the gradient iteration rule

$${\rm{B}}_{x,y}[m] \leftarrow {\rm{B}}_{x,y}[m] + \epsilon \cdot g_{x,y}[m],$$

where ϵ is a suitably chosen step size.

The GRAPE algorithm proceeds as follows on a classical computer:

  1. 1.

    start from an initial guess control B x,y [m];

  2. 2.

    calculate \(\tilde \rho\) according to Eq. (5);

  3. 3.

    evaluate fitness function \(f = {\rm{tr}}(\rho _f\tilde \rho )\);

  4. 4.

    if f does not reach our preset value, evaluate gradient function g according to Eq. (7);

  5. 5.

    update control variables according to Eq. (8), then go to step 2.

MQFC optimization

The GRAPE algorithm requires the calculation of \(U_1^M\), i.e., the dynamics of the system. This step is inefficient on a classical computer when the size of the system is large. In contrast, the scheme of MQFC optimization provides an alternative way which enables direct measurement of f and g in the experimental manner, or explicitly, via the quantum evolution and measurement of the quantum processor.

Without loss of generality, let us discuss the scenario of ensemble quantum computing. e.g., NMR quantum computing, where the state is usually written as a traceless deviation density matrix and a single-shot measurement is sufficient to get the expected value of an observable. For other systems that use the computational basis or projective measurement, the following procedure needs to be slightly modified and more repetitions may be required to get the estimate of f and g.

Measuring f is straightforward. For an n-qubit system, the total number of elements in the Pauli basis is 4n−1 (without the identity term). If the target state ρ f has some decomposition, say, \(\rho _f = \mathop {\sum}\nolimits_{\gamma = 1}^{\cal G} x_\gamma P_\gamma\) with respect to the Pauli basis, then the fitness function is

$$f = {\rm{tr}}\left( {\tilde \rho \rho _f} \right) = \mathop {\sum}\limits_{\gamma = 1}^{\cal G} x_r{\rm{tr}}\left( {\tilde \rho P_r} \right).$$

Here, \(1 \le {\cal G} \le 4^n\) denotes the number of nonzero components, P γ is the γ-th element of the Pauli basis, and x γ is its corresponding coefficient.

Therefore, \({\cal G}\) experiments are required to estimate f. In the γ-th experiment, we just need to apply the control field to the initial state ρ i and measure the expectation value 〈P γ 〉 of \(\tilde \rho\). For a generic ρ f that contains all \({\cal G} = 4^n - 1\) Pauli terms, measuring f in experiment is equivalent to carrying out full state tomography, and is thus inefficient. However, many tasks require the creation of a simple target state where \({\cal G}\) is quite small. For instance, if we aim to prepare the 12-coherent state ρ f  = Z12, one measurement is sufficient to obtain f.

Measuring g requires us to realize the commutator \([\sigma _{x,y}^k, \cdot ]\) inside Eq. (7). In fact,24

$$\left[ {\sigma _{x,y}^k,\rho } \right] = i\left( {{\cal R}_{x,y}^k\left( \rho \right) - \overline {\cal R} _{x,y}^k\left( \rho \right)} \right),$$

in which \({\cal R}_{x,y}^k\) and \(\overline {\cal R} _{x,y}^k\) mean a π/2 rotation and −π/2 about x or y axis on the k-th qubit, repsectively. By substituting Eq. (10) into Eq. (7), we get

$$\begin{array}{ccccc}\\ g_{x,y}[m] = & \Delta t\mathop {\sum}\limits_{k = 1}^n {\rm{tr}}\left\{ {\left( {U_{m + 1}^M{\cal R}_{x,y}^kU_1^m} \right)(\rho _i) \cdot \rho _f} \right\}\\ \\ & - \Delta t\mathop {\sum}\limits_{k = 1}^n {\rm{tr}}\left\{ {\left( {U_{m + 1}^M\overline {\cal R} _{x,y}^kU_1^m} \right)(\rho _i) \cdot \rho _f} \right\}.\\ \end{array}$$

The terms on the right-hand side are very similar to the measurement of f in Eq. (6), and the only difference is the local ±π/2 pulse inserted between slices m and m + 1. Explicitly, the m-th component of g x,y is a weighted sum of \(4n{\cal G}\) measurement quantities, where 4 comes from the ±π/2 pulses about the x and y axes, n from the sum over all the qubits, and \({\cal G}\) from the measurement of f. In each experiment, compared to the way of measuring f, we just need to insert a local π/2 pulse after the m-th slice evolution. Provided that all the qubits are well individually addressed, high fidelities are attainable in implementing these local π/2 rotations.

In summary, we need \(4n{\cal G}M\) experiments in total to perform the gradient measurement, which is linear in the number of qubits.

Experimental MQFC optimization

Now we turn to the experiment where the MQFC optimization is used to create the 12-coherent state in the 12-qubit quantum processor. First, let us clarify that all other pulses except the MQFC pulse throughout our experiments are local rotations, which are generated from a subsystem-based gradient ascent pulse engineering (SSGRAPE) approach.16 It is a technical improvement of the original GRAPE for our particular implementation, but does not address its poor scalability issue (see Appendix D, See Supplementary information). What makes the MQFC scheme remarkable is that, it does not involve the computationally expensive classical simulation of the 212-dimensional quantum dynamics in the course of optimization.

For our optimization task, GRAPE is a powerful tool, but handling 12 qubits is near the limit of capability for a typical laptop computer. In contrast, MQFC is capable of overcoming this difficulty in certain cases. Taking our experiment as an example, MQFC is able to solve the problem of finding a control field that evolves single-coherence ZI11 into 12-coherence Z12 in a time that scales linearly with the number of qubits. The entire experimental procedure is depicted in Fig. 2c, with a step-by-step description in Appendix E (See Supplementary information).

First, we prepare 7-coherence Z7I5 on the seven 13C spins, using the sequence in Fig. 2c before the MQFC optimization box. This procedure, benchmarked in our previous work,28 is mainly done with the aid of SSGRAPE. Subsequently, we create Z12 via MQFC on the quantum processor, which is the main focus of this work. We attempt to optimize a control field, namely a shaped radio frequency (r.f.) pulse, to evolve the system from the input ρ i  = Z7I5 to the output ρ f  = Z12. Our control field, as shown in the MQFC optimization box, is comprised of three sub-pulses to realize local rotations, and two free evolutions to let 13C qubits interact with 1H qubits for the purpose of generating higher coherence. The whole control field is digitized into M = 278 slices with Δt = 20 μs width, while 110 slices are for three sub-pulses and 168 slices remain zero to realize the two 1.68 ms free evolutions (Appendix E, See Supplementary information). The total dynamics of the pulse is given by \(U_1^M\) in Eq. (3).

The fitness function is defined as \(f = {\rm{tr}}(\rho _f\tilde \rho )\), a metric for the control fidelity, where \(\tilde \rho = U_1^M(\rho _i)\) is the experimental state and ρ f  = Z12 is the target. In our experiment, only one measurement of the expectation value of 〈Z12〉 suffices to attain f after each iteration. If f does not hit our preset value with the current control field, we navigate the control field along its gradient g. In fact, to measure g x [m] (the same for g y [m]) which is the gradient of slice m, we just need three steps: insert a local ±π/2 pulse on every qubit about x-axis between slice m and m + 1; apply this new control field to the initial state ρ i and measure f (see Fig. 2b); compute g x [m] by directly combining these ±π/2-inserted results via Eq. (11). As long as accurate local ±π/2 pulses are available for each qubit, g can be measured on a quantum processor. In experiment, we have designed a 1 ms π/2 pulse on every 13C nucleus with the simulated fidelity over 99.7% (Appendix D, See Supplementary information). Having the gradient, we can update the control field and continue the MQFC procedure until a desired f is attained.

Direct observation of 12-coherence

After the preparation of the 12-coherent state, the next step is to observe it. In NMR spectroscopy, multiple coherence is hard to be observed directly in a one-dimensional spectrum, i.e., by flipping the target spin to the xy plane while others remain in Z. If all coupling between the target spin and other spins can be resolved, such observation is feasible. For example, in a two-qubit system, we can flip spin one to X to observe ZZ. In fact, XZ can be written as

$${\rm{XZ}} = {\rm{X}} \otimes \left| 0 \right\rangle \,\left\langle 0 \right| - {\rm{X}} \otimes \left| 1 \right\rangle \,\left\langle 1 \right|{\kern 1pt} .$$

The first term X|0〉〈 0| leads to a positive peak at ν 1J 12/2 in the spectrum, as the J-coupling term shifts the frequency of qubit 1 by −J 12/2. Analogously, the second term X|1〉〈 1| leads to a negative (due to the minus sign before the term) peak at ν 1 + J 12/2. Generally, these two peaks can be resolved in the spectrum as long as J is large enough to separate them in frequencies. However, to observe multiple coherence, this requirement is of great challenge, since all J-couplings between the target spin and other spins should be sufficiently large to prevent the annihilations of positive and negative peaks. As a result, two-dimensional spectra and special techniques are usually employed to observe multiple coherence in conventional NMR spectroscopy.

For the purpose of NMR quantum computing, it is certainly better if one can read out multiple coherence directly in a one-dimensional spectrum, as one-dimensional spectrum reflects the state information more intuitively and reduces experimental running time remarkably compared to the two-dimensional spectroscopy. In our 12-qubit processor, although there are a few couplings as small as 0.01 Hz (Appendix C, See Supplementary information), a direct observation of 12-coherence Z12 is still available on C7. Figure 3a exhibits a strong agreement between experimental observation 12-coherence with merely 32 scans and the simulation, after rescaling the experimental result by 1.21 times to compensate for decoherence. To the best of our knowledge, our experiment is the first direct observation of multiple coherence beyond ten spins, and provides a valid evidence that our 12-qubit processor possesses excellent individual controllability and the potential to be a universal 12-qubit quantum processor.

Fig. 3
figure 3

Experimentally created 12-coherence using MQFC. a Direct observation of the created 12-coherence in one-dimensional NMR spectrum (red), where C7 is the probe qubit. Simulated spectrum (blue) is also plotted. The experimental result is rescaled by 1.21 times to compensate for the decoherence effect for better visualization. b Spectra of 12-coherence after each odd iteration during the MQFC optimization. Unlike the direct observation, a readout technique is applied to gain a higher resolution. A color scale indicates peak intensities. The height of the peaks is proportional to the value of created 12-coherence. c Comparison between GRAPE (blue) and MQFC (red) optimizations, both in simulation (solid; without decoherence accounted) and experiment (dashed). F dec is the numerical simulation of decoherence during the 12-coherence creation. Compared to the GRAPE algorithm, MQFC optimization is worse in simulation, but better in experiment. The error bars are plotted by the infidelity of the readout pulse. d Results at iteration 9. The experimental 12-coherence reaches 0.795 using MQFC which approaches the F dec = 0.824 bound, while GRAPE only leads to 0.703 (i.e., 0.121 lower than F dec) in experiment

Readout sequence

Although the direct observation of 12-coherence with 32 scans in Fig. 3a demonstrates our control precision, it is not suitable for the many experimental runs during the optimization since 32 scans leads to a great time cost. One solution is to decouple the five 1H spins to boost the signal-to-noise ratio (SNR) by 25 = 32 times, which exactly compensates for the required scan number. We have designed a readout pulse sequence to realize it as shown in Fig. 4.

Fig. 4
figure 4

Readout sequence to boost the SNR of the C7 spectrum. It transforms the 1H spins from Z to identity and thus enables the decoupling of 1H channel. The phase correction compensates for the chemical shift evolutions, after which all relevant spins are along the y-axis. In principle, this technique improves the SNR by a factor of 32, and makes the measurement of f or g practical using one scan

The local pulses in the readout sequence are computed by SSGRAPE, and the sequence is implemented before every measurement. The phase correction is a z-rotation to neutralize the unwanted chemical shift rotation during the free evolution. If the state is Z12, the five 1H spins will be evolved to the identity state after the readout sequence, and the decoupling of 1H leads to the C7 spectrum as shown in Fig. 3b, which is measured with a single scan. We then use spectrum fitting to obtain the signal’s amplitude and phase, and thus the value of 〈Z12〉.

This readout sequence induces errors in terms of decoherence and pulse imperfections. For the former one, through our simulation we find that it leads to about 30% signal loss, which is reasonable since multi-coherence is exceptionally vulnerable to decoherence. Therefore, this factor is taken into account for all the measurement results, that is, the measured values are rescaled by about 1.3. With respect to the pulse imperfection, it consists of two parts: the imperfection of the sequence itself, i.e., some approximations when we design this simple readout sequence, and the infidelities in implementing the pulses. In total, 3.5% error arises in simulation. We use this value as the uncertainty of the experimental value of 〈Z12〉, namely, the error bars in Fig. 3c.

Experimental results

Figure 3b shows the spectrum of \(\tilde \rho\) after the readout stage for each odd iteration. The peak intensities correspond to the value of \(f = {\rm{tr}}({\rm{Z}}^{ \otimes 12}\tilde \rho )\), which clearly shows that MQFC increases f during the optimization. This demonstrates that MQFC is a practical technique for designing control fields in large quantum systems.

Our experiment also exhibits MQFC’s ability of correcting unknown experimental errors. To demonstrate this improvement, we implement another group of 12-coherence-creating experiments, where all experimental settings are the same except that the pulse is generated from the classical SSGRAPE method other than the MQFC approach. We then compare these two groups of experiments. Figure 3c illustrates the result of SSGRAPE and MQFC pulses both in simulation and experiment. Focusing on the final result at iteration 9 in Fig. 3d, in experiment SSGRAPE finally creates a 12-coherence with f = 0.703 ± 0.034, whereas MQFC pulse creates f = 0.795 ± 0.027. This experimental improvement (nearly 10%) disagrees with simulation, as in simulation MQFC (0.830) is even worse than SSGRAPE (0.931).

Considering that MQFC is a feedback-control process, some incomplete knowledge of the experimental quantum process, such as the nonlinearity of the pulse generator or imprecision of the molecular Hamiltonian, may be inherently corrected during the optimization. Indeed, the experiment clearly suggests that MQFC is advantageous in terms of correcting errors from unknown sources. Furthermore, we simulate the decoherence effect during the procedure, and find that the upper bound of \({\rm{tr}}({\rm{Z}}^{ \otimes 12}\tilde \rho )\) in the presence of dephasing noise is about 0.824 (see Methods). Note that our MQFC result finally reaches 0.795, which is very close to this bound, demonstrating that our control of this 12-qubit processor is close to the theoretical prediction after accounting for decoherence.



One major concern about control methods is their scalability with the number of qubits n. Our MQFC protocol involves a single experiment to measure f and 4nM experiments to measure g for each iteration, where n is the number of qubits. Assuming each experiment takes τ exp time, the MQFC in total consumes T it  = (4nM + 1)τ exp for each iteration. For comparison, one has to deal with massive 2n × 2n matrix multiplications and exponentials using GRAPE on a classical computer. The speed-up comes from the fact that MQFC utilizes the evolution of the quantum system instead of computing the system’s dynamics when evaluating f and g.

For other potential problems when scaling up the GRAPE technique, MQFC confronts similar difficulties, such as how to effectively represent a generic target state, how to choose a good initial guess, how to determine the pulse parameters before optimization, and how many iterations are needed to reach a satisfactory fidelity. Unfortunately, experimental observation of running time vs. number of qubits is not likely in NMR, since changing the number of qubits would usually require a different sample with different characteristics. So we cannot experimentally compare the scaling of MQFC vs. GRAPE, instead we must be satisfied with the fact that MQFC performs well at the 12-qubit level and should theoretically scale better than GRAPE under standard assumptions. See Appendix A (See Supplementary information) in for details.

One may also ask if there could be other classical algorithms that scale as well (or better than) MQFC. This question remains open, but it seems very unlikely—the gradient calculation is based on the dynamics as shown in Eq. (3), i.e., the expected classical algorithm needs to simulate the dynamics of an NMR system in an efficient way. Even when boiling down to our particular state engineering task, as far as we can tell, there is no employed numerical method29,30,31 to simplify such an optimization, despite extensive work on the subject since the early days of experimental quantum computing. Moreover, MQFC can correct unknown errors to some extent, while open-loop algorithms should require knowledge about the noise spectrum in advance, which is usually impractical for large quantum systems. In this sense, another potential application of MQFC is to demonstrate the quantum computing supremacy,32 where initial endeavors have been made in other systems, for example in a recent five-photon boson sampling experiment.33

Optimizing clifford gates

While our experiment focuses on state engineering, MQFC can also be used for other quantum optimization tasks. As an example, we consider optimizing the pulse sequence for a generic Clifford gate. It is possible to use twirling to estimate the average gate fidelity of a Clifford gate efficiently.28 The twirling protocol is based on finding the fidelity between experimental states following the pulse sequence and the corresponding desired states following the ideal gate. In principle this should be done for a complete set of initial states, but a randomized protocol can be used to approximate the gate fidelity with a constant number of experiments. The MQFC protocol can be modified to extract the desired fidelities and optimize the pulse sequence accordingly (details in Appendix B, See Supplementary information). Note that right after our work, a five-qubit implementation of a different quantum algorithm for gate optimization was reported.34

Comparison with previous work

MQFC was originally introduced in Ref. 24 where it was implemented on a 7-qubit NMR processor. There are two significant improvements in our work. First, our work clearly demonstrates the superiority of MQFC in correcting unknown errors with around 10% fidelity boost compared to the best classical optimization result, while in the 7-qubit experiment no improvement was observed. The reason could be that the characterization of a 7-qubit system is much more accurate than a 12-qubit one, indicating that MQFC should be more powerful when dealing with large systems as the knowledge of larger systems are more likely to be incomplete. Second, our 12-qubit experiment lies at the cutting edge of present experimental quantum computing, and the capability of individual controls at this qubit number is state-of-the-art. As a comparison, in a recent work,35 the 10-qubit entanglement in a superconducting circuit is created with fidelity 0.668 using global control. Moreover, we demonstrated that at the 12-qubit level, the algorithm is already fast enough to justify its use as a tool in the lab.

In summary, we have created a 12-coherence state on an NMR quantum processor using MQFC. Our experimental procedure and result, in particular the direct observation of 12-coherence with one qubit as the probe, signify the capability of our quantum processor to serve as a universal 12-qubit quantum processor with high-fidelity individual controls on each qubit. In terms of control field optimization, our experiment demonstrates two superiorities in efficiency and experimental performance of MQFC beyond its classical counterpart. MQFC requires a running time that scales linearly with the number of qubits, and yields about 10% improvement compared to the best result via classical optimization. This optimization approach could be exceptionally useful in a large system with incomplete characterization, and is readily transferrable to other systems such as superconducing circuits or nitrogen-vacancy centers in diamond. We expect that, as experiments involving more than 10 qubits become more common, quantum feedback methods such as MQFC will become standard tools in quantum computing labs.


To numerically simulate the decoherence effect in our 12-qubit system, we first make the following assumptions: the environment is Markovian; only the \(T_2^*\) dephasing mechanism is taken into account since T 1 effect is negligible in our circuit; the dephasing noise is independent between all qubits; the dissipator and the total Hamiltonian commute in each pulse slice as Δt = 20 μs is small. With these assumptions, we solve the master equation in two steps for each Δt: evolve the system by the propagator in Eq. (3), and subsequently apply the dephasing noise for Δt which is an exponential decay of off-diagonal elements in the density matrix. The typical length of simulating our 12-qubit experiment in the presence of dephasing noise is in the magnitude of days on a desktop computer. The simulation shows that at most F dec = 0.824 of Z12 can be achieved with the 5.56 ms MQFC pulse applied on Z7I5, which is reasonable as high-order coherence is very vulnerable to the dephasing noise. Alternatively speaking, the upper bound of the MQFC experimental result is 0.824, since the optimization procedure does not include the function of robustness against dephasing noise yet.

Data availability

The data sets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.