Probe optimization for quantum metrology via closed-loop learning control

Experimentally achieving the precision that standard quantum metrology schemes promise is always challenging. Recently, additional controls were applied to design feasible quantum metrology schemes. However, these approaches generally does not consider ease of implementation, raising technological barriers impeding its realization. In this paper, we circumvent this problem by applying closed-loop learning control to propose a practical controlled sequential scheme for quantum metrology. Purity loss of the probe state, which relates to quantum Fisher information, is measured efficiently as the fitness to guide the learning loop. We confirm its feasibility and certain superiorities over standard quantum metrology schemes by numerical analysis and proof-of-principle experiments in a nuclear magnetic resonance (NMR) system.


INTRODUCTION
Much of quantitative science deals with measuring a certain parameter, say φ, of a physical process precisely. Typically, this involves subjecting suitably engineered probe states to the physical process, and using measurement readout to recover an estimate of φ. The central limit theorem states that repeated applications of this procedure can improve our estimate, such that the resulting standard error scales as 1/ √ N in the number of particles N . Remarkably, quantum technologies allow us to surpass this standard limit. By using suitably entangled probes, we can reach the Heisenberg limit -suppressing ∆φ such that it scales as 1/N 1-3 . This quadratic scaling advantage can drastically reduce the resources required for precision measurement, and continues to catalyze rapid developments in the field of quantum metrology [4][5][6][7][8][9][10][11][12][13][14][15] .
While quantum metrology is well understood at the theoretical level, its physical application to large-scale quantum systems faces significance challenges [16][17][18][19][20][21][22][23] . Consider the iconic task of estimating the phase φ of some unitary process U φ = e −iHφ . In this setting, theory tells us how to determine the optimal quantum probe ρ for any possible Hamiltonian H. However, when H acts on a many-body system, this optimal probe is typically a complex, entangled many-body state. Engineering this probe is often non-trivial, especially in the advent of limited access to the physical operations used to synthesize such probes 16 . Meanwhile, many realistic means of initializing such probes involve applying a sequence of controls, whose operational effects are not fully characterized 21,[24][25][26] . These issues are further exacerbated by the exponentially growing size of the Hilbert space -making direct implementation of complex metrological schemes extremely challenging.
Here, we propose a closed-loop learning protocol that circumvents these issues. The resulting protocol has the following desirable features: (a) It does not require us to analytically solve for the optimal probe -nor possess prior-knowledge of how this probe can be synthesized from available physical controls. These aspects are optimized through the learning process. (b) It does not require us to know the precise effects of these physical controls, nor implement tomography on the resulting quantum probes. (c) It does not require any computation involving matrix representations of H, avoiding the curse of dimensionality. These features combined allow a versatile procedure for finding improved metrology protocols, ideal for complex many-body settings. We demonstrate a proof-of-principle experiment using nuclear magnetic resonance (NMR), illustrating the viability of this approach with present-day quantum technology.

Framework of closed-loop learning assisted quantum metrology
Here, we consider estimating the phase φ of a general N -body unitary process U φ = e −iHφ . Each metrology protocol begins with a probe initialized in some easily prepared state ρ 0 on N probestypically a product state where each probe is initialized in some default state |0 . The goal then is to implement a control -a sequence of physical operations that we apply to the N -probe system, transforming ρ 0 to some candidate entangled state ρ C . By acting U φ on each probe, we end up with ρ φ , that encodes information regarding φ (See Fig. 1a). The efficacy of each candidate probe ρ C is typically quantified by the quantum Fisher information F Q 2 . The rationale is that after repeating this process through ν independent runs , the standard error to which we can estimate φ is bounded below by 1/ νF Q . This lower bound is tight, and can always be saturated using an ideal measurement scheme.
The particular benefit of quantum metrology is that use of suitably entangled ρ C enables one to reduced uncertainty of φ much more quickly than conventional strategies. One iconic case, for example, is when U φ corresponds applying an identical unitary process e −iHφ to each individual probe. In such scenarios, use of non-entangled ρ C results in F Q that scales linearly with N , such that sensing φ to some desired ∆ requires N > O(1/∆ 2 ) probes. In contrast, if ρ C is appropriately entangled, F Q that scales as N 2 , enabling the Heisenberg limit scaling of N > O(1/∆). The goal of quantum metrology can thus be split into two distinct tasks.
1. Determine the control sequence C that synthesizes some near optimal state ρ C whose corresponding quantum Fisher information F Q (ρ C ) is made as large as possible.
2. Use the control sequence to synthesize ρ C , which can then be injected as input to U φ for purposes of estimating φ.
Here, our primary is the first task, with understanding that our resulting control sequences can be used to synthesize the appropriate states to perform metrology. This is highly non-trivial for general H. Notably, the dimensions of H grow exponentially with N , making analytical methods for finding the optimal ρ C computationally intractable. Meanwhile, C is described by an ordered list of readily accessible elementary operations (e.g., pulse sequences). Inferring how these can be chained together to generate a given ρ C is generally highly non-trivial, especially when the exact physical effect of each elementary operation on the probe state is not known. Typical means of optimizing F Q (ρ C ) are further hampered by difficulty in evaluating the efficacy of a candidate control sequence C. Given ρ φ , evaluation of the corresponding efficacy F Q (ρ C ) involves an optimization over all possible measurement bases -a task whose complexity also scales exponentially with system size.
In our protocol, we first tackle the difficulty in evaluating efficacy by using relations between quantum Fisher information and purity loss. Let ρ avg = P x ρ φ+x dx, where P x is some probability distribution with mean 0 and standard deviation ∆x. Meanwhile setting ∆γ(∆x) = Tr(ρ 2 C ) − Tr(ρ 2 avg ). Then recent results 27 established that in the limit where ∆x 1, the quantum Fisher Physically ρ avg represents the resulting ensemble state when the aforementioned metrology procedure is applied to a unitary U φ such that φ undergoes stochastic fluctuations of magnitude ∆x. Thus ∆γ(∆x) captures the purity loss of the resultant state induced by these fluctuations. Eq.
1 then states that the efficacy of a metrology protocol is bounded below by the rate in which its output state loses purity when subject to stochastic noise in the parameter we are trying to sense.
Therefore, we can effectively use F L Q as a proxy for the efficacy of a probe.
The advantage is that purity loss is far more amendable to direct measurement than quantum Fisher information 28 . To evaluate the efficacy of a candidate C, we apply two pairs of the control sequence in parallel to obtain two copies of ρ C . The rate of purity loss of the resulting outputs when subject to stochastic noise on φ can then be experimentally measured by application of suitable controlled-SWAP gates -coherently swapping output pairs controlled on an ancillary quantum mechanical degree of freedom (See Fig. 1c). We refer to this quantum algorithm as the quantum efficacy estimator, which can now be coupled with a suitable closed-loop learning algorithm for automated discovery of increasingly effective control sequences for sensing φ (See Fig. 1b).
In practice, we can use many different learning algorithms, ranging from simple direct search algorithms 29 to more complex evolutionary algorithms 30 . Here, we found the Nelder-Mead algorithm 31 to be particular effective for our experiments. The entire learning process can then be 6 summarised as follows: we begin by initializing a population of n+1 control sequences at random, and make use of the quantum efficacy estimator to sort them in the order of decreasing efficacy, n+1 }, with g = 0 indicating the 0 th iteration. Meanwhile n is generally chosen to scale linearly with the number of actions (controls) we can apply in each particular time-step (e.g., if we have access to local rotations along x and y axis on N qubits, then n scales linearly with N ). Once the initial population is set, the Nelder-Mead algorithm then stipulates a systematic method to generate a new candidate control through geometric considerations -which replaces the worst performer C (g) n+1 to form the population in the next iteration, which is then again sorted by decreasing efficacy to obtain C (g+1) . The exact mechanics of this algorithms involve mapping each control sequence into a vertex in some suitable convex space, the details of which are found in Methods. At each iteration g, the control sequence with maximum purity loss is denoted as C (g) . The procedure is then continued until some designated stopping condition, such as when the purity loss of C (g) becomes sufficiently stable over multiple rounds, or when a set number of iterations are reached. Once the stopping condition is hit, the C (g) with maximum efficacy is delivered as the recommended control sequence.
This optimization process scales as a polynomial with respect to the number of free parameters that specifies a candidate control sequence. This latter condition is typically true for realistic settings, where (1) we are typically limited to one and two-body interactions, such that the number

Example of sensing with spin chains
We illustrate these advantages numerically for a scenario with spin chains featuring unavoidable spin-spin interactions. Consider the case where we have access to N -qubit spin chains, and wish to use them to estimate the strength of some external magnetic field in the z axis. This problem aligns with estimating the phase φ of an N -qubit unitary the angular momentum operator of the i th spin. If non-entangled qubits are used, the achieved Fisher information will scale as N . In contrast, the use of appropriately entangled probes can lead a quantum Fisher information F Q that scales as N 2 , enabling us to achieve the Heisenberg limit.
While theory would enable us to work out the optimal probe, the catch here is that our control on the spin system is limited. Firstly, the chain evolves naturally according to nearest-neighbor Ising coupling H S = 2πJ N i=2 I i−1 z I i z (J is the coupling strength). Secondly, our control of the system is limited to a sequence of local I x and I y interactions, whose strength we can adjust M times.
What is then the optimal way to adjust our control fields? The question is non-trivial.
The goal is to find some near optimal sequence, such that from some easily prepared initial probe |Ψ i the resulting probe state |Ψ f = U C |Ψ i is near optimal for estimating φ.
For sufficiently low N (of up 7), it is feasible to simulate our algorithm classically. In Fig.   2a,b, we plot the maximum F L Q and F Q with respect to the particle number for two possible choices of ∆x. Observe that the purity loss becomes a better proxy for quantum Fisher information when ∆x is reduced, in agreement with Eq. 1. Indeed, at (∆x) 2 = 0.001, the relationship is almost exact. As such, our learning protocols produce results within 1% of the Heisenberg limit when using (∆x) 2 = 0.001.
To further verify the effectiveness of our algorithm, we compare the results of our algorithm with that of the theoretical optimal. In this specific case, theory indicates that the N -party entangled NOON states |Ψ t = (|0 ⊗N + e iθ |1 ⊗N )/ √ 2 are the optimal probes -saturating the Heisenberg limit 35 . Executing our closed-loop learning algorithm, we note that the learned optimal probe state |Ψ f closely approximates these NOON states. In Fig. 2c, we list the fidelity Ψ t |Ψ f between |Ψ f and theoretical optimal |Ψ t . For small qubit numbers, the agreement is complete (fidelity = 1).
While limitations in computational resources (in evaluating F Q for example) do slowly degrade the fidelity as we increase particle number, there is still a match of over 0.95 when N = 7.
The algorithm, itself, however, is not designed to run purely on classical computers. Indeed, the computational costs to do so scale exponentially with N . Tracking the dynamics of controls, and resulting purity loss becomes quickly intractable. However, when the algorithm is executed on a quantum processor, such information does not need to be tracked. In particular, we do not need to know the mathematical descriptions of the controls, nor the strength of the internal spin-spin interactions.

Proof-of-principle experiment
Our proof-of-principle experiment was conducted on a Bruker Avance III 400 MHz spectrometer using the sample Diethyl-fluoromalonate at room temperature. This three-qubit nuclear magnetic resonance processor (MNR) consists of three spins 13 C, 1 H and 19 F. Label these as qubits 1, 2 and 3. This process enables us to engineer controlled-SWAP gates that coherently swaps between qubits 2 and 3, controlled on qubit 1 (see Methods and Supplementary Note 3). Thus provided we can initialize both qubits 2 and 3 in a designated state ρ, we can experimentally measure the purity Tr(ρ 2 ) 28 . This processor is thus capable of realising the quantum efficacy estimator for single qubit probes.
We illustrate the use of this device to estimate φ, encoded within the single qubit unitary Here the probe state is a single spin, which we can rotate along x and y directions.
Assume M total pulse segments, each candidate control sequence is now described by 2M free The resulting propagator could be expressed as Note that we have omitted the ∆t[m] which was present in numerical simulation for the general N case, as the lack of a drift Hamiltonian makes this unnecessary. Our goal is then to find a control sequence C that such that U C |0 has maximal quantum Fisher information with respect to φ.
We implement our closed-loop learning algorithm with a population of n = 7, and M = 3 pulse segments. Each pulse sequence was set to T = M τ = 30 µs. The key difference here from numerics is that the efficacy is now evaluated directly using our NMR processor. For a particular candidate control sequence C, we first initialize each of qubits 2 and 3 of our processor into the state |0 . The control sequence C is then applied to both qubits, setting them each to some resulting candidate probe state ρ C . Application of the controlled-SWAP circuit then enables estimation of Fig. 1c).
Determination of γ avg = Tr(ρ 2 avg ), requires us to simulate the effects of applying U φ+X , where X is Gaussian distribution with standard deviation ∆x. This is a little more complex in the NMR regime, but can be done using a variation of stratified sampling (see Methods). Once done, we can then directly evaluate the efficacy estimator ∆γ = γ C − γ avg (see Fig. 1b). Thus our NMR processor is able to function as an effective quantum efficacy estimator.
This gives us all the tools in place for a quantum assisted closed-loop learning algorithm. To begin, we generated a random selection of 7 control sequences, denoted as C (0) . By evaluating their efficacy using the NMR processor, and feeding results into the Nelder-Mead algorithm, we can systematically produce subsequent populations C (1) , C (2) , . . .. We emphasize that the entire procedure was fully automated, such that this procedure can proceed ad-infinitum without intervention till stopping conditions are met.
In our experiment, we set the stopping condition as g = 25. To verify that optimizing purity loss indeed optimizes the efficacy of the probe, we experimentally extracted the best candidate probe state, ρ  Fig. 3a illustrates candidate probes at various iterations, illustrating how our controls quickly converge on engineering probe states that are maximal coherent with respect the computational basis -the requirement for a probe to be optimal for estimating φ.

DISCUSSION
Here, we proposed a quantum enhanced machine learning protocol for synthesizing effective probes for the purposes of quantum metrology. The protocol enables an automated method to discover what control sequences one should apply to many-body quantum system -in order to steer into a state ideally suited for probing the phase φ of some unitary process e −iHφ . We experimentally realized a proof-of-principle experiment using a 3-qubit NMR processor, where the device was able to discover control sequences which prepare probe states whose sensitivity to a desired φ (as measured by quantum Fisher information) is within 1% of theoretical optimal values.
Our numerics indicate this methodology can remain effective when engineering probes involving a large number of entangled qubits -even when these qubits possess uncontrollable spin-spin interactions.
There are a number of open questions. The first is the issue of noise. One of the benefits of our approach is that it automatically accounts for noise during the control process, and naturally finds the optimal control sequence that accounts for such noise. However, the evaluation of purity loss does require the addition of an extra controlled-SWAP gate, and extra noise introduced at this stage can potentially skew the results. Fortunately, our analysis (see Methods) demonstrates that the protocol is highly resistant to one dominant source of noise in NMR -dephasing, such that any amount of dephasing noise can be corrected for by repeating our purity estimation protocol by some fixed number that does not scale with the size of the system. Sensitivity to other noise sources needs further investigation, and will likely require full tomographical data of the experimentally 13 realized controlled-SWAP gate to correct.
As with all learning algorithms for solving intractable problems, there are of course caveats.
The main one is that our algorithm will not always efficiently find the optimal probe. Like all optimization processes, the Nelder-Mead algorithm can be potentially trapped in local optima.
Thus one particular important line of future study would be the performance landscape of purity loss. In instances where this landscape is not ideal, our techniques can support multiple pathways for modification. Nelder-Mead, for example, could be replaced with genetic algorithms, neural networks or other means of machine learning [37][38][39] . Meanwhile, there may exist other indicators of efficacy that outperform purity loss in certain settings. Thus our closed-loop architecture could be modified to incorporate many possible alternative means of quantum-aided probe design.
Meanwhile, there will always be an ultimate limit to such learning algorithms. The reason is there is a polynomial equivalence between time-complexity in optimal control and quantum gate complexity 40,41 . Coupled with knowledge that most quantum circuits cannot be efficiently decomposed into fundamental gates, this means that the optimal probes can easily lie outside the set of states that can be synthesized through a control sequence with free parameters that grow as a polynomial of N . In such instances, an ideal solution simply does not exist. However, such situations may in fact represent scenarios where such learning protocols are most useful -for its optimization represents all control sequences that can be implemented in some bounded amount of time. As such the solution presented could be a good approximation for the best quantum probe we can synthesize with limited computation power.

Purity measurement in NMR
To establish the purity of ρ (g) avg , we made use of stratified sampling. Let x k be drawn by the stratified sampling method from the discretized Gaussian distribution with K samples and a variance of (∆x) 2 = 1.0721.
In our experiments, we divided the Gaussian distribution into K = 9 stratas, such that can then be estimated as follows: Hence estimation of the purity Tr[(ρ (g) avg ) 2 ] was achieved by measuring the purity of each term of Tr(ρ x j ρ x k ) using the scheme of Fig. 1c, where qubit 2 and 3, respectively, were prepared in ρ x j and ρ x k .

The Nelder-Mead algorithm
The Nelder-Mead algorithm functions by performing a series of geometric transformations on a simplex iteratively to get closer to the optimal control sequence. The simplex is a geometric 15 shape consisting of n + 1 vertices, and each vertex represents a candidate control sequence C i with i = 1, 2, ..., n + 1. Here, n should be the product of the directions of the control sequence and and its sliced numbers. Note that n is closely related to the number of vertices. Based on the following defined performance function (relate to the efficacy estimator) with respect to each candidate control, namely f (C i ) = 1−∆γ(ρ C i ), this algorithm attempts to replace the worst vertex by a new better one according to the geometric transformations reflection, expansion, contraction and shrinkage. Concretely, we describe the procedure of the Nelder-Mead algorithm used in this study.
Step 1: Randomly generate an initial simplex with vertices {C 1 , C 2 , · · · , C n+1 } and calculate their The amplitude of C i in each slice is set in the range [−1000, 1000].
Step 2: Sort the vertices so that f (C 1 ) ≤ f (C 2 ) ≤ · · · ≤ f (C n+1 ), calculate the centroid of the best n points byC = n i=1 C i .
Step 3: Calculate the reflected point, C r =C + α(C n+1 − C n ), evaluate the performance function where the reflection factor is set as α = 1.
Step 4: Replace the worst vertex C n+1 and its performance function f n+1 by the generated better one according to one of the following conditions: (2) if f r < f 1 . Calculate the expanded point C e =C + γ * α(C n+1 − C n ), evaluate its performance where the expansion factor is set as where the contraction factor is set as β = 0.5. Let where δ is the shrinkage factor and set as 0 where the contraction factor is set as β = 0.5.
where δ is the shrinkage factor and set as 0.5.
Step 5: Check the stopping conditions, if not satisfied, change the iteration number with g = g + 1 and continue at Step 2.

Effects of Decoherence
Here we analyze the effect of decoherence in algorithm. This is because our process for benchmarking the efficacy of a control sequence makes use of the same quantum device that will be used to during the actual metrological process. Specifically, each iteration of the learning algorithm can be casted as the following procedures: (A) Synthesize two copies of candidate probe states ρ C corresponding to a candidate control sequence C.
(B) Synthesize two copies of the state ρ avg , by first preparing a second pair of copies of the candidate probe state ρ C , and then applying the physical encoding process of parameter φ subject to stochastic fluctuations separately to each copy.
(C) Estimate the purity loss due to stochastic fluctuations by experimentally measuring the purity of the resulting states from step (A) and step (B).
We can now consider the impact of docoherence in each of these three steps. The first thing to note is that decoherence in step (A) and (B) respectively represent the intrinsic decoherence of our probe preparation device and that of the physical process it is trying to sense. As such, their inclusion in our learning process is actually desired. That is, as the device that is used to estimate the efficacy of the probes is the device that will eventually be used for metrology; we naturally want all decoherence that within this device to be accounted for while benchmarking the efficacy of candidate control sequences. A similar argument also holds for decoherence when applying the physical process, as this decoherence will also exist during sensing.
Given these considerations, the only undesired decoherence is that which occurs during estimation of purity loss (Step C). This procedure is done via the SWAP test, summarized as follows: (i) Take two copies of ρ C , and one ancillary qubit initialized in state |+ = (|0 + |1 )/ √ 2.
(ii) Apply a controlled-SWAP gate to swap the pair of ρ C , add a Hadmard gate to the ancillary qubit, and measure the expectation value I z of ancillary control qubit in the I z -basis (see Fig. 1c), to estimate Tr(ρ 2 C ).
(iii) Repeat the above procedure for ρ avg to estimate Tr(ρ 2 avg ).
(iv) The difference ∆γ = Tr(ρ 2 C ) − Tr(ρ 2 avg ) is then used to estimate the efficacy of the control sequence C.
Noise and decoherence during this procedure can affect the accuracy in which we estimate purity loss. In general, its effect is likely non-trivial, and tomography will be needed to work out what noise introduces to I z so that this error can be corrected for.
In the case of NMR, the dominant source arises from dephasing. This dephasing noise can be described by a non-unitary channel ε i (ρ) = (1 − p)ρ + 4pI i z ρI i z that acts on each qubit seperately, where I i z denotes the angular momentum operator acting on the i-th qubit and p is the strength of the dephasing. Following an error analysis similar to that of other NMR experiments that employ controlled gates 42 , we see that this noise does not change the relative order of our purity loss estimates. That is, provided there is a sufficient number of repetitions, our conclusion of which control sequence has greater purity loss between two candidates will not change under dephasing.
In particular, let I z p denote the expectation value of I z under dephasing strength p, then our measured purity has expectation value I z p = (1 − p) 2 I z with variance bounded above by 1.
To correctly compare two probe states whose purity loss differs by at most δ requires each purity measurement to have a variance less than δ 2 /4 (as differences in purity loss involve four additive purity measurements). This is guaranteed provided we repeat our measurement process of order Notably this overhead does not scale with N , and thus the protocol remains efficient. In our experiment, p is approximately 0.025, thus we are able to discern rank control sequences whose purity loss differ by more than 0.045.

DATA AVAILABILITY
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

COMPETING INTERESTS
The authors declare that there are no competing interests.

AUTHOR CONTRIBUTIONS
X.P. initiated the project. X.P., J.T. and M.G. conceived the basic procedure. X.P. and X.Y designed the experimental protocol. X.Y. carried out the experiment and analysed the data. All authors contributed to discussing the results and writing the manuscript.

ADDITIONAL INFORMATION
Supplementary Information is available for this paper.
Tr[(ρ avg ( g ) ) 2 ] closed-loop updated controls quantum simulator General procedure of quantum metrology, including probe state preparation (applying controls to the initial probe state to generate a candidate probe ρ C ), encoding some parameter φ by application of U φ to ρ C , and measurement read-out. b, A candidate control sequence C is evaluated for efficacy through quantum information processing. This involves using C to prepare copies of the candidate probe state ρ C , half of which are transformed into ρ avg . The purities of ρ C and ρ avg are then measured, and their difference -the purity loss -is used as a proxy for efficacy. c illustrates implementation of this process in experiment. The parameter encoding with fluctuations in the dotted box are switched off to determine the purity of ρ C , and on to determine the purity of ρ avg .
Their difference is fed into a classical computer running a Nelder-Mead algorithm that generates candidate control sequences for subsequent iterations. In our experiment, this process is automated, such that control fields are tuned automatically at each iteration. . We see that both begin at low values at g = 0 as expected for random probes, and improve markedly during the learning process. c and d illustrate that the efficacy of the discovered probes approaches the Heisenberg limit, indicating their near optimality. Meanwhile setting ∆x to be smaller seemed to be marginally more advantageous, likely owing to the closer agreement between purity loss and quantum Fisher information in this regime. In e, we show a table of the fidelity between the quantum probe states generated and the closest theoretically optimal probe. Supplemental Material to "Probe optimization for quantum metrology via closed-loop learning control" Note S1. Numerical simulation details when sensing with spin chains To show the advantages of the proposed protocol, we consider estimating the phase φ on N -qubit spin chains evolving nearest-neighbor Ising couplings H S = 2πJ N i=2 I i−1 z I i z , as described in the main text. Local control fields along x and y directions of each qubit are applied to search the optimal probe, therefore each slice of the control sequence can be expressed as where I i x,y,z are the spin angular momentum operators of the i th qubit and J is a settled coupling constant. In order to search the optimal controls more conveniently and easily, we transform the above evolution operator U m to another usual form ). Theory tell us the optimal states for this case can be expressed as the form of NOON states, i.e., |Ψ t = (|0 ⊗N + e iθ |1 ⊗N )/ √ 2. We have listed their overlaps in the main text, and here we list the non-zero matrix elements of ρ f = |Ψ f Ψ f |, i.e., ρ 11 f , ρ 1n f , ρ n1 f , ρ nn f , in Table. S1, where n = 2 N is the matrix dimension.  In our protocol, the to-be-estimated parameter φ is supposed to undergo some stochastic fluctuations. Though the types of these fluctuations can be quite diverse, providing that they are sharply distributed around φ [S1], a common and reasonable expectation should be Gaussian distribution. Here, we establish a systematic method to simulate this kind of Gaussian fluctuation. The density function for Gaussian distribution can be expressed as Y (x) = 1 √ 2π∆x e −(x−µ) 2 /2(∆x) 2 , the corresponding probability function is G(x) = 1 2 [1 + erf( x−µ 2(∆x) 2 )], and the inverse probability function is x = G −1 (z) = √ 2∆x * erfinv(2z − 1) + µ, where erf(·) is the error function and erfinv(·) is the inverse error function. In order to conveniently choose K samples from the distribution, we use the stratified sampling method. First, we divide the total probability into K equal ranges, thus the range bounds are t k = G −1 (k/K), k = 1, 2, ..., K − 1. Afterwards, we select one sample per range using the following way: Finally, K samples are successfully selected which can fit a very accurate target Gaussian distribution.
With the above Gaussian distribution simulation method in hand, we now choose the parameters ∆x and K. Consider single-particle probe states here, we show F L Q for several probe states (just take three examples) when the parameter φ undergoes Gaussian fluctuation of different strength ∆x in Fig. S1a. One can easily find that the controls will engineer different initial pure probe states to the same optimal one (maybe with some global phase). Therefore, the red dots show the biggest F L Q we can get for the chosen ∆x no matter what the initial probe state is. What's more, we should balance the biggest F L Q with the direct observation value, i.e., the purity loss ∆γ = F L Q (∆x) 2 /2, so we plot the corresponding maximum ∆γ with respect to (∆x) 2 in Fig. S1b. Both of F L Q and ∆γ should be big enough to get reliable and observable results, thus we choose (∆x) 2 = 1 in our experiments. On one hand, as we analyzed in the Methods, suppose we select K samples from the Gaussian distribution, the total number of the experiments required to measure Tr(ρ 2 avg ) is C(K, 2) + K = K(K − 1)/2 + K. This indicates that we can't choose too many samples as the complexity increases quadratically. On the other hand, the sample size K should be large enough to fit a reasonable Gaussian distribution while meeting the precision requirements. Using the sampling method described above, we set µ = 0, (∆x) 2 = 1 and vary the sample size in a very broad range. Finally, in consideration of the experimental time and the precision, we choose K = 9 to implement our demonstrative experiments. We show two cases K = 9 and  i<j J ij I i z I j z , where ∆ω i 0 represents the offset of the i-th spin in the rotating frame, J ij is the J-coupling strength between the i-th and j-th spin, and I i z is the spin angular momentum operator of i-th spin. Starting from the thermal equilibrium state, we used line-selective method [S2] to prepare a pseudo-pure state ρ pps = 1−ε 8 I + ε |000 000|, where ε ≈ 10 −5 represents the thermal polarization of this three-qubit system. As the identity matrix having no physical observable effects, this led to the system initial state |000 , by standard state tomography [S3], we confirmed the result with the fidelity being 0.992.
Decouple unwanted interactions. The parameter encoding operator U φ = e −iIzφ was operated on the probe spins. As we used a copy of the probe and an ancillary, the total evolution operator should be I ⊗ e −iφIz ⊗ e −iφIz . To realize the same evolution on the last two coupled spins, we used the decoupling techniques [S4] to cancel the effects of the J-coupling evolution. Based on the system Hamiltonian H S , the explicit pulse sequences to achieve this are shown in Fig. S2c, where the offset of these three spins are set as ∆ω 1 0 = 0, ∆ω 2 0 = ∆ω 3 0 = 50 * 2π rad/s. Thus, with some definite encoding time T e , the left free evolution of the spins 2 and 3 would induce an encoding parameter φ = 2π∆ω 2 0 T e = 2π∆ω 3 0 T e . For the settled parameter φ = π/3, T e = 0.003333 s.
Construct pulse sequences for circuits. We demonstrate the explicit quantum circuit of our proposed control sequence learning protocol and the parameter readout procedure in Fig. S2a. To realize these quantum circuits, we need to decompose every quantum gate into a sequence of basic operations that can be realized directly, which is closely related to the specific quantum architecture we are using. Any quantum circuit in NMR system can be decomposed into single-qubit rotations and free evolution of two-spin I z I z couplings. Based on the molecular structure and Hamiltonian parameters of the sample 13 C-unlabeled diethyl-fluoromalonate we used in the experiments, which is listed in Fig. S2b, the pulse sequence including the controlled-SWAP gate and two Hadamard gates, namely the purity measurement circuit, is shown in Fig. S2d.
State tomography for measuring F Q . As stated above, the quantum fisher information F Q (ρ (g) C ) of the best candidate probe state in each iteration was obtain by extracting the probe state ρ (g) C from a full three-qubit state tomography (use partial trace to obtain the density matrix of the probe state). Here, we show the three-qubit full tomography and the reduced single-qubit tomography results in the 1 st , 10 th , 20 th and 25 th iteration in Fig. S4. In each row of this figure, from left to right, the subfigures are the real part of the total three-qubit probe system state, the imaginary of the total three-qubit probe system state, the real part of the single-qubit probe state and the imaginary of the single-qubit probe state, respectively. Note S4. Optimal controls learning -another independent trial Each candidate probe state in iteration g is specified by ρ (g) C = |ψ ψ| with |ψ = cos(δ/2)|0 + sin(δ/2)e iϕ |1 . In a we plot the candidate probes discovered during iterations 1, 10, 20 and 25, as overhead projections on the Bloch sphere. Here (ϕ, sin(δ)) are effectively mapped to polar coordinates -such that sin δ becomes the magnitude (displacement of the point from the center), and ϕ is the angle relative to the x-axis. The points in each plot are color coded according to their efficacy. The plots then directly depict the convergence of the sequentially discovered candidate probes to the optimal probe (demarcated by δ = π/2). b plots the purity loss of these probe state (round circles with values given by axis on the left), together with the blue line indicating the bound stipulated by maximal purity loss out of all candidates in each iteration (solid blue line). The red line plots (with values given by axis on the right) the quantum Fisher information achievable by the associated probe state, should it be used to sense φ. These results illustrate that the learning algorithm converges quickly to near optimal values by the 10 th iteration. Meanwhile c plots the associated control fields in the x and y directions (orange and pink bars) used to general the optimal probe of iterations 1, 10, 20 and 25.
To further verify our experimental results, we ran the whole controls learning experiments again. Similarly, we set the stopping condition as g = 25. Fig. S3b plots the resulting purity loss of various control sequences in C (g) for each iteration g. Meanwhile Fig. S3c shows the sliced control sequences along x and y directions for the maximum purity loss in the 1 st , 10 th , 20 th and 25 th iteration. We see the these control sequences quickly converge, and that the resulting purity loss becomes almost maximal within 15 iterations.
To verify that optimizing purity loss indeed optimizes the efficacy of the probe, we experimentally extracted the best candidate probe state, ρ  Fig. S3b, illustrating the increases in efficacy of the probes closely follow that of increases in purity loss. Moreover, the final Fisher information obtained is 0.9836 ± 0.0017 (statistical results over the last 5 iterations), which is very close to the theoretical maximum of 1. Finally Fig. S3a illustrates candidate probes at various iterations, illustrating that how our controls quickly convey on engineering probe states that are maximal coherent with respect the computational basis -the requirement for a probe to be optimal for estimating φ.
Note S5. The procedure of reading the encoded phase φ out After finding the optimal control pulses C opt , we could continue to finish the metrology task of estimating φ = π/3 on a single probe spin governed by e −iIzφ . This procedure was conducted by firstly preparing the total system state as |0 0| ⊗ |0 0| ⊗ |0 0|. The optimal control C opt searched in the optimization part was then applied to the  Fig. S4: The full state tomography results for the entire three-qubit system state and the extracted single-qubit probe state. These probes are the optimal states discovered in the 1 st , 10 th , 20 th and 25 th iteration. In each row, from left to right, the figures are the real part of the total three-qubit system state, the imaginary of the total three-qubit system state, the real part of the single-qubit probe state and the imaginary of the single-qubit probe state, respectively.
probe spin 2. Thereafter, we achieved the parameter encoding process on probe spin 2 by setting ∆ω 2 0 = 50 Hz and varying the encoding time T e from 0 to 0.02 s. Finally, a Hadamard gate was followed to measure on the computational basis |0 with decoupling other two spins. The measurement results formed a curve looks like f (ω) = 1 2 +ã 2 cos(ωt) −ã 3 sin(ωt) + h. A fittedω could be gotten to calculate the concerned parameter, i.e.φ =ωT e , where T e should be 0.003333 s for φ = π/3. For comparison, we performed a similar experiment when controls learning were not considered, but the probe spin was prepared at ρ 1 (for another trial in Note S4, we marked as ρ 2 ) which was induced by the searched controls at 1 st iteration. Here, in Table. S2 we list the fitting results of the readout curves. As analysed, the fitting curves read f (ω) = 1 2 +ã 2 cos(ωt) −ã 3 sin(ωt) + h. The experimental results of the parameter readout process are shown in Fig. S5. We used the least-squares method to fit the curve and estimate the phase. In addition, we used the fringe visibility v = (A max − A min )/(A max + A min ) to directly estimate the phase uncertainty [S5] in this noisy experiment, where A max and A min are the maximum and minimum of the observed amplitude. When the controls learning are not included, for the probe state ρ 1 , we obtainedφ = 0.3128π and v = 0.4945. The optimal controls improved this result toφ = 0.3217π with v = 0.9863, as shown in Fig. S5a. Thus, we got a phase sensitivity enhancement factor 0.9863/0.4945 = 1.9945. For the probe state ρ 2 , the results are similar, they areφ = 0.3065π with v = 0.2569 for the case without controls learning and φ = 0.3157π with v = 0.9142 when controls learning are considered, as shown in Fig. S5b. In this case, we got a phase sensitivity enhancement factor 0.9142/0.2569 = 3.5586. From these results, we clearly see that the estimated encoding parameter is more closer to the true value (π/3) and more accurate when optimal controls are included.