Preparation of Metrological States in Dipolar-Interacting Spin Systems

Spin systems are an attractive candidate for quantum-enhanced metrology. Here we develop a variational method to generate metrological states in small dipolar-interacting ensembles with limited qubit controls and unknown spin locations. The generated states enable sensing beyond the standard quantum limit (SQL) and approaching the Heisenberg limit (HL). Depending on the circuit depth and the level of readout noise, the resulting states resemble Greenberger-Horne-Zeilinger (GHZ) states or Spin Squeezed States (SSS). Sensing beyond the SQL holds in the presence of finite spin polarization and a non-Markovian noise environment.

Introduction. Spin systems have emerged as a promising platform for quantum sensing [1][2][3][4] with applications ranging from tests of fundamental physics [5,6] to mapping fields and temperature profiles in condensed matter systems and life sciences [3]. Improving the sensitivity of these qubit sensors has so far largely relied on increasing the number of sensing spins and extending spin coherence through material engineering and coherent control. However, with increasing spin density, dipolar interactions between individual sensor spins cause single-qubit dephasing [7,8] and, in the absence of advanced dynamical decoupling [9][10][11], set a limit to the sensitivity.
Although dipolar interactions in dense spin ensembles lead to complex evolution, they can provide a resource for the creation of metrological states that enable sensing beyond the SQL. Current approaches to create such states (i.e., GHZ states and SSS) either require all-to-all interactions [12][13][14] or single-qubit addressability [15,16], which are challenging to implement experimentally. An alternative approach that relies on adiabatic state preparation requires less control but results in preparation times that increase exponentially with system size [17,18], leaving this method susceptible to dephasing.
Variational methods provide a powerful tool for controlling many-body quantum systems [19][20][21][22]. Such methods have been proposed for Rydberg-interacting atomic systems [23,24] and demonstrated in trapped ions [25]. However, these techniques rely on effective all-to-all interactions (e.g. almost constant interaction strength inside the Rydberg radius [23,26,27]) which are generally absent in solid-state spin ensembles. In this work, we develop a variational algorithm that drives dipolarinteracting spin systems [ Fig. 1(a)] into highly entangled states. The resulting states can be subsequently used for Ramsey-interferometry-based single parameter estimation [1]. The required system control relies solely on uniform single-qubit rotations and free evolution under dipolar interactions. The optimization can be directly performed on an experimental device using only its measurement outcomes without the need to know the spatial distribution of the spins (later referred as 'spin configuration'). Potential experimental platforms include dipolarinteracting ensembles of NV centers, nitrogen defects in diamond (P1), rare-earth-doped crystals, and ultra-cold molecules.
Variational Ansatz. As shown in Fig. 1(b), the variational circuit S(θ) = U m ...U 2 U 1 is constructed by m layers of unitary operations. Each U i consists of the parameterized control gates where R µ (ϑ) = exp −iϑ . The coupling strength between two spins at positions r i and r j is with µ 0 the vacuum permeability, the reduced Planck constant, γ the spin's gyromagnetic ratio, and β ij the angle between the line segment connecting (r i , r j ) and the direction of bias magnetic field. An evolutionary algorithm [29,30] is applied on the m-layer circuit which contains 3m free parameters constituting the vector θ = (τ 1 , ϑ 1 , τ 1 , ..., τ i , ϑ i , τ i , ..., τ m , ϑ m , τ m ). Each τ i is restricted to τ i ∈ [0, 1/f dd ] wheref dd is the average nearest-neighbor interaction strength for the considered spin configuration. The Ansatz in Eq. (1) is the most general set of global single-qubit gates that preserves the initial collective spin direction i S i /| i S i |, here chosen to be the x-direction [23,30]. Although this Ansatz The quantum circuit consists of three parts: a sequence for generating entanglement (entangler), phase accumulation (Ramsey) and single-qubit readout in the Sz basis. Dipolar interactions during Ramsey interference are eliminated by dynamical decoupling [7,9,28]. The measurement outcome is processed on a classical computer and used to determine the next generation for θ. (c) Gate sequence of each variational layer and the Wigner distributions for a 5-spin state after each gate. (d) Illustration of an optimization process on a 3-spin system with m = 1. The contour plots show the 2D projection of the multidimensional θ space for fixed ϑ1. The orange points mark the sampling positions in the parameter space. Convergence to the global maximum is reached in the 63th generation.
does not enable universal system control [30][31][32][33], we show that with increasing circuit depth, sensing near the HL can be achieved.
Metrological cost function. The Ramsey protocol shown in Fig. 1(b) encodes the quantity of interest in the accumulated phase φ = ωt R , with ω the detuning frequency and t R the Ramsey sensing time. The Classical Fisher Information (CFI) [1,30] quantifies how precisely one can estimate an unknown parameter φ under a measurement basis. Our variational approach treats the spin systems as a black-box for which the algorithm finds a control sequence that maximizes the CFI associated with the parameter estimation problem The sum runs over the 2 N basis states |z = ⊗ N i=1 |s z i , where s z i are the eigenvalues of S z i . P z ≡ |z z| denotes the corresponding measurement operator and ρ φ the density matrix. The CFI is chosen as cost function because it is a measure for the maximal achievable sensitivity for a given measurement basis [1,34]. Likewise, P z provides the maximal information that can be gained from singlequbit measurements. However, measurement operators such as parity or total spin polarization result in a smaller outcome space and are therefore more efficient in experimental implementations. While this study optimizes the measurement for P z , the obtained results likewise hold for parity and total spin polarization [30].
Numerical results for regular and disordered spin configurations. We start by testing our approach for three distinct regular spin configurations. Figure 2(a) shows the CFI after optimization for spins arranged on a linear chain (blue), a two-dimensional (2D) square lattice (orange), and a circle (green). All three configurations result in states with CFI above the SQL. When multiple circuit layers are added, the CFI further improves. Next, we simulate the case of disordered three-dimensional (3D) spin configurations (later referred as 3D-random). In our simulations the spins are randomly located in a box of length L ∝ N 1/3 (constant spin density). Compared to the regular spin array, the disordered case shows a noticeable saturation of the CFI as a funciton of N . With increased circuit depth, sensing precision beyond the SQL Average CFI for 50 configurations of 3D-randomly distributed spins. (c) Average number of layers required to achieve a CFI within a given percentage of the HL in the case of 3D-random configuration. The fit m = aN b +c with b = 2.45 (goodness of fit R 2 = 0.996) serve as a guide to the eye. The same data also fits to an exponential model with slightly lower R 2 = 0.995.
is maintained. The required circuit depth increases drastically with N [ Fig. 2(c)]. Characterization of entanglement. We investigate the N -qubit entangled states created by our variational method. Figure 3(a) shows the corresponding Wigner distributions [35][36][37] for a regular 2D spin array (top) and the average Wigner distributions for 50 different 3Drandom spin configurations (bottom). In both cases, the optimized states resemble GHZ states when N is small and m is large. For large N and small m, the states are close to SSS. Non-Gaussian states that provide sensitivity beyond the SSS but lower than GHZ states are also generated. Our algorithm tends to drive the systems into a GHZ state, as it has the unique property of attaining the HL in Ramsey spectroscopy [38].
For quantitatively analysing the buildup of entanglement, the von-Neumann entanglement entropy (E vN = − Tr(ρ s log 2 ρ s )) [39] is used as a measure for the degree of entanglement between a spin subsystem (ρ s = Tr s ρ tot ) and the remaining system. As an example, we explore one case of a 3D-random configuration of 9 spins. Figure 3(b) shows the von-Neumann entropy of each spin after employing a 2-layer circuit (left) and a 7-layer cir-cuit (right). In the case of m = 2, the achieved degree of entanglement is modest with spin No.6 for example showing no substantial entanglement with the remaining spins. When the circuit depth is increased to 7, all spins display substantial entanglement. While the singleparticle entropy detects spins unentangled with the remaining system, it does not determine whether all spins are entangled with each other or entanglement is local. We distinguish these two scenarios by identifying the smallest clusters with E vN ≤ 0.4. For m = 2, the spin ensemble segments into 5 clusters [ Fig. 3(b)], while for m = 7 only 2 clusters are found. The results verify that multiple layers are required to overcome the anisotropy of the dipolar interaction (Eq. (2)) when building up entanglement over the entire system. Finally, in Fig. 3(c) we analyze the size of the largest cluster for each of the 50 spin configurations and observe an overall increase of the largest cluster size and a decrease of the variance.
State preparation time. Minimizing the preparation time is central in practical applications, as it increases bandwidth, reduces decoherence, and enables more measurement repetitions [1]. Figure 3(d) shows the average state preparation time for 8 spins in 50 different 3D- random configurations as a function of layer number. The preparation time increases with the layer number and is inversely proportional to the average dipole coupling strength of the nearest-neighbour spinsf dd . Compared to adiabatic methods [17], our approach results in an 11× reduction of the preparation time to reach the same CFI for identical spin number and density [30].
State preparation under decoherence, initialization and readout errors Until now our analysis assumed full coherence and perfect spin initialization and readout. However, dephasing, initializaiton, and readout errors will be limiting factors in experimental implementations. We next examine the impact of such imperfections on state preparation and sensing. Figure 4(a) shows the CFI in the presence of imperfect initialization and finite readout fidelity for spins on a 2D square lattice. For N ≤ 10, beyond-SQL precision is reached with 90% initialization and 95% readout fidelity, respectively.
Next, we investigate the resulting states when readout errors are added into the optimizer. Figure 4(b) indicates that without readout errors, the Wigner distribution of the resulting state is close to a GHZ state. However, with a finite readout error rate, our algorithm drives the system into a state resembling a SSS. When the read-out noise is further increased, the SSS transforms into a coherent spin state (CSS). The results agree with the fact that GHZ states are sensitive to single-spin readout errors while SSS are more robust [40].
During the state preparation, decoherence (T 2 ) reduces entanglement. We assume independent, Markovian dephasing of each spin as described by a Lindblad master equation [39]. Figure 4(c) shows the CFI for various T 2 times using the previously optimized gate parameters for 2D square lattice. While a finite T 2 decreases the CFI, coherence times exceeding 5/f dd result in states with beyond-SQL sensitivity. Here, f dd denotes the nearest-neighbor interaction strength for 2D square lattice. Since performing optimization with imperfections is numerically expensive, the results in Fig. 4(a), (c), (d) are obtained by optimizing the parameters in the absence of imperfections and using those parameters to compute the CFI under imperfect conditions. Thus, better results are expected if the optimization is directly run on experiments.
Sensitivity in a non-Markovian environment. In addition to impacts on state preparation, dephasing affects performance in Ramsey interferometry. In the presence of spatially uncorrelated Markovian noise, entanglement does not lead to a beyond-SQL scaling [48,49]. In a non-Markovian environment, such as that of a solid-state spin system, this limitation does not hold [50,51]. We examine the performance of our optimized states in a non-Markovian noise environment. We adopt a noise model [50] in which the amplitude of single-spin coherence reduces according to where ν is the stretch factor set by the noise properties. The time evolution under Ramsey propagation is simulated with a generalized Lindblad master equation [30,51]. The sensing performance of optimized states is characterized by the square of the signal-noise-ratio SNR 2 ∝ CFI ω /t R [30]. Figure 4(d) shows their performance compared to the CSS and the GHZ states for a ν = 2 and ν = 4 noise exponent [8]. The created entangled states provide an advantage over uncorrelated states. For small spin numbers, the SNR follows the HL scaling [50].
Proposed experimental platforms. Candidate systems for realizing the proposed variational approach need to possess long T 2 coherence time, strong dipolarinteracting strength, and high initialization and readout fidelity. Recent developments in solid-state spin systems and ultracold molecules have demonstrated coherence times that exceed dipolar coupling times (1/f dd ) as well as high-fidelity spin initialization and readout. Table S1 lists the experimentally observed parameters for different candidate systems, including Nitrogen Vacancy (NV) ensembles, P1 centers in diamond, rare-earth doped crystals, and ultracold molecule tweezer systems [30].
Conclusion and Outlook. This work introduces a variational circuit that generates entangled metrological states in a dipolar-interacting spin system without requiring knowledge of the actual spin configuration. The required system parameters are within the reach of several experimental platforms. While this study remains limited to small system sizes (N ≤ 10, limited by computational resource), our results are of immediate interest to nanoscale quantum sensing where spatial resolution is paramount and the finite sensor size limits the number of spins that can be utilized. Extending our results to N > 10 can either be achieved by utilizing symmetries in regular arrays or directly testing our optimization algorithms on an actual experimental platform. The developed method is also potentially applicable for preparing other relevant highly entangled states in quantum computing and quantum communication.  In this section, we discuss how to choose the experimentally realizable elementary gates in the variational sequence in the entangler based on limited quantum resource [23,24].

Entanglement generation gates from two-body interaction Hamiltonian and global rotations
Consider a two-body interaction Hamiltonian: (S1) In this Hamiltonian, S = (S x , S y , S z ) is the vector of spin-1/2 operators, V ij is the interaction strength between spin i and j which depends on their locations, and J I ( = 0), J S are the Ising and symmetric coupling constant respectively. The elementary gates in each layer of the variational circuit for preparing metrological states ( Fig.1(c) main text) include two free evolutions under the interaction Hamiltonian D(τ ), D(τ ), one global rotation along the x-axis R x (ϑ) and two fixed π/2 rotations R y (− π 2 ), R y ( π 2 ) along the y-axis. We define the interaction gate in the z-direction as The interaction gates in other directions can be obtained by π/2 rotations: In Eqs. (S2) (S3), the symmetric interaction term stay unchanged because inner product is conserved under global rotation and the 'direction of interaction' is only determined by the Ising term. Using these definitions, we simplify the gate set in each layer as In the next two subsections, it will be shown that the sequence in Eq. (S4) is the most general gate set that uses only global rotations and preserves the collective spin direction along x-direction.

Preservation of the collective spin direction
Define the x-parity operator P x ≡ Π N i σ xi = P † x , with P 2 x = I. This operator describes the parity of a state in x-direction and is related to the global π rotation along x-axis up to a phase constant, Applying the x-parity operator onto individual spin's angular momentum operator gives P x S µj P x = (σ x S µ σ x ) j = ±S µj . Thus the interaction gates along x-and z-direction conserve the x-parity, Similarly, the only global rotation that conserves x-parity for arbitrary angles is R x (ϑ). Then, based on Eq.(1) in the main text, the unitary operator of the whole control sequence conserves the x-parity The initial spin state pointing to the +x-direction is an eigenstate of P x : P x |↑ x ⊗N = |↑ x ⊗N . Thus, any state produced by this variational circuit remains an eigenstate of P x : Now consider the expectation value of the total spin angular momentum operator J µ ≡ i S µi (µ ∈ {x, y, z}): Thus, the collective spin direction J /| J | always points along the x-direction.
Choosing the most general gate set To preserve the collective spin direction along x-axis, the global rotation and interaction gates that can be chosen are R x , D x , D ⊥ where D ⊥ stands for the interaction gates along any direction perpendicular to the x-direction.
Combining R x and D z can generate any D ⊥ , thus the simplest gate set fulfilling all the requirements is D x R x D z , as described by Eq.(1) in the main text.
The derivations and results in this section about selecting the variational sequence agree with ref. [23]. However, the interaction Hamiltonian we discuss here is more general. In Eq. (S1), when J I = 1, J S = 0, the interaction becomes Ising type interaction which is equivalent to the Rydberg interaction in ref. [23,24]. The Ising interaction can also describe spin systems with large local disorder. The optimization results are shown in the next section. When J I = 3, J S = −1, Eq. (S1) becomes the dipolar interaction Hamiltonian between spin-1/2 particles. When J I = 2, J S = −1, it becomes the dipolar interaction Hamiltonian between spin-1 particles (such as NV centers). The simulation results for this case are shown in the next section. When J I = 1, J S = −1, the interaction can describe the dipolar interaction between cold molecules [7].

OPTIMIZATION ALGORITHM: COVARIANCE MATRIX ADAPTATION EVOLUTION STRATEGY
The optimization in the 3m dimensional parameter space is highly non-convex ( Fig.1(d) in main text) due to the large inhomogeneity of the interaction strength. In our setting, the previously used Dividing Rectangles algorithm [23,20] cannot converge to a beyond-SQL result despite large number of iterations. We address this challenge by using the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) as our optimization algorithm [29]. CMA-ES balances the exploration and exploitation process when searching in the parameter space so that convergence is reached after less than approximately 2,000 generations for N, m ≤ 10. This corresponds to about 10 8 repetitions of the Ramsey experiment, which can be further reduced if collective measurement observables are measured.
We reduce the complexity of the optimization by restricting τ i within [0, 1/f dd ] wheref dd is the average nearestneighbor interaction strength for the considered spin configuration. Setting a large parameter searching range for the interaction gates' time τ i would potentially ensure the global maximum CFI location is included in the parameter space. However, when the upper bound of τ i is much bigger than 1/f dd , the evolution of neighboring spin pairs is fast when sweeping τ i . This would introduce a huge amount of local maximum points in the parameter searching so that it is impractical for the black-box optimization algorithm to converge to that global maximum point.

OPTIMIZATION RESULTS OF DIFFERENT TYPES OF DIPOLE-DIPOLE INTERACTION HAMILTONIAN
The magnetic dipole-dipole interaction Hamiltonian under secular approximation has the general form [52,53]: where µ 0 is the vacuum permeability, γ is the geomagnetic ratio of the spin, β ij is the angle between the line segment connecting (r i ,r j ) and the direction of the bias external magnetic field (along z-direction in this case). Eq. (S8) is able to describe the dipolar interaction for the spin systems with arbitrary spin number as long as the spin angular momentum operators S µ obey the commutation relation [S i , S j ] = i ijk S k . It applies to the spin-1/2 systems we discussed in the main text and Nitrogen-Vacancy (NV) centers which are spin-1 systems.

NV ensemble
Here we consider NV ensemble and only |m s = 1 and |m s = 0 are used as a 2-level system. The spin-1 operators are If we only take the |m s = 1 , |m s = 0 subspace into consideration, the relations between the 'truncated' spin-1 operators and the spin-1/2 operators are: Plugging Eq. (S11) into Eq. (S8), we get the effective dipole-dipole interaction Hamiltonian for NV ensemble |m s = 1 , |m s = 0 subspace [52,54]: yj ) (S12) Fig. S1(a) shows the Classical Fisher Information (CFI) optimization results for 2D square lattice spin configuration. They are similar to the results we get in Fig.(2) of the main text for spin-1/2 systems. Ising type interaction (large local disorder) When the system has large local disorder, the flip-flop terms in the dipolar interaction Hamiltonian Eq. (S8) are suppressed because of the large energy gap: This location-dependent single-spin energy shift (δ i ) can be canceled by spin-echo pulse sequence where the interaction gate D(τ ) needs to be applied: (S14) Eq. (S14) is also valid when the local disorder δ i is comparable with the interaction strength V ij . If there is local disorder in the dipolar-interacting spin ensemble, applying spin-echo will generate the interaction gate D(τ ) where the local disorder terms are canceled.
The CFI optimization results by using the effective Ising type interaction Hamiltonian H DD, zj is shown in Fig. S1 (b). From Fig. S1, the CFI results close to the Heisenberg Limit are observed, indicating that the variational method can be applied to different kinds of spins in solid state systems and generate highly entangled state for high-precision quantum metrology. We also observe that for shallow variational circuits, the CFI 'oscillation' between even and odd spin numbers only appears when there are flip-flop terms in the Hamiltonian. For Ising type interaction, the 'oscillation' disappears.
OPTIMIZATION RESULTS BY USING P tot z , P π z AS MEASUREMENT BASES The optimization results shown in Fig.2 in the main text are obtain by using P z as the measurement basis for the CFI (cost function) calculation. Although measuring all the diagonal elements in the density matrix of the resulting states provides the maximum information one can get from single-qubit measurement and a large Hilbert space for the optimizer, it leads to an exponentially large (2 N ) experimental repetition number when the CFI needs to be estimated from experimental data. Thus, we test the variational method on two other measurement bases which require less repetitions for readout.
The measurement basis on total spin polarization along z-direction is given by S2. CFI optimization data for (a) 2D square lattice using observable P tot z , (b) 3D-random configuration using observable P tot z (averaged over 5 cases), (c) 2D square lattice using observable P π z , (d) 3D-random configuration using observable P π z (averaged over 5 cases).
provides a constant (2) dimensional outcome space for experimental readout. Improvements are also observed in 2D square lattice and 3D-random spin configurations (Fig. S2(c)(d)).

SUPPLEMENTARY DATA
Complete CFI data for Fig.2

in main text
The complete data for dipolar-interacting spin systems' CFI optimization is shown in this section. Fig. S3(a) shows the 50-cases averaged optimization results for 3D-random spin configurations, the variational circuit layer number m is chosen from 1 to 10. The optimized CFI results are approaching to the Heisenberg Limit (HL) when more layers (m) are used. However, when m > 7, the CFI results stop increasing. This CFI 'saturation' effect might be caused by two reasons. First, when m is larger, the number of the local maximum points in the high dimensional parameter space increases. This could potentially cause the optimizer to stuck in the local maximum point. Sometimes, take N = 7, m = 10 data in Fig. S3(a) as an example, adding more variational layers even leads to a lower CFI optimization result. The 'local maximum' problem could be solved by more advanced and powerful optimization algorithms, such as reinforcement learning [55][56][57], and more computational resources. Second, the 'saturation' effect reflects the global maximum CFI one can reach, no matter what kind of optimization algorithm is applied. It's still an open question what is the highest CFI the spin ensemble could reach for a given configuration. Fig. S3(b)-(d) show the CFI optimization result for 1D chain, 2D square lattice and 2D symmetric cycle spin configurations. The results of regular patterns are better than those of 3D-random pattern. As shown by the schematic on the left, the distances between spin No.4 and spin No.5, 7, and 9 are the same, so the interaction strengths between each pair are the same. Similarly, the distance between spin No.4 and spin No.2, 3, 6, and 8 are the same (smaller). Therefore, the plateau features in Fig. S4 are likely due to this symmetry: adding one more spin to the lattice does not require an extra layer to reach a given percentage of the CFI.

Orders of interaction
Due to the decaying feature ( 1 r 3 ) of dipolar interaction strength, the resulting states might be mainly generated by nearest-neighbor interaction. For studying 'how much' interaction is essential for generating the resulting entangled states, we calculate the overlap (state fidelity [39]) between the original state and the new state, which is generated by using the cutoff Hamiltonian and optimized parameters. A cutoff interaction strength f cutoff is chosen, and all the pairwise potential V ij smaller than f cutoff are set to zero in the cutoff Hamiltonian. Fig. S5 shows the relation between the state fidelity F versus f cutoff . A state fidelity value less than 1 is observed when f cutoff is set to be equal to the averaged nearest-neighbor interaction strength f dd . This result reflects higher order interactions in the spins ensemble are utilized for the entangled state generation.

Optimized states with different readout fidelity
We run the optimization with imperfect readout for N = 4 and N = 10 2D square lattice spin configurations. The optimized states resemble GHZ states (high RF), SSS (low RF), CSS (RF close to 50%). For N = 4 case, the Gaussian state appears for RF lower than 92%, but for N = 10 case, the Gaussian states appears when RF is about 96%. We expected that for large spin system with finite RF, Gaussian states (e.g. SSS) are advantageous for quantum-enhanced metrology.  Based on the simulation results shown in Fig.4(c) in main text, we needf dd T 2 ≥ 5 to generate metrological states that beat the SQL. It's worth mentioning that the T 2 in this situation stands for the coherence time without the dipole-dipole interaction's influence. During the state preparation step, the dipolar interactions between the spins are included in the system Hamiltonian for the entanglement generation (D gate in Fig.1(c) in main text). Thus, the T 2 (DD) in Table S1 is a lower bound and T 2 (best) is a more precise estimation for the spin coherence time.

SUPPLEMENTARY DERIVATIONS CFI with respect to angle and frequency
In general, the Classical Fisher Information (CFI) measures the sensitivity of a statistical model to small changes of a parameter θ [58,59]. Let Z be a random variable and P z (θ) ≡ P (z|θ) be its probability distribution which depends on θ. Let Θ be an unbiased estimator of θ, i.e. (S17) From Eq. (S17) and the fact that the sum of probabilities of all outcomes is 1, Subtracting Eq. (S19) multiplied by θ from Eq. (S18), we get Letting X = Θ−θ and Y = ∂ ∂θ log P z (θ), by the Cauchy-Schwartz inequality for random variables: is the variance of Θ. Defining we have If the measurement is repeated M times, then by the additive property of CFI, we obtain the Cramér-Rao bound: In our variational circuit, we use CFI with respect to an infinitesimal angle φ as the cost function to generate entangled states. In our program, we use a method similar to parameter shift to calculate the CFI φ of our optimized states [19,60,61]. In the following notation, 1. z represents a multi-qubit state in the z-basis; 2. U(φ) = e −iφJy is the rotation operator where φ is a small angle; 3. ψ is the state we create from the variational circuit; 4. P z (θ) is the probability of measuring the state z with the state after rotation.
Note that assuming the rotation operator U(φ) = e −iφJy ≡ U y (φ) along y-axis is for calculation simplicity. In experiments, the signal (e.g. the external B-field) usually induces a rotation along z-axis, U z (φ) = e −iφJz . It's equivalent to assume that the prepared state is firstly rotated by a R x (π/2) pulse and then accumulates a signal φ along y-axis, or firstly accumulates a signal along z-axis and then rotated by R x (−π/2) pulse [59]. In another word, R x (−π/2)U z (φ) = U y (φ)R x (π/2), so the signal accumulation process we assumed in the calculation is able to simulate the experiments. After creating the entangled states, we want to know how useful they are in a Ramsey spectroscopy, where the signal we want to detect is a frequency ω. By the same calculation as above except the difference that we take derivative with respect to ω = φ tR where t R is the Ramsey sensing time, we have

Relation beteen CFIω and SNR in single qubit Ramsey experiment
We illustrate the Ramsey protocol for a single qubit.
1. The qubit is initialized into the ground state |0 .

2.
A π 2 pulse along the y-direction is applied to transform it into a superposition state 1 √ 2 (|0 + |1 ). Its matrix form is 3. After evolving under noise and a signal with frequency ω for time t, its state becomes where γ is the decoherence rate.

4.
A second π 2 pulse along the x-direction is applied for readout. The qubit is then in the state . 5. After the rotation, the probability of the qubit being in the ground state is The CFI with respect to ω is Assuming only quantum projection noise, the Signal-to-Noise Ratio (SNR) is where M is the total number of measurements. Then Assuming no time overhead, i.e., M = Ttot tR where T tot is the total measurement time and t R is the time between Ramsey pulses, we obtain the relationship In unit time (T tot = 1), when SNR= 1, the smallest signal we can measure is leading to the saturated Cramér-Rao bound.

Maximum Likelihood Estimator
Since a measurement collapses a quantum state to an eigenstate of the observable, it's impossible to directly measure P (θ). In experiments, we repeat the sequence to obtain the results for estimating the P (θ) and then get an estimate value of θ. To understand the relation between the variance of the estimation and CFI, we introduce the Maximum Likelihood Estimator (MLE), which has asymptotic properties to saturate the Cramér-Rao bound. Below we summarize the proof given in [62].
Let X = {X 1 , X 2 , ..., X M } be a collection of independent and identically distributed (i.i.d.) random variables with a parametric family of probability distributions {P (X|θ)|θ ∈ Θ}, where θ is an unknown parameter and Θ is the parameter space. Let x = {x 1 , x 2 , ..., x M |x i ∈ X i } be the experimental data set from M repetitions. The goal is to estimate θ (the signal we want to measure) from x, i.e., find θ that is most likely to produce the outcome x. Thus, the normalized log-likelihood function is defined as A MLE maximizes the log-likelihood function In the following, we first show that In fact, θ 0 maximizes L(θ): Moreover, we show that θ 0 is the unique maximizer. Jensen's inequality states that for a strictly convex function f and a random variable Y , Taking f (y) = − log y and P (X|θ) = P (X|θ 0 ), we have for some θ 1 ∈ [Θ MLE , θ 0 ]. We analyze the numerator and denominator respectively. The numerator where the last equality is obtained from the linearity of expected value and derivative. By the CLT, the distribution of √ M L M (θ 0 ) converges to N 0, Var θ0 (log P (X i |θ 0 )) (S46) where the variance by the definition of CFI and that θ 0 maximizes L(θ). By the consistency property, Θ MLE converges to θ 0 , and thus θ 1 converges to θ 0 . The denominator by the WLLN. We further show that Eq. (S48) is in fact the additive inverse of CFI: Finally, Eq. (S44) becomes Thus, the MLE is asymptotically unbiased and saturates the Cramér-Rao bound.

Master equation for a non-Markovian environment
To simulate the performance of our optimized states during the Ramsey measurement with non-Markovian noise, we use a time-local master equation given by [51]. A brief summary of the derivation is given below.
1. Let L(C d ) be the Hilbert space of linear operators acting on C d , where the inner product is defined as σ, τ = Tr(σ † τ ) (the Hilbert-Schmidt inner product).
2. Let LL(C d ) be the Hilbert space of linear operators acting on L(C d ) which has dimension d 2 ×d 2 . Let {l i } i=1,...,d 2 be an orthonormal basis of LL(C d ). Then the action of Λ ∈ LL(C d ) on τ ∈ L(C d ) can be expressed as Thus, Λ has a unique correspondence with the matrix Λ with entries 3. Λ ∈ LL(C d ) is trace-and hermicity-preserving if and only if its matrix representation Λ can be written as where 0 is the zero row vector of length d 2 −1, m is a real column vector of length d 2 −1, and M is a (d 2 −1)(d 2 −1) real matrix.
4. For a single qubit, any operator ρ on C 2 can be written as where v is a three-dimensional real vector and σ is the vector of Pauli matrices. Then a map Λ whose matrix representation is given by Eq. (S53) acting on ρ gives 5. The noisy evolution of a state ρ is described by The time local master equation satisfies with the corresponding matrix representation 6. Consider the evolution of one qubit described by Λ(t) = U(t) • Γ(t) . U(t) is defined as where U (t) = e −i ωt 2 σz represents the signal accumulation. By Eq. (S51) and Eq. (S60), the matrix representation of U(t) is Γ(t) represents the noise which is trace-and hermicity-preserving, i.e., has the form in Eq. (S53).
From Eq. (S34), since T tot and δω 2 are constants, SNR is proportional to CFIω tR . Thus we choose CFIω tR as the result we show in Fig.4(d) in the main text.

Time Overhead
FIG. S8. 50 cases average sensing performance of the optimized states when using 7-layer circuit on 3D random spin configuration.
In experiments, the time overhead, including the state preparation and readout time, reduces the repetition number of the sensing sequence and thus decreases the sensitivity. If we consider a nonzero time overhead, i.e., M = Ttot tR+t oh , the expression for SNR 2 for an uncorrelated spin state becomes If t oh >> t R , we ignore the term t R in the denominator and Taking the derivative of Eq. (S74) with respect to t R gives us the best t R if the time overhead is significantly larger: Based on the data shown in Fig.3 from ref. (17), it takes about 200µs for the adiabatic method to prepare an 8-spin SSS with ξ 2 = 0.4 which corresponds to CFI = 20. The 2D spin density 8/(30nm × 30nm) corresponds tō f dd = 43.5kHz. According to Fig.3(d) in the main text, the variational method is able to prepare an 8-spin entangled state with CFI ≈ 20 by a 4-layer circuit withf dd T = 0.8. Plugging in the same average nearest neighbor dipolar interaction strengthf dd , we finally calculated the state preparation time of the variational method is T = 18.4µs, which is about 11 times faster than the adiabatic method under the same condition.

CONTROLLABILITY
Since all the black-box optimization algorithms cannot ensure that the optimized result is the global maximum/minimum point of in the parameter space, it is sill an open question that if the variational method is able to find the 'best' metrological state for a given spin configuration or not. In this section, we're interested in the theoretically achievable controllability of dipolar interacting spin systems. The question is, given any (possibly infinite) arbitrary sequence of evolution under each Hamiltonian governing the dynamics of our system, can we drive any arbitrary unitary operator? Quantum control systems of the general form governed by the Schrödinger equation, i d dt |ψ(t) = H(t) |ψ(t) , have been studied extensively [69,31]. H 0 is the unperturbed or free evolution Hamiltonian, H k are the control interactions, and u k (t) are the piecewise continuous control fields. There are several distinct but related notions of controllability that have different conditions for 'full' controllability. The notion of 'operator' or 'complete' controllability is the strictest condition and is defined as above. For generic interacting spin systems, all of these notions are equivalent. Complete controllability is equivalent to universal quantum computation (UQC) in quantum information processing (QIP) [70,71].

Controllability Test
The way we investigate the controllability of a generic system (S84) is by examining the so-called 'dynamical Lie algebra' L 0 ⊆ u(N ) or su(N ) generated by the operators {−iH 0 , −iH 1 , . . . , −iH K }, which are represented by N × N matrices in a basis we choose [69,31].
A quantum system of the form (S84) is completely controllable if either L 0 ∼ = u(N ) or L 0 ∼ = su(N ) [69], where u(N ) is the unitary Lie algebra represented by the set of skew-Hermitian N × N matrices and su(N ) is the special unitary Lie algebra represented by the same set of matrices with the extra condition that they are traceless. Note that dim u(N ) = N 2 and dim su(N ) = N 2 − 1, and the difference of 1 comes from counting identity operation (I) as a dimension or not. We must find a basis for L 0 by iteratively taking the Lie bracket [·, ·] of H 0 , H 1 , . . . , H K until we have a set of dim L 0 linearly independent matrices, where the Lie bracket is the commutator [A, B] = AB − BA for matrices A and B. Ref. [69] and ref. [31] present an algorithm for generating this basis. Thus, if dim L 0 = N 2 or N 2 − 1 we can say that the system is completely controllable. Note that for generic spin systems N = 2 N for N spins.

Controllability of Dipolar Interacting Spin Systems
We write our system in the form (S84) by defining the free evolution Hamiltonian to be the dipolar interaction H dd and two control interactions J x and J y , as these operators are generators of rotation, with respective independent control fields θ x (t) and θ y (t): Ref. [31,72] demonstrate that we cannot achieve complete controllability with global controls due to inherent symmetries, so we know that dim L 0 < 4 N − 1. However, complete controllability is a rather strict condition. Not being able to drive any arbitrary unitary does not mean we cannot drive unitaries that produce metrological states.
In fact, ref. [73] demonstrate for a long-range Ising spin model (all-to-all interactions) with global controls that metrological states, such as the GHZ and W states are reachable. Ref. [74] extend their result for symmetric Ising spin  TABLE S3. Lie algebra dimensions for the complete controllable system, dipolar interacting system and symmetric Ising system (lower bound for subspace controllability). Dipolar interacting spin systems' dim L0 is calculated using an implementation of [31]'s algorithm, and is necessarily bounded by the complete and subspace controllability dimensions. Lie algebra dimensions for dipolar interacting systems are only calculated up to N = 4 due to stability issues stemming from numerical errors in how matrix rank is calculated.
networks with global controls and demonstrate that one can reach any state that preserves spin permutation invariance. This is known as subspace controllability. The dimension of their dynamical Lie algebra, L Ising ≡ L PI ∩ su(2 N ), is shown to be N +3 N − 1. This is relevant to our system because [75] show that if we replace the Ising interaction with a more general two body interaction-which includes H dd -the dimension of the dynamical Lie algebra is necessarily greater than or equal to that of the symmetric Ising case, and it is therefore subspace controllable. This means that we can write N +3 N − 1 ≤ dim L 0 < 4 N − 1 and say that L 0 is subspace controllable but not completely controllable. Therefore, we can achieve arbitrary permutation invariant states, including metrological states such as a GHZ state.

Finding Reachable States
L 0 is associated with a Lie group e L0 by the Lie group-Lie algebra correspondence [69]. The Lie algebra u(N ) corresponds to the Lie group U (N ), and su(N ) corresponds to SU (N ). We can define R ≡ e L0 as the reachable set of unitaries we can drive under {H k } k=0,...,K , and so starting from an initial state |ψ 0 , R |ψ0 is the set of states we can reach.
As demonstrated in the previous section, our dynamical Lie algebra is a superset of L Ising and a strict subset of su(2 N ), so we can write e L Ising ⊆ e L0 ⊂ SU (2 N ). Because |GHZ ∈ R Ising |0 ⊗N we can write |GHZ ∈ R dipolar |0 ⊗N . In fact, this is true for any permutation invariant state, which includes all metrological states we're interested in.
While we know that metrological states are in the reachable set, determining the parameters that drive the unitaries to produce those states is a highly convex optimization problem equivalent to our variational circuit, using state fidelity between the ideal state and the current state instead of CFI as the cost function. That is we optimize the output unitary of the variational circuit,