Non-Gaussian noise spectroscopy with a superconducting qubit sensor

Accurate characterization of the noise influencing a quantum system of interest has far-reaching implications across quantum science, ranging from microscopic modeling of decoherence dynamics to noise-optimized quantum control. While the assumption that noise obeys Gaussian statistics is commonly employed, noise is generically non-Gaussian in nature. In particular, the Gaussian approximation breaks down whenever a qubit is strongly coupled to discrete noise sources or has a non-linear response to the environmental degrees of freedom. Thus, in order to both scrutinize the applicability of the Gaussian assumption and capture distinctive non-Gaussian signatures, a tool for characterizing non-Gaussian noise is essential. Here, we experimentally validate a quantum control protocol which, in addition to the spectrum, reconstructs the leading higher-order spectrum of engineered non-Gaussian dephasing noise using a superconducting qubit as a sensor. This first experimental demonstration of non-Gaussian noise spectroscopy represents a major step toward demonstrating a complete spectral estimation toolbox for quantum devices.

Supplementary Note 1 Device parameters and fabrication of the qubit sensor Device parameters are summarized in Table 1.

Parameter
Value Qubit frequency ω 0 /2π 2.920 GHz Qubit anharmonicity A/2π 1.163 GHz Relaxation time T 1 27.0 ± 2.7 µs Spin-echo relaxation time T 2 35.9 ± 4.4 µs Free induction decay time T * 2 12.2 ± 1.0 µs Readout cavity frequency ω r /2π 7.348 GHz Readout cavity linewidth κ/2π 2.548 MHz Dispersive coupling strength χ/2π 0.130 MHz The device was fabricated in the same way as in Ref. [1]. It is a generalized version of the capacitively shunted flux qubit [1], comprising a capacitively shunted small-area junction in parallel is a series array of N junctions. In the capacitively shunted flux qubit of Ref.
[1], this array comprised N = 2 junctions. Here, the number of array junctions is N = 8, far fewer than used in the fluxonium regime of operation [S1]. The area of each Josephson junction forming the array is identical and designed to be 0.2 × 1.2 µm 2 . The left junction in Fig. 1a is smaller in area by a factor of 8 than the right junction (α = 1/8). The critical current density J c is measured to be 0.60 ± 0.01 µA/µm 2 and the shunt capacitance C sh is designed to be 20 fF.

Supplementary Note 2 Randomized benchmarking of single-qubit gates
We characterized an average error rate of single-qubit gates by performing Clifford randomized benchmarking [S2] (Fig. 1). As mentioned in the main text, single-qubit operations are performed using cosine-shaped microwave pulses, applying a quadrature correction (DRAG [2]) to minimize unwanted phase evolution and leakage due to the presence of higher levels.
Supplementary Figure 1. Randomized benchmarking of single-qubit gates. Standard single-qubit Clifford randomized benchmarking, indicated as average sequence fidelity (magenta circle) vs. number of Clifford gates. There are 50 randomizations for each number of Clifford gates. The envelope of microwave pulse is a cosine with a total length of 11 ns; a constant buffer time of 7 ns is inserted after each pulse to ensure complete separation of the pulses (inset).

Supplementary Note 3 Measurement setup 3.1 Cryogenic setup
The experiments were performed using a Leiden CF-450 dilution refrigerator, capable of cooling to a base temperature of 15 mK. The samples were magnetically shielded with a superconducting can surrounded by a Cryoperm-10 cylinder. The schematic of the cryogenic circuitry is shown in Figure 2. There are two RF lines for the input and the output of the samples; applying microwave readout tone and measuring the transmission of sample respectively. Thermal noise from room temperature on the RF drive lines is attenuated with 20 dB at the 3 K stage, followed by 6 dB at still, and 26 dB at the 20 mK stage. All attenuators in the cryogenic samples are made by XMA. Note that there is one additional RF line for pumping the Josephson traveling wave parametric amplifier (JTWPA) [S3] used as a first-stage pre-amplifier to amplify the readout signal at base temperature. The effective noise temperature is determined primarily by the JTWPA, with a total system noise measured to be about 600 mK. To avoid any back-action of the pump-signal from TWPA, we added a microwave isolator between the samples and the TWPA. On the RF output line, there is a high-electron mobility transistor (HEMT) amplifier (Cryo-1-12 SN508D) at the 3 K stage. Two microwave isolators allow for the signal to pass through to the amplifier without being attenuated, while taking all the reflected noise off of the amplifier and dumping it in a 50 Ω termination instead of reaching the sample.
There are two additional lines for qubit flux bias: one is for DC flux bias, which is applied globally through a coil installed in the device package, the other is to apply magnetic flux to the qubit thorough a local antenna. The primary requirement of the DC flux bias line is the ability to tune through at least a single flux quantum on the SQUID of the qubit with high precision and low noise. The local flux bias line is attenuated by 20 dB at the 3 K stage, 6 dB at the still, 20 dB at the 20 mK stage to remove excess thermal photons from higher-temperature stages.

Room temperature control
Outside of the cryostat, we have all of the control components which allow us to apply microwave signals that address the cavity and the qubits, as well as the components necessary to resolve the readout signal. All the signals are added using microwave power splitters (Marki PD0R413) used in reverse. Direct digital synthesis of the qubit signals is performed using a high-speed arbitrary waveform generator (AWG Keysight M8195A), which has a 65 GS/sec sampling rate and sufficient bandwidth for this purpose. The output line is further amplified outside of the cryostat with an amplifier (MITEQ AMF-5D-00101200-23-10P) with a quoted noise figure of 2.3 dB, and a preamplifier (Stanford Research SR445A). A detailed schematic is given in Fig 2. We use an IQ demodulation technique to mix down the signal entering the RF port with a reference signal detuned by 40 MHz applied to the LO port. This results in down-converted signals to 40 MHz using a mixer. All components are frequency-locked via a common SRS rubidium frequency standard (10 MHz).

Pulse generation
Qubit control pulse generation is performed via a Keysight M8195A AWG. The pulses are programmed in Labber and then uploaded to the Keysight M8195A.

Generation of engineered noise
To synthesize a zero-mean flux-noise process ∆Φ(t) ≡ Φ(t) − Φ 0 /2 with a given PSD, we use an AWG to produce sample waveforms consisting of N h harmonics where ω m = 2πm/T 0 , and with 2π/T 0 the fundamental angular frequency. The Fourier coefficients a m and b m are random variables with Further taking a m and b m to have normal (Gaussian) distributions with variance σ 2 m = 2S Φ (ω m )/T 0 , the waveforms ∆Φ(t) become a discrete approximation of a Gaussian stochastic process with a frequency-domain PSD S Φ (ω).
In all experiments presented in the main text, we consider with P 0 the noise power and ω c /2π = 0.5 MHz. For the experiment presented in Fig. 2 of the main text, to produce a discrete approximation of a noise process with this spectrum, we take T 0 = 20 µs and N h = 10 3 , corresponding to harmonics separated by the fundamental frequency 1/T 0 = 50 kHz with a high-frequency cutoff at ω N h /2π = N h /T 0 = 50 MHz. For the experiment presented in Figs

Supplementary Note 4 Control pulse sequences
The set of control pulse sequences designed for reconstructing the bispectrum are summarized in Table 2 and visualized in Fig. 3. Note that all control pulse sequences start and end with a π/2 pulse for the purposes of state preparation and tomography.

Supplementary Note 5 Monte Carlo simulations
For the Monte Carlo simulations that are presented in the main text, we consider a single qubit controlled via a microwave drive at angular frequency ω d , which is used to apply pulses about σ x and σ y . In contrast with the main text, here we do not assume that these pulses are instantaneous. To describe the time-evolution of the qubit under the combined action of these finite-width pulses and classical noise described by the process B(t), we consider the Hamiltonian in the lab frame, where ω q is the qubit angular frequency, and ε(t) and θ(t) are the drive amplitude and phase, respectively. We next move to the frame that rotates at the drive frequency by applying the unitary transformation R d (t) = exp(−iω d t σ z /2), leading to the Hamiltonian in the rotating frame. This gives where Assuming that ε(t) ω d and that θ(t) varies on a timescale much longer than 2π/ω d allows us to invoke the rotating-wave approximation, under which terms oscillating like exp{±[2iω d t + θ(t)]} are neglected. The resulting control Hamiltonian may be simplified as where η I (t) ≡ ε(t) cos θ(t) and η Q (t) ≡ ε(t) sin θ(t) are the envelopes of the in-phase and quadrature components of the microwave control signal, respectively. Equation (6) then describes finite-width pulses about x or y axes. We perform Monte Carlo simulations by solving the time-dependent Schrödinger equation associated with the Hamiltonian H d (t) given by Eq. (5), under the control Hamiltonian of Eq. (6). The drive detuning D is set to zero in the simulations. The envelope of each control pulse is cosine with total pulse duration 11 ns (see inset of Fig. 1).
In our Monte Carlo simulations, we account for non-Gaussian noise by setting B(t) ≡ β Φ ∆Φ(t) 2 in Eq. (5), and producing random samples of Gaussian flux-noise ∆Φ(t) through the approach described in Section 3.4 of this Supplement. Because this approach relies on an exact solution of the qubit evolution under the noise samples, it is equivalent to accounting for all the terms in the cumulant expansion. The fundamental frequency 1/T 0 and the number of harmonics N h involved in the Fourier-series representation of the noise process are the same as in Section 3.4. To perform the simulations, we generate 100,000 noise samples for the data presented in Fig. 2 of the main text, and 80,000 noise samples for Figs. 4 and 5.
Supplementary Note 6 Estimation of the noise mean 6.1 Ramsey estimation protocol To measure the noise mean µ B , we use the Ramsey sequence illustrated in Fig. 4a. In this sequence, π/2 pulses about σ x and σ y are applied at times t = 0 and t = T , respectively, followed by a measurement of the qubit in the σ z eigenbasis at time t f = T + ∆T , where ∆T is a buffer time much shorter than T , but longer than the pulse width. To lay down the theoretical basis of the procedure, we start from the rotating-frame Hamiltonian H d (t) introduced in Eq. (5), above. To describe the effects of control with finite-width pulses, it is useful to move to the toggling frame using the unitary transformation R , with T the time-ordering operator, and where H c (t) is given by Eq. (6). In this toggling frame, the Hamiltonian is where y p (t) has components y p, , with a ∈ {x, y, z}, for pulse sequence p. Remark that a pulse sequence consisting of instantaneous π pulses (instead of the Ramsey sequence considered here) would result in y p (t) = [0, 0, y p (t)], where y p (t) is the switching function used in the main text.
Moving back to the lab frame, the expectation of the z component of the qubit polarization after the pulse sequence is where is the time-evolution operator in the toggling frame, and ρ 0 is the initial qubit density matrix, before application of the pulses. We evaluate σ z (t f ) perturbatively by performing a Dyson expansion of U T (t).
0 T t f Drive amplitude Figure 4. Ramsey protocol for estimation of the noise mean. a, Envelopes η I (t) (thin blue line) and η Q (t) (thick red line) of the in-phase and quadrature components of the π/2 pulses applied about σx and σy, respectively (see Eq. (6) in the text). b, Illustration of the technique for estimation of the mean by linear regression. Black error bars: Monte Carlo simulations of the z component of the qubit polarization σz(t f ) after the pulse sequence as a function of the detuning of the drive from the qubit frequency. Blue line: linear regression. According to Eq. (10), the x-intercept of the blue line gives −µ B , and its slope gives the filter function F 0,y (0, t f ), defined below Eq. (9).
Assuming that t f is sufficiently short and that D + B(t) is sufficiently small, we truncate this expansion to the first order in D + B(t). Upon substitution into Eq. (8), this approach is equivalent to neglecting any contribution of cumulants of the noise beyond order 1. Taking ρ 0 ≡ |0 0|, with |0 the eigenstate of σ z with eigenvalue −1, then yields where we have introduced the filter functions F p,a (ω, t) ≡ t 0 ds e −iωs y p,a (s), with a ∈ {x, y, z}. For the pulse sequence illustrated in Fig. 4a (which we label by p = 0), neglecting any overlap between the pulses, it is straightforward to show that where can be viewed as an effective pulse interval accounting for the shape of the pulses. For instantaneous pulses, T = T .
According to Eq. (10), setting D = 0, the first noise cumulant C (1) (0) ≡ µ B may be estimated simply by measuring σ z (t f ) and evaluating σ z (t f ) /F 0,y (0, t f ). However, this approach requires accurate knowledge of F 0,y (0, t f ), and thus of the shape of the control pulses. For the short Ramsey sequences required here to neglect cumulants of order higher than one in Eqs. (9)-(10), the estimate of µ B then becomes excessively sensitive to distortions of the pulse envelopes that occur in practice. As a result, estimates of µ B become significantly biased. Since the bispectrum estimation technique that will be discussed in Sec. Supplementary Note 8 requires precise knowledge of µ B , this bias precludes accurate non-Gaussian QNS.

Robust implementation via linear regression
Crucially, the vulnerability of the above Ramsey scheme to pulse-width effects can be alleviated by estimating µ B with a linear regression procedure. Indeed, according to Eq. (10), plotting σ z (t f ) as a function of the detuning D results in a straight line that intersects with the abscissa at D = −µ B (Fig. 4b), irrespective of F 0,y (0, t f ). Therefore, measuring σ z (t f ) as a function of D and performing a linear fit of the resulting data leads to an estimate of µ B that is insensitive to the pulse shape to the first order in D + B(t).
To apply this idea to our experimental data, we now explicitly construct an estimator of µ B based on linear regression. For each drive detuning D j , with j ∈ {1, 2, . . . , N D }, we consider N projective measurements of σ z yielding outcomes Z j,i = +1 or Z j,i = −1, where i labels measurements. In the limit N 1, the sample mean Z j of the projective measurements for detuning D j becomes Gaussian distributed, j is the expected variance of σ z averaged over realizations of the noise process, for detuning D j . Assuming that var(Z j ) = var(σ z ) j /N is the same for all relevant detunings, var(Z) j ≡ var(Z) ∀ j, Eq. (11) then corresponds to the conditional normal model of linear regression [S4], in which deviations of the data points from the expected linear behavior are given by independent and identically distributed (i.i.d.) Gaussian random variables. This assumption of uniform variance is justified in an approximate sense for ideal projective measurements. Indeed, in this situation, var(Z j ) = [1− σ z (t f ) 2 j ]/N , so that var(Z j ) is independent of D j to first order in σ z (t f ) j ≈ (D+µ B )F 0,y (0, t f ), with var(Z j ) ≈ 1/N . Limiting ourselves to detunings for which σ z (t f ) j 0.05 (see Fig. 4a), we find that var(Z j ) (estimated from the sample mean of measurements of σ z ) varies by less than 5% across values of D j .
To define our estimator of µ B within the conditional normal model of linear regression, we first introduce the quantities a ≡ µ B F 0,y (0, t f ) and b ≡ F 0,y (0, t f ), corresponding to the y-intercept and slope of the linear equation Maximizing the likelihood of a and b with respect to measurement outcomes Z j with the probability distribution given by Eq. (11) then yields the estimators The estimators defined by Eqs. (12) and (13) are Gaussian random variables with E(a est ) = a, E(b est ) = b, and To estimate µ B , we useμ When var(a est ) 1/2 and var(b est ) 1/2 are sufficiently small, we expandμ est B in powers of δa est ≡ a est − a and δb est ≡ b est − b. Truncating to the first order in δa est and δb est ,μ est For the experimental data presented in Fig. 4a of the main text, var(μ est B ) is estimated by replacing a → a est and b → b est in Eq. (15), with a est and b est given by Eqs. (12) and (13), respectively.
Finally, to isolate the shift in the qubit frequency due to the first cumulant of the engineered source of noise, we apply the above procedure first in the absence of noise, and then in its presence. This yields the two sets of data points shown in Fig. 4a of the main text. Fitting a straight line through each data set and substracting their x-intercept then gives our final estimate of is the estimator defined by Eq. (14) with (µ on B ) or without (µ off B ) engineered noise. The variance of µ est B is then simply var(µ est B ) = var(µ on B ) + var(µ off B ). For the experimental data presented in the main text, we find µ est B /2π = 127.1 kHz with a standard deviation var(µ est B ) 1/2 = 3.86 kHz, corresponding to the 95% confidence interval µ est B /2π = 127.1 ± 7.56 kHz.

Supplementary Note 7 PSD estimation procedure
To estimate the PSD, we build on the frequency-comb approach introduced by Alvarez and Suter in Ref. [3]. As detailed in the main text, treating the FFs as frequency combs generates a system of linear equations, solving which determines the PSD sampled at the harmonic frequencies. Rather than solving this sytem by matrix inversion as in Ref.
[3], we employ a statisticallymotivated maximum likelihood estimate (MLE), which takes experimental error into account. The likelihood function we use follows from the asymptotic Gaussian distribution of the measured decay constants, which we describe in Sec. 7.1. In Sec. 7.2, we determine the likelihood, the conditional probability of obtaining a set of decay data conditioned on the actual value of the PSD. The task of maximizing the likelihood can be cast as a linear problem, allowing clear comparison with Ref. [3].

Distribution of the decay constant
The PSD enters the dynamics of the qubit through the decay constant in Eq. (1), which can be obtained from measurements of the transverse Pauli operators, σ x and σ y . Let σ est i denote the estimated expected value of σ i for i ∈ {x, y} after the qubit has evolved for a time t under control sequence p. In the limit of a large number of measurements, σ est i is approximately Gaussian distributed with mean µ i = E[ σ i (t) ] and variance var[σ est i ]. In terms of the estimated expected values, the estimated decay constant is whereσ est i = σ est i − µ i . Note that σ est i , µ i andσ est i on the right side of this expression depend implicitly on the time t. When var[σ est x ], var[σ est y ] 1, we can expand this expression aboutσ est x ,σ est y ≈ 0, yielding Since it is a linear combination of Gaussian distributed random variables, the decay constant is also Gaussian distributed with mean and variance,

Maximum likelihood estimate
Recall from the main text that after M 1 repetitions of a control sequence p with cycle time T , the FF in Eq. (17) is an approximate frequency comb, enabling us to write the decay constant as Using the symmetry S(ω) = S(−ω), and the decay of the PSD and FF at high frequencies, we can truncate the sum above to a finite number of harmonic frequencies, The estimate of the PSD is based on the likelihood or conditional probability of measuring χ = [χ est 1 (M T ), . . . , χ est P (M T )] T for a set of control sequences p = 1, . . . , P with P ≥ K. Since the measurements of the decay constant are uncorrelated, the likelihood follows from Eq. (19), where the P × P covariance matrix has elements Σ p,q = var[χ est p (M T )] δ p,q and the P × K reconstruction matrix B depends on the FFs evaluated at the harmonic frequencies Since the likelihood is Gaussian, the maximum likelihood estimate of the PSD is equivalent to the value of S that minimizes the argument of the exponential in Eq. (20), The least squares estimate of the PSD originally used in Ref.
[3], given by S LS = B −1 χ, is recovered when Σ = I. This implies that all measurements of the decay constant contribute equally to the estimate. In contrast, the dependence of S MLE on the actual variances in Σ ensures that measurements with more uncertainty contribute less to the estimate. In the experiment, the PSD is reconstructed at the harmonics k = 0, . . . , 7 using the P = 11 control sequences depicted in Fig. (3), all with cycle time T = 960 ns. This differs from the implementation of Ref. [3], which uses CPMG control sequences of varying cycle times. For each control sequence, the transverse Pauli components are measured to obtain σ est x , σ est y and χ est p (M T ) from Eq. (16). The variances of χ est p (M T ), which comprise Σ, follow from Eq. (18) with µ x and µ y replaced by the estimated values σ est x and σ est y . In principle, the use of P = 11 control sequences would enable us to reconstruct the PSD at K = 11 harmonics. For the particular set of control sequences we used, however, the reconstruction matrix B becomes ill-conditioned for K > 8, limiting the number of reconstructable harmonics.

Supplementary Note 8 Bispectrum estimation procedure
While this work is based on the non-Gaussian QNS protocol originally proposed in Ref. [4], the estimation procedure we implemented contains several innovations aimed at generalizing the noise model and improving robustness in the presence of experimental error and numerical instability. First, the zero-mean, non-Gaussian noise model of Ref.
[4] is insufficient to describe the square noise engineered in our quarton-qubit sensor, which is inherently nonzero mean. This complicates the estimation procedure, since both the bispectrum and the mean enter the qubit dynamics through the phase in Eq. (2). Estimating the bispectrum requires that we disambiguate the phase contribution of the bispectrum from that of the mean, which we accomplish by first estimating the noise mean and then isolating the dynamical contribution of the bispectrum in the non-Gaussian phase. A second key difference is the "single-shot" nature of the current estimation procedure. In Ref. [4], control sequences with non-zero filter order were first used to estimate the bispectrum at "non-zero harmonics", i.e., (k 1 ω h , k 2 ω h ) for which k 1 , k 2 = 0. This estimate was combined with subsequent phase measurements to estimate the bispectrum at "zero harmonics", i.e. (k 1 ω h , k 2 ω h ) for which k 1 = 0 and/or k 2 = 0. In the RMLE estimate of the spectrum presented here, both the zeros and the non-zero harmonics are estimated simultaneously, eliminating any compounding of error that can occur in the two-step procedure. The present work also departs from Ref.
[4] significantly in its use of a statistically motivated maximum likelihood estimation procedure. As discussed in the main text, the least-squares estimate of Ref.
[4] is susceptible to numerical instability and, additionally, does not take measurement error into account.
In the remainder of this section, we fully detail our bispectrum estimation procedure. We begin by describing the probability distribution of the estimated "non-Gaussian" phase angle (ϕ p ), which enables us to derive the likelihood function for the probability of obtaining a particular set of phase data conditioned on the actual value of the bispectrum (Subsection 8.3). From the likelihood, the task of reconstructing the bispectrum can be mapped into an RMLE problem, as shown Subsection 8.4. The RMLE approach increases numerical stability, accounts for experimental error and allows us to deploy prior knowledge of the bispectrum in the estimation procedure. Since regularization can introduce error into the estimate if it is too strong, in Subsection 8.5 we determine an appropriate regularization strength for our problem using the L-curve criterion.

Distribution of the non-Gaussian phase
The non-Gaussian phase of the qubit is determined from the estimated expected values of the transverse Pauli operators, σ est x and σ est y , when the qubit has evolved under control sequence p for a time t. Recall from Sec. Supplementary Note 7 that in the limit of a large number of measurements, σ est i is approximately Gaussian distributed with mean µ i = E[ σ i (t) ] and variance var[σ est i ]. From the estimated expected values, the ordinary phase is determined by where As a linear combination of Gaussian distributed random variables, the phase is also Gaussian distributed with mean and variance The second equality on the right-hand side of E[φ est p ] follows from Eq. (2). Subtracting out the contribution of the noise mean from the phase produces the non-Gaussian phase, where µ est B is the estimated noise mean described in Sec. Supplementary Note 6. Using the asymptotic Gaussian distribution of µ est B with mean µ B and variance var[µ est B ], the non-Gaussian phase is similarly Gaussian with mean and variance var[ϕ est Note that E[ϕ est p (t)] depends to leading order on the bispectrum, unlike E[φ est p (t)] above.
The symmetries and multiplicities simplify the expected value of the phase substantially. Recall that after M 1 repetitions of control sequence p with cycle time T , the frequency comb approximation enables us to write the expected phase as a discrete sum depending on the bispectrum and the FF evaluated at the harmonic frequencies, In terms of the multiplicities, we can rewrite the sum as , and truncating the sum to a finite subset K 2 , we obtain

Likelihood function P ( ϕ| S 2 )
Given the actual bispectrum in K, S 2 = [S 2 (ω h k 1 ), . . . , S 2 (ω h k N )] T , the conditional probability of measuring ϕ est p (M T ) follows from Eqs. (25)- (27), Reconstructing the bispectrum requires measurements the non-Gaussian phase for a set of control sequences p = 1, . . . , P with P ≥ N , which we gather into the column vector ϕ = [ϕ est 1 (M T ), . . . , ϕ est P (M T )] T . Because the non-Gaussian phase measurements are uncorrelated, the likelihood or probability of obtaining ϕ given S 2 is a product of the conditional probabilities for the complete set of control sequences, Here, the P × P covariance matrix Σ is diagonal with elements Σ p,q = var[ϕ est p (M T )] δ p,q , and the P × N reconstruction matrix A depends on the filter functions evaluated at the harmonic frequencies, In the experiment, the likelihood in Eq. (29) is determined by measuring the non-Gaussian phase for each of the P = 11 control sequences in Fig. (3). Since we also rely on these sequences to estimate the PSD, both the bispectrum and the PSD can be estimated with the the same set of transverse Pauli measurements. For each of the control sequences, ϕ est p (M T ) was determined from Eq. (22) using measurements of σ est x , σ est y and µ est B . The variances of the ϕ est p (M T ), which constitute the covariance matrix Σ, are given by Eq. (26) with µ x and µ y replaced by the estimated values σ est x , σ est y . The reconstruction matrix A is determined from Eq. (30), with each FF evaluated on the set of harmonics K 1 depicted in Fig. 3(b).

Regularized maximum likelihood estimation
For the Gaussian likelihood derived in the previous section, the maximum likelihood estimate (MLE) of the bispectrum is equivalent to the value of S 2 that minimizes the exponent in Eq. (29), In the special case where Σ ∝ I, we recover the least-squares estimate used in Ref.
[4] with solution S LS 2 = A −1 ϕ. Even with a nonuniform covariance matrix, Eq. (31) is a simple convex optimization problem admitting an analytic solution for S MLE 2 . When A is ill-conditioned, however, the MLE suffers from numerical instability, which can introduce significant error into the estimate of the bispectrum, despite the existence of an analytic solution. The problem can be made more stable by introducing a regularization term or "regularizer" Q( S 2 ), producing the regularized maximum likelihood estimate (RMLE) of the bispectrum, The regularizer imposes additional structure on the solution, making it more robust to numerical instability arising from A. It also prevents overfitting, in which the estimated bispectrum is unduly influenced by errors in ϕ and is, thus, a poor predictor of the qubit dynamics under more general control settings.
There are numerous methods of regularization for ill-conditioned and/or ill-posed problems. An approach particularly ammenable to maximum likelihood estimation is Tikhonov regularization, which employs an L2 regularizer Q( S 2 ) = ||λ S 2 || 2 2 with strength controlled by the regularization parameter λ ≥ 0 [S5]. In Eq. (32), this regularizer has the effect of penalizing S 2 with larger L 2 -norms. To estimate the bispectrum, we consider a variation of Tikhonov regularization in which where D = diag (d 1 , · · · , d N ) is the diagonal "smoothing matrix". Note that the Tikhonov regularizer is recovered when d 1 = . . . = d N = 1 (Fig. 5a) and the standard maximum likelihood estimate is recovered when λ = 0. Using non-uniform values for the diagonals enables us to incorporate prior information about the bispectrum. For example, if the magnitude of the bispectrum is known to decay at the high-frequency border of K, we can make the corresponding harmonics in D large compared to those of the interior (Fig. 5b). Such a smoothing matrix favors a solution with small magnitude at the border. The connection between the smoothing matrix and prior knowledge of the bispectrum is more explicit in a Bayesian formulation of the estimation problem in which the RMLE estimate in Eq. (32) with the regularizer in Eq. (33) is equivalent to a posterior mean estimate in which the prior distribution of the bispectrum is Gaussian and zero-mean with covariance matrix (2λ 2 D 2 ) −1 , provided D is full-rank. For any smoothing matrix, the regularized maximum likelihood estimate has the simple analytic solution,

The L-curve criterion
Although it guards against numerical instability and overfitting to errors in the measured data, the regularizer can introduce its own error into the estimate if λ too large. A fundamental challenge in regularization is selecting a value of λ that balances these sources of error. While this problem is still an active area of research, one of the most widely used strategies for selecting λ is the L-curve criterion [S6]. A graphical technique, the L-curve criterion enables one to visualize the magnitude of the regularization error in proportion to other errors in the estimate and choose λ accordingly. For a given λ, the sources of error that contribute to the regularized maximum likelihood estimate in Eq. (34) are described by the residual norm and the solution norm The regularization parameter λ enters both the residual norm and solution norm implicitly through S RMLE

2
. The residual norm increases as λ grows, while the solution norm decreases. When E(λ) is too large relative to R(λ), the estimate does not account for the measured data due to error introduced by the regularization. Conversely, when E(λ) is too small relative to R(λ), the estimate is fits the measured data too closely, making it susceptible to overfitting and numerical error. The influence of λ on the error conributions is captured by the L-curve, a parametric plot of logR(λ) vs. logE(λ) as a function of λ. A typical L-curve, with its characteristic "L" shape, is shown in Fig. 6. Note that as λ increases from left to right, logR(λ) sharply decreases and then plateaus, while logE(λ) is initially stable followed by a rapid increase. The corner of the L-curve, marks a point at which the solution norm and residual norm are small simultaneously. The corner, thus, signifies the optimal value of λ according to the L-curve criterion. Figure 7 shows L-curves generated by our experimental data for the two different smoothing matrices illustrated in Fig.  5. For both the uniform and non-uniform smoothing matrices, the L-curves lack corners. Unlike the typical L-curve in Fig. 6, logR(λ) does not exhibit a sharp increase as λ → 0. This indicates that, for the control sequences we have selected, the reconstruction matrix A is sufficiently well conditioned to make regularization the dominant source of error [S7]. Consequently, it is not optimal to utilize regularization in this setting and the reconstruction presented in the main text uses λ = 0.
Note that this finding is contingent on both A and the particular regularizer we employ. Estimating the bispectrum at a greater number of harmonics demands a larger A, which is more likely to be near singular and/or poorly conditioned. This scenario will likely require some form of regularization. Additionally, the error introduced by regularization is reduced when prior knowledge of the bispectrum (if available) is used to select the regularizer. For example, suppose a noise model or previous experiment indicates that S 2 takes a value in the vicinity of S µ . This information is captured by the regularizer This corresponds to a Gaussian prior distribution of S 2 with mean S µ and covariance matrix (2λ 2 D 2 ) −1 . In contrast, naively employing Tikhonov regularization amounts to assuming a zero-mean prior distribution of S 2 .