A random-walk benchmark for single-electron circuits

Mesoscopic integrated circuits aim for precise control over elementary quantum systems. However, as fidelities improve, the increasingly rare errors and component crosstalk pose a challenge for validating error models and quantifying accuracy of circuit performance. Here we propose and implement a circuit-level benchmark that models fidelity as a random walk of an error syndrome, detected by an accumulating probe. Additionally, contributions of correlated noise, induced environmentally or by memory, are revealed as limits of achievable fidelity by statistical consistency analysis of the full distribution of error counts. Applying this methodology to a high-fidelity implementation of on-demand transfer of electrons in quantum dots we are able to utilize the high precision of charge counting to robustly estimate the error rate of the full circuit and its variability due to noise in the environment. As the clock frequency of the circuit is increased, the random walk reveals a memory effect. This benchmark contributes towards a rigorous metrology of quantum circuits.


SUPPLEMENTARY NOTE 1. BASELINE MODEL
Consider a time-homogeneous discrete-time random walk on the set of integers which starts at 0 and at each step moves +1 with probability P + , moves −1 with probability P − and stays at the same vertex with probability P 0 ; here we assume P + , P − , P 0 ∈ (0, 1), P − + P 0 + P + = 1.
To describe this process formally, consider a random variable K = (K −1 , K 0 , K +1 ) following a multinomial distribution with t > 0 trials and three categories, with associated probabilities P − , P 0 and P + , respectively. Then the random variable X = K +1 − K −1 corresponds to the position of the random walker after t steps, since all steps can be modeled with independent discrete random variables with three possible outcomes (−1, 0 and +1, respectively) and respective probabilities P − , P 0 and P + . First we show that the probability mass function of the discrete variable X ∈ {−t, −t + 1, . . . , t − 1, t} is given by (1) in the main text; i.e., let p t x := Pr(X = x), then Claim 1. The probability mass function (PMF) of the variable X is for x ∈ {−t, −t + 1, . . . , t − 1, t}.
Proof. Suppose that x ≥ 0; then the event X = x, i.e., the event of the random walker being at the position x after t steps, is equivalent to the event that the multinomially distributed variable K = (K −1 , K 0 , K +1 ) satisfies K +1 − K −1 = x (i.e., to the event that the random walker has moved K +1 steps to the right and K −1 = K +1 − x steps to the left). Therefore Pr(X = x) can be obtained from the multinomial distribution's PMF as Pr(K = k).
Since K follows a multinomial distribution with t > 0 trials and three categories with respective probabilities P − , P 0 and P + , its PMF is given by where k := (k −1 , k 0 , k 1 ) is a vector of nonnegative integers satisfying k −1 + k 0 + k 1 = t. Notice that such k can additionally satisfy k 1 − k −1 = x ≥ 0 only if k is of the form k = (s, t − x − 2s, x + s), for some nonnegative integer s. Moreover, since k 0 ≥ 0, we obtain the constraint t − x − 2s ≥ 0, i.e., s ≤ 0.5(t − x). We conclude that all suitable vectors k are parametrized by a nonnegative integer s, which is upper-bounded by 0.5(t − x) (more precisely, the maximal valid s value is the floor function of 0.5(t − x)). For each such s the corresponding vector is k = (s, t − x − 2s, x + s) and the probability of the event K = k is respectively. Thus p t x := Pr(X = x) is simply the sum of these multinomial probabilities: The latter quantity can be equivalently expressed as

Let us show that
which will establish (1) and conclude the proof. We start by rewriting the LHS of (2). Observe that (x + s)! = (x + 1) s , where (a) s := a(a + 1) . . . (a + s − 1) stands for the Pochhammer's symbol. Furthermore, where the last step applies the identity (−a) s = (−1) s (a − s + 1) s . Therefore The upper limit (t − x)/2 in the latter sum can be replaced with infinity, since the numerator x−t 2 s x−t+1 2 s is zero for the additional terms with s > 0.5(t − x). It remains to recognize now that the sum coincides with the definition of the Gaussian hypergeometric function 2 F 1 . We have verified (2), which concludes the proof of (1) when x ≥ 0. The case x < 0 follows from similar considerations.
We note that discrete distributions similar to X have been considered before. In particular, [1] considers an analogue of our random variable X and computes p t 0 (termed "return probability p 0 (t)" in the paper). X is also closely related to the inverse trinomial distribution [2,3], defined via a random walk on the line. Nevertheless, we are not aware of prior work establishing the PMF (1) of X.
The return probability for the random walk, p t 0 , is the probability of error-free electron transfer in the context of our benchmarking application, and hence can also be interpreted as transfer fidelity. As long as the contribution of the return paths is negligible, p t 0 decays exponentially, but for larger t the exponential decay is modified. Below we derive an explicit asymptotics that characterizes both sides of this crossover.
Proof. By (1) we have where we denote z = 4P + P − /P 2 0 . We start by observing that by a quadratic transformation [4, Eq. 15.8.13] we have where the variable ζ is defined by ζ 2−ζ = √ z, i.e., Using the equality (3) we arrive at Since the hypergeometric function on the right hand side of (4) is a degree-t polynomial in the variable ζ 1, for small values of t we can approximate P 0 + 2 P + P − ≈ P 0 and 2 F 1 (−t, 0.5; 1; ζ) ≈ 1, leading to the first part of the claim. Now we consider (4) with fixed ζ when t → +∞. We apply an asymptotic expansion of the hypergeometric function in case of a large argument due to Erdélyi [5,p. 77,Eq. 15], which exploits the relation between the (Gaussian) hypergeometric function 2 F 1 and the confluent hypergeometric function 1 F 1 : Combining this with (4) and substituting ζ = gives us the second part of the claim.
Finally, consider N independent observations of the random variable X, i.e., i.i.d. random variables X 1 , . . . , X N ∼ X. Let Z x = |{j ∈ {1, . . . , N } : X j = x}| be the number of times the value x ∈ {−t, . . . , t} appears among these N observations. Then the random variable follows a multinomial distribution with N trials and 2 t+1 categories, labeled from −t to t, and respective probabilities p t x . When there is no ambiguity, this notation is simplified to Z. The probability to observe a particular vector z ∈ N 2t+1 0 , t x=−t z x = N (where N 0 stands for the set of nonnegative integers) is Under the assumption that the P t ± -values are independent of the position x of the random walker, they can be extracted by deconvolution of p t x and p t+1 x . For that let us expand the model used so far and consider a random walk on the set of integers which at time t performs transition x → x + j with probability P t j , x, j ∈ Z. Here P t j ∈ (0, 1) for all j and j∈Z P t j = 1. The experiment yields two vectors from R 2m+1 , m ∈ N, representing the distributions p t = p t −m , . . . , p t −1 , p t 0 , p t 1 , . . . , p t m and p t+1 = p t+1 −m , . . . , p t+1 −1 , p t+1 0 , p t+1 1 , . . . , p t+1 m . We shall assume P t j = 0 for all j ∈ Z s.t. |j| ≥ m. The distribution p t+1 represents the position of the random walker after t + 1 steps and satisfies p t+1 i.e., p t+1 = P t * p t is the discrete convolution of P t = P t −m , . . . , P t −1 , P t 0 , P t 1 , . . . , P t m and p t . Therefore P t can be extracted by discrete deconvolution, which is performed as follows.
Let p t+1 , p t and P t stand for the Fourier transform of p t+1 , p t and P t , respectively, then The vector p t is calculated from p t as similarly for p t+1 . Now we can the get P t by applying the inverse discrete Fourier transform to P t :  Further details on how the deconvolution is performed and the uncertainty propagates can be found in [6]. Now that we have extracted P t , we find for the experiment described in the main text, that P t |j|>1 ≈ 0 which allows us to approximate P t + = P t +1 , and P t − = P t −1 .

SUPPLEMENTARY NOTE 3. COMPARISON BETWEEN MEASURED AND PREDICTED P±
The characterization of the single electron pumps gives us their transport statistic q (i) m , which is the probability of pump i ∈ {1, 2} transporting m ∈ Z electrons. Assuming independence of simultaneous pump operation we can calculate the probability P x that charge on the island increases by x ∈ Z electrons (here P ±1 is equivalent to P ± in (1) in the main text) as Table S1 provides an example for agreement between measured and predicted values of P ± for non-interacting pumps.
For the measurement in Table S2 the waveform of the pump drive was changed from a low-frequency sinusoidal to a sharp voltage transient. Here we see a strong disagreement between the prediction of single pump characterization and the measurement of the P ± -values. This disagreement is caused by a strong shift of the operation point which occurs as soon as the pumps are operated simultaneously, indicating a strong correlation between the pumps.

SUPPLEMENTARY NOTE 4. MODEL CONSISTENCY AND FISHER'S SIGNIFICANCE TESTING
The experimental data consist of observations (actually, rebinned observations as described in Supplementary Note 5) of random variables Z N1,t1 , Z N2,t2 , . . . , Z N L ,t L , for several different pairs (N 1 , t 1 ), . . . , (N L , t L ), which, according to the model outlined in Supplementary Note 1, all share the same step probabilities (P − , P + ).
We consider the problem of determining if there is a parameter P ± such that the experimental data do not contradict the model, at the fixed significance level. More generally, we are interested in extracting a region in the parameter space such that the experimental data do not contradict the model for each choice of the parameter from the region; for brevity, we will refer to this region as consistency region. It should be stressed that this approach is different from parameter estimation problem, in that here we are interested in parameter values which cannot be statistically rejected as incompatible with the data, whereas the parameter estimation techniques deal with estimating the values of the parameters in some fashion, e.g., by finding the values of parameters under which the experimental data are most probable under the assumed model.
The problem of testing consistency of the model with a specific parameter value is twofold: since the data correspond to several pairs of (N, t), with different parameters N, t but the same step probabilities (P − , P + ), there are two questions to be asked: 1. Are the data for the particular value of (N, t) consistent with the model for some parameter P ± ? 2. Are all the data consistent with the model for some fixed value of P ± ? Let z 0 be an observation of the random variable Z := Z N,t , with prescribed parameters t, N but unknown probabilities P − , P 0 , P + .
We employ Fisher's significance testing framework in order to extract the consistency regions for the parameter θ = (P − , P 0 , P + ). In its simplest form, a Fisherian test formulates [7] a single hypothesis, the null hypothesis H 0 , which specifies the null distribution (i.e., in our case H 0 : θ = θ * for some fixed θ * ); then a certain test statistic T is computed from the observation z 0 , leading to a value T (z 0 ). The p-value of the test is the tail probability of T (Z) under H 0 . In our setting, the test statistic will be non-negative and smaller values will indicate stronger disagreement with the null hypothesis. Then the p-value of the test is where Pr(Z = z) stands for the probability of the event Z = z under the null hypothesis and the sum is over all those values z of the random vector Z that satisfy T (z) ≤ T (z 0 ) In the Fisher's significance testing framework the p-value is interpreted as "a measure of extent to which the data do not contradict the model" [7, p.122]. Therefore Fisher's significance testing allows to check if H 0 must be rejected (at the chosen significance level) for the particular value θ * ; next, we shall employ Fisher's significance testing to extract the region of those θ values for which the respective H 0 cannot be rejected, see Supplementary Note 5.
The problem of testing whether the parameters of a multinomial distribution equal specified values has been wellinvestigated [8][9][10][11]. The common approaches (such as Pearson's χ 2 test, G 2 test or power-divergence test [10] which subsumes the former tests) are asymptotic tests which can be highly biased. This is due to the fact that under the null hypothesis the random variable X has vanishingly small tail probabilities (and the actual observed samples z have zero observed counts in the respective positions). This phenomenon makes the asymptotic tests ill-suited for the actual data.
An alternative to the aforementioned tests is the exact multinomial test [10], which enumerates all possible multinomial outcomes; its test statistic T is the probability of obtaining the particular outcome under the null hypothesis. Then the p-value of the test is z: However, the exhaustive enumeration quickly becomes computationally intractable as N grows. We instead apply a Monte Carlo test (proposed in [12], see also [11,13,14]), which can be seen as an extension of the exact multinomial test. In the Monte Carlo hypothesis testing procedure, a large number (say, N sim ) samples from the multinomial distribution under the null hypothesis are simulated; for each sample z the test statistic Pr(z) is calculated (i.e., the probability to draw z from the distribution Z under the null hypothesis). Let k be the number of samples for which the test statistic is at least as extreme as for the observed vector z 0 (i.e., the number of samples z for which Pr(z) ≤ Pr(z 0 )). Then the p-value of the test is (k + 1)/(N sim + 1).

SUPPLEMENTARY NOTE 5. CONSISTENCY REGIONS
Since P 0 = 1 − P + − P − , the Monte Carlo tests are applied to extract 95% consistency region for the pair (P − , P + ). This region is defined as the set of all admissible (P − , P + ) values for which the p-value obtained by testing the hypothesis H 0 : θ = (P − , (1 − P − − P + ), P + ) is at least 0.05.
In practice, since the observed vector z 0 has many zero entries (as N is too small to observe "X = x" when |x| is large) and, since the experimental data is limited to small |x|, the data are rebinned. Depending on the dynamical range of the detector and available computational resources, we consider a random variablẽ and perform the aforementioned tests against an observationz 0 ofZ. Further on, this subtlety will be assumed implicitly, i.e., when talking of the random variable Z or its observation z 0 , the rebinned counterpartsZ andz 0 are to be understood.

SUPPLEMENTARY NOTE 6. COMBINING THE p-VALUES
The discussion above attempts to answer if the data are consistent with some P ± , for a particular value of (N, t); the challenge now is to combine the statistical tests done for all L pairs of (N, t). While for each fixed pair (N, t) the 95% consistency region can be constructed from the observation of the respective Z N,t , the goal is to obtain a global measure of discrepancy between the data and the hypothesis H 0 : θ = (P − , (1 − P − − P + ), P + ), taking into account the observations for all pairs (N, t).
This task can be viewed as the problem of combining several independent p-values, which arises in meta-analysis [15]. When testing a true point null hypothesis and the test statistic is absolutely continuous, it can be shown that the p-values under the null hypothesis are uniformly distributed in [0, 1]. This allows to apply, e.g., Fisher's method of testing uniformity [16] (for an overview of other ways to combine p-values, see [17,Appendix A]). In our case both the random variables Z N,t and the test statistic are discrete, thus under the null hypothesis all p-values obtained for each pair (N, t) only approximate the uniform distribution. Fisher's method is used to approximately determine the combined p-value, even though in case of sparse discrete distribution this approximation may [18] yield conservative results.
In practice, due to the computational cost involved with computing the combined p-value, this global consistency test is only performed for a single value of θ. The value (P + , P − ) = (2.13 × 10 −5 , 6.92 × 10 −5 ) we performed the combined test on is the one under which the observed data are most probable, i.e., the maximum likelihood estimate, see Supplementary Note 7. However, the combined p-value 2.23 × 10 −6 means that H 0 needs to be rejected; also visually (see Figure 2c, triangles) it is clear that the distribution of p-values is far from uniform. Hence one concludes that this model with fixed P ± for all pairs (N, t) is incompatible with the experimental data.

SUPPLEMENTARY NOTE 7. MAXIMUM LIKELIHOOD ESTIMATION
The preceding discussion tries to determine if the data contradict the model, within the given level of significance. However, if one only tries to find the most suitable choice of parameters P ± , a natural approach is to maximize the likelihood function, i.e., (in case of a single observation for a single pair (N, t)) maximize the expression in (5), with z x being the actual observed values, with respect to the unknown parameters P ± . The task is equivalent to maximizing the logarithm of the likelihood, where p t x (θ) stands for the RHS in (1) and ln Γ is the natural logarithm of the gamma function. Since the observations across the L different pairs (N i , t i ) are assumed to be independent, the joint probability of observing the complete data is the product of individual probabilities for each separate (N i , t i ), i.e., the global log-likelihood function to be maximized is Ni,ti (θ).

SUPPLEMENTARY NOTE 8. DIRICHLET DISTRIBUTION-BASED RANDOM-WALK MODELS
Further we consider the case when the step probabilities P − , P + are themselves random variables. We assume that (P − , P 0 , P + ) follows a Dirichlet distribution, which is [19] "one of the key multivariate distributions for random vectors confined to the simplex". The Dirichlet distribution also becomes important when the observed data are superficially similar to the multinomial distribution but exhibit more variance than the multinomial distribution permits. As authors in [19, p.199] note, "One possibility of this kind of extra variation is that the multinomial probabilities" are not constant across the trials and the vector of probabilities can be interpreted as a random vector in the standard simplex; in this case the Dirichlet distribution is a convenient choice, resulting in a compound probability distribution, the Dirichlet-multinomial distribution [19,Definition 6.1].
The Dirichlet distribution on the standard 2-simplex ∆ 2 with positive parameter vector α = (α 0 , α 1 , α 2 ), denoted by Dir(α), is a probability distribution with [19, Definition 2.1] the density function The mean value and the variance of θ i , i = 0, 1, 2, are given by respectively, i.e., the mean value of θ i is proportional to the parameter α i , but the variance of θ i decreases as 2 i=0 α i is increased. This allows to employ the Dirichlet distribution to model the scattering of the vector (P − , P 0 , P + ) ∈ ∆ 2 around its mean value with a single additional parameter characterizing the magnitude of the scattering.
We proceed by considering two extensions of the baseline model, one where the variable θ = (P − , P 0 , P + ) is chosen independently for each of the N separate random walks, and another where θ = (P − , P 0 , P + ) is the same for all N random walks (but another θ ∼ Dir(α) is independently drawn if either N or t is changed).

Model description
Let α = (α −1 , α 0 , α 1 ) be a fixed vector of positive parameters. For each pair (N, t) we consider the following process: repeat N times: choose a random vector θ = (P − , P 0 , P + ) ∼ Dir(α) (independently each time); perform t steps of the random walk with the respective step probabilities (P − , P 0 , P + ); observe the position of the random walker X ∈ {−t, −t + 1, . . . , t − 1, t}; given the N observations X 1 , . . . , X N , denote Z x = |{j ∈ {1, . . . , N } : X j = x}| and define the random variable This model corresponds to choosing step probabilities P − , P + independently for each repetition of a random walk of a fixed length t. This way the random variable Z N,t again has multinomial distribution, but now with modified (compared to the baseline model) probabilities incorporating the underlying Dirichlet distribution.
After N independent observations the multinomial vector Z N,t is obtained, supported in the set The variable Z N,t still has the multinomial distribution, as in the baseline model, and Eq. (10) is the same as (5) but with p t x given by (9). However, in contrast to the baseline model, the vector K has the Dirichlet-multinomial distribution instead of the multinomial distribution as before. That, in turn, implies that the probabilities p t x are not calculated from (1), but given by (9) instead. In effect, Z N,t is a multinomial distribution, but different probabilities associated with its categories, when compared to the baseline model. The value α * to be tested in the previous step is again the maximum likelihood estimate, obtained by maximizing the function and p t x (α) is given by the RHS of (9). Maximizing this function over the parameter space using the experimental data gives the maximum likelihood estimate α * = (9.08 × 10 1 , 1.31 × 10 6 , 2.78 × 10 1 ). However, the combined pvalue 2.08 × 10 −6 again indicates that H 0 needs to be rejected; as it is seen in Figure 2c (squares), the distribution of p-values still remains far from uniform. Consequently, this model is also incompatible with the experimental data. SUPPLEMENTARY NOTE 11. SLOW-DRIFT MODEL

Model description
Let again α = (α −1 , α 0 , α 1 ) be a vector of positive parameters. Now we consider the following process for each pair (N, t): choose a random vector θ = (P − , P 0 , P + ) ∼ Dir(α); repeat N times: perform t steps of the random walk with the respective step probabilities (P − , P 0 , P + ); observe the position of the random walker X ∈ {−t, −t + 1, . . . , t − 1, t}; given the N observations X 1 , . . . , X N , denote Z x = |{j ∈ {1, . . . , N } : X j = x}| and define the random variable This way, the vector θ = (P − , P 0 , P + ) ∼ Dir(α) is drawn independently across different pairs (N, t), yet for each particular (N, t) it is fixed for all N random walks (the N random walks are assumed to be conditionally independent given θ). The resulting random variable Z N,t has a discrete compound distribution, akin to the Dirichlet-multinomial distribution; however, Z N,t is not multinomially distributed anymore.
Technically, the key difference from the previous model is that all N random walks use the same (randomly drawn from Dir(α)) vector θ, therefore marginalization of θ happens only after forming the counts vector Z N,t . In contrast, in the previous model the Dirichlet variable is marginalized after forming the vector K, resulting in the Dirichletmultinomial distribution for K and a standard multinomial variable Z N,t .

Probability mass function
To characterize the model more formally, for each pair (N, t) and a fixed vector θ = (P − , P 0 , P + ) let p t x (θ), |x| ≤ t, be defined as in the RHS of (1). The random variable Z N,t is defined by compounding the multinomial distribution where the integration is over the standard 2-simplex ∆ 2 and is the PDF of the Dirichlet distribution (adapted from (8)). It is worth mentioning that since only the parameters P ± , P 0 are chosen from the Dirichlet distribution, instead of all 2t+1 event probabilities associated to the multinomial distribution, the resulting compound distribution is not Dirichlet-multinomial.

SUPPLEMENTARY NOTE 12. CONSISTENCY OF SLOW-DRIFT MODEL
Given an observation z 0 of Z N,t , we again perform Monte Carlo test of the hypothesis H 0 : α = α * , for some fixed α * . However, now the probability Pr(Z N,t = z) has the complicated analytical form (11), which is difficult to compute numerically. Therefore also Pr(Z N,t = z) is estimated via Monte Carlo approximation, i.e., for the particular parameter α * and the observed vector z 0 we draw N sim independent samples θ ∈ ∆ 2 from Dir(α * ); for each of the sampled vectors θ = (P − , P 0 , P + ) draw a sample z from the multinomial distribution specified by (5) (where the probabilities p t x are computed using the sampled values P − , P + ).
This way N sim vectors z 1 , . . . , z Nsim are obtained, among them many may coincide. Suppose that m distinct vectors z 1 , . . . , z m were obtained, with their respective frequencies k 1 , k 2 , . . . , k m , i k i = N sim . We can assume that k 1 ≤ k 2 ≤ . . . ≤ k m .
Suppose that z j coincides with the actual observation z 0 , and (provided that j < m) k j < k j+1 ; then the p-value of the test is declared (k + 1)/(N sim + 1), where k := k 1 + k 2 + . . . + k j . In case z 0 does not occur among the N sim obtained vectors, the p-value is declared 0.
By employing the outlined procedure, we can perform consistency testing similarly as before. Consistency of the model taking into account all L different pairs (N, t) is done by combining the L obtained p-values, via Fisher's method of testing uniformity.
The value α * to be tested now is found differently, compared to the previous models. This is due to the fact that the probabilities Pr(Z = z) are estimated only approximately via Monte Carlo, which complicates maximizing the likelihood function.
Instead, we fix the fractions αi α• to the best values of P ± found in the baseline model (Supplementary Note 7) and optimize the parameter α • , i.e., α is in form α • · (P − , P 0 , P + ), where (P + , P − ) = (2.13 × 10 −5 , 6.92 × 10 −5 ). The cost function associated with α • is where p (i) stands for the i-th smallest value among p 1 , p 2 , . . . , p L , where the latter are the p-values returned by the L tests of the hypothesis H 0 : α = α • · (P − , P 0 , P + ). In other words, the cost function measures the distance between the empirical distribution function of p-values and the line corresponding to the cumulative distribution function corresponding to the uniform distribution. The minimization of the cost function over α • estimates the optimal parameter to α * = (2.43 × 10 3 , 3.50 × 10 7 , 7.47 × 10 2 ), with precision limited by numerical expense of Monte Carlo trials for p (i) . The combined p-value equals 0.71, therefore the null hypothesis H 0 : α = α * cannot be rejected. Also, as it is seen in Figure 2c (diamonds), the distribution of p-values visually conforms to the uniform. Henceforth, the experimental data do not contradict this model.

SUPPLEMENTARY NOTE 13. EXCESS NOISE SIMULATIONS
The purpose of this Supplementary Note is to illustrate that our excess noise models can be consistent with and give reasonable estimates of realistic parametric variability one expects from an ensemble of TLFs (multi-timescale 1/f -noise), or from a single but strong TLF (bimodal excess noise), despite a generic Dirichlet distribution and a single correlation timescale underpinning the "fast-fluctuator" and the "slow-drift" tests.
First we define a model that simulates fluctuating environment of the real experiments, and then analyze the simulated error counts using the same statistical tests as applied in the main text and described in Supplementary Notes 4, 9 and 11. The results in Section 14 below illustrate the three main findings summarized in the main text: (i) a threshold in excess noise amplitude, above which the advanced statistical tests become useful; (ii) an example of simulated environment with results of statistical tests similar to experiment, and a comparison between estimated and actual measure of parametric variability ∆P ± ; (iii) a summary of single fluctuator behavior w.r.t. "fast fluctuator" and "slow drift" statistical test.
Our approach to simulate 1/f noise by an ensemble of two-level fluctuators (TLFs) follows the principles reviewed in [21]. The timeline for the experiment is shown schematically in Figure S1. The differential error signal x is measured after each burst (i.e., transfer sequence) of length t i , repeated consecutively N i times, for a fixed set of burst lengths i = 1 . . . L. We explore the regime when the duration of one burst, t i f −1 , is much shorter than the detector-limited time interval τ 0 between the repetitions (e.g., up to a few µs for the former and on the order of a ms for the latter for device A). Switching events in the environment can only contribute significantly on time scales of τ 0 and longer, and are neglected during the short bursts. This means that P ± (t) remain fixed during a single instance of the random walk starting at an absolute time t (an integer multiple of τ 0 ), and the particular value of x is distributed according to Eq. (1) with a particular instance of P ± (t). In the statistical models defined in Supplementary Note 9-11, the values of P ± are drawn from a Dirichlet distribution after the time τ 0 ("fast fluctuator") or after time τ 0 N i ("slow drift"). Here, in contrast, we generate P ± (t) from a continuous-time Markov process characterized by a set of switching rates 0 . The value of P ± (t) is determined by an average over M two-level fluctuators, where ξ m (t) = 0 or 1 is the state of the m-th fluctuator at time t. Each fluctuator is characterized by two modes, P (m) ± [0] and P (m) ± [1], both drawn once for each full-timeline simulation from a globally fixed Dirichlet distribution. Parameters of the latter are the mean values P ± and the total α noise which determines the level of excess noise. The scatter parameter α noise of the mode distribution is free; it controls the excess noise level.
Switchings ξ m → 1 − ξ m happen randomly with a rate Γ m (same in both directions) independently of other fluctuators. Hence each fluctuator is described by a 2 × 2 continuous-time Markov chain transition rate matrix −Γm Γm Γm −Γm . An example of a timetrace of one fluctuator switching between two modes is shown by colorboxes in Figure S1.
Single TLF: M = 1 and Γ 1 as an adjustable parameter.
In a simulation, N tot values of x are available. For each burst length t i , the corresponding vector z i containing the number of counts for each category of x is compiled in the same way as in experiment. For statistical evaluation of the simulated timetraces, we have binned the simulated counts into five categories, instead of seven as in Eq. (7).
We fixed the set of L = 42 burst lengths with 1 ≤ t i ≤ 100 and samples sizes {N 1 . . . N L } with N tot = L i=1 N i = 38 022 642 and 5.00 × 10 5 ≤ N i ≤ 1.40 × 10 6 to be the same as for experimental results on device A reported in Figure  2 in the main text.
Similarly, parameters of the Dirichlet distribution for the fluctuator modes are chosen to be comparable to the experimental values, ( P + , P − ) = (2.10 × 10 −5 , 7.00 × 10 −5 ). We quantify the excess noise level by computing directly the relative standard deviation (RSD) of the simulated time traces P + (t) and P − (t), and choosing the maximum: RSD = max s=+,− σ s /µ s , where are respectively the mean and the standard deviation of a particular simulated time trace; the time along the data acquisition time-line t = n τ 0 is measured from 0.

Detection of excess noise
We have run a number of simulations of 1/f noise with varying the noise strength parameter α noise from 10 7.5 to 10 3.6 and checked the statistical consistency of the accumulated error counts with the three simple models discussed in the main text: baseline model (no excess noise), "fast fluctuator" and "slow drift". The results are plotted in Figure S2. Each symbol represents a simulation; its abscissa equals the RSD of the respective timetrace and its ordinate is the combined p-value obtained via consistency testing of the respective maximum likelihood estimate.

Relative
First we consider the baseline model results shown by triangles in Figure S2. For very low noise (essentially constant P ± (t)) the null hypothesis is expected to be true, and the p-values should be scattered approximately uniformly between 0 and 1, while for stronger noise we expect small p to become progressively more likely.
To illustrate this transition more clearly, we have added to Figure S2 box-and-whisker plots of the same dataset, by binning the baseline p-values by the corresponding RSD, i.e., first group with RSD < 2%, second with 2% ≤ RSD < 4% and so on, up to 18% ≤ RSD < 20%. The box-and-whisker plots are computed using the standard way, i.e., if Q 1 , Q 2 , Q 3 are the quartiles of the group and IQR = Q 3 − Q 1 is the interquartile range, then the lower and upper limits of the box are placed at Q 1 and Q 3 , respectively, and the horizontal line within the box represents the median Q 2 . Furthermore, the lower and upper fences are computed as Q 1 − 1.5 IQR and Q 3 + 1.5 IQR, respectively; if any value in the group is below the lower or above the upper fence, it is considered an outlier and plotted with a filled symbol. Finally, whiskers are drawn so that the upper whisker is located either at the upper fence or the maximal value in the group (whichever is smaller); similarly, the lower whisker is located either at the lower fence or the minimal value in the group (whichever is larger).
As it can be observed in the Figure, the first two groups with RSD up to 4% appear to be consistent with the standard uniform distribution. However, for relative standard deviation above 4% the distribution of the resulting p-values quickly deteriorates and for RSD ≥ 6% already the median of the obtained p-values is well below the conventional p = 0.05 threshold. We see a clear evidence of a threshold in environmental noise strengths (quantified by RSD in the simulation timetrace): if the noise is too low, the data are consistent with the baseline model. For larger values of RSD, the p-values for the baseline model collapse to very low values, indicating strong rejection of the null hypothesis of no excess noise.
Next we evaluate to what extent the data collected from 1/f noise simulations are consistent with single-timescale models discussed in the main text and in Supplementary Note 9 and 11. We have found that for this particular type of noise the behavior of the "fast fluctuator"' test offers only a marginal improvement over the baseline model, hence the corresponding p-values are not plotted.
To the contrary, the "slow drift" model is generally consistent with the simulated data even if the excess noise level is high, provided that parameters of the Dirichlet distribution in the "slow-drift model" (see Eq. 12) are chosen appropriately.
In Figure S2 the diamonds show the combined p-values for simulated noise data tested against the "slow drift" model using the Monte Carlo estimation procedure of Supplementary Note 12. Here we have used max s=+,− σ s computed from the simulated timetrace P ± (t) to set a priori the concentration parameter α • of the Dirichlet distribution used in the "slow drift" test. Despite the value of α • not being optimized (and hence, potentially biasing the test), the results indicate that the "slow drift" model for the data is not rejected: the p-values are high for the whole range of noise levels considered.

Example of statistical methodology applied to the simulated data
Here we illustrate in more detail application of statistical tests to a particular simulated timetrace of 1/f noise model. The simulation has used α noise = 10 5.2 and P + = 2.12 × 10 −5 , P − = 6.90 × 10 −5 .
Next we describe how the spectrum of the simulated noise was analyzed. Let R = 2 18 , K = Ntot R = 145. Given the signal P s (n τ 0 ), n = 1, 2, . . . , N tot , where s ∈ {−, +}, its power spectral density (PSD) is estimated by splitting the signal into K non-overlapping segments of length R (the signal components with n > KR are discarded) and computing the periodogram for each segment and averaging the results for the K segments. Each periodogram is found by computing the squared magnitude of the discrete Fourier transform and dividing the result by R; moreover, the values at all frequencies except 0 and 0.5 are doubled since the one-sided periodogram is considered.
In Figure S3a we show the PSD estimates of the simulated timetrace which show the signature 1/f roll-off. Next, we analyze the simulated sample in the same way as the experimental data for device A, c.f. Figure 2 in the main text.
For the baseline model, the maximum likelihood gives the estimate (P + , P − ) = 2.07 × 10 −5 , 6.96 × 10 −5 , which correlates well with the underlying µ ± ± σ ± . Next, we employ Fisherian significance tests separately for each block of N i counts for a fixed burst length t i to define consistency regions of p-value greater than 0.05 in the parameter space (P + , P − ) where the baseline model cannot be rejected at this significance level. These quasielliptic consistency regions are depicted in Figure S3b; again, their overlap is only partial. Fisher's meta-analysis method yields the combined p-value 0.04, thus the baseline model is nominally rejected.

Simulations of a single fluctuator
We have explored the excess noise model with a single fluctuator (M = 1) with the switching rate Γ 1 varied in simulations from 10 −6 τ −1 0 to τ −1 0 . The distribution of P ± (t) in this case is bimodal, with the two modes, P ± [0] and P (1) ± [1], chosen randomly before the start of the simulation. The corresponding noise power spectral density is a Lorentzian that crosses over from a constant to 1/f 2 at frequency f ∼ Γ 1 .
With respect to the baseline and the "slow drift" statistical tests, the single-fluctuator model behaves similarly to the 1/f case considered above, regardless of the choice of Γ 1 : a sufficiently large distance between the two modes generates counts inconsistent with the baseline, but compatible with the "slow drift" model with an appropriately chosen ∆P ± .
The results of "fast flcutuator" tests do, in general, depend on the switching speed Γ 1 : for fast Γ 1 ∼ τ −1 0 the simulated data typically are consistent with the "fast fluctuator" model, while for slower Γ 1 τ −1 0 the simulated data reject the "fast fluctuator" noise model. These observations can be understood considering the characteristic frequencies to which our statistical tests are sensitive: "slow drift" probes the low-frequency part of the noise spectrum (switching rates on the order of 1/(N i τ 0 ) ∼ 10 −5 τ −1 0 ) while the "fast fluctuator" tests relatively high frequencies (∼ τ −1 0 ), at which the noise from a slow (Γ 1 τ −1 ) single fluctuator is sufficiently suppressed by the 1/f 2 roll-off.

SUPPLEMENTARY NOTE 15. SLOW-DRIFT MODEL: VARIANCE OF RANDOM WALKER'S POSITION
In this section we derive the formula for the variance of the random walker's position ∆x 2 used in the concluding part of the main text.
Consider repeatedly sampling random walker's position in the slow-drift model; we are interested in the sample variance. However, the samples are correlated, since the model assumes using the same parameter θ ∼ Dir(α) for several (N ) successive random walks. Henceforth, we consider the following scenario: draw θ (1) ∼ Dir(α) and run t steps of the random walk with step probabilities θ (1) for N times; let X (1) N be the random walker's position after the respective random walk has been completed. Afterwards, the step probabilities reset, i.e., a new parameter θ (2) ∼ Dir(α) is independently drawn, and N times random walk of t steps is run with step probabilities θ, resulting in random variables X and Consequently, Let us show that then we will arrive at as desired.
We arrive at which concludes the proof.

SUPPLEMENTARY NOTE 16. PROOF FOR SPREAD CONDITION
This section contains the proof for the spread condition shown in (2) in the main text (restated here for convenience).
Claim 4. For any random process on the line with steps of length 1, the spread condition must be satisfied.
Proof. If the random process is at location y ≤ x − 1 after t time steps, it must be at a location y ≤ x after t + 1 time steps. Hence, x−1 y=−∞ p t y ≤ x y=−∞ p t+1 y . On the other hand, if the process is at a location y ≤ x after t + 1 time steps, it has been at a location y ≤ x + 1 after t time steps. Hence, We note that the claim applies not just to Markov processes but to any random process that can move at most distance 1 in one time step. For example, it applies to processes where the transition probabilities depend not just on the current location but also on locations in previous time steps.
Claim 5. If two probability distributions (p t x ) x∈Z and (p t+1 x ) x∈Z satisfy the spread condition (21), then there exists a set of transition probabilities P (x,t) ±1 such that (p t+1 x ) x∈Z is generated from (p t x ) x∈Z by a Markov process Proof. Let q t x = x y=−∞ p t y ; then the spread condition is equivalent to Consider a Markov process in which the probability of moving left from a location x at time t is defined by x−1 otherwise, and the probability of moving right is defined by The probability to stay at x is defined as P However, this inequality clearly holds, since Therefore we have defined valid transition probabilities. To see that this process produces (p t+1 x ) x∈Z , let q t+1 x be the probability of being at a location x ≤ x after applying these transition probabilities to the distribution (p t x ) x∈Z . We show that q t+1 x = q t+1 x for all x, thus the probability of being at any particular x 0 equals q t+1 x0 − q t+1 x0−1 = p t+1 x0 . We consider two cases.
A consequence of these two claims is that, given just the probabilities p t x , we cannot distinguish whether they come from a (possibly non-stationary, not translation-invariant) Markov process or from a more general process that moves at most distance 1 in one time step. (In the second case, the spread condition will be satisfied and then, because of Claim 5, there will be a time and location dependent Markov process that gives the same p t x .)