Estimating the randomness of quantum circuit ensembles up to 50 qubits

Random quantum circuits have been utilized in the contexts of quantum supremacy demonstrations, variational quantum algorithms for chemistry and machine learning, and blackhole information. The ability of random circuits to approximate any random unitaries has consequences on their complexity, expressibility, and trainability. To study this property of random circuits, we develop numerical protocols for estimating the frame potential, the distance between a given ensemble and the exact randomness. Our tensor-network-based algorithm has polynomial complexity for shallow circuits and is high-performing using CPU and GPU parallelism. We study 1. local and parallel random circuits to verify the linear growth in complexity as stated by the Brown-Susskind conjecture, and; 2. hardware-efficient ans\"atze to shed light on its expressibility and the barren plateau problem in the context of variational algorithms. Our work shows that large-scale tensor network simulations could provide important hints toward open problems in quantum information science.

Random quantum circuits have been utilized in the contexts of quantum supremacy demonstrations, variational quantum algorithms for chemistry and machine learning, and blackhole information. The ability of random circuits to approximate any random unitaries has consequences on their complexity, expressibility, and trainability. To study this property of random circuits, we develop numerical protocols for estimating the frame potential, the distance between a given ensemble and the exact randomness. Our tensor-network-based algorithm has polynomial complexity for shallow circuits and is high-performing using CPU and GPU parallelism. We study 1. local and parallel random circuits to verify the linear growth in complexity as stated by the Brown-Susskind conjecture, and; 2. the hardware-efficient ansätze to shed light on its expressibility and the barren plateau problem in the context of variational algorithms. Our work shows that large-scale tensor network simulations could provide important hints toward open problems in quantum information science. * : Corresponding author.

I. INTRODUCTION
Quantum computing might provide significant improvement of computational powers for current information technologies [1][2][3]. In the noisy intermediatescale quantum (NISQ) era, an important question for near-term quantum computing is whether quantum devices are able to realize strong computational advantage against existing classical devices and resolve hard problems that no existing classical computers can resolve [4]. Recently, Google and the University of Science and Technology of China, in experiments involving boson sampling [5,6], claimed to have realized quantum advantage using their quantum devices, disproving the extended Church-Turing thesis. These experiments are considered milestones toward full-scale quantum computing. Another recent study suggests the possibility of achieving quantum advantage in runtime over specialized state-of-theart heuristic algorithms to solve the Maximum Independent Set problem using Rydberg atom arrays [7]. Despite the great experimental success in quantum devices, however, the capability of classical computation is also rapidly developing. It is interesting and important to think about where the boundary of classical computation of the same process is and to understand the underlying physics of the quantum supremacy experiments through * mliu6@uchicago.edu † junyuliu@uchicago.edu ‡ yuri@anl.gov § liangjiang@uchicago.edu classical simulation [5]. Tensor network methods are incredibly useful for simulating quantum circuits [8][9][10].
Originating from approximately solving ground states of quantum many-body systems, tensor network methods find approximate solutions when the bond dimension of contracted tensors and the required entanglement of the system is under control [8]. Tensor network methods are also widely used for investigating sampling experiments with random quantum architectures, which are helpful for verifying the quantum supremacy experiments [11][12][13][14].
In this work, we develop novel tensor network methods and perform classical random circuit sampling experiments up to 50 qubits. Random circuit sampling experiments are important components of near-term characterizations of quantum advantage [15]. Ensembles of random circuits could provide implementable constructions of approximate unitary k-designs [16][17][18], quantum information scramblers [19], solvable many-body physics models [20], predictable variational ansätze for quantum machine learning [21][22][23], good quantum decouplers for quantum channel and quantum error correction codes [24,25], and efficient representatives of quantum randomness. To measure how close a given random circuit ensemble is to Haar-uniform randomness over the unitary group, we develop algorithms to evaluate the frame potential, the 2-norm distance toward full Haar randomness [26][27][28]. The frame potential is a user-friendly measure of how random a given ensemble is in terms of operator norms: the smaller the frame potential is, the more chaotic and more complicated the ensembles are, and the more easily we can achieve computational advantages [29,30]. In fact, in certain quantum cryptographic tools, concepts identical or similar to approximate k-designs are used, making use of the exponential separation of complexities between classical and quantum computations [31][32][33][34][35][36][37][38].
It is critical to perform simulations of quantum circuits efficiently. To achieved this, we developed an efficient tensor network contraction algorithm is developed in the QTensor package [39][40][41]. QTensor package is optimized to simulate large quantum circuits on supercomputers. For this project, we implemented a modified tensor network and fully utilized QTensor's ability to simulate quantum circuits efficiently at scale.
In particular, we show the following applications of our computational tools. First, we evaluate the k-design time of the local and parallel random circuits through the frame potential. A long-term open problem is to prove the linear scrambling property of random circuits, where they approach approximate k-designs at depth O(nk) with n qudits [16-18, 29, 31, 42-47]. Although lower and upper bounds are given, there is no known proof of the k-design time for general local dimension q and k ≥ 3 [18,47]. According to [47], the linear increase of the k-design time will lead to a proof of the Brown-Susskind conjecture, a statement where random circuits have linear growth of the circuit complexity with insights from black hole physics [48,49]. Recently, the complexity statement was proved in [50] for a different definition of circuit complexity compared with [47]. Thus, a validation of the k-design time measured in the frame potential will immediately lead to an alternative verification of the Brown-Susskind conjecture, with the complexity defined in [47]. Using our tools, we verify the linear scaling of the k-design time up to 50 qubits and q = 2. Our research also provides important data on the prefactors beyond scaling through numerical simulations, which will be helpful to further the understanding of theoretical computer scientists.
Moreover, we use our tools to evaluate the frame potential of the randomized hardware-efficient variational ansätze used in [21]. Barren plateau is a term referring to the slowness of the variational angle updates during the gradient descent dynamics of quantum machine learning. When the variational ansätze for variational quantum simulation, variational quantum optimization, and quantum machine learning [51][52][53][54][55][56][57][58][59][60][61][62][63] are random enough, the gradient descent updates of variational angles will be suppressed by the dimension of Hilbert space, requiring exponential precision to implement quantum control of variational angles [23]. The quadratic fluctuations considered in [21] will be suppressed with an assumption of 2-design, which is claimed to be satisfied by their hardware-efficient variational ansätze. For higher moments, higher k-designs are required. A study of how far from a given variational assumption to a unitary kdesign is important in order to understand how large the barren plateau is and how to mitigate them through designs of variational circuits. In our work, we find that for several ks, the randomized hardware-efficient ansätze are efficient scramblers: the frame potential decays exponentially during an increase in circuit depth, and nondiagonal entangling gates are more efficient.

A. Frame potential
Given an ensemble E of unitaries with a probability measure, we are interested in its randomness and, therefore, closeness to the unitary group. Truly random unitaries from the unitary group have the Haar measure. Such closeness is measured by how well the ensemble approximates the first k moments of the unitary group. To this end, a k-fold twirling channel is defined for the ensemble. If the unitary ensemble approximates the kth moment of the unitary group, the distance between the k-fold channel defined for the ensemble and the Haar unitaries (measured by the diamond norm) is bounded by : Such E is said to be an -approximate k-design. The diamond norm of the channels is not numerically friendly, however. A quantity more suitable for numerical evaluation, which is also discussed in the context of k-designs, is the frame potential F, given by [64] Specifically, it relates to the aforementioned definition of -approximate k-designs as follows [18]: where d = q n is the Hilbert space dimension, q is the local dimension of the qudits, and F (k) Haar = k!. If we obtain the frame potential F (k) E , we are guaranteed to have at least an max -approximate k-design, where Similarly, we have the following condition for the ensemble to be an -approximate k-design: where the ensemble E(l) depends on the number of layers l. Assuming an exponentially decreasing frame potential approaching the Haar value, we have ⇒Ae −l/C ≤ q nk (8) ⇒l ≥ C(kn log q + log A + log 1/ ).
Under this assumption, A and C could still have n and k dependence. Therefore, for there to be linear scaling in n and k, A cannot be exponential, and C must be sublinear.
As an example, the exponential decay of F (2) for the parallel random unitary ansätze is given by [18] F (2) < 2 1 + 2q where n g = n/2 . This is plotted in Fig. 1. For fixed , this leads to a linear scaling of l in n, given by l ≥ C(2n log q + log n + log 1/ ), where C = log q 2 +1 2q −1 is independent of n. We emphasize that linear scaling in n is for fixed , not fixed F.
FIG. 1. Theoretical fractional deviation of the k = 2 frame potential from the Haar value as a function of layers for the parallel random unitary ansätze. In this plot, the layer required to reach a fixed F does not scale linearly with n. The linear scaling is only for fixed .

II. RESULTS
We obtain numerical results for ansätze with local dimension q = 2. Specifically, the frame potential values up to 50 qubits and k = 5 are evaluated. We compute the frame potentials for local random unitary ansätze, parallel random unitary ansätze and hardware efficient ansätze, which are illustrated in Fig. 2.

A. Algorithm description
The unitary ensembles we are interested in are parameterized by a large number of parameters. Therefore, evaluating the integral is a high-dimensional integration problem, and a numerical Monte Carlo approach is suitable. We approximate the frame potential as the mean value of the trace, Therefore, we need to evaluate the trace of the sampled unitaries on n target qudits.
A quantum circuit unitary ijk... , where i, j, k are input qubit indices and α, β, γ are output qubit indices. The trace of the unitary is This is a tensor contraction operation that can be expressed as the tensor network in Fig. 3a. In this representation, each node is an index, and edges that form cliques are unitaries. This is different from tensor network representations that are more familiar in other works (MPS, MPO, MERA, etc.) For more details on the representation, see [15,40,65]. The circuit shown here is a parallel random unitary circuit with 4 qubits. For efficient contraction, when the number of qubits is large, the contraction order is along the direction indicated in the Fig.  3 such that the maximum number of exposed indices is minimum.
Directly implementing this tensor network requires modification of QTensor. We propose an alternative tensor network in Fig. 3c with similar topologies that gives the trace as a single-probability amplitude in the form of ψ|U |ψ for any basis state |ψ . The quantum circuit to achieve this is illustrated in Fig. 3 c, and we proceed with a proof.
For simplicity, we describe the algorithm for q = 2 qubits. We assign an ancillary qubit to each target qubit. The quantum state of the n ancillary and n target qubits is initialized to the state After a layer of Hadamard on the ancillary qubits, we get where |µ is the n ancillary qubit basis state in the computational basis. Applying a CNOT gate on all target qubits controlled by their respective ancillary qubits yields    where U CNOT a=j,t=j is a CNOT gate with the jth ancillary qubit as control and the jth target qubit as target. For simplicity, we can combine the aforementioned Hadamard layer and the CNOT layer as a single operator Consider the following probability amplitude: This is simply the probability amplitude of measuring the |Ψ 0 state after applying the unitary M U † V M to the ini-tialized |Ψ 0 state. Moreover, this probability amplitude is actually the trace of U † V : Therefore, evaluating the trace becomes evaluating the probability amplitude of obtaining the |Ψ 0 state, which QTensor is able to simulate with complexity proportional to the number of qubits and exponential to the circuit depth. This is helpful for evaluating the trace of unitaries that can be efficiently represented by shallow circuits, especially those with limited qubit connectivity such as hardware-efficient ansätze.
For qudits with general local dimensions q, the generalization is straightforward. We need to replace the Hadamard gate H with the generalized Hadamard gate H q , and the CNOT gate with the SUM q gate [66]: Similar to the qubit case, applying the generalized gates to |Φ 0 yields an entangled uniform superposition |Φ of all basis states. The expectation value of any target qudit unitary with respect to this state is the trace. Graphically, this can be understood by the tensor equivalence shown in Fig. 3b. The gates are where for U CNOT kij , k is the control qubit index (CNOT does not change the control qubit in the computational basis and therefore has only one index for the control qubit), and i, j are the target qubit output and input indices. The bottom tensor of Fig. 3b evaluates to B. Verifying the Brown-Susskind conjecture from frame potentials Local and parallel random unitaries are commonly discussed in the context of quantum circuit complexity and the Brown-Susskin conjecture. For both ansätzes, the composing random unitaries are drawn from the Haar measure on U (d 2 ).

Parallel random unitaries
Results for parallel random unitaries are presented in Figs. 4 and 5. In Fig. 4, The frame potential shows a super-exponential decay in the regime of a few layers and converges to exponential decay as the number of layers increases, just like the theoretical prediction in Fig. 1.
To obtain the layer scaling for reaching -approximate designs, we fit F Haar to an exponential function according to Eq. 7, and l is estimated using Eq. 9. Note that our numerical results are in the regime of large but we are extrapolating to small values, the validity of which depends on a tightly exponentially decaying F.
For robust error analysis, we use bootstrapping to quantify the uncertainties. We randomly sample a subset of computed frame potentials and perform curve fitting to obtain the calculate the number of layers needed to reach < 0.1. This is repeated multiple times to obtain a distribution of layer values, and we use the median of this distribution as our estimate. More details can be found in the supplementary materials.
Assuming the validity of extrapolation, the results for = 0.1 are shown in Figs. 5. Brandal et al. [17] established upper bounds on the number of layers needed to approximate k-designs for local and parallel random unitaries, which are quadratic and linear in n, respectively. They further proved that this bound could not be improved by more than polynomial factors as long as ≤ 1/4. Therefore, for us to verify linear growth in n, we need to reach under ≤ 1/4, which informed our choice of = 0.1 threshold. We observe a linear scaling of the number of needed layers in n, which agrees with the theoretical prediction and non-trivially restricts the F scale factor A and decay rate C, as discussed in Section I A. Explanations for missing data points are provided in the supplemental materials. Further, we compare the theoretical predictions in Eq. 11 against our numerical findings. Figure 5 shows the experimental and fitted k-design layer scaling as a function of the number of qubits. Specifically, we fit a linear curve, ignoring the log n and the constant log 1/ terms. We find a slope of 4.38 in the case of k = 2, which is lower than the theoretical value 6.2 as predicted by E.q. 10. We note, however, that the theoretical value gives an upper bound of the frame potential since there is overcounting in the contributing domain walls [18]. Therefore, the analytical expression predicts a larger number of layers needed to approximate 2-designs than necessary. This is apparent in the n = 2 case, where 16 layers are needed in Eq. 11 but a single layer is already sampling from the Haar measure. This accounts for the discrepancy between the theoretical values and the experimental values.
In the inset of Fig. 5, we show the slopes of the scaling curves with different k values. It is predicted that there is a linear O(nk) scaling in k for the number of layers l (or O(n 2 k) scaling for the circuit size T ) needed to approach k-designs [18], and a linear relationship between k and complexity is established in [47]. Together, these findings imply that complexity grows linearly in the circuit size [47,50]. Our results support the linear scaling of T in k, which predicts that the slope grows linearly in k.

Local random unitaries
Results for local random unitaries are presented in Figs. 6 and 7. Since each layer in the local random circuit has only one gate, we simulate layers proportional to the number of qubits and plot layers/qubits on the x-axis to maintain a linear scaling. We observe that this layer/qubits ratio scales linearly with the number of qubits. This is the same gate count scaling as the parallel random unitary ansätze, both quadratic in n. The scaling in k is close to linear, but the confidence is lower due to a lack of data points for k = 4, 5 at large n. Explanations for missing data points are provided in the supplemental materials. Originally proposed as ansätze for variational quantum eigensolvers [53], hardware-efficient ansätze utilize gates and connectivity readily available on the quantum hardware and are attractive because of their relaxed hardware requirements [67][68][69]. In addition, the ansätze are simulated in the context of the barren plateau problem [21], where the variance of gradients vanish exponentially with the number of qubits in sufficiently deep circuits. In fact, the proof of the barren plateau problem assumes that circuits before and after the gate whose derivative we are computing are approximate 2-designs, which is especially suitable for the hardware efficient ansätze because they are believed to be efficient at scrambling. We simulated these circuits with controlled-phased gates and controlled-not gates as two-qubit gates, respectively. Figure 8 shows that the controlled-not gate based ansätzee approaches the Haar measure sooner, and therefore future analysis is conducted on CNOT based ansätze only. Figure 9 shows a linear dependence on the number of qubits, as well as a positive dependence on k. Explanations for missing data points are provided in the supplemental materials.
We note that these ansätze reach lower frame potential values with much fewer layers, albeit having much fewer parameters per layer. This result is partially explainable through the observation that each layer in the hardwareefficient ansätze contains two layers of two-qubit gate walls, whereas each layer in the parallel random unitary ansätze contains only one wall. Further, random unitaries from U (d 2 ) are not all maximally entangling. The hardware-efficient ansätze can therefore generate highly entangled stages much more efficiently, exploring a much larger space with fewer parameters.
Further, unlike the previously discussed ansätze where the frame potential decay rate is constant, the hardwareefficient ansätze decay rate increases with n as shown in the inset of Fig. 8. This does not contradict the observed linear scaling as long as the decay rate scaling is sublinear.
This observation confirms that hardware-efficient ansätze are highly expressive, a concept that is crucial to the utility of variational quantum algorithms. Ansätze with higher expressibility are able to better represent the Haar distribution and, therefore able to better approximate the target unitary or minimize the objective. This links the expressibility to the frame potential [70]. The high expressibility of hardware-efficient ansätze and their close relatives, and consequently the desirable noise properties due to their shallow depths, are precisely the argument in favor of these ansätze over their deeper and more complex problem-aware counterparts [71]. With the recent discovery of the relation between expressibility and gradient variance [72], the analysis of frame potentials can play an important role in theoretically and empirically determining the usefulness of various ansätze for variational algorithms.

III. DISCUSSION
Evaluating the distance from a given random circuit ensemble to the exact Haar randomness is important for understanding several perspectives in quantum information science, including recent experiments on the near-term quantum advantage. Explicitly constructing the unitaries requires memory complexity of O(4 n ). A more efficient classical algorithm decomposes a unitary into gates in a universal set (H, T , and CNOT), which allows us to estimate the normalized trace by sampling allowed Feynman paths [73]. Exact evaluation using this method is NP-complete, and approximation to fixed precision requires a number of Feynman path samples that are exponentially large in the number of Hadamard gates in the circuit. Fortunately, for shallow circuits, tensornetwork-based algorithms can obtain the exact trace with linear complexity in n.
In our paper, we simulate large-scale random circuit sampling experiments classically up to 50 qubits, the number of noisy physical qubits we are able to control in the NISQ era, using the QTensor package. As examples, we provide two applications of our computational tools: a numerical verification of the Brown-Susskind conjecture and a numerical estimation relating to the barren plateau in quantum machine learning and the randomized hardware-efficient variational ansätze.
Through our examples, we show that classical tensor network simulations are useful for our understanding of open problems in theoretical computer science and numerical examinations of quantum neural network properties for quantum computing applications. We believe that tensor networks and other cutting-edge tools are useful for probing the boundary of classical simulation and improving the understanding of quantum advantage in several subjects of quantum physics, for instance, quantum simulation [74,75]. Moreover, it will be interesting to connect our algorithms to the current research on classical simulation of boson sampling experiments.

A. Tensor network simulator
For all the trace evaluations, we use the QTensorAI library [76], originally developed to simulate quantum machine learning with parameterized circuits. This library allows quantum circuits to be simulated in parallel on CPUs and GPUs, which is a highly desirable property for sampling a large number of circuits. The library is based on the QTensor simulator [39][40][41], a tensor network-based quantum simulator that represents the network as an undirected graph.
In this method of simulation, the computation is memory bound, and the memory complexity is exponential in the "treewidth," the largest rank of tensor that needs to be stored during computation. The graphical formalism utilized by QTensor allows the tensor contraction order to be optimized to minimize the treewidth. For shallow quantum circuits, the treewidth is determined mainly by the number of layers in the quantum circuit, and therefore QTensor is particularly well suited for simulating shallow circuits such as those used in the Quantum Approximate Optimization Algorithm (QAOA).
The simulation of both parallel and local random unitary circuits requires the use of random two-qubit random unitary gates. We implement these gates and sample Haar unitaries according to the scheme proposed for unitary neural networks [77], using a PyTorch implementation [78]. The universality of this decomposition scheme is first proved in the context of optical interferometers [79,80]. This implementation parameterizes two-qubit unitaries using 16 phase parameters, and uniformly sampling these parameters leads to uniform sampling on the Haar measure. Further, it is fully differentiable, although we do not care about this property in this work.

C. High-performance computing
For hardware-efficient and parallel random unitary ansätze, once the number of qubits and the number of layers are chosen, the circuit topology will remain the same throughout the ensemble. This is in contrast to the local random unitary ansätze, where a two-qubit gate is applied to random neighboring qubits in each layer, which means that the circuit topologies are very different within an ensemble. For fixed-topology ensembles, the algorithm can optimize the contraction order for all circuits at once. This optimization significantly reduces the computational complexity, and the optimization time is on the order of minutes depending on the circuit size.
However, local random unitary circuits cannot benefit from circuit optimizations since we would need to do that for each sample, whereas the actual simulation time is usually much shorter.
Further, for fixed topology circuits, the tensor contraction operations are identical, which is very suitable for single-instruction multi-data parallel executions on GPUs. For ensembles with the smallest tree widths, we can compute the trace values of millions of circuits in parallel on a single GPU. However, local random unitary circuits are not compatible with single-instruction parallel computation and must be simulated in parallel using a CPU cluster.

DATA AVAILABILITY
Data containing the bootstrap frame potential values used to generate the figures are available in the GitHub repository [81], and data for the calculated trace values of sampled random circuits is available upon request from the authors.

CODE AVAILABILITY
The code used to generate the data and figures is available in the GitHub repository [81]. The tensor network quantum simulator QTensor and QTensorAI are open source, and available [82,83].
1 Supplementary Methods

Determination of sample sizes
Since the simulation of large circuits is expensive, we adaptively terminate sampling simulations based on the estimated standard error in the frame potential according to various rules we implement. For example, the simulation stops if the frame potential is larger than the Haar value by a few standard errors or takes too long. If the estimated frame potential is less than the Haar value or smaller than that of a lower depth circuit, the simulation would run longer to steer away from unphysical results.

Bootstrapping for uncertainty quantification
As discussed in the main manuscript, we use bootstrapping to analyze the uncertainties in the data. This is because of the highly asymmetric nature of the error due the highly skewed distribution of |Tr U † V | k (rare cases where U and V collide, making the quantity is very large, and the skewedness worsens for larger ks). Performing naive error propagation using the standard error without taking asymmetry into account after curve fitting and solving for intersection is therefore unreliable.
Bootstrapping is a good method for uncertainty quantification because it does not assume the distribution of the estimator. Given a set S of N samples obtained from population P , we are interested in the estimator value E and its uncertainty. Bootstrapping resamples N samples with replacement from S to form S i , and calculates E i multiple times to form a set of bootstrap samples {E i }. This means each S i must repeat and omit some samples from S. Assuming S is a good representation of the population P , it is as if each E i is sampled from P . Therefore, the distribution of E i should approximate the actual distribution of E.
In our analysis, P is the trace distributed over the ansätz measure, E can be the frame potential, layers needed to reach , etc. We use 300 bootstrap samples in our analysis.

Choice of curve fitting region
Assuming exponential scaling of F in l, we can fit an exponential curve. As can be seen from Supplementary Figure 1

Sample sizes
We show in Supplementary Figure 1 the number of independently sampled circuits used to obtain the frame potential for a single data point. We compute and store the individual trace values Tr(U † V ) for each sampled circuit, and use the same trace values to compute frame potentials with different k values. Therefore, the number of samples has only qubit and layer dependence. As we can see, most data points are obtained with at least 1000 samples.

Detailed uncertainties obtained from bootstrapping
The main manuscript does not show uncertainties obtained from bootstrapping for the percentage deviation of the frame potential. This is because we want to prioritize illustrating traces of all qubit counts on the same plot, and bootstrapping errors can obfuscate the graphs. Supplementary Figure  2, 3, 4 are violin plots that shows the bootstrap distributions of the frame potential percentage deviations as shadows around data points. To avoid cluttering the plots, we only show traces for half of the qubit counts.
Supplementary Figure 2 Percentage deviation of the k = 3 frame potential from the Haar value as a function of layers for the parallel random unitary ansätz. Supplementary Figure 3 Percentage deviation of the k = 3 frame potential for the local random unitary ansätz. Supplementary Figure 4 Percentage deviation of the k = 3 frame potential from the Haar value for the CNOT hardware efficient ansätz.

Validation of calculated frame potentials
To show the convergence process, we take the parallel random unitary as an example, showing specifically the n = 50, l = 11 case, which is the largest number of qubits we simulated, and the largest number of layers simulated for this qubit count. We choose this as a weak data point since it is hard to simulate and obtain good statistics. Supplementary Figure 5 shows the decrease in uncertainties as the number of samples increases. In this particular instance, the ansätz is close to a 3-design. Supplementary Figure 5 Convergence process of the estimated frame potential as the number of samples increases. The plot shows the data for the parallel random unitaries with 50 qubits and 11 layers. Error analysis is similarly performed using bootstrapping. Solid points are medians of the bootstrap sample, and the vertical shadows represent the sample distribution where the width corresponds to the density. Horizontal bars show the extrema in the bootstrap samples.

Validation of sampling Haar random two-qubit unitaries
Parallel and local random unitaries require random two-qubit unitaries sampled from the Haar measure. The decomposition approach we use indeed satisfy this criterion. We observe for n = 2 and l = 1, which is a single random unitary gate, that the frame potential agrees with the Haar value k! even at high k values, verifying that it is sampling the Haar measure. Specifically, we show the convergence process for k = 6 and k = 8 in Supplementary Figure 6, 7.   Increasing k or n leads to an increase in the frame potential percentage deviation from the Haar value, which means that the valid exponential regime starts later in l for large k and n.
Further, due to the simulation difficulty of large n circuits, we may not be able to simulate large l and have to terminate the curves of early. This means that curves at large k and n are shorter in the exponential regime. Together with the larger uncertainties at large k and n, there may not be enough data points for proper fitting, which is the reason why data points in the layer scaling plots ( Figure 5, 7, and 9 of the main manuscript) are missing for large k and n. One or more bootstrap samples for the missing data points fail to converge during the fit.

Weak spurious correlation
Since frame potentials of different ks are computed from the same set of Tr U † V , there is a correlation between different k's. For example, the layer scaling for in Figure 5 and 9 of the main manuscript shows some weakly correlated fluctuations for both k = 2 and k = 3. However, if we split the data into three equal partitions and plot three curves, such fluctuations are no longer correlated, therefore do not represent genuine patterns.