A quantum hamiltonian simulation benchmark

Hamiltonian simulation is one of the most important problems in quantum computation, and quantum singular value transformation (QSVT) is an efficient way to simulate a general class of Hamiltonians. However, the QSVT circuit typically involves multiple ancilla qubits and multi-qubit control gates. In order to simulate a certain class of $n$-qubit random Hamiltonians, we propose a drastically simplified quantum circuit that we refer to as the minimal QSVT circuit, which uses only one ancilla qubit and no multi-qubit controlled gates. We formulate a simple metric called the quantum unitary evolution score (QUES), which is a scalable quantum benchmark and can be verified without any need for classical computation. Under the globally depolarized noise model, we demonstrate that QUES is directly related to the circuit fidelity, and the potential classical hardness of an associated quantum circuit sampling problem. Under the same assumption, theoretical analysis suggests there exists an `optimal' simulation time $t^{\text{opt}}\approx 4.81$, at which even a noisy quantum device may be sufficient to demonstrate the potential classical hardness.


Introduction
Recent years have witnessed tremendous progress in quantum hardware and quantum algorithms. As nearterm quantum devices become increasingly accessible, the need for holistic benchmarking of such devices is also rapidly growing. Indeed, while most of the frequently used quantum benchmarks, such as randomized benchmarking [1] and gateset tomography [2], still focus on the performance of one or a few qubits, over the past three years a number of 'whole machine' benchmarks have been proposed that aim at assessing the performance of quantum devices beyond a small number of qubits [3][4][5][6][7][8][9][10].
While results from such generic benchmarks certainly provide important characteristics of the quantum devices themselves, we are ultimately interested in applying the devices to carry out specific computational tasks. However, the circuit structure of quantum algorithms can be vastly different for different algorithms. Generic quantum benchmarks can miss structural information that is specific to a particular algorithm and which may amplify either quantum errors of certain types or errors amongst a certain group of qubits, and/or reduce errors elsewhere. In this work we address the benchmarking of quantum simulations for time-independent Hamiltonians. Such a simulation can be stated as follows: given an initial state |ψ 0 and a Hamiltonian H, evaluate the quantum state at time t according to |ψ(t) = exp(−itH) |ψ 0 . Hamiltonian simulation is of immense importance in characterizing quantum dynamics for a diverse range of systems and situations in quantum physics, chemistry and materials science. Simulation of one quantum Hamiltonian by another quantum system was also one of the motivations of Feynman's 1982 proposal for design of quantum computers [11]. Hamiltonian simulation is also used as a * Electronic address: linlin@math.berkeley.edu subroutine in numerous other quantum algorithms, such as quantum phase estimation [12] and solving linear systems of equations [13].
Following the conceptualization of a universal quantum simulator using a Trotter decomposition of the time evolution operator e −itH [14], a number of quantum algorithms for Hamiltonian simulation have been proposed [15][16][17][18][19]. Detailed assessment of these algorithms, with continued improvement of theoretical error bounds, has since emerged as a very active area of research [20][21][22][23][24][25][26][27][28][29][30][31]. In this context, one of the most significant developments in recent years is the quantum signal processing (QSP) method [17], and its generalization, the quantum singular value transformation (QSVT) method [32]. For sparse Hamiltonian simulation, the query complexity of QSVT matches the complexity lower bound with respect to all parameters [17,32]. The QSVT method also enjoys another advantage, namely that the quantum circuit is relatively simple, and requires very few ancilla qubits. QSVT allows one to use essentially the same parameterized quantum circuit to perform a wide range of useful computational tasks, including Hamiltonian simulation [33], solution of linear systems [32,34,35], and finding eigenstates of quantum Hamiltonians [36]. In this sense, it provides a 'grand unification' of a large class of known quantum algorithms [37].
Despite these advantages, QSVT is generally not viewed as a suitable technique for near-term quantum devices today. This is largely because these techniques rely on an input model called 'block encoding' which views the Hamiltonian H as a submatrix of an enlarged unitary matrix U H . For Hamiltonians arising from realistic applications (e.g., linear combination of products of Pauli or fermionic operators, and sparse matrices in general), the construction of U H often involves multiple ancilla qubits and multi-qubit control gates. Taken together, these requirements can make QSVT very difficult to implement with high fidelity and to date there has been no QSVT based Hamiltonian simulation on realistic devices.

Overview
In this work we remedy this situation by identifying and demonstrating an application for QSVT on near term quantum devices that allows benchmarking of Hamiltonian simulation for a class of Hamiltonians that are relevant to recent efforts to demonstrate supremacy of quantum computation over classical computation [6]. This is the class of random Hamiltonians generated from block encoding of random unitary operators that correspond to random unitary circuits. We show that for this class of Hamiltonians it is possible to formulate a simple metric, called the quantum unitary evolution score (QUES), for the success of quantum unitary evolution. This metric is the primary output from the Hamiltonian simulation benchmark, and is directly related to the circuit fidelity. This allows verification of Hamiltonian simulation on near-term quantum devices without any need for classical computation, and the approach can be scaled to a large number of qubits.
The main result of this paper is a very simple quantum circuit (Figure 1), called the minimal QSVT (mQSVT) circuit. With proper parameterization, the mQSVT circuit is able to propagate a certain class of random Hamiltonians H to any given target accuracy. In fact, we argue that the mQSVT circuit is not only the simplest quantum circuit for carrying out a QSVT based Hamiltonian simulation, but that it is actually the simplest possible circuit for all tasks based on QSVT. Here H is not a Hamiltonian corresponding to a given physical system, but a random Hamiltonian generated using a simple random unitary circuit, called a Hermitian random circuit block encoded matrix (H-RACBEM) [9]. However, for the purpose of benchmarking the capability of a quantum device to perform arbitrary Hamiltonian simulations, averaging over a distribution of the underlying arbitrary Hamiltonians is precisely what is required to generate a holistic benchmark protocol that samples from all possible instantiations.
The quantum circuit in Figure 1 consists of two components: an arbitrary random unitary matrix U A that implicitly defines the Hamiltonian H, together with its Hermitian conjugate U † A and a series of R z gates with carefully chosen phase factors {ϕ i } 2d i=0 (see Supplementary Note 3). The mQSVT circuit makes d queries to U A and U † A , two of which are shown explicitly in Figure 1. For an n-qubit matrix H, the total number of qubits needed is always n + 1, i.e., only 1 ancilla qubit, hereafter referred to as the signal qubit, is required. This is even smaller than the simplest QSVT circuit [17], which requires at least 2 ancilla qubits. However, more important than the reduction of the number of qubits is the fact that Figure 1 removes all two-qubit and multiqubit gates outside of the unitary U A . This means that one can choose any convenient entangling two-qubit gate (e.g.. CZ, CNOT, √ iSWAP) and any coupling map that is native to a quantum device to construct the random U A . Combining this with the sequence of single qubit R z gates then makes the resulting benchmarking quantum circuit of Figure 1 readily executable.
Quantum unitary evolution score (QUES) Figure 1 implements f t (H) |0 n on the system qubits, where f t (H) is a matrix polynomial (see Supplementary Note for details), with approximation error in the operator norm upper bounded by f t (H) − e −itH 2 ≤ . Therefore in the absence of quantum errors, after applying the circuit to the input state |0 n+1 , the probability P t (U A ) := f t (H) |0 n 2 of measuring the top ancilla qubit with outcome 0 will be close to 1, indicating that the underlying Hamiltonian evolution is unitary.
From now on, we will primarily consider mQSVT circuits with a fixed set of phase factors {ϕ i } and hence fixed simulation time t. For notational simplicity, we will drop the t-dependence in quantities such as P t (U A ), unless specified otherwise.
On a real quantum device, the probability P (U A ) should be replaced by P exp (U A ), which is the experimentally measured probability. We define the quantum unitary evolution score (QUES) by where the expectation is taken over the ensemble of random quantum circuit instances U A . The deviation of QUES from 1 then measures the average performance of the quantum computer under a Hamiltonian simulation task.
There is no unique prescription for constructing random quantum circuits. To fix the choice of U A , we employ here the model random quantum circuit construction used to analyze the concept of quantum volume in [4]. Here, given a number of qubits n, U A is constructed to contain n layers, each consisting of a random permutation of the qubit labels followed by random two-qubit gates between the n qubits. Given this construction, the QUES in Eq. (1) will then depend only on n and d, and the overall depth of the circuit is approximately 2d times the circuit depth of U A . Note that given the basic quantum gate set of a particular quantum device, alternative constructions of U A using random choices of specific oneand two-qubit gates are possible. Figure 2 shows the results of computing the QUES across 8 different IBM Q quantum devices [38], each having 5 qubits and one of three distinct coupling maps (panel b). When the number of qubits n ≤ 3, the QUES on all devices is relatively high ( 0.7) but it decreases sharply for n ≥ 4. In contrast, the QUES decreases only relatively mildly as d increases. This is particularly noticeable for n = 2, which may indicate that the quantum circuit transpiler provided by the IBM Q may be particularly effective for this device with very small qubit · · · e i' 0 Z |0i · · · · · · · · · · · · · · · |0i · · · · · · The overall circuit implements a complex matrix polynomial ft(H) of degree d on the Hamiltonian H that is defined in terms of a pseudo random quantum circuit UA. The circuit acts on n + 1 qubits, consisting of n system qubits and 1 ancilla qubit. After measuring the top ancilla qubit and post-selecting on the 0 outcome of this, the action on the bottom n system qubits accurately approximates exp(−itH) |0 n .
number. We emphasize that compared to generic benchmark measures such as the quantum volume, the QUES is specific to the computational task of the Hamiltonian simulation, and any information specific to this is not diluted by additional averaging over output distributions from other computational tasks. In particular, we find that even for quantum devices with relatively small quantum volume (8-QV), the performance in terms of QUES is only mildly worse than for those with a larger quantum volume (32-QV).

Circuit fidelity and system linear cross-entropy score (sXES)
The quality of a noisy implementation of a quantum circuit is often characterized by the circuit fidelity. Loosely speaking, the output quantum state of a noisy circuit can be characterized as a convex combination of the correct result and the result obtained under noise, i.e., 'output' = α × 'correct result' + (1 − α) × 'noise', where 0 ≤ α ≤ 1 is the circuit fidelity. Let p(U A , x) be the noiseless bitstring probability of measuring the mQSVT circuit with outcome 0 in the ancilla qubit and an n-bit binary string x in the n system qubits. Let p exp (U A , x) be the corresponding experimental bitstring probability, which can be estimated from the frequency of occurrence of the bitstring 0x in the measurement outcomes. Since the random circuit U A is approximately drawn from the Haar measure, we make analogous assumptions to those in [3,6], and assume the following global depolarized error model: We discuss the justification and potential generalization of such an error model in Supplementary Note 4. Under the global depolarized error model, we now analyze the effect of noise on the circuit and show how measuring the QUES allows the circuit fidelity to be extracted. Prior work has made use of a combination of quantum and classical computation to obtain the circuit fidelity α. Such analysis relies on the possibility of evaluating the noiseless bitstring probability p(U A , x) classically, given U A and x, e.g., via tensor network contraction [39]. This enabled the estimation of α from measurements of cross-entropy, referred to as XEB in this setting [3,6]. We adapt this approach to the Hamiltonian simulation problem by defining a system linear cross-entropy score (sXES): The prefix 'system' is added because the ancilla qubit is fixed to be the |0 state in the definition of |0 π SU(4) π SU(4) · · · π SU(4) |0 · · · |0 SU(4) SU(4) · · · SU(4) |0 · · · |0 SU(4) SU(4) · · · SU(4) |0 · · · (a) p(U A , x), p exp (U A , x), and the |x state belongs to the system register. In order to connect to the problem of generating heavy weight samples later, our definition of sXES excludes the bitstring 0 n . This is necessary also since the statistical properties of the bitstring 0 n are different from those of the bitstrings in the system register.
Taking the expectation with respect to the distribution of U A , and rearranging Eq. (2) then gives an expression for the circuit fidelity: .
(4) This expression holds for any ensemble of random matrices, and relies only on the assumption that the noise model is depolarizing.
Once the probability distribution of U A is specified (e.g., the Haar measure [40]), the only term in α that requires a quantum computation is sXES(U A ), and all other terms in Eq. (4) can be evaluated classically. However, evaluation of the right-hand side of Eq. (4) often requires a significant amount of classical computation when n becomes large [3].

Inferring circuit fidelity from QUES
Based on the discussion so far, it might seem surprising that an alternative, very good approximation to the circuit fidelity can readily be obtained from the QUES metric in Eq. (1). This is arrived at by first defining P exp (U A ) = x p exp (U A , x), i.e., the average over all possible output bit strings x of the probability of measuring a given bit string as outcome of the action of U A on the input state |0 n+1 . Then summing both sides of Eq. (2) with respect to all bit strings x, further taking the expectation value of both sides over all possible U A yields a fidelity estimate α QUES that can be obtained directly from the measured QUES value, namely The approximation error is defined as the maximal error for simulating a bounded Hamiltonian using the mQSVT circuit, namely := max H 2 ≤1 f t (H) − e −itH 2 . It determines the extent of deviation of α QUES from α. Specifically, under the globally depolarized noise model, we have the following bound (Supplementary Note 6) Here the error bound is derived without including the Monte Carlo measurement error due to the finite number of measurement shots. The analysis of the resulting statistical error is given in Supplementary Note 12 C. It is evident that, unlike Eq. (4), there is no classical overhead for evaluating α QUES for any n. Since the circuit fidelity α should be non-negative, combining Eq. (5) and Eq. (6) also indicates that under the assumption of the depolarizing noise model, we have To numerically verify the relation between QUES and circuit fidelity, we make use of the digital error model of [3] in which each quantum gate in the circuit is subject to a depolarizing error channel with a certain error rate. We test the resulting noisy quantum circuit with different two-qubit gate error rates r 2 and set the one-qubit gate error rate to r 1 = r 2 /10. We also discard the rotation gate with phase factor ϕ 2d , since this just adds a global phase to the exact Hamiltonian simulation. Then, given U A with a total of g 1 one-qubit gates and g 2 two-qubit gates, the reference value of the circuit fidelity can be set to α ref : [3,6]. We assume U A is Haar-distributed (numerically verified in Supplementary Note 7) to simplify the computation of classical expectations. Figure 3 summarizes the estimated circuit fidelity for random quantum circuits with different depth parameter d, variable coupling maps, and a range of error parameters. In all cases, we find that the derived circuit fidelity from QUES (α QUES ), the circuit fidelity α obtained from sXES, and the reference value α ref are generally consistent with each other. Numerical results also show that α QUES exhibits a trend that slightly overestimates the value of the fidelity α (see Table 5 for numerical values of the fidelities). We also see that for a given set of error rates r 1 , r 2 , the circuits with highest connectivity show the best performance. This is because random circuits on these architectures converge faster to the Haar measure, which reduces the circuit depth (see Supplementary Note 7).
In the next two subsections we show how to assess and evaluate whether the Hamiltonian simulation with the mQSVT circuit can be a classically hard task. We first define the analog of XHOG for Hamiltonian simulation, which we refer to as sXHOG, and give conditions for the hardness of this. We then show that potential classical hardness can be inferred directly from the value of the circuit fidelity obtained from the QUES, i.e. from α QUES .

Classical hardness and system linear crossentropy heavy output generation (sXHOG)
The complexity-theoretic foundation of the Google claim of 'quantum supremacy' in [6] is based on a computational task called linear cross-entropy heavy output generation (XHOG) with Haar-distributed unitaries [3,6,41,42]. Specifically, given a number b > 1 and a random n-qubit unitary U , the task is to generate k nonzero bitstrings x 1 , x 2 , · · · , x k ∈ {0, 1} n such that Here we use U without the subscript to distinguish the XHOG problem and the sXHOG problem which will be defined later. For k randomly generated bitstrings, we expect that 1 k k j=1 q(U, x j ) ≈ 2 −n . Therefore any value b > 1 will correspond to a 'heavy weight' output. When k is large enough, successful solution of the XHOG problem is considered to be classically hard for every value b > 1 [41,42]. This holds for every circuit fidelity estimate α > 0 obtained from the XEB metric, leading to the claim of supremacy in [6] based on extraction of a value α ≈ 0.002 from the experiments.
For the Hamiltonian simulation benchmark, we can define an analogous linear cross-entropy heavy output generation problem for the n system qubits. Note that the heavy weight samples are now defined only for the system qubits. We shall refer to this heavy output generation problem for Hamiltonian simulation as the sXHOG problem, to emphasize this important feature and the difference from the standard XHOG problem. Specifically, given a number b > 1, a Hamiltonian simulation benchmark circuit with sufficiently small approximation error , and a random (n+1)-qubit unitary U A defining a random Hamiltonian on the n qubits, the task is to generate k nonzero bitstrings for any x = 0 n at all t, but p(U A , 0 n ) can be much larger (for more details see Figure 5(b) in Supplementary Note 11). The state 0 n is then by definition 'heavy' and we must therefore exclude this from the measure in order to avoid a trivial outcome. This is what distinguishes the sXHOG problem from the original XHOG problem.
The potential classical hardness of the XHOG problem is justified by a reduction to a complexity-theoretic conjecture, called linear cross-entropy quantum threshold assumption (XQUATH) [41]. For completeness, we give a similar variant of the reduction of sHOG problem to a conjecture named system linear cross-entropy quantum threshold assumption (sXQUATH) in Theorem 6 of Supplementary Note 9. The concept of sXQUATH directly parallelizes that of XQUATH, with a similar restriction as above to exclude the output bit string 0 n (for more details see Supplementary Note 9). Similar to the construction in Ref.
[41], the classically efficient solution to sXHOG problem yields a violation to sXQUATH, which assumes that p(U A , x) for x = 0 n cannot be efficiently estimated on classical computers to sufficient precision.

Inferring classical hardness from QUES
In order to decide whether a noisy implementation of the Hamiltonian simulation benchmark is potentially in the classically hard regime, we need to establish whether or not the sXHOG problem can be solved for b > 1.
Under the assumption that U A is drawn from the Haar measure, and that the approximation error of the  mQSVT circuit is sufficiently small, we derive the following relation between b and the circuit fidelity α: Here α * is a fidelity threshold (not the complex conjugation of α) and γ a constant. Explicit expressions for the threshold value α * and the constant γ are given in Supplementary Note 10. Both quantities are independent of the circuit fidelity α and depend only on the number of system qubits n and the simulation time t. Eq. (7) thus shows that when γ > 0 and α > α * , we will have b > 1 so that the sXHOG problem solved by the mQSVT circuit might be classically hard. This is qualitatively different from the situation for XEB experiments, for which every α > 0 implies b > 1 [6].
Using the relation between QUES and α in Eqs. (5) and (6), and assuming that is sufficiently small, we immediately arrive at the conclusion that when the corresponding sXHOG problem might be classically hard for a sufficiently large value of n. This is a surprising result, since as noted above, the estimation of QUES does not require intensive classical computation. In fact it is not even necessary to actually generate any heavy weight samples -instead we just need to measure the value of QUES, Eq. (1), which is readily done by repeatedly running the circuit in Figure 1 with random circuit parameters as described above. Of course, should one wish to actually solve the sXHOG problem itself, the heavy weight samples would need to be generated using a quantum computer and intensive classical computation for computation of 1 k k j=1 p(U A , x j ) would then also be required. But in order to demonstrate the potential regime of classical hardness for Hamiltonian simulation, i.e., the minimal values of n and d to reach this regime, this is not required.
To further investigate the implications of Eq. (7), we now explicitly indicate the time dependence of all quantities (i.e., we employ the notation γ → γ t , α * → α * t ). In Figure 4 we plot the values of γ t , α * t according to the expressions given in Supplementary Note 10 as a function of the simulation time t, for n = 4, 8, 12 qubits. Figure 4 shows that γ t > 0 for all t, so then we only need to determine whether it is possible to have fidelity α ≥ α * t . It is evident from Figure 4 that both α * t (panel (a)) and γ t (panel (b)) show oscillatory behavior. We now analyze this behavior to identify an optimal time at which the potential classical hardness of Hamiltonian simulation in this random Hamiltonian setting can be demonstrated for a sufficiently large number of qubits n.
For very short times, i.e., when t ≈ 0, we have α * t > 1. This means that we cannot have b > 1 for any value of the circuit fidelity 0 ≤ α ≤ 1. To see why this is the case, consider the limit t = 0. Here p t (U A , 0 n ) = 1, and p t (U A , x) = 0 for any x = 0 n . By continuity, when t is very small, the magnitude of p t (U A , x) for most bitstrings x = 0 n is still very small and cannot reach the heavy output regime. Figure 4(a) also shows that there is a critical simulation time t thr ≈ 2.26, for which α * t < 1 for any t > t thr .
When t > t thr , ideally we would like to have α * t ≈ 0, so that a very low experimental circuit fidelity α is sufficient to reach the heavy output regime. To this end we investigate what happens at the vanishing circuit fidelity, i.e., α = 0. Detailed analysis shows that in the large n limit, we have γ t α * t = E (p t (U A , 0 n )), and Eq. (7) can be simplified as (see Supplementary Note 10) where the expectation value is taken with respect to the random unitaries U A as before. Thus when the expectation value is positive, i.e., E (p t (U A , 0 n )) > 0, in the large n limit we have b| α=0 < 1 and the task should not be classically hard. Moreover, since b is a continuous function of α, even if we now have finite circuit fidelity α, when this is small enough we can still find b < 1. This provides an alternative explanation of Eq. (7), namely, that the circuit fidelity α needs to be larger than the finite positive threshold value α * t > 0 for most values of t > t thr .
As a result of these considerations, when n is large enough, it is important to focus on the regimes where the expectation value E (p t (U A , 0 n )) ≈ 0, which from Eq. (7) implies that the threshold fidelity α * t ≈ 0. The numerical results shown in Figure 4 indicate that this can happen in two different scenarios. The first is when the simulation time t → ∞ (see the analytic justification of this statement in Supplementary Note 12 D). Of course this requires a very large circuit depth and is a physically 'trivial' limit that is impractical on near-term quantum devices. The second scenario, which is much more relevant in practice, is when α * t reaches its first minimum, which defines an optimal time t = t opt . In the large n limit, the value of t opt can be rationalized as the first node of the Bessel function J 0 (t/2) (see Supplementary Note 11). Figure 4 (a) shows that for t opt ≈ 4.81, we already have E (p t (U A , 0 n )) ≈ 0 and α * t ≈ 0. Therefore simulating to the time t = t opt is highly desirable, since this is a relatively short time at which the Hamiltonian simulation benchmark is nevertheless now guaranteed to solve the sXHOG problem even for a very small circuit fidelity. Our numerical results indicate that the values of t * and t opt depend only weakly on n, and their values are nearly converged for n as small as 12. Therefore this value of t opt can be used in a future quantum simulation in the heavy output regime.

Discussion
We have presented a quantum benchmark for Hamiltonian simulation on quantum computers. The Hamiltonian simulation problem is solved using a minimal quantum singular value transformation (mQSVT) circuit. The primary output of the Hamiltonian simulation benchmark is a single number called QUES, which can be verified without any classical computation, even in the regime that is potentially hard for classical computation. Therefore the Hamiltonian simulation benchmark provides a scalable benchmark of the circuit fidelity under the global depolarized error model, and can be executed and verified on future quantum devices with a large number of qubits.
As the current quantum computing technologies advance, the possibility of implementing some error correction is improving [43]. Here the highly structured mQSVT circuit provides useful indications of where best to implement error correction under limited resources for this. Recall that the mQSVT circuit consists of a series of repetitions of a random circuit U A and its conjugate U † A , interleaved with single-qubit Z rotation operators characterized by carefully selected phase factors. Thus given a specific random Hamiltonian block encoded in U A , the time dependent evolution operator for this Hamiltonian is defined entirely in terms of the phase angles for the single-qubit Z rotation operators. Since these phases should moreover be precisely determined, this suggests that on near-term quantum devices that may allow for some error correction but have overall limited resources, quantum error correction for these single-qubit rotations should be prioritized.
It is also useful to consider here the applicability of this Hamiltonian simulation approach to general Hamiltonians, i.e., not restricted to random Hamiltonians, on near-term quantum computers. Unfortunately it appears that for current quantum technologies there is potentially a large gap between the feasible simulation of a H-RACBEM given in this work and that of a general Hamiltonian relevant to e.g., molecular or solid-state physics. The main reason is that the block encoding of most Hamiltonians of practical interest will involve significant numbers of ancilla qubits, as well as multi-qubit control gates, all of which are extremely expensive on near-term quantum devices. In contrast to this general situation, the construction of H-RACBEM uses only whatever onequbit and two-qubit gates are available for a given quantum device and is thus considerably easier. Nevertheless, it is possible that undertaking Hamiltonian simulation with H-RACBEM may also yield interesting physi-  cal applications to the various settings in which quantum chaotic dynamics are relevant. One immediate possibility in this direction is to use H-RACBEM to simulate the dynamics of quantum scrambling or quantum chaos in strongly interacting quantum systems. Scrambling dynamics can be studied by simulating out-of-time-order correlators (OTOCs) for effective Hamiltonians that can be defined implicitly in terms of a random circuit for time t (see e.g., [44]). We note that one can easily perform a Hamiltonian simulation backward in time, merely by reversing the sign of t, so the mQSVT circuit for an OTOC at any time t of a random Hamiltonian encoded in H-RACBEM can be readily constructed by adding local operators between forward and backward implementations of the mQSVT. Evaluation of the circuit at different times t can be implemented either by reevaluating the phase factors (which may required building a longer circuit depending on the accuracy required). The circuits can also be adapted to Hamiltonian simulation at finite temperatures and hence also to scrambling at finite temperatures. From a theoretical perspective it would also be useful to explore to what extent the structure of the H-RACBEM influences the speed of scrambling [45].
Our theoretical analysis of the sensitivity of the Hamiltonian benchmarking scheme in this work was based on a fully depolarized noise model, which is often assumed to be a good model for superconducting qubits [6]. In general the Pauli stochastic noise model on which is based may be biased or non-uniform across qubits. In addition, thermal noise and coherent errors are important for some qubit architectures. It will be useful to extend the current analysis to more general noise models, and some of these aspects are discussed in Supplementary Note 4.
Finally, we note that while this Hamiltonian simulation benchmark is restricted to the specific class of random Hamiltonians, it might also provide information relevant to more general Hamiltonian simulations. Efforts to analyze the complexity of analog Hamiltonian simulations have often focused on the relation of such simulations to classical sampling tasks [46][47][48], and are closely related to the cross-entropy analysis for sampling of random quantum circuits of [3,6]. As noted recently [48], the classical hardness can be shown for certain classes of analog quantum Hamiltonian simulation [49,50]. Note that the potential classical hardness of the original XHOG problem corresponding to Google's supremacy experiment is justified by a reduction to a complexity-theoretic conjecture called XQUATH [41]. However, a recent paper [51] that appeared after submission of the current work has provided evidence that can refute XQUATH, at least for some classes of quantum circuits. Therefore it is possible that our sXQUATH assumption can be refuted on the same basis. It could be useful to explore generalizations of other classical sampling tasks to the QSVT setting, as was done here for the cross-entropy heavy output generation, to help guide the search for Hamiltonians whose simulation by QSVT can exhibit quantum advantages. Finally, the current approach of analysis of alternative fidelity measures under Hamiltonian simulation using mQSVT may provide useful for analysis of recent fidelity based experimental studies of analog Hamiltonian simulations that followed the emergent random nature of a projected ensemble of states [52].

Details of numerical simulations
All numerical tests are implemented in python3.7 and Qiskit [53]. Quantum circuits in Figure 2 are optimized by the transpiler provided by Qiskit before being executed on a real quantum device. The number of measurements (shots) is fixed to be 1, 000 for the experiments on real quantum devices in Figure 2, and it is set to 1, 000, 000 for those on classical simulators in Figure 3. The classical generation of Haar random unitaries in Figure 4 is performed by QR factorization to random complex matrices with i.i.d. Gaussian entries according to the recipe in [54].

Data availability
The experimental data that support the finding are available from the authors upon request.

Code availability
The codes that support the finding are available from the authors upon request.

Supplementary Note 1. NOTATIONS
We first introduce the definition of block encoding. Let A ∈ C N ×N be an n-qubit Hermitian matrix (N = 2 n ). If we can find an (n + 1)-qubit unitary matrix U A such that ( * stands for a matrix block whose entries are not of interest) holds, i.e. A is the upper-left matrix block of U A , then we may get access to the action of A on an n-qubit state |ψ via the unitary matrix U A by where |⊥ is an unnormalized (n + 1)-qubit state not of interest and satisfies (|0 0| ⊗ I n ) |⊥ = 0. Here we follow the row-major order convention. For instance, and Eq. (1) can also be written as Clearly when the operator norm A 2 is larger than 1, A cannot be encoded by any unitary U A as in Eq. (1). Generally if we can find α, ∈ R + , and an (m + n)-qubit matrix U A such that Here m is called the number of ancilla qubits for block encoding. We refer to [32] for more details on block encoding. When the block encoding is exact with = 0, U A is called an (α, m)-block-encoding of A. The special case of the (1, 1)-block-encoding may also be called a 1-block-encoding.
In the Supplementary Note, for notational simplicity, we may use U without a subscript to represent a (n + 1)-qubit quantum circuit drawn at random from a certain probability distribution. Unless otherwise noted, A denotes the upper-left n-qubit submatrix of U , i.e. U is the 1-block-encoding of A. This matrix A is also called a random circuit block encoded matrix (RACBEM), and H = A † A is corresponding Hermitian random circuit block encoded matrix (H-RACBEM) [9].
We use N = 2 n to represent the dimension of the Hilbert space of the system qubits, and I n to denote the n-qubit identity matrix. For a complex square matrix A with singular value decomposition (SVD) A = W ΣV † , its singular value transformation through an even function g is defined as g (A) = V g(Σ)V † . Here, the right triangle in the notation means only the right singular vectors V are kept in the transformation. If we consider |A| := √ A † A = V ΣV † , then the singular value transformation of A is equal to the eigenvalue transformation of the Hermitian matrix |A|, namely, g (A) = g(|A|). Furthermore, due to the even parity of g, there is a function f so that g(x) = f (x 2 ) and g (A) = f (|A| 2 ) = f (H). In particular, when g t (x) is an even polynomial approximation to s t (x) = e −itx 2 , we can define g t (x) = f t (x 2 ). Hence f t (x) approximates e −itx , and g t (A) = f t (H) approximates the Hamiltonian evolution e −itH .
We use U f,U to represent the minimal quantum singular value transformation (mQSVT) circuit in Figure 1, which has only a single ancilla qubit, m = 1. For any n-qubit input state |ψ , the mQSVT circuit performs the following transformation of the input quantum state, where |⊥ ∈ C N is an unnormalized quantum state. In other words, U f,U is the 1-block-encoding of f (H) ≡ g (A). A 2 := σ max (A) is the operator norm of a matrix which is equal to its maximal singular value.
f ∞ := max x∈[−1,1] |f (x)| is the infinity norm of continuous functions on [−1, 1]. E (·) stands for the average over the random matrix ensemble (most commonly, the ensemble of U ). Both z and z * stands for the complex conjugate of a complex number z. For a complex polynomial P (x) = i c i x i ∈ C[x], its complex conjugate as P * (x) = i c * i x i . For a matrix A, the transpose, Hermitian conjugate and complex conjugate are denoted by A , A † , A * , respectively. Without otherwise noted, an n-bit binary string x ∈ {0, 1} n is identified to its decimal representation. Specifically, when an n-bit binary string appears in the subscript of a matrix or a vector, it is identified to be its decimal representation (we use a zero-based indexing). For example, A 0 n ,1 n := A 0,2 n −1 . Table 1 summarizes the main notations used in the Supplementary Note. In the context of Hamiltonian simulation, many quantities depend on the value of the simulation time t. Such a t-dependence is usually added as a subscript such as p t (U, x). Most of the discussion focuses on the simulation at a fixed time t. Therefore when the context is clear, for simplicity we may drop the t dependence.
Symbol Definition U f,U mQSVT circuit in Figure 1 implementing a 1-block-encoding of f (H) A Upper-left n-qubit submatrix of a (n + 1)-qubit random unitary matrix U H an even polynomial approximation to st(x), also denoted by P (x, Φ) with phase factor Φ ft(x) gt(x 2 ), which is a polynomial approximation to e −itx P(·) probability density function of random quantum circuits Pexp(·) probability density function associated with the noisy implementation of random quantum circuits

Supplementary Note 2. EQUIVALENCE BETWEEN MINIMAL AND STANDARD QSVT CIRCUITS
The standard implementation of the QSVT circuit [32] uses one extra ancilla qubit, called the signal ancilla qubit. In this section, we show that when the system matrix A is block encoded with only one ancilla qubit, the signal ancilla qubit is no longer needed. Therefore the only ancilla qubit is due to the block encoding of A, and the circuit is called the minimal QSVT (mQSVT) circuit in Figure 1. Furthermore, Figure 1 removes all two-qubit and multi-qubit gates outside of the unitary U , which greatly simplifies the implementation for a given quantum device. We prove the equivalence between the mQSVT and the standard QSVT circuits in this section for completeness.
For any (n + 1)-qubit unitary U , let the singular value decomposition of its upper-left n-qubit submatrix A be A = W 1 ΣV † 1 . Following the cosine-sine decomposition (CSD), there exists n-qubit unitaries W 2 , V 2 so that U can be decomposed as follows, This decomposition also implies any n-qubit non-unitary matrix A, up to a scaling factor, can in principle be block encoded using only one ancilla qubit.
Then, the unitary matrix representation of the quantum circuit in Figure 1 is Let K be the permutation matrix permuting the j-th and the (N + j)-th columns, and V = diag {V 1 , V 2 }. The multiplicand is simplified as a direct sum of N 2-by-2 blocks upon conjugating V := V K, i.e.
Let W (x) := e i arccos(x)X , and The matrix representation is then It is straightforward to show that the following mapping from [−1, 1] to SU (2) x defines an even polynomial P (x) of degree at most 2d, and an odd polynomial Q(x) of degree at most 2d − 1, so that |P (x)| 2 + (1 − x 2 ) |Q(x)| 2 = 1 holds for any x ∈ [−1, 1]. Then, the matrix representation of the quantum circuit is For example, when g t (x) is an even polynomial approximation to s t (x) = e −itx 2 , we can define g t (x) = f t (x 2 ), and the diagonal n-qubit submatrices are g t (A) = f t (A † A) and (g t (A)) † = (f t (A † A)) † respectively. This remarkably simple structure of the QSVT circuit is due to the use of 1-block-encoding. In general, if an n-qubit matrix A is block encoded in an (n + m)-qubit unitary U , the standard QSVT circuit has the structure in Figure 1. In particular, even when m = 1, two CNOT gates are needed to implement each phase rotation. This introduces additional errors and can be practically cumbersome on near term devices the default two-qubit gate is not CNOT (e.g. Figure 1: Quantum circuit for quantum singular value transformation (QSVT) of an even complex matrix polynomial P of degree 2d. The dashed boxes represent the controlled rotation with phase factor ϕj where two m-qubit Toffoli gates controlled at |0 m are used. The QSVT circuit queries the (n + m)-qubit quantum circuit U and its inverse recursively for d times. For the given QSVT circuit, by measuring ancilla qubits with outcome 00 m , the action on the system qubits is the matrix polynomial.
In the mQSVT circuit in Figure 1, the phase factors (ϕ 0 , · · · , ϕ 2d ) are determined by an optimization procedure that provides an even polynomial g t (x) satisfying g t (x) − s t (x) ∞ ≤ for some given precision parameter (see Supplementary Note 3), and the evolution time t is encoded in the choice of phase factors. We then measure the top ancilla qubit and post-select on the 0 outcome of this measurement. This then ensures that the action on the lower n system qubits approximates the Hamiltonian evolution e −itH |0 n ≈ f t (H) |0 n , where H = A † A. Here f t (H) is a matrix polynomial, and the approximation error in the operator norm is upper bounded by f t (H) − e −itH 2 ≤ . In the absence of quantum errors the probability of measuring the top ancilla qubit with outcome 0, i.e. the P t (U ) := f t (H) |ψ 2 , will be close to 1. Specifically, by the triangle inequality, the probability of measuring the top ancilla qubit with outcome 0 is lower bounded: where p j = |V 0,j | 2 .
We also find that the probability p t (U, x) = | 0x|U ft,U |00 n | 2 ≈ | x| exp(−itH)|0 n | 2 will characterize the dynamics of the propagation from 0 n to x for any n-bit string x ∈ {0, 1} n . If the simulation time t is short, then exp(−itH) ≈ I, and p t (U, 0 n ) can be much larger than p t (U, x) for any bitstring x = 0 n . This issue will be particularly important when defining the 'heavy weight samples' in later discussions. Therefore we shall primarily focus on the case when x = 0 n .
As an illustrative example, Figure 2 shows a quantum circuit implementing a 2-qubit matrix A encoded by a 3-qubit unitary matrix U . The construction uses only the basic gate set {U 1 , U 2 , U 3 , CNOT}. In Supplementary Note 3 we describe an optimization based method to obtain the phase factors for a relatively short time t to a small approximation error . In this example we set t = 1. To obtain a theoretical error bound at large t, we can use the phase factor concatenation technique in Supplementary Note 12 B to obtain the simulation at t from 2 to 10, and the error bound t = t 2 is given by Theorem 14. Using these circuits, we may measure the outcome of the system qubits in the computational basis, and follow the dynamics of the probability x =0 n p t (U, x), i.e. that of the quantum state moving away from the initial state |0 n . Figure 2(b) shows that this agrees very well with the result using the exact dynamics x =0 n | x|e −itH |0 n | 2 . Furthermore, according to Eq. (3), the probability P t (U ) in Figure 2(c) satisfies the theoretical bounds 1 − 2 t ≤ P t (U ) ≤ 1 and is very close to 1.

Supplementary Note 3. OPTIMIZATION BASED METHOD FOR FINDING PHASE FACTORS IN THE HAMILTONIAN SIMULATION BENCHMARK
In order to implement the Hamiltonian simulation benchmark at time t, we need to find the phase factors Φ that generates an even polynomial P (x, Φ) = g t (x) so that g t (x) − s t (x) ∞ ≤ for a sufficiently small . For a large class of polynomials, the existence of such phase factors is established in [32, Theorem 4], and summarized in Theorem 1. where The phase factors in the theorem and those used in the quantum circuit in the main text is related by the following relation  In order to find the phase factors, the standard practice follows a two-step procedure. We first identify the approximate polynomial P (x). Then the phase factors for P (x) are computed following a recursive relation described in [32, Theorem 4]. In the case of the Hamiltonian simulation benchmark, it is highly nontrivial to find an approximate polynomial P (x) satisfying the conditions in Theorem 1 while approximating the function e −itx 2 sufficiently well. Therefore we cannot follow the standard procedure to evaluate the phase factors.
On the other hand, the recently developed optimization based method [33] provides an alternative route to streamline this process. Instead of following a two-step procedure, the optimization based method allows one to find both the approximate polynomial and the phase factor sequence in a single step. Note that the optimization procedure in [33] only addresses the case when the target function is real. Here the target function s t (x) is complex. Below we present a modified optimization procedure to find the phase factor sequence for complex polynomials.
Specifically, given an arbitrary set of phase factors Φ ∈ R d+1 , Theorem 1 defines a mapping R d+1 → C[x] giving a complex polynomial of degree at most d via P (x, Φ) := 0|U (x, Φ)|0 . Note that given the parity constraint, the number of (complex) degrees of freedom is d where d := d+1 2 . Therefore, to fix the polynomial, one needs to sample d points given the polynomial is complex valued. In practice, to ensure numerical stability, we sample the function on x k = cos 2k−1 4 d π , k = 1, · · · , d, which are the positive Chebyshev nodes of T 2 d (x). The optimization based method view P as a nonlinear approximation ansatz. Define the objective function as Taking the (2π)-periodicity into account, the optimization problem is The optimization problem is numerically solved by a quasi-Newton method. Table 2 describes the approximation error for polynomials measured by P (x, Φ * ) − s t (x) ∞ at simulation time t = 1, for polynomial degrees between 6 and 20. When the polynomial degree is 20, the approximation error is as small as 10 −8 , which demonstrates the effectiveness of the optimization based method.  According to Figure 4, there exists an 'optimal' simulation time t opt = 4.8096, for which the threshold fidelity α * (t opt ) ≈ 0 (the derivation is in Supplementary Note 11). Table 3 describes the phase factor sequences that can be directly used in Figure 1 to perform Hamiltonian simulation at time t opt . In order to reach low (3.0 × 10 −2 ), medium (9.4 × 10 −5 ), and high (1.6 × 10 −6 ) accuracy, the degrees of the polynomial found by the optimization procedure are 10, 18, 26, respectively. Figure 3 further shows the pointwise approximate error on the interval [0, 1] (the error on [−1, 0] is the same due to the even parity). Compared to Table 2, in order to reach precision = 3.3 × 10 −6 at simulation time t = 1, the polynomial degree still needs to be 14. So even though t opt is nearly 5 times larger, the polynomial degree only increases by less than twofold to reach similar accuracy. Since t opt is still relatively small, this does not violate the 'no-fast-forwarding' theorem of Hamiltonian simulation [15].  Table 3.

Supplementary Note 4. NOISE MODEL
In the experimental setting, the density matrix after the application of the quantum circuit U f,U can be written as Here, |ψ f,U := U f,U |0 ⊗ |ψ is the ideal quantum state generated by the exact implementation of the quantum circuit, and the operator χ f,U is due to the noise channel. Under the global depolarizing noise model for the mQSVT circuit, we have χ f,U = I/2 n+1 and the diagonal entries of Eq. (8) then yield the probabilities of Eq. (2). However, in practice χ f,U may not be a scaled identity matrix, or even a diagonal matrix. If so, the system linear cross-entropy score (sXES) in Eq. (3) should be expressed more generally as Now we can write where χ represents the effects of correlations between noise channels. Under the global depolarized noise channel we have χ = 0. Refs. [3] and [6, Supplementary information IV.B] argue that when the noise x|χ f,U |x is uncorrelated with the signal p(U, x), the statistical fluctuation error χ can be of higher order, as a result of the concentration of measure phenomenon and Levy's lemma in high dimensional spaces [55]. On the other hand, even though each component U in the mQSVT circuit can be taken to be the same as the random circuit in the supremacy experiment of Ref. [6], the overall mQSVT circuit exhibits additional structure due to the presence of the R z (phase) gates and the multiple repetitions of each U and U † pair. Thus the global depolarized error model may not hold in practice. To overcome this difficulty, the randomized compilation method in Ref.
[56] can be applied to the mQSVT circuits to convert all noise in the circuit into stochastic Pauli errors. The theory of error randomization can then be used to guarantee that the effect of these Pauli errors can be accurately modeled by global depolarization [3,57]. Specifically, the method of [56] divides a universal gate set into a set of 'easy' gates and a set of 'hard' gates and then uses randomization applied to the easy gates to twirl the errors on the hard gates. Ref. [56] shows that for a set of elementary gates consisting of Clifford plus T gates, making the easy set equal to one-qubit Clifford operations allows one to tailor general multi-qubit noise into Pauli noise on the hard gates. In the mQSVT circuit, thanks to the full flexibility in generating the random circuit U , we may explicitly construct U so that most gates are taken directly from an easy gate set. This enables the twirling operations to be applied independently within all of the U, U † blocks and thereby reduces the noise in each layer of gates in an mQSVT circuit into stochastic Pauli channels. Since the U circuits consist of random layers that generate a 2-design, multiple layers of this circuit implement an approximate 2-design. A 2-design twirls any errors into global depolarization (see also [1,58,59]), and so the overall error on a single U or U † can be approximated by a global depolarizing channel [3]. The repeated structure of U and U † subroutines in a mQSVT means that an error that is randomized by U is then 'de-randomized' by U † . To mitigate this effect, it is possible to apply the randomized compilation method of Ref.
[56] to each U, U † independently. The impact of this type of effects has been studied in the setting of randomized mirror circuit benchmarks [7,10,57], which also use circuits with a U, U † structure and employ a form of randomized compilation. These studies have explicitly shown that with this approach the overall error can still be modeled by a global depolarizing channel.

Supplementary Note 5. STRUCTURE OF THE PROBABILITY SPACE OF MEASURING NOISY RANDOM QUANTUM CIRCUITS AND SXES
There are two sources of randomness when measuring noisy random quantum circuits. The first is due to the random choice of U with probability density P(U ). The second is due to the Monte Carlo nature of the quantum measurement. Specifically, given the choice of U and a noisy implementation of the quantum circuit, the probability to obtain 0x as the measurement outcome is p exp (U, x) := 0x| exp |0x . The joint probability of U and the measurement outcome 0x is Here for simplicity, we only focus on the measurement result whose ancilla qubit is measured with outcome 0. When P(U ) is given by the Haar measure, the noise channel is depolarized, and the noisy (experimental) bitstring probability is which is the convex combination of the exact bitstring probability and the uniform distribution [3]. The information of the circuit fidelity can then be encoded in the experimental average of various quantities. For example, the bitstring probability for nonzero bitstrings is given by Eq. (12) connects quantities evaluated from quantum experiments and classical computation on the left hand side and those from classical computations on the right hand side. The left-hand side is given by the system linear cross-entropy score (sXES) in Eq. (3), which can be evaluated from multiplying the bitstring frequency p exp (U, x) obtained from the quantum experiment and the bitstring probability p(U, x) computed classically. The quantities on the right-hand side can be evaluated fully classically. The circuit fidelity α is then the only unknown and can be solved for by substituting the quantum experimental and classically computed quantities into Eq. (4) of the main text.

Supplementary Note 6. ESTIMATING CIRCUIT FIDELITY FROM QUANTUM UNITARY EVOLUTION SCORE
The experimental average of the probability of measuring the ancilla qubit with outcome 0 is Here P (U ) := x p(U, x), and P exp (U ) is the probability which can be approximately determined by the bit frequency of the measurement outcome in the experiment. Rearranging the terms in Eq. (13), the circuit fidelity can be alternatively estimated via From Eq. (3), we have P (U ) ∈ [1 − 2 , 1]. Hence a lower and upper bound on the fidelity follows: The difference between the upper and lower bound is Therefore, lim →0 (α−α) = 0 and the derived bounds are tight. Let us choose the form of the estimate as -independent Then, |α QUES − α| ≤ α−α ≤ 16 +O( 2 ). Furthermore, the estimate can be determined using only the experimentally measurable quantity P exp (U ) and is independent of the classical computation of P (U ), which may be hard to evaluate for large n. This remarkable fact, namely the evaluation of circuit fidelity without any classical computation, arises from the approximate implementation of Hamiltonian simulation of the overall circuit. Since only one ancilla qubit is measured, the QUES defined in Eq. (17) cannot entirely capture whether the circuit is implemented correctly. However, when the assumption that the noise channel is depolarized and when the polynomial approximation to s t (x) is sufficiently accurate, α QUES provides a very good estimate to the circuit fidelity.

Supplementary Note 7. ALGORITHM FOR CONSTRUCTING RANDOM QUANTUM CIRCUITS AND NUMERICAL CONVERGENCE TO HAAR MEASURE
In order to theoretically analyze the circuit fidelity, we need the additional assumption that P(U ) is the Haar measure. This has the advantage that several terms in Eq. (4) can be evaluate analytically. Using the Haar measure, the statistics of an ensemble of random Hamiltonians is much simplified and can be computed by the statistics of the truncation matrix of Haar unitaries [60-64]. Details of the statistics are given in Supplementary Note 12 A. Furthermore, if U is Haar-distributed, then the noise effect of directly sampling U is well captured by a fully depolarized error channel, due to the nearly maximal entanglement in the output state [3,6]. The need to choose an appropriate circuit depth such that the circuit statistics approximate those of Haar unitaries motivates an investigation of the statistics of Haar random quantum circuits of finite number of qubits.

ALGORITHM 1: Constructing random quantum circuits
Input: Coupling map G = V, E where V is the set of n qubits, E is the set of qubit pairs on which CNOT is available, basic gates Γ = {U1, U2, U3, CNOT}, the number of total one-qubit gates g1, and the density of one-qubit gates p1 ∈ (0, 1).
Set the number of two-qubit gates to g2 = 1−p 1 2p 1 g1 . Set the maximal number of two-qubit gates in each layer to y2 = 1−p 1 2 n . Set m1 = m2 = 0, initialize an empty quantum circuit C. while m1 ≤ g1 and m2 ≤ g2 do Draw x2 ≤ y2 pairs of qubits from E so that each pair (q1, q2) and its permutation (q2, q1) are not selected in the previous layer. The choice of x2 also satisfies m2 + x2 ≤ g2.
Draw x1 = min{n − 2x2, g1 − m1} one-qubit gates uniformly at random from Γ\{CNOT} and act them on the rest of qubits in this layer.
Update the numbers of one-and two-qubit gates, m1 ← m1 + x1 and m2 ← m2 + x2. end while if m1 < g1 then Append layers of random g1 − m1 one-qubit gates sampled uniformly at random from Γ\{CNOT}. else if m2 < g2 then Append layers of g2 − m2 CNOT gates acting on random operands. end if Return: A random quantum circuit C with g1 one-qubit gates and g2 two-qubit gates.
We first construct random quantum circuits by using the algorithm given in Supplementary Note 7. It follows a similar recipe in Ref. [9]. We set the basic one-qubit gates to U1, U2 and U3 gates. Up to a global phase factor, the U3 gate is which is a generic single-qubit operation parameterized by three Euler angles. The U1 and U2 gates are defined by restricting to one or two Z-rotation angles respectively, i.e.
Taken together with the CNOT gate, these form a continuously parameterized gate set that is universal.
Although we specify the choice of one-qubit gates and the use of the CNOT gate here, Supplementary Note 7 can be directly generalized to an arbitrary basic gate set. The random quantum circuit generated by the algorithm respects the architecture of a quantum computer. In practice, we set the density of one-qubit gates to p 1 = 0.5. Then, for an n-qubit random quantum circuit with layers, the number of one-qubit gates is g 1 = n 2 and that of two-qubit gates is g 2 = n 4 . To measure the numerical convergence of random circuits to the Haar measure, we first summarize some of the statistical properties of the Haar measure. Given an n-qubit Haar-distributed unitary U , we denote p ij := |U ij | 2 . As a special case of the more general Theorem 7 (to be presented in Supplementary Note 12 A), the p ij 's are identically Beta-distributed.
Theorem 2. The probability density of p ij is Beta (1, N − 1), Proof. Let the submatrix of interest be the upper left 1-by-1 block, namely, a single matrix element a := U 00 . Note that p 00 := |a| 2 . Then, Eq. (30) indicates that the probability density of a is The polar decomposition of the complex number a = re iθ yields the Jacobian da = rdrdθ ∝ dp 00 dθ. Then, integrating with respect to dθ, the marginal distribution of p 00 is This is the Beta(1, N − 1) distribution. When i = 0 or j = 0, let K 1 be the matrix permuting the i-th row and the 0-th row by left multiplication, and let K 2 be the matrix permuting the j-th column and the 0-th column by right multiplication. Then, U := K 1 U K 2 is Haar distributed by the bi-invariance of the Haar measure. Furthermore, U 00 = U ij . Therefore, the previous proof shows that p ij is also Beta(1, N − 1) distributed.
Note that in the limit N 1, the distribution of p ij is well approximated by the exponential distribution Exp(N ), a.k.a. the Porter-Thomas distribution derived in [3]. The statistics follows straightforward computation by integrating with respect to the probability density.
The variance of the k-th moment V Haar We remark that in the limit N 1, M Haar k ≈ k! N k−1 and S Haar ≈ ln(N ) + γ − 1 where γ is Euler's constant. The asymptotic results are the same as those derived in [3]. The variance is asymptotically V Haar From the variance, we conclude two important features about the statistics. Given N , the variance (i.e. fluctuation) increases with respect to the order of the moment. For each moment, the statistics becomes concentrated as N increases, namely the variance V Haar k vanishes as N → ∞. By Taylor expansion, the entropy has the same concentrated behavior which can be numerically observed in Figure 4. Figure 4 presents the statistics of random quantum circuits for several different structures of the circuit coupling map and shows that for all three coupling maps studied in the main text, the distribution of circuits of sufficient depth converges to the Haar measure. In Figure 4(a), we plot the normalized entropy S/S Haar . The figure shows that the entropy converges to that of Haar measure, i.e., S/S Haar → 1 after the circuit depth of U increases beyond specific values that depend only weakly on the number of qubits n. Since the random quantum circuit is constructed by combining layers of random one-and two-qubit gates, we test the convergence of random quantum circuits for different coupling maps, thereby varying the qubit pairs on which the two-qubit gates can act. The numerical results in Figure 4(a) show that coupling maps with greater connectivity converge significantly faster to the Haar measure. We attribute this to the larger number of possible allocations of two-qubit gates enhancing state entanglement within the system and thereby leading to faster mixing of information.
In addition to showing the convergence in terms of circuit entropy, we also quantify the convergence to the Haar measure for the first five moments in Figure 4(b). The minimal depth to achieve approximate Haar random circuits deduced from the convergence in moments is highly consistent with that derived from the convergence in entropy. We list the depth used in the computation of the sXES in Table 4.  Table 4: Depth for random quantum circuits used in the computation of the system linear cross-entropy score. Each depth is chosen so that both entropy and moments are close to these derived from the Haar measure.

Supplementary Note 8. CIRCUIT FIDELITY FROM THE SYSTEM LINEAR CROSS-ENTROPY SCORE
The system linear cross-entropy score is based on Hamiltonian simulation. Note that in the circuit fidelity of Eq. (4), only the terms involved the system linear cross entropy sXES in the numerator contain quantities that must be experimentally evaluated. All other terms can be simplified by using the statistical property of an ensemble of random matrices inherited from the Haar measure of U in Supplementary Note 12 A (in particular Theorem 13).
The procedure of computing the system linear cross-entropy score can be summarized as follows.
1. Draw quantum circuits U i 's approximately from the Haar measure at random.
2. For each U i , build the mQSVT circuit for the Hamiltonian simulation benchmark, and measure all qubits to count the bitstring frequencies of 0x where x ∈ {0, 1} n . The bitstring frequency is an estimate to p exp (U i , x). Furthermore, the sum of bitstring frequencies for all x's is an estimate to P exp (U i ) = x∈{0,1} n p exp (U i , x).
3. For each U i and bitstring 0x, compute the noiseless bitstring probability p(U i , x) on classical computers.

Compute estimates of the fidelity according to Eqs. (4) and (5).
We list the circuit fidelity estimated by different methods in Table 5. The agreement shows the consistency of the quantum Hamiltonian simulation benchmark. Here, the theoretical reference value is estimated from the depolarization noise model. Given U with a total of g 1 one-qubit gates and g 2 two-qubit gates, the value α ref := (1 − r 1 ) 2d(g1+1) (1 − r 2 ) 2dg2 follows approximately assuming each quantum error fully mixes the quantum state.

Supplementary Note 9. CLASSICAL HARDNESS OF SXHOG
Definition 4 (sXHOG, or System Linear Cross-entropy Heavy Output Generation). Given as input a number b > 1, a random (n + 1)-qubit unitary U , and the mQSVT circuit for the Hamiltonian simulation benchmark with sufficiently small approximation error , output nonzero bitstrings x 1 , x 2 , · · · , x k ∈ {0, 1} n \{0 n } so that  The classical hardness of the XEB experiment is justified by reducing the XHOG problem to a complexity assumption referred to as Linear Cross-entropy Quantum Threshold Assumption (XQUATH) [41]. Similarly, the hardness of the sXHOG problem can be reduced to an assumption that we refer to by analogy as sXQUATH.
Definition 5 (sXQUATH, or System Linear Cross-entropy Quantum Threshold Assumption). Given a random (n+1)qubit unitary U , and the mQSVT circuit for the Hamiltonian simulation benchmark with sufficiently small approximation error , for a uniformly random x ∈ {0, 1} n \{0 n }, there is no polynomial-time classical algorithm that produces an estimate p of p x := p(U, x) so that Here, the expectation is taken over random circuits U , the internal randomness of the algorithm, and the random bitstring x.
The reduction of the XHOG problem is given in the following theorem, which is directly parallel to that in [41, Theorem 1].
Proof. Suppose that A is a classical algorithm solving sXHOG in Definition 4 with a success probability s as stated in the theorem. Given U and the mQSVT circuit as the input of A, it outputs S := {x i = 0 n : i = 1, · · · , k}. When A successfully solves the sXHOG problem, the set S satisfies Eq. (18). Specifically, let x ∈ {0, 1} n \{0 n } be a bitstring sampled uniformly at random. We now construct an algorithm to produce an estimate p of p(U, x). Given such a bitstring x, the algorithm outputs an estimate p = b2 −n if x ∈ S and p = 2 −n if x / ∈ S. Consider a random variable Here, the randomness comes from the uniformly random bitstring x, the random unitary U and its corresponding mQSVT circuit, and whether the classical algorithm A succeeds. We write them explicitly as the subscript of the expectation. Furthermore, we denote by S Furthermore, X(U, x) ≡ 0 if x / ∈ S, regardless of whether A succeeds or not. By the law of total expectation, (20) Here Note that S (s) U and S (f ) U are sets of k distinct bitstrings. Following that x is uniformly distributed, we have Then when k ≥ 1 ((2s−1)b−1)(b−1) . This violates sXQUATH and thereby proves the classical hardness of sHOG.

Supplementary Note 10. CIRCUIT FIDELITY AND SXHOG
In this section we demonstrate that the success of sXHOG can be verified by experimental evaluation of the circuit fidelity. Due to the relation between the circuit fidelity and QUES in Supplementary Note 8, it means that the success of sXHOG can be verified by QUES, which does not involve any classical computation.
First, since we are only interested in the measurement outcome whose ancilla qubit returns 0, we normalize the bitstring probability as a conditional probability p exp (U, x|ancilla = 0) := p exp (U, x)/P exp (U ).
The physical interpretation of the normalization of the bitstring probability is to discard the measurement result whose ancilla is measured with 1. We also remark that when the Hamiltonian simulation benchmark circuit is sufficiently accurate, we have P (U ) ≈ 1, and it is not necessary to normalize the noiseless bitstring probability in Eq. (18). The probability that the experimental measurement on the ancilla qubit outputs 0 is Given the circuit fidelity α, we denote the conditional probability density as and the corresponding expectation is denoted as E Qα (·). Then, the conditional average bitstring probability, which is directly related to the parameter b in determining the sXHOG problem as The last equality is derived using results in Supplementary Note 12 A. Here and are cosine transformations of P Thus for large n, we have Here γ = 2 − 5H 1 + 4H 2 and α * = H 1 /γ. When H 1 , H 2 are sufficiently small, b(α) is monotonically increasing. The hardness of classical spoofing also requires b(α) ≥ 1 (see Theorem 6). Thus, we define the threshold α * := H1 2−5H1+4H2 be the fidelity so that b(α * ) = 1. To achieve supremacy, the fidelity is required to satisfy α ≥ α * . To see the existence of the threshold fidelity, let us consider a fully contaminated noise where α = 0. Then, the average bitstring probability is By continuity, a threshold fidelity α * exists to ensure b(α) > 1. It also indicates that the threshold is very close to zero when the diagonal elements of the time evolution vanish simultaneously in the ensemble, namely E (p(U, 0 n )) ≈ 0. The threshold can be suppressed by choosing a larger simulation time t since α * → 0 as t → ∞. Furthermore, when α * 1, the conditional average bitstring probability is Note that at t opt , the threshold α * | t opt ≈ H 1 /2. Eq. (43) implies that H 1 ≥ − 2 N −1 . Therefore, when the number of qubits n is not sufficiently large, α * | t opt can possibly be negative. However, as n increases, α * | t opt converges to 0 exponentially fast because the lower bound − 2 N −1 → 0 in the large n-limit. This agrees with the numerical behavior of the threshold α * shown in Figure 4.

Supplementary Note 11. ANALYTIC ESTIMATION OF t opt FOR LARGE n
According to the result in Figure 4, at t = t opt , we have E (p t (U, 0 n )) = E| 0 n |e −iHt |0 n | 2 ≈ 0.

Jensen's inequality gives
If H is a H-RACBEM and the corresponding U is drawn from the Haar measure, then Here P Therefore for any t, Here we have used the integral representation of the Bessel function of the first kind The solution of the system heavy output generation problem and analytic evaluation of the system linear crossentropy score require the use of statistical properties of the ensemble of random matrices obtained from the Haar measure. In this section, we derive the statistical properties of the ensemble. We consider a generic block encoding with m extra ancilla qubits, namely, an n-qubit matrix A is a submatrix of an (n + m)-qubit unitary U . We use M = 2 m to represent the dimension of the Hilbert space generated by m ancilla qubits. We assume that U is drawn from an (n + m)-qubit Haar measure.
Given the identification C N ×N C N 2 , the uniform measure on the space of complex matrices is identified as the pushforward of the Lebesgue measure on C N 2 , for example, by taking the coordinate system as matrix elements. We denote this uniform measure as dA. Assuming that A is an n-qubit submatrix of a Haar-distributed (n + m)-qubit unitary U , the first theorem gives a characterization of the induced probability distribution of A.
where Z : dA is a normalization constant. Here, 1 is an indicator function. It gives 1 when the condition in the subscript is satisfied, and gives 0 otherwise. Generically, let A be an n 1 -by-n 2 submatrix of an n-by-n Haar-distributed unitary U , and n ≥ n 1 + n 2 . Then, the probability density is In particular, for 1-block-encoded matrix with m = 1, the exponent of the determinant is 0, and A is uniformly distributed in the unit ball {A ∈ C N ×N : and diag Σ = (σ 1 , · · · , σ N ). The Jacobian of this decomposition is dA ∝ dV dW ∆(σ 2 1 , · · · , σ 2 N ) 2 N j=1 σ j dσ j where dW, dV are the Haar measure on their compact manifolds respectively and ∆(x 1 , · · · , x n ) := i<j (x i − x j ) is the Vandermonde determinant. Then, the distribution of V, W, Σ follow immediately the theorem.
Since H = A † A, the eigenvalue λ j of the random Hermitian matrix H and the singular value σ j of the complex matrix A is related by λ j = σ 2 j . By a direct change-of-variable, the joint distribution of all eigenvalues has the density The normalization constant is precisely given by the Selberg's integral [67] The distribution is invariant under the relabeling of eigenvalues (λ 1 , · · · , λ N ) → (λ π(1) , · · · , λ π(N ) ) for any permutation π. This feature is inherited from the bi-invariance of the Haar measure on the compact Lie group.
In practice, only the marginal distribution involving a few eigenvalues will be used. However, the Vandermonde determinant in the joint distribution couples all eigenvalues together, which makes it hard to compute the marginal distribution analytically. Nonetheless, a semi-analytical representation by orthogonal polynomial expansion can be derived as follows.
Let Theorem 9 ([40, Theorem 5.7.1]). Let {C i (x) : deg C i = i, i = 0, · · · , N − 1} be a set of linearly independent monic polynomials such that they are orthogonal with respect to (·, ·) w . Let c i := C i 2 w . Define a bivariate function Then, the joint distribution for eigenvalues follows a determinantal process The orthogonal polynomial can be generated by 3-point recursion formula. Specifically, for 1-block-encoding (i.e. m = 1 and M = 2), the weight function is w ≡ 1, and the orthogonal polynomial is the shifted Legendre polynomial Corollary 10. For 1-block-encoding, the joint eigenvalue distribution can be expressed in terms of the bivariate function where P i is the i-th Legendre polynomial.
With Corollary 10, the averages of bitstring probability can be evaluated semi-analytically. For a generic complex even polynomial, we define R q1,··· ,q 2 k1,··· ,k 1 |r1,··· ,r 3 A complex polynomial f ∈ C[x] can be determined by setting f (x 2 ) = g(x). Note that g(σ j ) = f (λ j ) relates the singular value transformation and the eigenvalue transformation. The expectation can be expresses exactly by the integration with joint distribution, R q1,··· ,q 2 k1,··· ,k 1 |r1,··· ,r 3 For Hamiltonian simulation, e −itx 2 has unit absolute value for all x ∈ R. For simplicity we assume the approximation error is sufficiently small, and g(x) = s t (x) = e −itx 2 , or equivalently f (x) = e −itx . This allows us to omit the terms due to |f rj (λ 1+ 2+j )|. Furthermore, when the upper and lower indices of R in Eq. (39) are the same k j = q j = 1, the relabeling invariance of eigenvalues implies that the defined quantity is reduced to which is directly related to the Hamiltonian simulation. When the time dependence is irrelevant to the analysis, we drop the t dependence in H (t) by writing it as H for simplicity. We can compute H as follows.
Theorem 11. Let the degree-d complex polynomial f (x) = d q=0 c q P q (2x − 1) be decomposed in terms of Legendre polynomials. Then, Here, S 2 is the symmetric group, if 2s = i + j + k is even and |i − j| ≤ k ≤ i + j, 0, otherwise.
when j ≤ and f (j) (x) = f (x) when j > . Then, directly applying Theorem 9 and Corollary 10, the quantity can be evaluated immediately, The conclusion follows.
The constraint in Eq. (42) will be referred to as the triangle rule. Due to the triangle rule, F i,j,k is a sparse tensor. Many terms in the (2 )-fold summation vanishes, which can be used to accelerate the evaluation. Note that Legendre polynomials are bounded by 1 on [−1, 1], which implies that |F i,j,k | ≤ 1. To circumvent the numerical instability arising from factorials, F i,j,k can be evaluated recursively, where 2s = i + j + k. Using Algorithm 2, F i,j,k can be evaluated stably with s recursions.
ALGORITHM 2: A stable recursive algorithm for computing F i,j,k Input: A triplet (i, j, k) satisfying the triangle rule.
Order and relabel the triplet so that i ≤ j ≤ k.
Recursively call the algorithm to compute F i,j,k−2 . Note (i, j, k − 2) preserves the triangle rule. Return: F i,j,k = 2s−1−2i Recursively call the algorithm to compute F i,j−1,k−1 . Note (i, j − 1, k − 1) preserves the triangle rule. Return: F i,j,k = 2s−1−2i The measurement on all qubits gives an (n + 1)-bit string. We are interested in the bitstring 0x which means the outcome of the ancilla qubit is 0 and that of n system qubits is x ∈ {0, 1} n . The probability of measuring the bitstring 0x is When x = 0 n ≡ 0, p(U, x) = j,k g(σ j )g(σ k ) |V j0 | 2 |V k0 | 2 involves only one column of a Haar unitary V . Yet when x = 0 n , the probability involves two columns. For x = 0 n or 1 n , let us consider another unitary V by permuting the column 1 and column x of V . By the bi-invariance of Haar measure on unitary group, V and V are identically distributed. Therefore, we conclude the following lemma.
According to Corollary 8, the distributions of Σ and V are decoupled. The average over the singular values can be evaluated semi-analytically, and the average over the Haar unitary can be analytically computed by using representation theory. We conclude the relevant average values as follows.
For bitstring x = 0 n , E p(U, 0 n ) 2 = E   i,j,k,l g(σ i )g(σ j )g(σ k )g(σ l )p i p j p k p l   Using Eq. (46), the four-fold summation can be categorized into five equivalent classes. We define the partition α be the array with at most 4 entries whose sum is exactly 4. The evaluation of the second moment of the probability with nonzero bitstrings x follows a similar procedure but is more involved. Using Lemma 12, E x =0 n p(U, x) 2 = (N − 1)E p(U, 1 n ) 2 . Let H(i, j, k, l) := E V 1 n ,i V 0 n ,i V 1 n ,j V 0 n ,j V 1 n ,k V 0 n ,k V 1 n ,l V 0 n ,l .
Note that the linear map φ ijkl commutes with the action of the symmetric group S 4 and that of the unitary group U(N ) on the tensor product space C N ⊗4 . Then, by Schur's lemma over C, φ ijkl is a scalar on each subspace decomposed with respect to Schur-Weyl duality. By taking trace on each subspace, the linear map is determined as the linear combination of projectors. The generic formula was derived in [64] by which H(i, j, k, l) is evaluated. Let the four-fold sum in terms of ijkl be broken into five equivalent classes as follows. The asymptotic behavior of the ensemble is discussed in Supplementary Note 12 D.

B. Concatenating phase factors for long time Hamiltonian simulation
Although simulation at very long time is certainly beyond the regime of near term applications, the unitarity of Hamiltonian simulation provides an alternative method to obtain phase factors. Specifically, given the phase factor sequence at some short time t, the phase factor sequence at a long simulation time rt can be easily constructed for some integer r > 1. The procedure, called phase factor concatenation, is given in Eq. (47), and the quality of the approximation is describe in Theorem 14.
By the triangle inequality, Let r be the approximation error of P (r) , and let Q (r) be the complementing polynomial in as Theorem 1. Similarly, we have (1 − x 2 ) Q (r) (x) 2 ∞ ≤ 2 r . By the construction of the phase factor sequence in Eq. (47), we have P (r) (x) = P (r−1) (x)P (x) − (1 − x 2 )Q (r−1) (x)Q(x). so that the deviation in the probability is bounded |p exp (b) − p exp (b)| ≤ δ with high probability. Then, when computing the QUES by measuring each circuit with M meas ≥ 2 δ 2 shots, which is referred to as QUES := E P exp (U ) , the statistical error is bounded as The circuit fidelity is estimated byα QUES := 2 × QUES − 1. Furthermore, the error is bounded as where the last inequality uses Eq. (6). The derived error bound is a generalization of Eq. (6) by including the Monte Carlo measurement error due to the finite number of measurement shots.

D. Asymptotic behavior of the long time Hamiltonian simulation benchmark
The first and second moments of the bitstring probability are directly relevant to the construction of the system linear cross-entropy benchmarking based on Hamiltonian simulation. For simplicity, we define where the dependence on t is encoded in the implementation of the quantum circuit. In this section, we will investigate the behavior of these moments in different regimes in terms of the Hamiltonian simulation time t when N is sufficiently large. According to Lemma 12, K 1 (x, t) and K 2 (x, t) are constant for any nonzero bitstring x = 0 n . Therefore, by using Theorem 13, we have and for any x = 0 n , Note that by definition, H 1 (t) and H 2 (t) are the cosine transformation of the joint eigenvalue distribution P (2) eig and P (4) eig respectively. According to Theorem 9, these joint distributions are polynomials. Then both H 1 (t) and H 2 (t) converge to zero as t → ∞. In particular, there exists t * such that H 1 (t), H 2 (t) = O 1 N for any t > t * . In this regime, for any nonzero bitstring x = 0 n , Note that the bitstring probability of a Haar-distributed unitary U , p ij := |U ij | 2 , has the first and second moment E (p ij ) = 1 N and E p 2 ij = 2 N 2 + O 1 N 3 . Furthermore, by using Lemma 12 and Theorem 13, the cross moment of two different nonzero bitstrings x = y is Thus, in the regime where H 1 (t), H 2 (t) = O 1 N , the cross moment is 1 N 2 + O 1 N 3 . Following Theorem 2, the cross moment of success probabilities of a Haar-distributed unitary is exactly the same up to higher order E (p ij p kj ) = 1 N (N +1) = 1 N 2 + O 1 N 3 . Remarkably, the correlation between different nonzero bitstrings is small, which is quantified by the covariance Cov (p(U, x)p(U, y)) = E (p(U, x)p(U, y)) − E (p(U, x)) E (p(U, y)) = O 1 N 3 . We conclude that in the defined regime, the ensemble of the time evolution matrix induced by our construction has approximately the same statistics as that of Haar-distributed unitaries up to at least the second moment.
Note that the threshold α * t := H1(t) 2−5H1(t)+4H2(t) is directly related to H 1 (t) and H 2 (t). When t = t opt for small time or t > t * in the long time regime, we have H 1 (t), H 2 (t) = O 1 N which leads to an exponentially small threshold α * t = O 1 N . It allows the quantum supremacy to be achieved for a small circuit fidelity α > α * t . Note t * can be impractically large for near-term applications. Hence it is crucial that the simulation at t = t opt ≈ 4.81 is equally effective and much more tractable.