The complexity of NISQ

The recent proliferation of NISQ devices has made it imperative to understand their power. In this work, we define and study the complexity class NISQ, which encapsulates problems that can be efficiently solved by a classical computer with access to noisy quantum circuits. We establish super-polynomial separations in the complexity among classical computation, NISQ, and fault-tolerant quantum computation to solve some problems based on modifications of Simon’s problems. We then consider the power of NISQ for three well-studied problems. For unstructured search, we prove that NISQ cannot achieve a Grover-like quadratic speedup over classical computers. For the Bernstein-Vazirani problem, we show that NISQ only needs a number of queries logarithmic in what is required for classical computers. Finally, for a quantum state learning problem, we prove that NISQ is exponentially weaker than classical computers with access to noiseless constant-depth quantum circuits.

Faulty oracle models.A number of works have studied the effect of imperfect oracles on quantum speedups.For instance, [1] studied whether the exponential speedup achieved by the quantum annealing algorithm for the welded tree problem persists when the oracle is subjected to various kinds of noise.Relevant to our Theorem 2.4, [2,3] considered the performance of Grover's algorithm when the phase oracle is subject to small phase fluctuations, and [4] showed that under this faulty oracle model speedups are not possible for any quantum algorithm; see also [5] for a different oracle noise model.Relevant to our Theorem 2.5, [6,7,8] showed that noise in an oracle for subset parity does not affect the computational complexity of quantum learning algorithms in the same way conjectured for classical learning algorithms.
These results assume that noise occurs inside the oracle, but that the quantum computation leveraging the faulty oracle is noiseless.Moreover, the lower bounds in [1,4,5] assume a global noise on the oracle as opposed to local qubit-wise noise considered in this work.In contrast, we study the effects of imperfections in quantum computation due to local noise, as well as noisy oracles with noise appearing before and after the oracle.
Noiseless hybrid quantum-classical models.A number of works have studied the power and limitations of hybrid quantum-classical models when the quantum computation is assumed to be noiseless.For example, the recent work [9] studies unstructured search in a noiseless hybrid setting where the algorithm can make queries to both classical and quantum versions of the search oracle.They show that any algorithm with constant success probability must make either Ω( √ N ) queries to the quantum oracle or Ω(N ) queries to the classical oracle.Earlier works [10,11] showed that relative to various oracles, namely the recursive Simon's and welded tree problem, BQP is strictly more powerful than classical computation, which is assisted by a noiseless bounded-depth quantum device.The oracle we use in Theorem 2.3 is essentially a simplified version of the recursive Simon's problem, and in Supplementary Note 8, we show how recursive Simon's itself can be used to simultaneously separate NISQ and the complexity classes considered in [10,11] from BQP.In [12], the authors used a very different proof technique to establish a statement that is equivalent to our Theorem 5.B.1, which is the main component for establishing Theorem 2.4 on the lack of Grover-like quadratic speedup in NISQ on unstructured search.[12] uses an approach based on potential function, while we use a technique for proving lower bounds for learning quantum states and processes with unentangled measurements.Moreover, the connection between lower bounds for bounded-depth quantum computation and lower bounds for NISQ is new to our work.
Noise resilience of specific algorithms.A number of works have studied how existing algorithms perform under various forms of noise in their implementation.For unstructured search, [13,14] studied whether Grover's algorithm is robust to various deviations like noisy Hadamard gates and Gaussian noise between iterations, while [15] demonstrated that recursive amplitude amplification is robust to noisy reflection operators.[16,17,18,19,20], among others, studied the resilience of specific quantum optimization algorithms like QAOA and VQE under models similar to our definition of λ-noisy circuits.[20] found that in various regimes, these algorithms suffer significant slowdown when implemented on noisy quantum devices due to the flattening of the cost landscape.[21] shows that estimating the output probability of random quantum circuits to exponentially small additive error remains #P-hard even under the presence of small noise.Furthermore, [16] showed that the presence of noise can make these quantum algorithms easy to simulate on classical computers.In contrast, in the present work, we study the capabilities and limitations for NISQ without necessarily focusing on any particular algorithm.
Complexity of noisy quantum circuits.To our knowledge, the only other paper to study the relation of noisy quantum circuits and existing complexity classes is that of [22], which considered a generalization of our notion of λ-noisy circuits in which a random fraction of qubits at every layer are adversarially corrupted.Notably, they showed that polynomial-size noisy quantum circuits are no stronger than the complexity class QNC 1 of logarithmic-depth noiseless circuits, whereas quasipolynomial-size noisy quantum circuits can compute any function in QNC 1 .Recall that in the present work, rather than study λ-noisy circuits in isolation, we consider the power of classical computation augmented by such circuits.

Supplementary Note 2 -The NISQ Complexity Class
In this section, we formally define the complexity class NISQ.Then we recall the notion of classical oracles in classical (BPP) and quantum computation (BQP) and generalize this to NISQ.
We consider noisy quantum circuits with noise level λ to be defined as follows.
Definition 2.A.3 (Output of a noisy quantum circuit).Let λ ∈ [0, 1] and n ∈ N. Given T ∈ N and a sequence of T depth-1 unitaries U 1 , . . ., U T , the output of the corresponding λ-noisy depth-T quantum circuit is a random n-bit string s ∈ {0, 1} n sampled from the distribution where every quantum operation is followed by a layer of single-qubit depolarizing channel.When λ = 0, we say that this circuit is noiseless.
Remark 2.A.1.We work with the single-qubit depolarizing channel as it is the most standard model for local noise.One could also consider stronger noise models, e.g.every qubit is randomly corrupted with probability λ by an adversary rather than randomly decohered.Tautologically, the lower bounds we prove in this work will translate to such stronger models.We also prove our upper bounds, namely Theorem 2.2 and 2.5, under this stronger model (see Remarks 4.A.1 and 6.B.1).
Definition 2.A.4 (Noisy quantum circuit oracle).We define NQC λ to be an oracle that takes in an n ∈ N and a sequence of depth-1 n-qubit unitaries {U k } k=1,...,T for any T ∈ N and outputs a random n-bit string s according to Eq. (1).
We define the time to query NQC λ with T depth-1 n-qubit unitaries to be Θ(nT ), which is linear in the time to write down the input to the query.
We now define NISQ algorithms, which are classical algorithms with access to the noisy quantum circuit oracle.This provides a formal definition for hybrid noisy quantum-classical computation.Definition 2.A.5 (NISQ algorithm).A NISQ λ algorithm with access to λ-noisy quantum circuits is defined as a probabilistic Turing machine M that can query NQC λ to obtain an output bitstring s for any number of times, and is denoted as A λ ≜ M NQC λ .The runtime of A λ is given by the classical runtime of M plus the sum of the times to query NQC λ .
The NISQ complexity class for decision problems is defined as follows.Observe that the following recovers the definition for BPP when M NQC λ in the definition of A λ above is replaced by M .Definition 2.A.6 (NISQ complexity).A language L ⊆ {0, 1} * is in NISQ if there exists a NISQ λ algorithm A λ for some constant λ > 0 that decides L in polynomial time, that is, such that • for all x ∈ {0, 1} * , A λ produces an output in time poly(|x|), where |x| is the length of x; • for all x ∈ L, A λ outputs 1 with probability at least 2/3; • for all x ̸ ∈ L, A λ outputs 0 with probability at least 2/3.Remark 2.A.2.The noise parameter λ in our definition of NISQ is taken to be an absolute constant.One can also consider variants in which λ depends on the input length, or equivalently on the system size of the noisy quantum circuits.Note that if λ is a sufficiently quickly decaying function in these parameters, then the resulting complexity class will be equivalent to BQP.For instance, if λ ≪ 1/N where N is an upper bound on the width times depth of any noisy quantum circuit call, then with high probability, no noise gets applied over the course of the quantum computation.It is an interesting direction to explore how the complexity landscape mapped out in this paper changes in the intermediate regime where λ is only mildly decaying in the input length.

2.B Algorithms with oracle access
In this work we study the complexity of learning a classical oracle or testing a property thereof.For instance, the unstructured search problem considers learning a classical oracle that highlights an element among N elements.We recall the following definition of a classical oracle O, as well as definitions of classical/quantum algorithms with access to the classical oracle O.
for some integer k ∈ N and n ′ -qubit unitaries V n,1 , . . ., V n,k given as the product of many depth-1 unitaries.Here, I denotes the identity matrix over n ′ − n qubits.
We now present the definition of NISQ algorithms with access to the classical oracle O, which requires first defining noisy quantum circuit oracles with access to O.
Definition 2.B.4 (Noisy quantum circuit oracle with access to O).We define NQC O λ to be an oracle that takes in an integer n ′ and a sequence of n ′ -qubit unitaries {U k } k=1,...,T for any T ∈ N, where U k can either be a depth-1 unitary or U O ⊗ I, to a random n-bit string s sampled according to the distribution

2.C Algorithms of bounded depth
In parts of this work we leverage the well-known connection [22] between noisy quantum circuits and noiseless bounded-depth circuits.Here we briefly recall some standard notions regarding the latter, presented in the language of Supplementary Note 2.A.
Definition 2.C.1 (Noiseless hybrid quantum-classical computation of bounded depth).A noiseless depth-T algorithm is a NISQ 0 algorithm A that only queries NQC 0 on sequences of depth-1 n-qubit , we denote this class by BPP QNC i .We also define BPP QNC ≜ ∪ i≥0 BPP QNC i .
Note that BPP QNC is contained in the class BQP, as BQP can implement arbitrary polynomial -depth quantum computation.We can also define noiseless depth-T algorithms with access to a classical oracle, as well as relativized versions of BPP QNC i which we denote by (BPP QNC i ) O , completely analogously to what is done in Section 2.B.

Supplementary Note 3 -Preliminaries
In this section, we summarize the basic mathematical techniques and ideas used in this work.

3.A Learning tree formalism and Le Cam's method
We begin by recalling the learning tree formalism of [23], adapted here to the setting of NISQ.This formalism will feature heavily in the proofs of our lower bounds against NISQ.Definition 3.A.1 (Tree representation for NISQ algorithms).Given oracle O : {0, 1} n → {0, 1} m , a NISQ λ algorithm with access to O can be associated with a pair (T , A) as follows.The learning tree T is a rooted tree, where each node in the tree encodes the transcript of all classical query and noisy quantum circuit results the algorithm has seen so far.The tree satisfies the following properties: • Each node u is associated with a value p O (u) corresponding to the probability that the transcript observed so far is given by the path from the root r to u.In this way, T naturally induces a distribution over its leaves.For the root r, p O (r) = 1.
• At each non-leaf node u, we either classically query the oracle O at an input x ∈ {0, 1} n , or run a λ-noisy quantum circuit A with access to O.
(i) Classical query: u has a single child node v connected via an edge (u, x, O(x)), and we define (ii) Noisy circuit query: The children v of u are indexed by the possible s ∈ {0, 1} n ′ that could be obtained as a result.We refer to the edge between u and v as (u, A, s).We denote by |ϕ O (A)⟩ the output state of the circuit so that the probability of traversing (u, A, s) from node u to child v is given by | ⟨s|ϕ O (A)⟩ | 2 .We define • If the total number of classical/quantum queries to O made along any root-to-leaf path is at most N , we say that the query complexity of the algorithm is at most N .
A is any classical algorithm that takes as input a transcript corresponding to any leaf node ℓ and attempts to determine the underlying oracle or predict some property thereof.
The following lemma shows that slight perturbations to the distributions over children for each node do not change the overall distribution over leaves of T by too much.
Lemma 3.A.1.Given learning tree T corresponding to a NISQ λ algorithm with query complexity N , suppose T ′ is a learning tree obtained from T as follows.For every node u at which a noisy quantum circuit A is run, replace A by another circuit A ′ such that the new induced distribution over children of u is at most ε-far from the original distribution in total variation.Then the distributions over leaves of T and T ′ are at most εN -far in total variation.
Proof.Consider the sequence of trees T (i) where T (0) = T and T (i) is given by taking all u in layer i of T (i−1) that run some noisy quantum circuit A and replacing them with the corresponding circuit A ′ from T ′ .By design, T (N ) = T ′ .Let p (i) denote the distribution over leaves of T (i) .It suffices to show that d TV (p (i) , p (i−1) ) ≤ ε.Note that p (i−1) specifies some mixture over distributions p v , where p v is the distribution over leaves conditioned on reaching node v in the i-th layer.In particular, in this mixture, v is sampled by sampling parent node u by running the NISQ algorithm corresponding to T ′ for i − 1 steps and then running the corresponding quantum circuit A from T .In contrast, p (i) is a mixture over the same distributions p v , but v is sampled by running the NISQ algorithm corresponding to T ′ for i steps and then running the corresponding quantum circuit A ′ from T ′ .These two distributions over v are at most ε-far in total variation, so the two mixture distributions are also at most ε-far in total variation as claimed.
Our lower bounds will be based on Le Cam's method-see Section 4.3 of [23] for an overview in the context of the tree formalism of Definition 3.A.1.In every case we will reduce to some distinguishing task in which the algorithm must discern whether the oracle it has access to comes from one family of oracles or from another.For example, for unstructured search, the distinguishing task will be whether the oracle corresponds to some element in the search domain or whether the oracle is the identity channel.
More concretely, given two disjoint sets of oracles S 0 , S 1 , we will design distributions D 0 , D 1 over S 0 , S 1 .Given any algorithm specified by some (T , A), we will upper bound the total variation distance between the following two distributions.We consider the mixture of distributions p O i over leaves of the learning tree when the underlying oracle O i is sampled according to D 0 at the outset, as well as the mixture when the oracle is sampled according to D 1 .The following lemma shows that upper bounding ) suffices to show a query complexity lower bound for the distinguishing task: Lemma 3.A.2 (Le Cam's two-point method, see e.g.Lemma 4.14 from [23]).Let {O i } i∈S 0 and {O i } i∈S 1 be two disjoint sets of oracles.Given a tree T as in Definition 3.A.1 corresponding to a NISQ algorithm that makes N oracle queries, let p i denote the induced distribution over leaves when the algorithm has access to there is no algorithm A that maps transcripts T corresponding to leaves of T to {0, 1} which can distinguish between S 0 and S 1 with advantage 1/3. 1

3.B Basic hybrid argument
Here we describe a standard template for showing quantum query complexity lower bounds via a hybrid argument.Lemma 3.B.1.Let E 0 , E 1 be quantum channels on n qubits such that for all pure states σ, we have ∥(E 0 − E 1 )[σ]∥ tr ≤ ε.Let A be any depth-T quantum circuit with access to one of the two channels, and let s ∈ {0, 1} n be the random string output by the circuit.Let p 0 , p 1 denote the distribution over s when A has access to E 0 , E 1 respectively.Then d TV (p 0 , p 1 ) ≤ εT .
Proof.Let E = E s for s ∈ {0, 1}, and define the channel U i which acts by U i (σ) = U i σU † i where U i is an associated unitary operator.We proceed via a hybrid argument.The output state of the circuit is given by for some unitaries U 1 , . . ., U T .For s ′ = 1 − s and 1 ≤ i ≤ T define Then where the supremum is over all density matrices.By convexity of the trace norm, this bound still holds when the supremum is restricted to pure states σ.By assumption, the above quantity is εT .The total variation distance between p 1 and p 2 as defined in lemma statement is simply the L 1 distance between the diagonals of σ s and σ s ′ , which is upper bounded by ∥σ s − σ s ′ ∥ tr ≤ εT .
Supplementary Note 4 -Super-Polynomial Oracle Separations Our basic strategy is to modify the Simon's oracle into a new classical oracle such that the new oracle is robust to noise.We note that a NISQ algorithm is unable to implement known faulttolerant quantum computation schemes that can run for any arbitrary quantum circuit with a polynomial number of gates.However, we will still take inspiration from a fault-tolerant quantum computation scheme [24] to define a certain "robustified Simon's oracle" relative to which we obtain a super-polynomial separation between BPP and NISQ.As we will show, because the fault-tolerant scheme of [24] is robust not just to local depolarizing noise but to arbitrary local noise occurring with sufficiently small constant rate, the NISQ algorithm that we give will ultimately be robust under this stronger noise model as well (see Remark 4.A.1).

4.A.1 Recursively-defined concatenated code
We consider a Calderbank-Shor-Steane (CSS) code built from two classical linear codes C 1 , C 2 , where C 1 ≜ C is a punctured doubly-even self-dual code and C 2 ≜ C ⊥ (we refer the reader to [24] for background on these notions).We consider C 1 , C 2 to be over m classical bits.The corresponding CSS code encodes a single logical qubit into m physical qubits.Let 1 m denote the all-ones vector of length m (when the subscript is clear from context, we will omit it).The two code words in the CSS code are given by where ⊕ denotes addition over Z m 2 (i.e., it is the bit-wise XOR).Denote by d the number of errors that can be corrected by the CSS code.The two parameters m and d are both considered to be constant.We define where Lemma 4.A.1 (Disjointness of A 0 and A 1 ).With the above definitions, we have Proof.This lemma follows from the definition of d.Assume that the intersection is non-empty, i.e., A 0 ∩ A 1 ̸ = ∅.Hence Then, , where E 1 and E 2 are Pauli operators with weight at most d.This contradicts the definition of d.
We now recall the recursive concatentation construction from [24].Given an integer r > 0, we define the following code va r + 1 levels of recursion.Each level encodes a single qubit using the above CSS code over m qubits from the previous level.Hence, a single qubit in the top level is encoded by a total of m r qubits in the bottom level.More formally, for each r, we will define two sets 1 over m r -bit strings as follows.These sets contain the computational basis states that are used to span the recursively-defined concatenated code.
The two code words in the recursively-defined concatenated code are then given by For each r, we also define two sets 1 over m r -bit strings that correspond to the neighborhoods around We can prove the following two lemmas.).For all r ≥ 1, we have Proof.We consider a proof by induction on r ≥ 1.By definition of A 0 and A 1 , we have 1 , which establishes the base case of r = 1.For r > 1, we show that for any for all i with x 0i = 0 and the inductive hypothesis that w 0i ⊕1 for all i with x 0i = 0. Hence, by considering w 1 = w 0 ⊕ 1 and Similarly, we can show that for any (v 1 , . . ., v m ) ∈ A ).For all r ≥ 1, we have Proof.The base case is given in Lemma 4.A.1.Now for the induction step, assume 5) implies that the minimum Hamming distance between any two bitstrings in C ⊥ and C ⊥ ⊕ 1 is at least 2d + 1.Hence, for any w 1 ∈ C ⊥ and w 2 ∈ C ⊥ ⊕ 1, after removing at most 2d bits (the bits with x 1i = 1 or x 2i = 1), there still exists an index i among the rest of the bits (i.e., the bits with x 1i = 0 and

4.A.2 Robustified Simon's problem
Given a large enough integer n, we consider Simon's problem over n ′ = 2 Θ(log(n) c ) bits for a constant 0 < c < 1.Here 1/c corresponds to the constant c 2 from Theorem 10 of [24].We consider r = Θ(log log(n ′ )) and encode each of the n ′ bits using m r bits.Because m = O(1), we have m r n ′ = 2 Θ(log(n) c ) < n for large enough n.
Given a classical function f s : {0, 1} n ′ → {0, 1} n ′ from Simon's problem with secret string s ∈ {0, 1} n ′ , we define a classical function f s : {0, 1} n → {0, 1} m r n ′ as follows.Let x be an n-bit string.We focus on the first m r n ′ bits of x and divide them into n ′ m r -bit strings as x 1 , . . ., x n ′ .We first define f 0 s : {0, 1} n → {0, 1} n ′ as follows, We use ∃! to denote "there exists a unique choice".Because A The function f s can be considered as the robust version of f s , where the output bitstring is stable over a large number of bitstrings.Let U fs be the unitary from Eq. ( 9).
We denote by O fs the oracle which applies this unitary.Then we have the following theorem, which is the main result of this section and implies Theorem 2.2: Theorem 4.A.2.For λ sufficiently small, there is a NISQ λ algorithm which, given oracle access to O fs , can determine whether f s is 2-to-1 or 1-to-1 with constant advantage in time at most O(poly(n)).By contrast, any classical algorithm with access to O fs requires at least Ω(superpoly(n)) time, to determine whether f s is 2-to-1 or 1-to-1 with constant advantage.Thus, relative to oracles O of this form, BPP O ⊊ NISQ O .
Our strategy for proving this theorem is to combine the following two ingredients.First, we draw on tools underlying the error correction scheme in Theorem 10 of [24].This scheme requires the algorithm to be able to initialize constant-error noisy zero states in the middle of the circuit, which our computational model does not allow for.Our second proof ingredient is then to leverage the noisy majority vote approach in Theorem 4 of [22] for distilling constant-error noisy zero states in the middle of the circuit from constant-error noisy zero states prepared at the beginning of the circuit.
Putting these together, we can utilize a noisy quantum machine with n ′ log(n ′ ) c 1 × 2 Θ(log(n ′ ) c ) = poly(n) qubits (and a similar number of gates) and noise rate λ < λ 0 (for a small constant λ 0 ) to run an encoded version of Simon's algorithm on the oracle O fs .We will find that our classical oracle, which essentially implements a classical error correction code, interfaces nicely with the quantum error correction scheme of [24], enabling our algorithm to work.

4.A.3 Simulating fault-tolerant Simon's in NISQ with exponential overhead
Recall that Simon's algorithm consists of three steps: (i) prepare the all-plus state on the first half of the qubits and the all-zero state on the second half of the qubits, (ii) query the oracle, and (iii) apply the Hadamards to the first half of the qubits followed by measuring them in the computational basis.In this section, we describe how to implement encoded versions of the first and last steps.The main guarantees for these steps are respectively given by Lemmas 4.A.4-4.A.5 and Lemma 4.A.6 below.
To set up the proof, we require some definitions from [24] so that we can state the needed theorems from [24] and [22].Accordingly, let us recall the following notions of (r, k)-sparse sets and (r, k)-deviated states from [24].Definition 4.A.3 ((r, k)-sparse set).An (r, k)-sparse set of qubits over many blocks of the m r qubits is defined recursively as follows.A set A of qubits over many blocks of m qubits is (1, k)sparse if and only if every block has at most k qubits that are in A. A set A of qubits over many blocks of m r qubits is (r, k)-sparse if and only if for every block, by treating the m r qubits as m sub-blocks of m r−1 qubits, there are at most k sub-blocks that are not (r − 1, k)-sparse.Definition 4.A.4 ((r, k)-deviate).A state ρ is said to be (r, k)-deviated from ρ ′ if k is the minimum integer such that there exists an (r, k)-sparse set of qubits A, such that ρ A c = ρ ′ A c .Here, we denote ρ A c to be the reduced density matrix of ρ on the qubits not in set A.
We will also need the definitions of location and quantum computation code from [24].Definition 4.A.5 (Location).A set (q 1 , . . ., q a , t) is a location in a quantum circuit Q if the qubits q 1 , . . ., q a participated in the same gate in Q at layer t, and no other qubit participated in that gate.If a qubit q did not participate in any gate at layer t, then (q, t) is a location in Q as well.Definition 4.A.6 (Quantum computation code).A quantum code C encoding a single logical qubit into m physical qubits is called a quantum computation code if it is accompanied with a universal set of gates G with fault tolerant procedures, including fault tolerant encoding, decoding, and error correction procedures.Moreover, we require that (i) all procedures use only gates from G, and (ii) the error correction procedure takes any m-qubit density matrix to a state in the code space.
Apart from state encoding and decoding, the quantum computation code C also encodes any gate g ∈ G into a circuit P (g) such that for any pure state |ψ⟩, P (g) maps the state encoding of ψ under C to the state encoding of g |ψ⟩.If g is a k-qubit gate, then the circuit P (g) acts on k blocks of m qubits (each block encoding one logical qubit) possibly with other ancilla qubits.
We are now prepared to recall the threshold theorem of [24], which we state in the context of probabilistic qubit-wise noise (with probability λ, an arbitrary single-qubit quantum channel is applied on the qubit).In the rest of the section, we consider any failure probability δ > 0, and a (noiseless) quantum circuit Q with 2n ′ input qubits, depth t, and v locations.Let C be a quantum computation code with gates G that corrects d errors.Let be the encoding map for the code given by recursively concatenating C a total of r = O(log log(v/δ)) times.Using key lemmas for establishing the threshold theorem from [24] (Theorem 10 therein), we can extract the following two results: Theorem 4.A.3 (Lemma 8 and 10 from [24]).There is an absolute constant λ c ∈ [0, 1] such that for any λ < λ c , there exists a λ-noisy quantum circuit Q ′ which can initialize ancillary qubits at any time (these ancillary qubits are also subject to qubit-wise noise of λ) during the computation and satisfies the following.Q ′ operates on m r n ′ qubits and has depth O(t polylog(v/δ)), and the output state with probability 1 − δ over the local noise.Theorem 4.A.4 (Lemma 8, 9, and 10 from [24]).There is an absolute constant λ c ∈ [0, 1] such that for any λ < λ c , there exists a λ-noisy quantum circuit Q ′ which can initialize ancillary qubits at any time (these ancillary qubits are also subject to qubit-wise noise of λ) during the computation, and a classical postprocessing algorithm A based on recursive majority vote, that satisfies the following.Q ′ operates on m r n ′ qubits and has depth O(t polylog(v/δ)).Let σ be any n ′ -qubit state.Let D be the n ′ -bit string distribution generated by measuring QσQ † in the computational basis.For any state ρ that is (r, d)-deviated from applying a λ-noisy computational basis measurements on the output state of Q ′ given input state ρ, followed by the classical algorithm A, produces a distribution D ′ equal to D with probability 1 − δ over the local noise.
Our CSS code defined at the beginning of Subsection 4.A.1 in Eqs. ( 2), ( 3), ( 4) and the surrounding text, which corrects d errors is a suitable quantum computational code for the purposes of the above Theorem.Indeed, the construction in the proof of Theorem 4.A.3 and 4.A.4 is to build the concatenated code described in Definition 4.A.1 and Eq. ( 6) and the surrounding text, and then to show that it has suitable fault-tolerant quantum error correction properties.As mentioned above, the key difference between our model for noisy quantum circuits versus the one considered in that work is that the latter allows the circuit to initialize ancillary qubits at any point in the computation, whereas in our setting all qubits are present at time zero.In [22] it was shown how to pass from the latter setting to the former with a blowup in total number of qubits that is exponential in the depth of the original circuit.Theorem 4.A.5 (Theorem 4 from [22]).There exists an absolute constant λ c ∈ [0, 1] such that for any non-negative λ < λ c and any t ∈ N, there exists a λ-noisy quantum circuit of depth t operating on 3 t qubits initialized to the all-zero state such that the output state's first qubit is in the state |0⟩ with probability at least 1 − λ.
We now use Theorems 4.A.3, 4.A.4, and 4.A.5 to argue in Lemma 4.A.4 (resp.Lemma 4.A.5) that we can prepare an encoding of the all-plus state (resp.all-zero state) over n ′ qubits using poly(n) total qubits including ancillas, thus realizing an encoded version of the first step in Simon's algorithm.Lemma 4.A.4.Suppose n ′ ≤ exp(log c n) for 0 < c < 1 a sufficiently small constant.There exists an absolute constant λ c ∈ [0, 1] such that for any non-negative λ < λ c , there exists a λ-noisy quantum circuit which operates on n ′ input qubits and poly(n) ancillary qubits and has polylog(n) layers, such that with probability at least 1 − O(1/n ′ ) over the local noise, the output state is (r, d/2)-deviated from the state for r = log log(n ′ ).
Proof.By Theorem 4.A.3, there is a λ-noisy circuit with polylog(n ′ ) layers and n ′ polylog(n ′ ) qubits, which can initialize qubits at any time, such that with probability at least 1−O(1/n ′ ) over the local noise, the output state is (r, d/2)-deviated from the state in Eq. (10).By Theorem 4.A.5, for each qubit that is initialized at some arbitrary time t ≤ polylog(n ′ ), we can append to Q ′ a λ-noisy circuit of depth t operating on 3 t qubits initialized at time zero, such that the first qubit of the output state of this circuit can play the role of the qubit initialized at time t in Q ′ .Altogether, we obtain a circuit whose depth is the same as Q ′ but which now operates on at most 3 polylog(n ′ ) • n ′ polylog(n ′ ) qubits.This is at most poly(n) provided that the constant c in the in the Lemma is sufficiently small.
Using the same proof as the above lemma, we can establish the following.
Lemma 4.A.5.Suppose n ′ ≤ exp(log c n) for 0 < c < 1 a sufficiently small constant.There exists an absolute constant λ c ∈ [0, 1] such that for any non-negative λ < λ c , there exists a λ-noisy quantum circuit which operates on n ′ input qubits and poly(n) ancillary qubits and has polylog(n) layers, such that with probability at least 1 − O(1/n ′ ) over the local noise, the output state is (r, d/2)-deviated from the state Next, we use Theorems 4.A.3 and 4.A.5 to argue that we can take any state on m r n ′ qubits which is not too deviated from a codeword, noisily apply Hadamard transform, and apply fault-tolerant measurement to the result.This realizes an encoded version of the last step of Simon's algorithm.
The noisy quantum circuit for this uses poly(n) total qubits including ancillas.
Lemma 4.A.6.Suppose n ′ ≤ exp(log c n) for 0 < c < 1 a sufficiently small constant and for r = polylog(n ′ ).There exists an absolute constant λ c ∈ [0, 1] such that for any non-negative λ < λ c , there exists a λ-noisy quantum circuit Q ′ which operates on m r n ′ input qubits and poly(n) ancillary qubits and has polylog(n ′ ) layers, such that the following holds.Let A be the classical post-processing procedure based on recursive majority vote from Theorem 4.A.4.
For k satisfying k ≤ d, let input state ρ be (r, k)-deviated from the state Let D be the classical distribution over {0, 1} n ′ generated by measuring in the computational basis.Then if one applies Q ′ to ρ, noisily measures the output state under the computational basis, and applies A to the classical outcome, then the resulting distribution over {0, 1} n ′ is identical to D with probability at least 1 − O(1/n ′ ) over the local noise.
Proof.Because ρ is (r, k)-deviated from V ⊗n ′ σ(V ⊗n ′ ) † for k ≤ d, we can apply Theorem 4.A.4 to obtain a λ-noisy quantum circuit Q ′′ operating on m r n ′ qubits with polylog(n ′ ) layers which satisfies the desiderata of the lemma.The caveat is that the circuit has to be able to initialize ancilla qubits at any time.
We can address this similarly to the proof of Lemma 4.A.4.By Theorem 4.A.5, for each qubit that is initialized at some arbitrary time t ≤ polylog(n ′ ), we can append to Q ′′ a λ-noisy circuit of depth t operating on 3 t qubits initialized at time zero, such that the first qubit of the output state of this circuit can play the role of the qubit initialized at time t in Q ′′ .Altogether, we obtain a circuit Q ′ whose depth is the same as Q ′′ but which now operates on at most 3 polylog(n ′ ) • n ′ polylog(n ′ ) ≤ poly(n) qubits.

4.A.4 Stability against deviation in the robustified Simon's oracle
So far, Lemma 4.A.4 implies that we can realize the first step of Simon's in an encoded fashion: noisily prepare a state, call it ρ 1 , which is only slightly deviated from an encoding of the all-plus state.And Lemma 4.A.6 implies that given a state, call it ρ 2 , which is only slightly deviated from the output of the robustified Simon's oracle, we can simulate the last step of Simon's algorithm in the presence of noise.In order to apply Lemma 4.A.6 however, it remains to verify that when we go from ρ 1 to ρ 2 by invoking the robustified oracle in the second step of Simon's, the sparsity of the deviations in ρ 1 is preserved.We show this in Lemma 4.A.7 below.
First, recall the unitary U fs from Eq. ( 9).Note that by construction, f s only depends nontrivially on the first m r n ′ < n bits of its input.By defining f * s : {0, 1} m r n ′ → {0, 1} m r n ′ to be the function f s restricted to the first m r n ′ bits, we can rewrite (9) as We see from the above equation that U fs acts trivially on the |x 2 ⟩ part of the input state.So let us define U f * s as the restriction of U fs to its |x 1 ⟩ and |y⟩, subsystems, namely With the above notations for the oracle, we can now prove the following lemma showing the classical function f * s preserves the deviation metric.We note that, on the other hand, the ordinary Simon's function f s does not have the same property.
Proof.Consider the set S ⊂ {0, 1} m r n ′ defined by Here t| A means the restriction of t to its bits in locations a 1 , a 2 , ..., a |A| .Because ρ is (r, k)-deviated from ρ 0 , there exists an (r, k)-sparse subset A of qubits, such that ρ A c = ρ 0 A c .Decomposing the Hilbert space H main, 1 as H main, 1 ≃ H A ⊗ H A c , then using our notations above we can write for some constants c t,t ′ ∈ C. We further know, by hypothesis, that we can write . If all we know about ρ is that it is (r, k)-deviated from ρ 0 , then a priori we only have the more general decomposition has trace zero and is positive semi-definite (as it is obtained by left-and right-multiplying ρ by the projector t̸ ∈S A c |t⟩⟨t| ), we must have Q A t,t ′ = 0 for all t, t ′ ̸ ∈ S A c .On the other hand, if |1⟩ , . . ., ||A|⟩ is an orthonormal basis for H A , then the fact that the 2 × 2 principal minors of ρ must be nonnegative implies that for any 1 ≤ i, j ≤ |A| and any t ∈ S A c , t ′ ̸ ∈ S A c we have Because we have already shown that Q A t ′ ,t ′ = 0, the left-hand side is zero.On the other hand, because ρ is Hermitian, the right-hand side is nonnegative.The right-hand side is the squared magnitude of (Q A t,t ′ ) ij , so we conclude that Q A t,t ′ = 0 for all t, t ′ for which at least one of t, t ′ is not in S A c .In other words, ρ actually has the decomposition From the definition of σ 0 , we have We can then use the fact that σ is (r, k)-deviated from σ 0 to show that This is because the definition of A (r) 0 ensures that it contains all bitstrings that can be generated by taking any bitstring in B (r) 0 and flipping the bits in an (r, d)-sparse set of bits.We are now ready to study the effect under the oracle Recall that A is (r, k)-sparse.Since k ≤ d, on account of ( 7), (8), and the definition of (r, k)sparseness, we have that f * s (ut) only depends on t (and likewise f * s (u ′ t ′ ) only depends on t ′ ), and so in a slight abuse of notation we write f * s (ut) = f * s (t) so that (11) becomes We similarly write f 0 s (ut) = f 0 s (t), and define a t,t ′ ≜ ⟨ f 0 s (t ′ )| f 0 s (t)⟩.Consider the following two cases that cover all possibilities of a t,t ′ .1. a t,t ′ = 0: In this case, f 0 s (t ′ ) and f 0 s (t) have at least one bit which differs.Say their j-th bits differ, i.e. [ f 0 s (t)] j ̸ = [ f 0 s (t ′ )] j .Let [y] i,...,j for i ≤ j denote the bitstring (y i , y i+1 , ..., y j ), with similar notation for y ′ .For the two bitstrings [y] jm r +1,...,(j+1)m r , [y ′ ] jm r +1,...,(j+1)m r ∈ A .

Using Lemma 4.A.3 on the disjointness of
Thus, we have ⟨y 2. a t,t ′ = 1: All bits in f 0 s (t ′ ), f 0 s (t) are the same.Hence, Tracing out the H anc, 1 subsystem above gives Since we likewise have we find that ( 12) and ( 13) agree upon taking the partial trace of each over H A .Thus we find that tr , as claimed.

4.A.5 Proof of super-polynomial separation between NISQ and BPP
We are now ready to complete the proof of the oracle separation between NISQ and BPP.
Proof of Theorem 2.2.We begin by showing hardness for BPP algorithms.Given a BPP algorithm for the robustified Simon's problem mapping n bits to m r n ′ bits, we show that we can produce a BPP algorithm for the original Simon's problem on n ′ bits with the same number of queries.
Because n ′ = 2 Θ(log(n) c ) for a constant 0 < c < 1, this implies a super-polynomial query complexity lower bound of Ω(2 2 Θ(log(n) c ) ), as the classical query complexity of Simon's problem over n ′ bits is Ω(2 n ′ /2 ) [25].Indeed, by Eqs. ( 7), (8), whenever the BPP algorithm makes a query to the robustified Simon's oracle f s , we can simulate that with at most a single query to the standard Simon's oracle f s , which immediately implies the desired simulation.We now turn our attention to demonstrating a polynomial upper bound in NISQ.Let us decompose our total Hilbert space H as H ≃ H main, 1 ⊗ H main, 2 ⊗ H anc, 1 ⊗ H anc, 2 where We begin with a state on H initialized in the all-zero state.By Lemma 4.A.4, we can prepare a state ρ on H main, 1 , using the ancillas on H anc, 2 , such that ρ is (r, d/2)-deviated from ρ 0 = V ⊗n ′ H ⊗n ′ |0 n ′ ⟩⟨0 n ′ |H ⊗n ′ (V ⊗n ′ ) † .By Lemma 4.A.5, we can prepare a state σ on H anc, 1 , using the ancillas on H anc, 2 , such that σ is (r, d/2)-deviated from σ At this point in the algorithm, our qubits on H main, 2 are no longer in the all-zero state due to the local noise.We do not care what the state is and suppose that the state is given by ρ (2) .We proceed by applying our oracle unitary U fs to (ρ ⊗ ρ (2) ) ⊗ σ on H main, 1 ⊗ H main, 2 ⊗ H anc, 1 .Since the oracle unitary acts as the identity on H main, 2 by construction, we can equivalently just apply U f * s to ρ ⊗ σ on H main, 1 ⊗ H anc, 1 .Doing so and subsequently neglecting the H anc, 1 register (corresponding to tracing out the qubits), we obtain But by Lemma 4.A.7, this state is only (r, d/2)-deviated from where s is the hidden string.Applying Hadamards to the encoded qubits of ρ ′ , measuring in the computational basis, and applying classical post-processing via recursive majority vote as per Lemma 4.A.6, we will obtain an n ′ bit string z 0 which with probability 1 − O(1/n ′ ) is sampled from the distribution D defined as follows.If f s is 1-to-1 function then D will be the uniform distribution over n ′ bit strings, whereas if f s is a 2-to-1 function then D will be the uniform distribution over n ′ bit strings subject to the constraint z 0 • s = 0 (mod 2).
If we repeat the entire procedure n ′ times, then with probability (1 − O(1/n ′ )) n ′ = Ω(1) we obtain n ′ such bit strings z 0 , z 1 , ..., z n ′ −1 .If this event, call it E, happens, then by solving the n ′ linear equations z i • s = 0 (mod 2) for i = 0, 1, ..., n ′ − 1, we can determine whether s is the all-zero string meaning f s is 1-to-1, or some non-trivial string in which case f s is 2-to-1.In general, if E does not happen and we have obtained some arbitrary string s, we can check that this situation is the case by querying the classical oracle at f s (0) and f s (s).So by repeating the entire procedure O(log(1/δ)) times, with probability at least 1 − δ the event E will happen at least once, and we will be able to determine if f s is 1-to-1 or 2-to-1.Remark 4.A.1.As alluded to at the beginning of this section, the proof above applies verbatim to the stronger noise model where at every layer, every qubit is independently corrupted with probability λ by an adversary.As a result, although our definition of NISQ pertains to local depolarizing noise, the oracle separation between BPP and NISQ holds even when the local noise could be adversarially chosen.

4.B NISQ vs. BQP
In this section we show an oracle separation between NISQ and BQP via a simple "lifting" of Simon's problem.In fact, we will actually be able to separate NISQ and BPP QNC 0 relative to this oracle.We begin by describing the modification of Simon's problem we will consider.For n ∈ N, given a function f : {0, 1} n → {0, 1} n , we define the lift of f to be the function f : {0, 1} 2n → {0, 1} n given by Given lifted function f , we will abuse notation and let O f denote both the classical oracle given by evaluating f as well as the quantum oracle It is not hard to see that in the absence of depolarizing noise, a minor modification of Simon's algorithm, which can be implemented in BPP QNC 0 , still works under this lifting.In contrast, for NISQ algorithms, we show the following: Any NISQ λ algorithm which, given oracle access to O f for any lift of a function f : {0, 1} n → {0, 1} n which is either 2-to-1 or 1-to-1, can determine whether f is 2-to-1 or 1-to-1 with constant advantage must have query complexity at least exp(Ω(λn)).Thus, relative to oracles O of this form, NISQ O ⊊ BQP O .
Our lifting operation is reminiscent of the shuffling Simon's problem introduced in [10] to give an oracle separation between BPP QNC 0 d and BPP QNC 0 2d+1 .As we show in Supplementary Note 8, the shuffling Simon's problem can also be used to separate NISQ from bounded-depth noiseless quantum computation.For instance, this implies the existence of an oracle relative to which NISQ ∪ BPP QNC ⊊ BQP.
We now proceed to the proof of Theorem 4.B.1.We begin by recording the following basic fact about local depolarizing noise, whose proof we defer to Appendix 9.B.Lemma 4.B.1.Given n ′ ∈ N, let Ω denote some subset of {0, 1} n ′ , and let Π denote the projection to the span of {|x⟩} x∈Ω .Then for any λ ∈ [0, 1] and any n ′ -qubit state |ψ⟩, where the supremum is over probability distributions over {0, 1} n ′ , and ã is the random string obtained by flipping each of the bits of a independently with probability λ/2.
Note the probability on the right-hand side of ( 14) is exponentially small when Ω ⊂ {0, 1} 2n is the set of strings x for which x n+1 , . . ., x 2n = 0. We will now use this to show that the distribution over measurement outcomes from running a noisy quantum circuit that has query access to either O f or the identity oracle Id gives very little information about which oracle the circuit has access to.
Lemma 4.B.2.Let A be any λ-noisy quantum circuit which makes N oracle queries.If p f (respectively p id ) is the distribution over the random string s output by the circuit when the oracle is O f (respectively the identity oracle Id), then d TV (p f , p id ) ≤ N exp(−Ω(λn)).
Proof.Let n ′ denote the number of qubits on which A operates.For convenience, we denote by O f the channel given by pre-composing O f with D ⊗3n λ .We will show that for all n ′ -qubit pure states σ, ∥ tr is small so that we can apply Lemma 3.B.1.When Ω ⊂ {0, 1} 2n is given by all strings whose last n bits are 0, then for any a ∈ Ω, if ã is obtained by flipping each of the bits of a independently with probability λ/2, then Pr In particular, By Jensen's inequality, we can bound Proof of Theorem 4.B.1.Let T be the learning tree corresponding to a NISQ λ algorithm that makes at most N classical or quantum oracle queries to O f , as in Definition 3.A.1.By Lemma 3.A.1 and Lemma 4.B.2, if we replace every noisy quantum circuit A in the tree with a noisy quantum circuit A ′ that makes queries to the identity oracle instead of to O f , then the new distribution over the leaves of T is at most N 2 exp(−Ω(λn))-far in total variation from the original distribution p O f ; for N = exp(o(λn)), this quantity is o(1).For convenience, denote this new distribution by p ′ f .To apply Lemma 3.A.2, we wish to bound ).But note that because the quantum circuits A ′ in the new learning tree are independent of the underlying function f , the learning tree is simply implementing a randomized classical query algorithm.We can thus think of p ′ f as a mixture over distributions p ′r f each corresponding to some fixing of the internal randomness r of the algorithm (here the coefficients of the mixture are independent of f ).It thus suffices to bound sup r d TV ( Ef ). Henceforth fix any r.The rest of the argument follows the standard proof of the classical lower bound for Simons' algorithm.The algorithm queries the classical oracle at some deterministic sequence of inputs x 1 , . . ., x a , which we may assume without loss of generality are distinct and lie in Ω.For any y 1 , . . ., y a which are all distinct, (x 1 , y 1 ), . . ., (x a , y a ) and r determine some leaf node ℓ of the tree.The probability of this leaf node under Ef For any y 1 , . . ., y a for which there is a collision, the probability of the corresponding leaf node under Ef 1-to-1 [p ′r f ] is clearly 0. We conclude that the total variation between these two mixtures is upper bounded by the probability that there is a collision among f (x 1 ), . . ., f (x a ) for a random 2-to-1 function f .The latter is at most so for a ≪ 2 n/2 , this quantity is o(1).As min(exp(Ω(λn)), 2 n/2 ) = exp(Ω(λn)), the theorem thus follows by Lemma 3.A.2.Remark 4.B.1.The reader may observe that apart from the classical lower bound for Simon's problem, our proof of the lower bound in Theorem 4.B.1 makes very little use of the fact that f is either a 2-to-1 or 1-to-1 function.In fact, the above argument shows more generally that for any search problem over a family of Boolean functions, the query complexity of any NISQ algorithm for the lifted problem is essentially given by the classical query complexity for the original problem.Our proof is composed of two parts.The first and main part is to establish a stronger result, namely a query complexity lower bound even for noiseless bounded-depth quantum computation.The second part is to verify, essentially via an argument of [22], that this implies a lower bound for NISQ.

5.A Proof preliminaries
We will still work with the tree formalism from Definition 3.A.1, but because our focus now is on noiseless bounded-depth quantum computation, the definition simplifies somewhat: Definition 5.A.1 (Tree representation for bounded-depth algorithms).Given oracle O, a noiseless depth-T algorithm with access to O can be associated with a pair (T , A) as follows.The learning tree T is a rooted tree, where each node in the tree encodes the transcript of all classical query and noisy quantum circuit results the algorithm has seen so far.The tree satisfies the following properties: • Each node u is associated with a value p O (u) corresponding to the probability that the transcript observed so far is given by the path from the root r to u.In this way, T naturally induces a distribution over its leaves; we abuse notation and denote this distribution by p O .For the root r, p O (r) = 1.
• At each non-leaf node u, we run a noiseless depth-T quantum circuit A with access to O.The children v of u are indexed by the possible s ∈ {0, 1} n ′ that could be obtained as a result.We refer to the edge between u and v as (u, A, s).We denote by |ϕ O (A)⟩ the output state of the circuit so that the probability of traversing (u, A, s) from node u to child v is given by • If the total number of queries to O made along any root-to-leaf path is at most N , we say that the query complexity of the algorithm is at most N .
A is any classical algorithm that takes as input a transcript corresponding to any leaf node ℓ and attempts to determine the underlying oracle or predict some property thereof.
We note that one minor distinction between the tree formalism for bounded-depth computation versus the one for NISQ is that we do not consider classical queries to O.The reason is that because the quantum circuits A used at every node in Definition 5.A.1 are noiseless, we can simulate noiseless classical query access to O using quantum query access.
In the sequel, for O = O i for i ∈ {0, . . ., d}, we will refer to p O and ϕ O in Definition 5.A.1 as p i and ϕ i .
We will make use of the following well-known bound: Lemma 5.A.1 (Eq.( 7) in [26]).For any quantum circuit A for unstructured search that makes T oracle queries, if |ϕ i (A)⟩ denotes the output state when the underlying oracle is O i , then

5.B Lower bound against bounded-depth computation
We now use these tools to prove the following query complexity lower bound.This will be the main component in our proof of Theorem 5..1.

Theorem 5.B.1.
There is an absolute constant c > 0 for which the following holds.Let d, T ∈ N with T ≤ d.Then no noiseless quantum algorithm of depth T with query complexity at most cd/T can, given oracle access to O i for any i ∈ [d], output i with probability 2/3.

5.B.1 Likelihood ratio calculations
To prove Theorem 5.B.1, our goal is to bound To that end, we will analyze the likelihood ratio between these two distributions.Given any path z = ((u 1 , A 1 , s 1 ), (u 2 , A 2 , s 2 ), . . ., (u T , A T , s T )) in the tree, the likelihood ratio L i (z) between traversing that path when the underlying oracle is O i versus when it is O 0 is given by Our goal is to show that, with respect to the distribution over paths z when the underlying oracle is ] is with high probability not too small.This readily implies the desired upper bound on d TV (p 0 , E i∼[d] p i ), as the latter satisfies The idea for showing this will be to argue that the successive partial products in (15) give rise to a multiplicative sub-martingale2 with suitably bounded increments 1 + Y i (u t , A t , s t ), so that we can apply off-the-shelf martingale concentration inequalities.Key to bounding these increments are the following moment bounds on Y i (u, A, s), as a random variable in s.
Lemma 5.B.1.For any edge (u, A, s) in the tree and any i ∈ [d], we have that Here the expectations are with respect to the distribution over measurement outcomes when the underlying oracle is O 0 .
Proof.Recalling that we observe outcome s under this distribution with probability | ⟨ϕ 0 (A)|s⟩ | 2 , we have which establishes the first statement.For the second statement, we have

5.B.2 Good and balanced paths
In this section we define two conditions on paths z of the tree (Definitions 5.B.1 and 5.B.2) under which we can show that Eq. ( 15) is not too small with high probability over paths for which these two conditions hold.We then prove that a random path in T under p 0 will satisfy these conditions with high probability.First, let 0 < ε < 1/2 be a small enough constant that we will set later.Given a path in the learning tree given by z = ((u 1 , A 1 , s 1 ), . . ., (u n , A n , s n )), if each A t queries the oracle T t times, define the potential Note that if every algorithm in the tree queries the oracle T times, then τ (z) = nT 2 ; indeed, it may be helpful for the reader to focus on this case and think of τ (z) as nT 2 in the sequel.In general, Hölder's inequality implies the following basic fact: Lemma 5.B.2.If (T , A) specifies a noiseless quantum algorithm of depth T , then its query complexity is at least 1 T max z τ (z).
We are now ready to define our two conditions on paths: Definition 5.B.1.We say that an edge (u, A, s) is i-good if We say that a path z is i-good if all of its constituent edges are i-good.
Let I good (z) (respectively I bad (z)) denote the set of indices i ∈ [d] for which z is i-good (respectively not i-good).Definition 5.B.2.We say a path z is i-balanced if its constituent edges ((u 1 , A 1 , s 1 ), . . ., (u n , A n , s n )) Let I bal (z) (respectively I imbal (z)) denote the set of indices i ∈ [d] for which z is i-balanced (respectively not i-balanced).
Intuitively, goodness of a path ensures that as one goes from one partial product of (15) to the next, we never experience any significant multiplicative decreases.On the other hand, as we will see in the proof of Lemma 5.B.6 below, balancedness of a path ensures that the "variance" of these multiplicative changes is also not too large.These two conditions are important for applying off-the-shelf martingale concentration bounds like Freedman's inequality, which is governed by how large the changes can be in the worst case and how large they can be on average.We now argue that most paths are i-good and i-balanced for most i ∈ [d] (Lemmas 5.B.4 and 5.B.5 below).To do this, we will need the following consequence of Lemma 5.B.1.
Lemma 5.B.3.For any edge (u, A, s) in the tree and any i ∈ Here the probability is with respect to the distribution over measurement outcomes when the underlying oracle is O 0 .Proof.Suppose that ∥|ϕ i (A)⟩ − |ϕ 0 (A)⟩∥ 2 < ε/2 (otherwise the claim vacuously holds).Then by Chebyshev's inequality, We now combine Lemma 5.A.
where the penultimate step follows by Lemma 5.B.Proof.Note that where the second step follows by Markov's inequality and the last step follows by Lemma 5.A.

5.B.3 Martingale concentration
The paths that are both i-balanced and i-good are the ones over which the log-likelihood log L i (z) will concentrate as a random variable in z ∼ p 0 .As alluded to in the discussion above, being i-balanced ensures bounded variance, while being i-good ensures bounded differences.Together these yield the following Bernstein-type concentration which is the main technical ingredient in the proof of Theorem 5.B.1: Lemma 5.B.6.For any i ∈ [d], consider the following sequence of random variables , where the randomness is with respect to p 0 .For any η, ν > 0, we have Proof.Note that for any (u, A, s), where in the first step we used the fact that log(1 + z) ≥ z − z 2 for z ≥ −1/2, in the third step we used Lemma 5.B.1, in the fourth step we used Cauchy-Schwarz, and in the last step we used the last part of Lemma 5.B.1 and Lemma 5.B.3.Additionally, where in the first step we used the fact that log(1 + z) 2 ≤ 2z 2 for z ≥ −1/2, and in the last step we used the second part of Lemma 5.B.1.For every t, define the random variable where the randomness is with respect to p 0 .By Eq. ( 16), {Z t } t is a submartingale difference sequence satisfying Z t ≥ log(1 − ε) ≥ −2ε given 0 < ε < 1/2, so the lemma follows by Freedman's inequality and Eq. ( 17).
We will take η to be a small constant to be tuned later, and In light of Lemma and/or the path is not i-balanced.Let I conc (z) (respectively I anticonc (z)) denote the set of indices i ∈ [d] for which z is i-concentrated (respectively not i-concentrated).

5.B.4 Completing the argument
We assume the query complexity of the algorithm is at most cd/T for a small constant 0 < c < 1 to be tuned later.Note that by Lemma 5.B.2, this implies that By union bound, with probability at least 0.97 over z ∼ p 0 , there are at least 0.97(N/G) indices i such that z is i-good, i-balanced, and i-concentrated.For an index i that satisfies all three conditions for a path z, we have where in the second, third, and fourth steps we used that z is i-good, i-concentrated, and i-balanced respectively.Hence, L i (z) ≥ 9/10 with probability at least 0. By Lemma 3.A.2, we conclude that provided the query complexity of the algorithm is at most cd/T , it cannot distinguish between the oracle O 0 and the oracle O i for a random choice of i with constant advantage.

5.C From bounded-depth to noisy computation
We now show how to extract from Theorem 5.B.1 a lower bound against NISQ.We begin with the following basic lemma, a proof of which we include in Appendix 9.A for completeness, that quantifies the amount of information that is lost from running many layers of noisy computation: Lemma 5.C.1 (Lemma 8 from [22]).Let A be a λ-noisy depth-T quantum circuit on n qubits with output state ρ.Then I(ρ) ≜ n − S(ρ) ≤ (1 − λ) T • n, where S(•) denotes von Neumann entropy.
We will also use the following standard operational characterization of I(ρ): Lemma 5.C.2 (See e.g.Lemma 2 from [22]).Given any n-qubit state ρ and any POVM, the distributions p, q induced by respectively measuring ρ and I/2 n with the POVM satisfy KL (p∥q) ≤ I(ρ).
6.B Proof of Theorem 6..1 With these notations at hand, we are ready to prove Theorem 6..1.Suppose we (noisily) initialize in the state |+⟩ ⊗n ⊗ |−⟩.We call the last qubit the ancilla qubit.There are two cases depending on whether or not the ancilla qubit is corrupted prior to the oracle application in the algorithm.We handle these two cases separately in the following two lemmas.
Lemma 6.B.1.If the ancilla qubit is not corrupted prior to application of the oracle in the Bernstein-Vazirani algorithm, then for every i ∈ [n], with probability (1 − λ) 4 the ith output bit is given by s i and otherwise is given by a possibly incorrect bit.
Proof.When we reach the step of the Bernstein-Vazirani algorithm where we apply the oracle, suppose that k qubits out of the first n have been corrupted in the first two layers of noise (see Figure 2).Further suppose that the qubits are located at a 1 , ..., a k , where {a 1 , ..., a k } ⊂ {1, ..., n}.
Picking some permutation π ∈ S n such that π(i) = a i for i = 1, ..., k, we can write the state of the system as for some coefficients {β z,z ′ } satisfying z β z,z = 1.The rest of the protocol proceeds as follows.We apply O f to get Next we trace out the ancilla to find Following this we apply a third layer of local noise, apply H ⊗n , and then apply a fourth layer of local noise again.Suppose that this procedure corrupts any number of the already corrupted qubits at positions a 1 , ..., a k , as well as ℓ qubits at positions different from a 1 , ..., a k .Suppose that the positions of these ℓ qubits are a k+1 , ..., a k+ℓ where {a 1 , ..., a k , a k+1 , ..., a k+ℓ } ⊂ {1, ..., n}.Since the local noise and H ⊗n act qubit-wise, if we only want to track the uncorrupted qubits we can do as follows: at the outset we trace out the k + ℓ qubits which are to be corrupted, and then we apply H ⊗(n−k−ℓ) to the residual qubits.We implement this procedure presently.Defining σ ∈ S n by σ(i) = a i for i = 1, ..., k + ℓ, we can rewrite (20) as Finally, measuring in the computational basis, the probability of measuring All in all, we have seen that if we perform the usual Bernstein-Vazirani algorithm, we obtain the hidden bit string s but with a fraction of its bits corrupted, corresponding precisely to the qubits that were corrupted by one of the four layers of local noise.
We can now conclude the proof of Theorem 6..1.
Proof of Theorem 6..1.By Lemma 6.B.1, we see that for each bit i of the n output bits, the probability of being |s i ⟩ is at least (1 − λ) 6 , which happens if the ancilla qubit is never corrupted in either of the two layers of noise prior to the application of the oracle (with probability (1 − λ) 2 ), and if additionally the former of the two possible events in Lemma 6.B.1 happens (with probability (1 − λ) 4 ).Let f (λ) ≜ (1 − λ) 6 and note that for λ ≤ 1/10, f (λ) > 1/2.
Let X i be a random variable which equals zero if s i is obtained correctly with our procedure, and equal to one otherwise.Letting Y i be the average of M i.i.d.copies of X i , then the Chernoff-Hoeffding bound tells us that Prob Y This is an upper bound on the probability that if we repeat the Bernstein-Vazirani algorithm M times and employ the majority votes strategy on the ith site to determine s i , then we will fail.The probability that we fail for at least one of the n sites is upper bounded by So if we want the right-hand side to be at most δ, then we can pick some M such that Since (1 − 2f (λ)) 2 is an alternating series in λ, we can lower bound it by its first two terms as Then (21) can be written in a slightly simplified form as for λ < 1/24, as claimed. 3emark 6.B.1.As with the proof of Theorem 2.2, our proof considers a stronger noise model where at every layer, every qubit is independently corrupted with probability λ by a (potentially adversarially chosen) single-qubit channel.While our definition of NISQ focuses on local depolarizing noise, the quantum advantage in solving the Bernstein-Vazirani problem holds even when the local noise could be adversarially chosen.
For the purposes of showing a lower bound in this oracle model, we will assume that in between oracle queries, the algorithm can perform arbitrary noiseless quantum computation, and at the end it can perform a noiseless measurement in the computational basis.The only noise that gets applied is local depolarizing noise after any call to O ρ .Note that this is a stronger model of computation than a λ-noisy quantum algorithm or a NISQ λ algorithm, which merely makes the lower bound we show even stronger.Furthermore, as there is no notion of a classical oracle in this setting, it is not necessary to work with the tree formalism of Definition 3.A.1.

7.B Exponential lower bound
We are now ready to state the main oracle separation of this section: On the other hand, there is an algorithm in BPP QNC 0 that, given O(n) oracle queries, can determine both s and P with high probability.In fact, even if ρ is an arbitrary state, there is an algorithm in BPP QNC 0 that, given O(n) oracle queries, can estimate | tr(P ρ)| for all n-qubit Pauli operators P to within small constant error with high probability.
We remark that [34] gives an upper bound of (1 − λ) −Θ(n) for NISQ λ algorithms, so our exponential lower bound is qualitatively best possible.For the proof of Theorem 7.B.1 we consider the output state of any noisy quantum circuit and argue that for any Pauli operator P , the distance between the output state when the unknown state ρ is given by 1  2 n (I +P ) (likewise when it is given by 1 2 n (I −P )) versus when it is given by I 2 n is exponentially small unless if the circuit makes exponentially many oracle queries: by O s .We would like to show that for all n ′ -qubit states σ, ]∥ tr is small so that we can apply Lemma 3.B.1.But note that for any By taking the channels E in Lemma 3.B.1 to be D ⊗n ′ λ • O 1 and D ⊗n ′ λ • O 0 , we conclude that no algorithm given by alternately querying the oracle followed by depolarizing noise, and running arbitrary noiseless quantum computation, and finally measuring in the computational basis can distinguish whether the underlying oracle is O 0 or O 1 with at least 2/3 probability unless it makes Ω((1 − λ) −|P | ) queries.
As this model is a stronger model of computation than NISQ λ (note that there is no notion of a classical oracle in this setting), this implies the claimed lower bound for NISQ λ .Proof of Theorem 7.B.1.The second part of the theorem was shown in [28,Theorem 2].The NISQ λ part of the theorem follows by applying Lemma 7.B.1 with P ranging over all n-qubit Pauli operators which act nontrivially on all qubits.The Lemma implies that regardless of which P is the one defining the oracle, the algorithm is unable to distinguish whether it has access to O ρ or O I/2 n without making Ω((1 − λ) −n ) queries.
Supplementary Note 8 -Separating NISQ and BPP QNC from BQP While the relatively simple oracle from Section 4.B allowed us to separate NISQ and BQP, it is insufficient even to separate BPP QNC 0 from BQP, as Simon's algorithm can also be implemented in the former.Here we show how to simultaneously separate NISQ and BPP QNC from BQP, at the cost of relying on a more involved oracle construction, namely the shuffling Simon's problem from [10].
We first recall the setup of the shuffling Simon's problem.
We now describe the quantum oracle associated to SHUF(f, d).We are now ready to describe the shuffling Simon's problem.
We would like to apply Lemma 4.B.1 to Ω consisting of strings which are separated in Hamming distance.To that end, we prove the following expansion result for such subsets of the hypercube.Lemma 8..2.Let 3/10 ≤ q < 1/2.Let Ω ⊂ {0, 1} n be a subset such that all strings are at least qn apart in Hamming distance.Then for any distribution D over {0, 1} n and any 0 < λ ≤ 1, if ã is obtained by sampling a ∼ D and independently flipping each bit of a with probability λ/2, then Pr[ã ∈ Ω] ≤ exp(−Ω(λn)).
Note that if q ≥ 0.28, this expression is at most exp(−Cn) for an absolute constant C > 0. We see that O f,d ⊗ Id maps |ϕ j ⟩ ⟨ϕ j | to E F ∼D(f,d) x,x ′ ,y,y ′ v j,x,y v j,x ′ ,y x,x ′ ,y,y ′ v j,x,y v j,x ′ ,y

Definition 2 .B. 1 (
Classical oracle O).A classical oracle O is a function from {0, 1} n to {0, 1} m for some n, m ∈ N. The (n + m)-qubit unitary U O corresponding to the classical oracle O is given by U O |x⟩ |y⟩ = |x⟩ |y ⊕ O(x)⟩ for all x ∈ {0, 1} n , y ∈ {0, 1} m .Definition 2.B.2 (Classical algorithm with access to O).A classical algorithm M O with access to O is a probabilistic Turing machine M that can query O by choosing an n-bit input x and obtaining the m-bit output O(x).

Definition 2 .B. 3 (
Quantum algorithm with access to O).A quantum algorithm Q O with access to O is a uniform family of quantum circuits {U n } n , where U n is an n ′ -qubit quantum circuit given by

Definition 2 .
B.5 (NISQ algorithm with access to O).Let λ ∈ [0, 1].A NISQ λ algorithm A O λ = (M NQC λ ) O with access to O is a probabilistic Turing machine M that has the ability to classically query O by choosing the n-bit input x to obtain the m-bit output O(x), as well as the ability to query NQC O λ by choosing n ′ and {U k } k=1,...,T to obtain a random n ′ -bit string s.The runtime of A O λ is given by the sum of the classical runtime of M , the number of classical queries to O, and the sum of the times to query NQC O λ .With this definition in hand, we can extend the usual notions of relativized complexity to NISQ: Definition 2.B.6 (Relativized NISQ).Given a sequence of oracles O : {0, 1} n → {0, 1} m(n) parametrized by n ∈ N, a language L ⊆ {0, 1} * is in NISQ O if there exists a constant λ > 0 and a NISQ λ algorithm A O λ with access to O that decides L in polynomial time.

Supplementary Figure 1 :
------------or ---------------Illustration of the tree representation for NISQ algorithms.(a) At every memory state u of the classical computer/algorithm, it could either make a noisy circuit query or a classical query.(b) The tree representation with a mix of noisy circuit queries and classical queries.

Lemma 4 .
A.2 (Structure of A either exists a unique choice of b 1 , . . ., b n ′ or does not exist any choice of b 1

Lemma 4 .
A.7 (Stability of the robustified classical oracle).Consider a Hilbert space H which decomposes into subsystems as y,w ≤ 4 exp(−λn/4).By taking the channels E 1 and E 2 in Lemma 3.B.1 to be O f ⊗ D ⊗n ′ −3n λ and D ⊗n ′ λ , we obtain the desired bound on d TV (p f , p id ).We are ready to conclude the proof of Theorem 4.B.1.Roughly, because Lemma 4.B.2 tells us that running a noisy quantum circuit with oracle access gives negligible information about the underlying oracle, a NISQ algorithm with access to O f is no more powerful than a classical algorithm with access to the corresponding classical oracle.The lower bound of Theorem 4.B.1 then follows from the classical lower bound for Simon's problem.

1 .
The lemma follows by the fact that |I bal (z)| + |I imbal (z)| = d and by Markov's inequality.

Lemma 7 .B. 1 .
For any λ > 0. Let P ∈ {I, X, Y, Z} ⊗n .Any NISQ λ algorithm that has oracle access to the state oracle O ρ for either ρ = 1 2 n (I + P ) or ρ = I 2 n and can distinguish which oracle it has access to with at least 2/3 probability must make Ω((1 − λ) −|P | ) queries, where |P | denotes the number of non-identity components in P .Proof.Suppose the circuit operates on n ′ qubits.For convenience, for s ∈ {0, 1} denote O 1 2 n (I+s•P )
F)] ⊗ |w j,x,y ⟩ ⟨w j,x ′ ,y ′ |Noting that the only dependence on f in the above expression is in the definition of f d , we see that for any f, f′ : {0, 1} n → {0, 1} n , (O f,d ⊗Id−O f ′ ,d ⊗Id)[|ϕ j ⟩ ⟨ϕ j |] = E F ∼D(f,d)
5.B.6, we introduce one more property of paths z which, together with goodness (Definition 5.B.1) and balancedness (Definition 5.B.2), ensures that E i∼[d] [L i (z)] is not too small.