Benchmarks of nonclassicality for qubit arrays

We present a set of practical benchmarks for N-qubit arrays that economically test the fidelity of achieving multi-qubit nonclassicality. The benchmarks are measurable correlators similar to two-qubit Bell correlators, and are derived from a particular set of geometric structures from the N-qubit Pauli group. These structures prove the Greenberger–Horne–Zeilinger (GHZ) theorem, while the derived correlators witness genuine N-partite entanglement and establish a tight lower bound on the fidelity of particular stabilizer state preparations. The correlators need only M ≤ N + 1 distinct measurement settings, as opposed to the 22N − 1 settings that would normally be required to tomographically verify their associated stabilizer states. We optimize the measurements of these correlators for a physical array of qubits that can be nearest-neighbor-coupled using a circuit of controlled-Z gates with constant gate depth to form N-qubit linear cluster states. We numerically simulate the provided circuits for a realistic scenario with N = 3, …, 9 qubits, using ranges of T1 energy relaxation times, T2 dephasing times, and controlled-Z gate-fidelities consistent with Google’s 9-qubit superconducting chip. The simulations verify the tightness of the fidelity bounds and witness nonclassicality for all nine qubits, while also showing ample room for improvement in chip performance.


INTRODUCTION
As hardware is developed to implement quantum circuits on increasing numbers of qubits, it will be valuable to have economical benchmarks of fully quantum behavior. From the outset of quantum computing it has been clear that the advantage of a quantum computer lies somewhere in its ability to readily perform tasks that are physically challenging or impossible for a classical system. Therefore, ideal hardware benchmarks should certify the ability of the hardware to generate such nonclassical behavior. Indeed, a wide variety of benchmarking techniques have been developed recently, 1,2 including gatefidelity benchmarks using randomized gate sequences that avoid the state-preparation and measurement errors, and statepreparation benchmarks that certify particular states while avoiding the exponential scaling of state tomography.
Despite these recent achievements, quantifying the specific nonclassical resources that lead to quantum computational advantage has remained an elusive goal. 3 Several earlier proposals for suitable measures like entanglement, [4][5][6][7] Bell-nonlocality, [8][9][10][11][12][13] or quantum discord and its variations, [14][15][16] proved to be insufficient on their own due to the discovery of algorithmic counterexamples. [17][18][19][20][21] Recent advances suggest a strong connection between quantum advantage and contextuality, [22][23][24][25][26] which is a general structural feature of quantum mechanics that subsumes nonlocality. The most pragmatic metric of nonclassical behavior in quantum devices, however, has been the violation of two-qubit Bell inequalities, or similar entanglement witnesses that can apply to few-qubit subsets of a multi-qubit device. 27 In this article, we provide a set of practical hardware benchmarks that naturally generalize two-qubit Bell inequality tests to N qubits, based on the Greenberger-Horne-Zeilinger (GHZ) theorem. As with Bell inequalities, our nonclassicality benchmarks use the experimental violation of a classical bound to quantify the nonclassical behavior of the circuit. Beyond quantifying nonclassicality via a bound-violation, these benchmarks also provide tight lower bounds on the fidelities with which particular stabilizer subspaces have been prepared, and thus witness genuine N-qubit entanglement for all states that lie within the targeted subspaces. These benchmarks are optimized for testing controllable qubit arrays with nearest-neighbor coupling. As such, we provide efficient circuits for preparing cluster states that maximally violate these benchmarks with controlled-Z entangling gates, using a constant gate depth of 4 (up to hardware-specific decompositions of the controlled-Z gate [28][29][30][31][32][33] ). Though our benchmarks efficiently verify genuine N-qubit entanglement using cluster states, many of the benchmarks may be applied to other stabilizer states and we expect similar benchmarks to exist for all stabilizer states.
The benchmarks we present here generalize earlier work that was experimentally tested with N = 3, 4 photons, 34 where they were compared to previously proposed state-dependent methods for efficiently verifying the fidelity of particular entangled N-qubit preparations. 35,36 These prior methods have already been used to verify multi-qubit entanglement in state-of-the-art experiments with 12 qubits 37 and 18 qubits, 38 since the exponential scaling required for traditional state tomography is increasingly prohibitive. Notably, for large N our GHZ-based benchmarks produce a tighter preparation-fidelity bound than these existing methods and similarly produce entanglement witnesses with better scaling.

Nonclassicality benchmarks
Our benchmarks consist of measurable correlators that are compared to derived upper bounds; violation of these bounds characterizes nonclassicality. Each such benchmark corresponds to a specific prepare-and-measure circuit on N-qubits with M ≤ N + 1 different measurement settings. The M observables form a structure called an ID (also called an identity product 39 ), which is a set of mutually commuting N-qubit Pauli operators whose overall product is the N-qubit identity, up to a sign. We express an ID as an M × N table of single-qubit Pauli operators and the identity {Z, X, Y, I}, labeled O ij with i = 1, …, M and j = 1, …, N. We also define the shortened label O i ¼ N j¼1 O ij to indicate the N-qubit observable obtained as the product of the ith row of an ID. We omit tensor product symbols for compactness.
To obtain the Bell inequality for each ID, 34 we choose a particular eigenspace Π represented by a projector of rank 2 N−M +1 , which is specified by the set of N-qubit Pauli observables {O i } that form the M rows of the ID (see Figs 1 and 2), and a specific choice of their respective eigenvalues {λ i }. We then define the correlator observable for this chosen eigenspace, such that its expectation value in a state ρ has an upper bound of β QM = M, saturated by the chosen eigenspace ρ = Π For example, we could prepare the joint eigenstate of the ID of Then, 〈α〉 = Tr(Πα) = 4, since each term in the sum becomes +1.
In the spirit of Bell,9,10 if one tries to explain the observed correlation by choosing a complete set of local hidden variables v Zj , v Xj , v Yj ∈ {+1, −1} that predict the outcomes of the single-qubit Pauli measurements, then at least one of the terms in the correlator sum becomes −1, resulting in a smaller upper bound, Experimental violation of this bound thus indicates nonclassicality in the form of a violation of local realism. Though the locality loophole is always open for neighboring qubits on a chip, this violation is still a useful witness for nonclassical states prepared by the chip, much like for Bell inequalities or Bell-Leggett-Garg inequalities. 40 The derivation of this bound is reviewed in the "Methods" section.
As an independent result, maximizing the expectation value of the correlator over all biseparable quantum states in the N-qubit Hilbert space produces the upper bound, which happens to coincide with the bound for local hidden variable theories. Experimental violation of the bound thus also witnesses genuine N-partite entanglement. In the "Methods" section, we provide the proof that the joint eigenspaces of the IDs in this article are maximally entangled, as well as the derivation of this bound. In light of the convenient fact that β bisep = β LHVT , we define the nonclassicality benchmark score for a given physical N-qubit device  Fig. 3 for the corresponding N, which lies in the specific eigenspace stabilized by the ID. Combining the rows of each ID with the appropriate eigenvalue defines a correlator observable α ¼ P i λ i O i , from which we obtain the experimental benchmark score B ¼ ðhαi exp À M þ 2Þ=2 that witnesses nonlocal N-partite entanglement when 0 < B ≤ 1, as well as the lower bound F ! F ID ¼ ðB þ 1Þ=2 on the fidelity F for the state preparation to lie within the indicated eigenspace of the ID as the experimentally determined value, such that B 0 fails to witness either entanglement or the violation of local realism, while 0 < B 1 witnesses nonlocal Npartite-entangled states. The nonclassicality benchmark score thus serves as a metric of uniquely quantum behavior, with B ¼ 1 indicating maximum nonclassicality that saturates the correlator bound. Each N-qubit ID provides a benchmark corresponding to a distinct nonclassical eigenspace of an N-qubit physical device, and thus the hierarchy of IDs presented in Fig. 1 provides a corresponding hierarchy of benchmarks.
Lower bounding the fidelity The correlator also serves to bound the fidelity from below, 34 where F = Tr(ρ exp Π) ∈ [0, 1] is the fidelity that the experimentally prepared state ρ exp lies within the eigenspace Π stabilized by the chosen ID. We provide a general derivation of this bound in the "Methods" section. Importantly, in the limit 〈α〉 exp → β QM = M, we have F ID → 1, and thus as the fidelity of the preparation is improved, this lower bound obviates the need for full tomography of these preparations.
Taken together, the inequalities of Eqs. (3), (4) and (6) provide a practical and efficient characterization of the prepared N-qubit state, as well as a robust benchmark of its nonclassical behavior, using only M ≤ N + 1 measurement settings. We present minimal benchmark IDs in Fig. 1 for N = 3, …, 9, and detail minimal IDs up to N = 33 qubits in Supplementary Figs 1 through 5. These minimal IDs saturate the conjectured bound N ≤ (M − 2)(M − 1)/2. We also present a family of maximal benchmark IDs in Fig. 2 for all N ≥ 10 that saturate the bound M − 1 ≤ N.

Benchmark circuits and simulation
The IDs in this article have been specially chosen so that the prepare-and-measure circuit for each measurement setting requires a gate depth of 4 on any array of N physical qubits with only nearest-neighbor controlled-Z couplings, making them a scalable and uniform set of benchmarks for implementations of this type. Figure 3 shows the circuits for N = 4, 5, from which the generalization to all N should be straightforward. In general, each circuit prepares an N-qubit linear cluster state, which is contained within the maximally entangled subspace of the corresponding ID.
In order to evaluate the usefulness of these benchmarks in realworld physical implementations, we simulated the performance of these circuits for each of the IDs in Fig. 1. We simulated each circuit over a range of T 1 energy relaxation times, T 2 dephasing times, and angular jitter for the controlled-Z gate rotations, using the ranges given in Figs 4 and 5. We also considered the effect of (a) (b) Judging by our simulated data shown in Figs 4 and 5, we expect the nine-qubit Google chip to be able to violate the classicality bounds for all nine qubits. We can see clearly that the qubit initialization error is the dominant source of error as we try to move to larger N. This shows that our benchmarking scheme is immediately relevant, since it appears that similar hardware fidelity would only violate the bound for one or two more qubits -but certainly not all 72 on the Bristlecone chip 41 -once suitable IDs have been found beyond the nine presented here.

DISCUSSION
The IDs and implementation circuits presented in this article are good benchmark tests for any physical implementation of qubits in a nearest-neighbor-connected array. They work naturally on a chip with more connectivity than this as well. While our simulations targeted a particular recent chip implementation for concreteness, this does not constrain the general usefulness of this protocol for other multi-qubit systems.
Although some other families of IDs with the same properties as those in Figs 1 and 2 are known, 39,42 the minimal IDs, with the largest possible value of N for a given M, are not known in general (see the Supplementary Discussion and Supplementary Figs 1 through 5 for the best known cases). Because of their geometric nature, enumerating all of the representative IDs for given values of N and M is a highly nontrivial problem, related to solving the graph isomorphism problem on N × M colored vertices, and it is thus limited by computational resources. Furthermore, not every ID can be constructed using only nearest-neighbor couplings in linear circuits as in Fig. 3. The increased connectivity of more modern chips, like the Bristlecone chip from Google, should allow the use of more general IDs, although the circuit depth will likely increase by one or two gates.
Each of the IDs presented here also gives rise to a complete proof of the Kochen-Specker (KS) theorem for contextuality, 22,43,44 which can be implemented for any initial state with a few alternative circuits for the different measurement contexts. In general, IDs are the natural building blocks of proofs of the KS theorem in the N-qubit Pauli group. This is a slightly more complicated setup, which could inspire different contextuality based benchmarks in future work.
Finally, maximally entangled IDs with M < N + 1 give rise to maximally entangled eigenspaces, each of dimension 2 N−M+1 , which generalize the codespaces of error-correcting codes, 45,46 and L = N − M + 1 is the number of logical qubits (where N is the number of physical qubits). All N-qubit-stabilizer-based errorcorrecting codes (including the toric code 47 ) belong to the family of IDs, and while all IDs of this type are error-detecting codes, they cannot all be used to diagnose the syndrome of an error in order to correct it. Many of the well-known error-correcting codes generate an ID which proves the GHZ theorem, and all can be used as entanglement witnesses in the manner of this article. 48 Nevertheless, these more general maximally entangled subspaces may be of significant interest for other applications in quantum information processing, which warrants further investigation. One straightforward application for these subspaces is to perform benchmarks that measure the physical qubits as described in this paper, while simultaneously benchmarking the performance of the logical qubits in some additional way. The two tests may be performed simultaneously because any general logical L-qubit state can be prepared for each benchmark, although the circuit is likely to be longer and more complex than Fig. 3, and the performance will be commensurately worse. It is remarkable to note that if the conjectured bound N ≤ (M − 2)(M − 1)/2 can be saturated, then the number of logical qubits is bounded by L ≤ ((M − 2)(M − 1)/2 − M + 1, and thus the ratio L/N → 1 in the limit M → ∞.

Proving the GHZ theorem
All of the IDs in Fig. 1 have sign −1, and for each qubit j, the number of entries O ij = Z in the ID is even, as is the number of entries with O ij = X and with O ij = Y. These properties indicate that these IDs give rise to proofs of the GHZ theorem, 11 which is a logical version of Bell's nonlocality theorem, 9,10 without any inequalities. To see this, suppose that a joint eigenstate (i.e., any state in a joint eigenspace) of these observables is Fig. 3 Illustrative circuit diagrams for preparing the states for IDs in Fig. 1, with a corresponding to N = 4 in Fig. 1b and b corresponding to N = 5 in Fig. 1c. These two examples generalize to N-qubits, and produce linear cluster states. The local measurement settings for each observable O ij in the ID are implemented by the unitary operations U ij , assuming detectors that naturally measure the Z basis. This circuit allows the M different settings of an ID to be implemented with different U ij for different observables and qubits. For example, in the "fivequbit ID of Fig. 1c the first setting is ZYYZI, meaning that for the first and fourth qubits U 11 = U 14 = I, for the second and third qubits U 12 = U 13 = e iπX/4 , and the fifth qubit is ignored prepared. This eigenstate has M eigenvalues λ i corresponding to the M observables, and Q M i¼1 λ i ¼ À1, since the product of these M observables is −I ⊗N . Suppose that each of the N qubits are now mutually space-like separated, and each is subjected to random local Pauli measurements, and label their outcomes λ ij , when all N local measurement settings happen to correspond to observable i of the ID. The entanglement correlations that are obeyed by this state are Q N j¼1 λ ij ¼ λ i . Putting these relations together we have Q M i¼1 Q N j¼1 λ ij ¼ À1. Now, in order for a local hidden variable theory (LHVT) to explain these entanglement correlations, each qubit j must carry local hidden variables v Zj , v Xj , v Yj ∈ {+1, −1} which predict the outcomes λ ij , and are pre-arranged to satisfy the entanglement constraints. However, for such hidden variables we would have Xj v lj Yj ¼ þ1, since n j , m j , and l j are all even for the IDs of this article, and thus is is impossible to choose local hidden variables which can satisfy the entanglement correlations of this state. This logical proof without inequalities can be converted into a Bell inequality for use as a benchmark of N-qubit nonlocality, as shown in the main text, by noting that for any complete assignment of local hidden variables v Zj , v Xj , v Yj ∈ {+1, −1} to the ID, at least one of the observables has the wrong eigenvalue.
In general, proving the GHZ theorem does not prove that nonlocal correlations exist between more than just a single pair of qubits among the N, 49-52 nor does it generally witness genuine N-qubit entanglement. In contrast, the benchmark IDs we present in this article prove the GHZ theorem and are constructed to be N-partite entanglement witnesses, 53 Each experimentally obtained quantity 〈α k 〉 enables us to put a lower bound on the fidelity that an experimentally prepared pure state |ψ〉 lies within the eigenspace Π k . 34 With no loss of generality, we will henceforth use correlator α 1 and the target eigenspace Π 1 . We begin by expanding |ψ〉 in this eigenbasis as such that P j ja j j 2 þ P 2 MÀ1 k¼2 jb jk j 2 ! ¼ 1.
Since the expansion is in an eigenbasis of α 1 , we find Note that hκ j1 jα 1 jκ j1 i ¼ P M k¼1 λ 2 k ¼ M, since all eigenvalues of |κ 1 〉 match those in the correlator α 1 by construction, and thus square to 1. However, any other |κ jk 〉 does not lie within Π 1 , so is characterized by eigenvalues distinct from those characterizing Π 1 . Moreover, since the product of all eigenvalues for the observables of a given ID is fixed for any eigenstate, only even numbers of eigenvalues can differ from those characterizing Π 1 , which necessarily causes at least two terms of 〈κ jk |α 1 |κ jk 〉 to become −1, resulting in an upper bound of 〈κ jk |α 1 |κ jk 〉 ≤ M − 4 for those eigenstates. Using these two observations we obtain, where F ¼ P j ja j j 2 , and we have used k¼2 jb jk j 2 Þ ¼ 1. We can rewrite this relation as Noting that the left hand side of this equation is the fidelity F for the preparation |ψ〉 to lie within the eigenspace Π 1 , the right hand side F ID gives a lower bound F ≥ F ID for the fidelity. For IDs with M = N + 1, the target subspace Π 1 contains only one eigenvector, so the fidelity F is also a state preparation fidelity for the particular target eigenstate |κ 1 〉. For IDs with M < N + 1, the target subspace Π 1 is degenerate, so the fidelity F is the fidelity for |ψ〉 to lie within that subspace. Next we generalize the above derivation to the case of mixed states. For a general convex combination of m pure states, where P c l ¼ 1, we can expand each |ψ l 〉 using appropriate eigenbases of the ID as in Eq. (7) and follow the same arguments to obtain where F l ≡ 〈ψ l |Π 1 |ψ l 〉. We can rewrite this as As in the pure state case, the left-hand side is the fidelity F for the mixed state ρ to lie within the target subspace Π 1 , while the same expression for the right-hand side F ID places a lower bound on this fidelity.

Witnessing genuine N-partite entanglement
An N-qubit ID provides an entanglement witness if it is maximally entangled. 39,56 Entanglement is usually discussed in reference to the separability of states. However, there is a way to reason about the entanglement of a set of observables directly without reference to states. We define a maximally entangled set of N-qubit observables as one with the property that there exists no bipartition of the N qubits into subsets of R and N − R, such that all of the observables in each subset k2½1;R O ik mutually commute. It follows from this definition that the joint eigenstates of this set are maximally entangled N-qubit stabilizer states. To see this, consider that every stabilizer state (space) of N qubits has a stabilizer group of b = 2 g mutually commuting Pauli observables {S i } and corresponding eigenvalues {λ i }, and its density operator can be written as where g is the number of independent generators in the set, and d = 2 N is the dimension of the Hilbert space. Note that if g < N, then ρ projects onto a subspace of rank r = 2 N−g > 1, and that g = M − 1 for a minimal ID, which is just a specific subset of one or more complete stabilizer groups. If a stabilizer state is the tensor product of two smaller stabilizer states on subsystems A and B, it follows that its density operator can be written as  50] μs. c Same ranges as the center plot, but with P e = 0 to show the asymptotic approach F ID → F as F → 1 S A i ¼ k2A O ik mutually commute by definition. It follows that one can find such a mutually commuting bipartition for any separable state, and therefore if no such bipartition exists, then the set of observables is maximally entangled. All of the IDs presented in this article are maximally entangled in this way, which results in a witness inequality with the same bound as the Bell inequality. All states within a maximally entangled eigenspace of an ID are maximally entangled, meaning that for all of them, the maximum squared-Schmid-coefficient across all bipartitions is 1/2. For such an eigenstate |ψ〉, a standard entanglement witness is W ¼ 1=2 À jψihψj, and an experimental measurement of hWi < 0 is a witness of genuine N-partite entanglement. 54 Noting that a superposition state a|ψ〉 + b|ψ ⊥ 〉 can only violate this bound for F = |a| 2 > 1/2, we obtain F ID ≤ F ≤ 1/2 for all biseparable states. Plugging this into F ID = (〈α〉 exp − M + 4)/4 yields 〈α〉 bisep ≤ M − 2, which is Eq. (4).

Numerical simulation details
In the simulation, the state is first degraded by initialization error. That is, ideally the N qubits are prepared in an initial ground state N i¼1 j0i. However, each qubit has an error probability P Each gate in Fig. 3 is then applied to the initial state ρ. For the Hadamard gate, it is sufficient to use a Y 90 rotation, exp(−iYπ/4). We decompose the controlled-Z gate into an implementable ZZ 90 entangling gate and singlequbit corrections: exp(iπ/4)[exp(iZπ/4) ⊗ exp(iZπ/4)]exp(−iZZπ/4). We degraded each gate by T 1 energy relaxation and T 2 dephasing processes for the corresponding gate times Δt. For the energy relaxation time T 1 , the first-order corrections for each individual qubit are accumulated and then applied to ρ. For each qubit Δρ i ¼ ða y i ρa i À 1 2 fρ; a y i a i gÞΔt=T i 1 , where a i is the lowering operator of the ith qubit tensored with identity for the other qubits, and ρ ! ρ þ P N i Δρ i . This linear-order Lindblad-form update is sufficient, since Δt=T i 1 ( 1. For the dephasing time T 2 , we directly construct the matrix for efficiency and apply gate dephasing using element-wise multiplication (MATLAB syntax.*), as ρ → ρ.* D.
For simulating gate infidelity, we assume that the single-qubit gate fidelities are high enough for their errors to be neglected, and so simulate only a range of fidelities for the two-qubit controlled-Z gates. As a crude model for infidelity of a controlled-Z gate, we add a random angular jitter δφ only to the ZZ rotation step, exp[−iZZ(π/2 + δφ)/2], and average over the effect of this jitter using a raised cosine distribution with a width w, dP (δφ) = d(δφ)[1 + cos(πδφ/w)]/(2w), where δφ ∈ [−w, w] has compact angular support. This yields the averaged state update, ρ ! R e Àiζi ðπ=2þδφÞ=2 ρe iζi ðπ=2þδφÞ=2 dPðδφÞ ¼ 1 2 ρ þ ζ i ρζ i þ iðζ i ρ À ρζ i Þ sin w w À sin w 2ðwþπÞ À sin w where ζ i is the tensor product of Pauli Z for the two qubits the controlled-Z is acting on, and identity for all of the other qubits. The limit as w → 0 restores the unperturbed gate. This crude error model includes only one possible physical mechanism of infidelity for the controlled-Z gate, but gives an indication of the gate sensitivity to imprecise angular control.
Since the initialization error dominates the infidelity, the effect of the angular jitter is small.

DATA AVAILABILITY
The data that support our findings are available from the authors on reasonable request.

CODE AVAILABILITY
The MATLAB code used to generate our data is available from the authors on reasonable request.