Benchmarks of Nonclassicality for Qubit Arrays

We present a set of practical benchmarks for $N$-qubit arrays that economically test the fidelity of achieving multi-qubit nonclassicality. The benchmarks are measurable correlators similar to 2-qubit Bell correlators, and are derived from a particular set of geometric structures from the $N$-qubit Pauli group. These structures prove the Greenberger-Horne-Zeilinger (GHZ) theorem, while the derived correlators witness genuine $N$-partite entanglement and establish a tight lower bound on the fidelity of particular stabilizer state preparations. The correlators need only $M \leq N+1$ distinct measurement settings, as opposed to the $2^{2N}-1$ settings that would normally be required to tomographically verify their associated stabilizer states. We optimize the measurements of these correlators for a physical array of qubits that can be nearest-neighbor-coupled using a circuit of controlled-$Z$ gates with constant gate depth to form $N$-qubit linear cluster states. We numerically simulate the provided circuits for a realistic scenario with $N=3,...,9$ qubits, using ranges of $T_1$ energy relaxation times, $T_2$ dephasing times, and controlled-$Z$ gate-fidelities consistent with Google's 9-qubit superconducting chip. The simulations verify the tightness of the fidelity bounds and witness nonclassicality for all nine qubits, while also showing ample room for improvement in chip performance.


INTRODUCTION
As hardware is developed to implement quantum circuits on increasing numbers of qubits, it will be valuable to have economical benchmarks of fully quantum behavior. From the outset of quantum computing it has been clear that the advantage of a quantum computer lies somewhere in its ability to readily perform tasks that are physically challenging or impossible for a classical system. Therefore, ideal hardware benchmarks should certify the ability of the hardware to generate such nonclassical behavior. Indeed, a wide variety of benchmarking techniques have been developed recently [1,2], including gate-fidelity benchmarks using randomized gate sequences that avoid the state-preparation and measurement errors, and state-preparation benchmarks that certify particular states while avoiding the exponential scaling of state tomography.
In this article we provide a set of practical hardware benchmarks that naturally generalize two-qubit Bell inequality tests to N qubits, based on the Greenberger-Horne-Zeilinger (GHZ) theorem. As with Bell inequalities, our nonclassicality benchmarks use the experimental violation of a classical bound to quantify the nonclassical behavior of the circuit. Beyond quantifying nonclassicality via a bound-violation, these benchmarks also provide tight lower bounds on the fidelities with which particular stabilizer subspaces have been prepared, and thus witness genuine N -qubit entanglement for all states that lie within the targeted subspaces. These benchmarks are optimized for testing controllable qubit arrays with nearest-neighbor coupling. As such, we provide efficient circuits for preparing cluster states that maximally violate these benchmarks with controlled-Z entangling gates, using a constant gate depth of 4 (up to hardware-specific decompositions of the controlled-Z gate [28][29][30][31][32][33]). Though our benchmarks efficiently verify genuine N -qubit entanglement using cluster states, many of the benchmarks may be applied to other stabilizer states and we expect similar benchmarks to exist for all stabilizer states.
The benchmarks we present here generalize earlier work that was experimentally tested with N = 3, 4 photons [34], where they were compared to previously proposed state-dependent methods for efficiently verifying the fidelity of particular entangled N -qubit preparations [35,36]. These prior methods have already been used to verify multi-qubit entanglement in state-of-the-art experiments with 12 qubits [37] and 18 qubits [38], since the exponential scaling required for traditional state tomography is increasingly prohibitive. Notably, for large N our GHZ-based benchmarks produce a tighter preparation-fidelity bound than these existing methods and similarly produce entanglement witnesses with better scaling.

RESULTS
Nonclassicality Benchmarks:-Our benchmarks consist of measurable correlators that are compared to derived upper bounds; violation of these bounds characterizes nonclassicality. Each such benchmark corresponds to a specific prepare-and-measure circuit on N -qubits with M ≤ N + 1 different measurement settings. The M observables form a structure called an ID (also called an identity product [39]), which is a set of mutually commuting N -qubit Pauli operators whose overall product is the N -qubit identity, up to a sign. We express an ID as an M × N table of single-qubit Pauli operators and the identity {Z, X, Y, I}, labeled O ij with i = 1, ..., M and j = 1, ..., N . We also define the shortened label O i = ⊗ N j=1 O ij to indicate the N -qubit observable obtained as the product of the ith row of an ID. We omit tensor product symbols for compactness.
To obtain the Bell inequality for each ID [34], we choose a particular eigenspace Π represented by a projector of rank 2 N −M +1 , which is specified by the set of N -qubit Pauli observables {O i } that form the M rows of the ID (see Figs. 1 and 2), and a specific choice of their respective eigenvalues {λ i }. We then define the correlator observable for this chosen eigenspace, such that its expectation value in a state ρ has an upper bound of β QM = M , saturated by the chosen eigenspace ρ = Π, For example, we could prepare the joint eigenstate of the ID of Fig. 1(a), with negative eigenvalue λ 1 = −1 for the 3-qubit Pauli observable O 1 = Y XY , and positive eigenvalues λ 2 = λ 3 = λ 4 = +1 for the remaining observables O 2 = Y Y Z, O 3 = ZXZ, and O 4 = ZY Y . Then, ⟨α⟩ = Tr(Πα) = 4 since each term in the sum becomes +1.
In the spirit of Bell [9,10], if one tries to explain the observed correlation by choosing a complete set of local hidden variables v Zj , v Xj , v Y j ∈ {+1, −1} that predict the outcomes of the single-qubit Pauli measurements, then at least one of the terms in the correlator sum becomes -1, resulting in a smaller upper bound, Experimental violation of this bound thus indicates nonclassicality in the form of a violation of local realism. Though the locality loophole is always open for neighboring qubits on a chip, this violation is still a useful witness for nonclassical states prepared by the chip, much like for Bell inequalities or Bell-Leggett-Garg inequalities [40]. The derivation of this bound is reviewed in the Methods Section. As an independent result, maximizing the expectation value of the correlator over all biseparable quantum states in the N -qubit Hilbert space produces the upper bound, which happens to coincide with the bound for local hidden variable theories. Experimental violation of the bound thus also witnesses genuine N -partite entanglement. In the Methods section, we provide the proof that the joint eigenspaces of the IDs in this article are maximally entangled, as well as the derivation of this bound. In light of the convenient fact that β bisep = β LHVT , we define the nonclassicality benchmark score for a given physical N -qubit device as the experimentally determined value, such that B ≤ 0 fails to witness either entanglement or the violation of local realism, while 0 < B ≤ 1 witnesses nonlocal Npartite-entangled states. The nonclassicality benchmark score thus serves as a metric of uniquely quantum behavior, with B = 1 indicating maximum nonclassicality that saturates the correlator bound. Each N -qubit ID provides a benchmark corresponding to a distinct nonclassical eigenspace of an N -qubit physical device, and thus the hierarchy of IDs presented in Fig. 1 provides a corresponding hierarchy of benchmarks. Lower Bounding the Fidelity:-The correlator also serves to bound the fidelity from below [34], where F = Tr(ρ exp Π) ∈ [0, 1] is the fidelity that the experimentally prepared state ρ exp lies within the eigenspace Π stabilized by the chosen ID. We provide a general derivation of this bound in the Methods section. Importantly, in the limit ⟨α⟩ exp → β QM = M , we have F ID → 1, and thus as the fidelity of the preparation is improved, this lower bound obviates the need for full tomography of these preparations. Taken together, the inequalities of Eqs. 3, 4, and 6 provide a practical and efficient characterization of the prepared N -qubit state, as well as a robust benchmark of its nonclassical behavior, using only M ≤ N + 1 measurement settings. We present minimal benchmark IDs in Fig. 1  Benchmark Circuits and Simulation:-The IDs in this article have been specially chosen so that the prepare-and-measure circuit for each measurement setting requires a gate depth of 4 on any array of N physical qubits with only nearest-neighbor controlled-Z couplings, making them a scalable and uniform set of benchmarks for implementations of this type. Figure 3 shows the circuits for N = 4, 5, from which the generalization to all N should be straightforward. In general, each circuit prepares an N -qubit linear cluster state, which is contained within the maximally entangled subspace of the corresponding ID.
In order to evaluate the usefulness of these benchmarks in real-world physical implementations, we simulated the performance of these circuits for each of the IDs in Fig. 1. We simulated each circuit over a range of T 1 energy relaxation times, T 2 dephasing times, and angular jitter for the controlled-Z gate rotations, using the ranges given in Figs. 4 and 5. We also considered the effect of initialization and readout error for each qubit. The ranges of values were chosen to match the reported values of the 9-qubit Google chip [31,32], with the experimental values roughly in the center of each simulated range. We ran one version of the simulation using a nominal initialization error for each qubit of P e = 2%, and another version where we used the observed initialization errors for each of the nine qubits on the Google chip. Final readout error has been neglected as correctable for ensemble statistics. Selected plots from the simulations are shown in Fig. 4, while scatter plots of the lower fidelity bound, F ID , are shown in Fig. 5 for the full ranges of simulated values. Note that in order to minimize the effect of the two worst qubits on the chip (boldface values in Figs. 4 and 5), we always used the last N qubits on the chip to form our N -qubit IDs in the simulation. See the Methods section for additional details about how the numerical simulations were performed. Judging by our simulated data shown in Figs. 4 and 5, we expect the 9-qubit Google chip to be able to violate the classicality bounds for all nine qubits. We can see clearly that the qubit initialization error is the dominant source of error as we try to move to larger N . This shows that our benchmarking scheme is immediately relevant, since it appears that similar hardware fidelity would only violate the bound for one or two more qubits -but certainly not all 72 on the Bristlecone chip [41] -once suitable IDs have been found beyond the 9 presented here.

DISCUSSION
The IDs and implementation circuits presented in this article are good benchmark tests for any physical implementation of qubits in a nearest-neighbor-connected array. They work naturally on a chip with more connectivity than this as well. While our simulations targeted a particular recent chip implementation for concreteness, this does not constrain the general usefulness of this protocol for other multi-qubit systems.
Although some other families of IDs with the same properties as those in Figs. 1 and 2 are known [39,42], the minimal IDs, with the largest possible value of N for a given M , are not known in general (see the Supplementary Discussion and Supplementary Figures 1 through 5 for the best known cases). Because of their geometric nature, enumerating all of the representative IDs for given values of N and M is a highly nontrivial problem, related to solving the graph isomorphism problem on N × M colored vertices, and it is thus limited by computational resources. Furthermore, not every ID can be constructed using only nearest-neighbor couplings in linear circuits as in Fig. 3. The increased connectivity of more modern chips, like the Bristlecone chip from Google, should allow the use of more general IDs, although the circuit depth will likely increase by one or two gates.
Each of the IDs presented here also gives rise to a complete proof of the Kochen-Specker (KS) theorem for contextuality [22,43,44], which can be implemented for any initial state with a few alternative circuits for the different measurement contexts. In general, IDs are the natural building blocks of proofs of the KS theorem in the N -qubit Pauli group. This is a slightly more complicated setup, which could inspire different contextuality-based benchmarks in future work.
Finally, maximally entangled IDs with M < N + 1 give rise to maximally entangled eigenspaces, each of dimension 2 N −M +1 , which generalize the codespaces of error correcting codes [45,46], and L = N − M + 1 is the number of logical qubits (where N is the number of physical qubits). All N -qubit-stabilizer-based error correcting codes (including the toric code [47]) belong to the family of IDs, and while all IDs of this type are error-detecting codes, they cannot all be used to diagnose the syndrome of an error in order to correct it. Many of the well-known error correcting codes generate an ID which proves the GHZ theorem, and all can be used as entanglement witnesses in the manner of this article [48]. Nevertheless, these more general maximally entangled subspaces may be of significant interest for other applications in quantum information processing, which warrants further investigation. One straightforward application for these subspaces is to perform benchmarks that measure the physical qubits as described in this paper, while simultaneously benchmarking the performance of the logical qubits in some additional way. The two tests may be performed simultaneously because any general logical L-qubit state can be prepared for each benchmark, although the circuit is likely to be longer and more complex than Fig. 3, and the performance will be commensurately worse. It is remarkable to note that if the conjectured bound N ≤ (M − 2)(M − 1) 2 can be saturated, then the number of logical qubits is bounded by L ≤ ((M − 2)(M − 1) 2 − M + 1, and thus the ratio L N → 1 in the limit M → ∞.

METHODS
Proving the GHZ Theorem: All of the IDs in Fig. 1 have sign -1, and for each qubit j, the number of entries O ij = Z in the ID is even, as is the number of entries with O ij = X and with O ij = Y . These properties indicate that these IDs give rise to proofs of the GHZ theorem [11], which is a logical version of Bell's nonlocality theorem [9,10], without any inequalities. To see this, suppose that a joint eigenstate (i.e., any state in a joint eigenspace) of these observables is prepared. This eigenstate has M eigenvalues λ i corresponding to the M observables, and ∏ M i=1 λ i = −1, since the product of these M observables is −I ⊗N . Suppose that each of the N qubits are now mutually space-like separated, and each is subjected to random local Pauli measurements, and label their outcomes λ ij , when all N local measurement settings happen to correspond to observable i of the ID. The entanglement correlations that are obeyed by this state are ∏ N j=1 λ ij = λ i . Putting these relations together we have ∏ M i=1 ∏ N j=1 λ ij = −1. Now, in order for a local hidden variable theory (LHVT) to explain these entanglement correlations, each qubit j must carry local hidden variables v Zj , v Xj , v Y j ∈ {+1, −1} which predict the outcomes λ ij , and are pre-arranged to satisfy the entanglement constraints. However, for such hidden variables we would have ∏ since n j , m j , and l j are all even for the IDs of this article, and thus is is impossible to choose local hidden variables which can satisfy the entanglement correlations of this state. This logical proof without inequalities can be converted into a Bell inequality for use as a benchmark of N -qubit nonlocality, as shown in the main text, by noting that for any complete assignment of local hidden variables v Zj , v Xj , v Y j ∈ {+1, −1} to the ID, at least one of the observables has the wrong eigenvalue.
In general, proving the GHZ theorem does not prove that nonlocal correlations exist between more than just a single pair of qubits among the N [49][50][51][52], nor does it generally witness genuine N -qubit entanglement. In contrast, the benchmark IDs we present in this article prove the GHZ theorem and are constructed to be N -partite entanglement witnesses [53,54], such that their corresponding Bell inequalities can only be violated by genuinely N -qubit-entangled states. To go further than the results we present here and prove nonlocal correlations exist between every pair of qubits among the N , one must violate the corresponding Svetlichny inequalities [49,55] instead, but with the cost that the number of required measurement settings grows exponentially with N [49].
Bounding the Fidelity: An N -qubit ID with M observables {O i } has a complete set of eigenspaces {Π k } satisfying ∑ k Π k = I, each of which can be identified by a unique set of distinct eigenvalues {λ ik } of {O i }. Only M − 1 of the observables in an ID are independent, and if M − 1 < N the eigenspaces Π k are degenerate, and each contains 2 N −M +1 mutually orthogonal vectors κ jk ⟩ which share the eigenvalue λ ik , with j = 1, . . . , 2 N −M +1 , such that { κ jk ⟩} is a complete orthonormal eigenbasis of the ID. Each of the 2 M −1 eigenspaces Π k corresponds to a unique correlator α k = ∑ M i=1 λ ik O i . Each experimentally obtained quantity ⟨α k ⟩ enables us to put a lower bound on the fidelity that an experimentally prepared pure state ψ⟩ lies within the eigenspace Π k [34].
With no loss of generality, we will henceforth use correlator α 1 and the target eigenspace Π 1 . We begin by expanding ψ⟩ in this eigenbasis as, such that ∑ j a j 2 + ∑ 2 M −1 k=2 b jk 2 = 1. Since the expansion is in an eigenbasis of α 1 , we find Note that ⟨κ j1 α 1 κ j1 ⟩ = ∑ M k=1 λ 2 k = M , since all eigenvalues of κ 1 ⟩ match those in the correlator α 1 by construction, and thus square to 1. However, any other κ jk ⟩ does not lie within Π 1 , so is characterized by eigenvalues distinct from those characterizing Π 1 . Moreover, since the product of all eigenvalues for the observables of a given ID is fixed for any eigenstate, only even numbers of eigenvalues can differ from those characterizing Π 1 , which necessarily causes at least two terms of ⟨κ jk α 1 κ jk ⟩ to become −1, resulting in an upper bound of ⟨κ jk α 1 κ jk ⟩ ≤ M − 4 for those eigenstates. Using these two observations we obtain, where F = ∑ j a j 2 , and we have used ∑ j a j 2 + ∑ 2 M −1 k=2 b jk 2 = 1. We can rewrite this relation as Noting that the left hand side of this equation is the fidelity F for the preparation ψ⟩ to lie within the eigenspace Π 1 , the right hand side F ID gives a lower bound F ≥ F ID for the fidelity. For IDs with M = N + 1, the target subspace Π 1 contains only one eigenvector, so the fidelity F is also a state preparation fidelity for the particular target eigenstate κ 1 ⟩. For IDs with M < N + 1, the target subspace Π 1 is degenerate, so the fidelity F is the fidelity for ψ⟩ to lie within that subspace. Next we generalize the above derivation to the case of mixed states. For a general convex combination of m pure states, where ∑ c l = 1, we can expand each ψ l ⟩ using appropriate eigenbases of the ID as in Eq. (7) and follow the same arguments to obtain where F l ≡ ⟨ψ l Π 1 ψ l ⟩. We can rewrite this as, As in the pure state case, the left hand side is the fidelity F for the mixed state ρ to lie within the target subspace Π 1 , while the same expression for the right hand side F ID places a lower bound on this fidelity. Witnessing Genuine N -Partite Entanglement: An N -qubit ID provides an entanglement witness if it is maximally entangled [39,56]. Entanglement is usually discussed in reference to the separability of states. However, there is a way to reason about the entanglement of a set of observables directly without reference to states. We define a maximally entangled set of N -qubit observables as one with the property that there exists no bipartition of the N qubits into subsets of R and N − R, such that all of the observables in each subset ⊗ k∈ [1,R] O ik mutually commute. It follows from this definition that the joint eigenstates of this set are maximally entangled N -qubit stabilizer states.
To see this, consider that every stabilizer state (space) of N qubits has a stabilizer group of b = 2 g mutually commuting Pauli observables {S i } and corresponding eigenvalues {λ i }, and its density operator can be written as, where g is the number of independent generators in the set, and d = 2 N is the dimension of the Hilbert space. Note that if g < N , then ρ projects onto a subspace of rank r = 2 N −g > 1, and that g = M − 1 for a minimal ID, which is just a specific subset of one or more complete stabilizer groups. If a stabilizer state is the tensor product of two smaller stabilizer states on subsystems A and B, it follows that its density operator can be written as, For the bipartition of the system into A and B, all of the stabilizer operators S A i = ⊗ k∈A O ik mutually commute by definition. It follows that one can find such a mutually commuting bipartition for any separable state, and therefore if no such bipartition exists, then the set of observables is maximally entangled. All of the IDs presented in this article are maximally entangled in this way, which results in a witness inequality with the same bound as the Bell inequality.
All states within a maximally entangled eigenspace of an ID are maximally entangled, meaning that for all of them, the maximum squared-Schmid-coefficient across all bipartitions is 1 2. For such an eigenstate ψ⟩, a standard entanglement witness is W = 1 2 − ψ⟩⟨ψ , and an experimental measurement of ⟨W⟩ < 0 is a witness of genuine N -partite entanglement [54]. Noting that a superposition state a ψ⟩ + b ψ ⟩ can only violate this bound for F = a 2 > 1 2, we obtain F ID ≤ F ≤ 1 2 for all biseparable states. Plugging this into F ID = (⟨α⟩ exp − M + 4) 4 yields ⟨α⟩ bisep ≤ M − 2, which is Eq. (4).
Numerical Simulation Details: In the simulation, the state is first degraded by initialization error. That is, ideally the N qubits are prepared in an initial ground state ⊗ N i=1 0⟩. However, each qubit has an error probability P Each gate in Fig. 3 is then applied to the initial state ρ. For the Hadamard gate, it is sufficient to use a Y 90 rotation, exp(−iY π 4). We decompose the controlled-Z gate into an implementable ZZ 90 entangling gate and single-qubit corrections: exp(iπ 4)[exp(iZπ 4) ⊗ exp(iZπ 4)] exp(−iZZπ 4). We degraded each gate by T 1 energy relaxation and T 2 dephasing processes for the corresponding gate times ∆t. For the energy relaxation time T 1 , the first-order corrections for each individual qubit are accumulated and then applied to ρ. For each qubit where a i is the lowering operator of the ith qubit tensored with identity for the other qubits, and ρ → ρ + ∑ N i ∆ρ i . This linear-order Lindblad-form update is sufficient since ∆t T i 1 ≪ 1. For the dephasing time T 2 , we directly construct the matrix, for efficiency and apply gate dephasing using element-wise multiplication (MATLAB syntax .*), as ρ → ρ .* D.
For simulating gate infidelity, we assume that the single-qubit gate fidelities are high enough for their errors to be neglected, and so simulate only a range of fidelities for the 2-qubit controlled-Z gates. As a crude model for infidelity of a controlled-Z gate, we add a random angular jitter δϕ only to the ZZ rotation step, exp[−iZZ(π 2 + δϕ) 2], and average over the effect of this jitter using a raised cosine distribution with a width w, dP (δϕ) = d(δϕ)[1 + cos(πδϕ w)] (2w), where δϕ ∈ [−w, w] has compact angular support. This yields the averaged state update, where ζ i is the tensor product of Pauli Z for the two qubits the controlled-Z is acting on, and identity for all of the other qubits. The limit as w → 0 restores the unperturbed gate. This crude error model includes only one possible physical mechanism of infidelity for the controlled-Z gate, but gives an indication of the gate sensitivity to imprecise angular control. Since the initialization error dominates the infidelity, the effect of the angular jitter is small. Code availability: The MATLAB code used to generate our data is available from the authors on reasonable request. Data availability: The data that support our findings are available from the authors on reasonable request.
[56] M. Waegell, "A bonding model of entanglement for N -qubit graph states," International Journal of Quantum Information, 12, 1430005, (2014). Oij. Eigenvalues λi of Oi are also shown in each table, chosen to correspond to the state prepared by the circuit of Fig. 3 for the corresponding N , which lies in the specific eigenspace stabilized by the ID. Combining the rows of each ID with the appropriate eigenvalue defines a correlator observable α = ∑ i λiOi, from which we obtain the experimental benchmark score B = (⟨α⟩exp − M + 2) 2 that witnesses nonlocal N -partite entanglement when 0 < B < 1, as well as the lower bound F ≥ FID = (B + 1) 2 on the fidelity F for the state preparation to lie within the indicated eigenspace of the ID.   Fig. 1, with (a) corresponding to N = 4 in Fig. 1b and (b) corresponding to N = 5 in Fig. 1c. These two examples generalize to N -qubits, and produce linear cluster states. The local measurement settings for each observable Oij in the ID are implemented by the unitary operations Uij, assuming detectors that naturally measure the Z basis. This circuit allows the M different settings of an ID to be implemented with different Uij for different observables and qubits. For example, in the 5qubit ID of Fig. 1c the first setting is ZY Y ZI, meaning that for the first and fourth qubits U11 = U14 = I, for the second and third qubits U12 = U13 = e iπX 4 , and the fifth qubit is ignored.   Figure 5. Scatterplots of the fidelity lower bound FID vs. true fidelity F for all simulated data. The lower bound is tight, thus as F → 1 so too does FID. All plots contain data for the nonideality ranges:  Supplementary Figures 1, 2 This suggests that the number of qubits N in a benchmark ID with M observables is bounded by, and we conjecture that this will hold for all M . If we can derive a proof that this conjecture is true, this may help to identify a pattern to generate IDs with N = (M − 2)(M − 1) 2 for all M .

SUPPLEMENTARY NOTES
Maximal Benchmark IDs for All N : The benchmark IDs for all N in the main text belong to the stabilizer group of the N -qubit linear cluster state. That is, these maximal IDs stabilize a particular maximally entangled eigenstate, which is rank 1 as opposed to the code spaces of maximum rank stabilized by the minimal IDs. Unlike the case with minimal IDs, we exploit an emergent pattern in the maximal IDs that allow them to be extended to all N in a straightforward way that is provably maximally entangled.
To prove that these IDs stabilize the linear cluster state, we select the specific separable N -qubit IDs shown in Supplementary Figure 6 from the stabilizer group of (H 0⟩) ⊗N . Acting a controlled-Z gate on every nearest-neighbor pair of qubits in the array transforms this separable ID into the corresponding benchmark ID from Fig. 2