Benchmarking highly entangled states on a 60-atom analogue quantum simulator

Quantum systems have entered a competitive regime in which classical computers must make approximations to represent highly entangled quantum states1,2. However, in this beyond-classically-exact regime, fidelity comparisons between quantum and classical systems have so far been limited to digital quantum devices2–5, and it remains unsolved how to estimate the actual entanglement content of experiments6. Here, we perform fidelity benchmarking and mixed-state entanglement estimation with a 60-atom analogue Rydberg quantum simulator, reaching a high-entanglement entropy regime in which exact classical simulation becomes impractical. Our benchmarking protocol involves extrapolation from comparisons against an approximate classical algorithm, introduced here, with varying entanglement limits. We then develop and demonstrate an estimator of the experimental mixed-state entanglement6, finding our experiment is competitive with state-of-the-art digital quantum devices performing random circuit evolution2–5. Finally, we compare the experimental fidelity against that achieved by various approximate classical algorithms, and find that only the algorithm we introduce is able to keep pace with the experiment on the classical hardware we use. Our results enable a new model for evaluating the ability of both analogue and digital quantum devices to generate entanglement in the beyond-classically-exact regime, and highlight the evolving divide between quantum and classical systems.

| Entanglement in quantum and classical systems. a. In quantum systems, entanglement spreads between neighboring particles before saturating at an extensive level. However, entanglement growth is hampered by experimental errors which reduce the fidelity, limiting entanglement build-up. b. On the other hand, classical computers employ approximate simulation algorithms which often can only capture a limited degree of entanglement to avoid an exponential increase in cost, meaning they cannot exactly simulate dynamics at large system sizes and long evolution times. c. Here we compare quantum devices and classical algorithms in their ability to prepare highly entangled states using a Rydberg quantum simulator with up to 60 atoms (shown as a fluorescence image).
One such approach is to study the fidelity of preparing a highly entangled target state of interest 2 , with several efficient fidelity estimators 7, [15][16][17][18] having been introduced in recent years. However, in the beyond-classically-exact regime, these protocols have only been applied to digital quantum devices, with no such demonstrations on analog quantum simulators 19 , i.e. quantum devices tailored to efficiently encode select problems of interest [20][21][22][23][24][25][26][27] .
In this work, we perform fidelity estimation with an analog quantum simulator targeting highly entangled states which are impractical to represent exactly on a classical computer. Our Rydberg quantum simulator 17,21 has recently demonstrated 28 two-qubit entanglement fidelities of ∼0.999, spurring this study with up to 60 atoms 29 (Fig. 1c). We stress that we target high entanglement entropy states which require an exponential number of coefficients to represent classically, as distinct from Greenberger-Horne-Zeilinger (GHZ), cluster, or stabilizer states, which are efficiently representable on a classical computers at all system sizes 30 (Ext. Data Fig. 1). We use a Rydberg quantum simulator and a classical computer to simulate a time-independent, high-temperature quench starting from the all-zero state, targeting an ideal pure state, |ψ⟩. b. The classical algorithm is characterized by a bond dimension, χ , which limits the maximum simulable entanglement, resulting in smaller-than-unity classical simulation fidelity, C. We estimate the quantum fidelity, F , with a cross-correlation between measurement outcomes of the classical and quantum systems, termed 7 F d . c, d, e. (Top) Half-cut von Neumann entanglement entropy of |ψ⟩, (Middle) classical simulation fidelity, (Bottom) estimated experimental quantum fidelity. We study benchmarking against an exact simulation (gray) or an approximate simulation with limited bond dimension (blue). c, For a system size of N = 30 (left panels), using too small of a bond dimension sets a cap in the simulation entanglement. d. This causes the classical fidelity to fall at a time, tex, when the entanglement of the target state becomes too large. e. At approximately tex, the estimated experimental quantum fidelity also drops. For the largest system size, N = 60 (right panels), tex is well before when the entanglement saturates, even for the largest bond dimension we employ. The time-axis is normalized by the Rabi frequency (Methods). f. The estimated fidelity (averaged over all times in e) increases with bond dimension (open markers), before saturating (closed markers) at a bond dimension capturing the necessary entanglement. For the largest system sizes, saturation is not achieved using the available classical resources.
Our fidelity estimation is based on extrapolation from benchmarking against many approximate classical simulations, namely, matrix product state (MPS) algorithms which cap the maximum simulation entanglement to avoid the aforementioned exponential increase in classical cost [30][31][32] (Fig. 1b). In one-dimension, early-time entanglement growth is system-size independent, so at short times the MPS representation is exact for essentially arbitrarily large systems. When entanglement growth surpasses the entanglement cap, the MPS is no longer a faithful reference, but we can extrapolate the fidelity through a combination of varying the system size, evolution time and simulation entanglement limit.
Using the fidelity, we derive and demonstrate a simple proxy of the experimental mixed state entanglement 6 , which to date has been notoriously difficult to measure in large systems. Our proxy serves as a universal qualityfactor requiring only the fidelity with, and the entanglement of, the ideal target pure state. This enables comparisons between our experiment and state-of-the-art digital quantum devices [2][3][4][5]33 , with which we are competitive.
Ultimately, we compare the fidelity of our experiment against that achieved by a range of approximate classical algorithms. Using a single node of the Caltech central computing cluster, none of the tested algorithms is able to match the experimental fidelity in the high-entanglement regime, except for an improved algorithm we introduce, termed Lightcone-MPS. Even with this new algorithm, classical costs reach a regime requiring high-performance computing to match the experiment's performance.

Fidelity estimation with approximate algorithms
A key quantity when studying quantum systems is the fidelity 34 , F = ⟨ψ|ρ exp |ψ⟩, where |ψ⟩ is a pure state of interest, andρ exp is the experimental mixed state. For digital devices studying deep circuits, the fidelity can be estimated via the linear-cross-entropy 2,15 , a cross-correlation between measurement outcomes of an experiment and an exact classical simulation. A modified cross-entropy, termed 7 F d , was proposed for both analog and digital systems, and demonstrated on Rydberg 17 and superconducting 35 analog quantum simulators (Methods). However, a stringent requirement remains: access to an exact classical simulation, precluding direct fidelity estimation at large system sizes. We circumvent this constraint by introducing a method to estimate the fidelity by benchmarking against approximate classical simulations.
We consider a comparison (Fig. 2b) between an ideal high-entanglement target pure state, |ψ⟩, the experimental mixed state,ρ exp , and a pure state from classical MPS simulation, |Ψ sim ⟩. We introduce an improved MPS time-evolution algorithm using an optimal decomposition of Hamiltonian dynamics into quantum circuits 36 , which we term Lightcone-MPS (Methods). The MPS is parameterized by a bond dimension, χ , that defines the maximum simulable entanglement, which scales as log( χ ). Starting from an all-zero state, we program a time-independent, global quench under the one-dimensional Ising-like Rydberg Hamiltonian (Fig. 2a, Ext. Data Fig. 3). Hamiltonian parameters lead to high-temperature thermalization 37,38 , such that describing |ψ⟩ at late times requires an exponential number of classical coefficients 17 .
For a system size of N =30 (Fig. 2c-e, left), we can exactly classically simulate these dynamics (Fig. 2d, grey); by exact, we mean the classical fidelity, C = |⟨Ψ sim |ψ⟩| 2 , stays near unity for all times. We numerically observe the entanglement of the target state increases linearly at early times, before eventual near-saturation (Fig. 2c). Moreover, the estimated experimental quantum fidelity, F d , shows apparent exponential decay due to experimental errors 17 (Fig. 2e, grey).
However, the situation changes when using an approximate classical simulation. Now, the classical fidelity begins to decay (Fig. 2d, blue) after the time, t ex , when the ideal entanglement exceeds the limit set by the bond dimension (Fig. 2c, blue), meaning the classical simulation is no longer a faithful reference of the ideal dynamics. Most importantly, we find that after t ex the experimental benchmarked fidelity also deviates downwards (Fig. 2e, blue), indicating that F d no longer accurately estimates the fidelity to the ideal state. For the largest system sizes (for instance, N = 60 in Fig. 2c-e, right), t ex occurs well before the entanglement is predicted to saturate, even for the largest bond dimension we can realistically use.
Essentially, F d appears to be an amalgam of both classical and quantum fidelities, only estimating the quantum fidelity to the ideal state in the limit of the classical simulation being perfect. To test this behavior for all system sizes, we study the benchmarked value of F d averaged over all experimental times (Fig. 2f). Consistently we see for too small of a bond dimension (open markers), F d is reduced, and can even become negative. As bond dimension increases, F d rises, before reaching a saturation bond dimension, χ 0 (N, t), which depends on system size and time (closed markers). For the largest system sizes and times, however, the saturation bond dimension is beyond the capabilities of any current classical hardware 14 .
One could extrapolate the late-time fidelity from short times where we can reach the saturation bond dimension, but this is non-trivial due to non-Markovian noise sources often affecting analog quantum systems. In particular, with analytic and numerical analysis we show that shotto-shot Hamiltonian parameter fluctuations (e.g. laser intensity variations) induce sub-exponential fidelity decay at low fidelities (Methods, Theorem 1, Ext. Data Fig. 8).
Instead, we use a model-agnostic extrapolation by leveraging a large amount of data with three independent parameters: evolution time, system size, and bond dimension normalized by its saturation value (Fig. 3a,   Fig. 3 | Fidelity-benchmarking a 60-atom system. a. We employ a Monte Carlo inference approach to extrapolate the fidelity at large system sizes and long evolution times. Specifically, we train 1500 neural networks, each instantiated with randomized (hyper)parameters, to predict F d as a function of size, time, and bond dimension, and take the ensemble average as the predicted value. b. We test this procedure using error model simulations from N = 8 to 18 with increased laser intensity noise to emulate the fidelity expected for the experimental N = 60 dataset. For t > 6.6 cycles and N > 15, we only train on bond dimensions below the level necessary for exact simulation in order to mimic constraints at large system sizes. We observe two behaviors: 1) the ensemble prediction is consistent with the ground truth, and 2) the fidelity appears to follow a non-exponential form. See Methods for further crosschecks, as well as analytic evidence for the origin of the non-exponential decay due to non-Markovian noise. c. Experimental fidelities for N up to 60; markers are grayscale where the classical fidelity (with χ = 3072) is less than 0.99. d. Early-time fidelity decay rate as a function of system size, consistent with linear system size scaling. e. Fidelity at the time (inset) where the pure state entanglement saturates, with F d = 0.095 (11) at N = 60; the error bar is the standard error over Monte Carlo inferences added in quadrature with the underlying sampling error.
Methods). We can calculate F d in seven of the octants of this parameter space -the only outlier is the highentanglement regime of interest. We thus use a Monte We develop an experimentally measurable proxy that lower-bounds the log negativity, which is a measure of mixed state entanglement. Here we demonstrate this proxy with error model simulations of random unitary circuit and Rydberg evolution. b. The experimental mixed state entanglement-proxy; solid lines are guides to the eye. c. The maximum entanglement-proxy for our experiment can be compared against that of literature examples performing global fidelity estimation with digital quantum processors: Sycamore 2,5 , Zuchongzhi 3,4 , and H2 33 (text indicates release year). For literature examples, the x-axis is the number of qubits, while for our experiment the effective system size is defined as the number of qubits with the same Hilbert space dimension as our experiment under the Rydberg blockade constraint (Methods), and is for instance ∼42 at N = 60.
Carlo inference approach by training an ensemble 39 of initially randomized neural networks to predict F d given an input N , χ , and t; F d at large system sizes and long evolution times is then estimated as the ensemble average when χ → χ 0 (Ext. Data Fig. 9). We emphasize that essentially we are simply performing curve fitting of the smoothly varying function F d (N, χ , t), for which we can directly simulate many ground truth data.
We check that this protocol consistently reproduces fidelities at small system sizes, does not appear to overfit the experiment (Ext. Data Fig. 12), is insensitive to hyperparameters such as the neural net topology and size, and that predictions are converged as a function of the bond dimension (Ext. Data Fig. 13). We further reaffirm that our method extrapolates correctly by replicating our entire procedure in a smaller scale wherein the quantum device is replaced by numerical error model simulations up to N = 18 atoms (Methods). For t > 6.6 cycles and N > 15, the training data only consists of low bond dimensions to emulate the limitations of the large-N experimental data. Even still, the extrapolated fidelity is in excellent agreement with the ground truth data (Fig. 3b, Ext. Data Fig. 12), and reproduces the sub-exponential fidelity decay predicted analytically (Theorem 1).
Ultimately, we apply Monte Carlo inference to the full experimental dataset for system sizes up to N = 60 atoms ( Fig. 3c; see Ext. Data Figs. 6 and 11 for all data). At high fidelities (≳0.2), we observe nearly exponential decay, with a rate scaling linearly with system size (Fig. 3d). At low fidelity, however, the Monte Carlo prediction again reproduces the expected sub-exponential response. We estimate the fidelity to produce the target state when the entanglement is expected to saturate (Fig. 3e), yielding F d = 0.095 (11) at N = 60.
To our knowledge, this is the first global fidelity estimation in the classically-inexact regime with an analog quantum simulator, and it furthermore showcases benchmarking a quantum device by extrapolating from approximate classical simulations.

Experimental mixed state entanglement
Having benchmarked the fidelity of our Rydberg quantum simulator, we now turn to investigate the actual halfchain bipartite entanglement content of the experiment. In the past, several studies have investigated entanglement properties of (nearly) pure states by estimating the second Rényi entropy in (sub-)systems up to 10 particles 37,[40][41][42] . However, the actual output of an experiment can be a highly mixed state, with markedly different entanglement content than the target pure state. For this reason, it is desirable to directly quantify mixed state entanglement measures. Unfortunately, extensions of most pure state entanglement measures to the case of mixed states are defined variationally, and as such are incalculable for even moderately sized systems 43 .
An alternative, computable measure of mixed state entanglement is the log negativity 6 , E N , which is an upper bound to the distillable entanglement of the system 43 . However, measuring the value of the negativity naïvely requires tomography of the full system density matrix, The equivalent classical cost of the experiment, as quantified by the minimum bond dimension, χ * , for the classical simulation to maintain a higher fidelity than the experiment across all times, i.e. for C > F d (inset). We consider several classical algorithms (for example, time-evolving block decimation, TEBD 44 ), all of which become impractical at moderate system sizes. This necessitates the introduction of our Lightcone-MPS algorithm (Methods), which reaches a maximum value of χ * = 3400 for N = 60. b. Predicted MPS costs (simulation time, sampling time, and peak memory usage) to operate at χ * as a function of the experimental per-atom fidelity (see text). Times are representative of a single 16-core node on the Caltech cluster (Methods).
which is infeasible even for intermediate scale quantum systems 45,46 . In the past, experiments have been limited to demonstrating necessary conditions for a nonvanishing negativity, which can only reveal the binary presence of mixed state entanglement 47,48 .
Here we derive and demonstrate an entanglementproxy, E P , which can lower-bound the extensive mixed state entanglement (quantified by log negativity). For a mixed state,ρ, with fidelity, F , to a target pure state, |ψ⟩, with known entanglement, E N (|ψ⟩), our mixed state entanglement-proxy is (1) Intuitively, E P is a proxy evaluating the competition between the growth of the error-free entanglement, E N (|ψ⟩), versus the error-sensitive fidelity, as F < 1 reduces the mixed state entanglement. Whenρ is an isotropic state (an admixture of a maximally entangled state and a maximally mixed state), it has been shown 6,49 that E N (ρ) = max(E P (ρ), 0) at large system sizes. Further, we show the same holds for a Haar-random state admixed with a maximally mixed state -the expected output 50 of deep noisy random unitary circuits (RUCs) -as long as the fidelity is large compared to the inverse of the half-chain Hilbert space dimension (Methods).
More generally, we prove E P is a lower bound for E N for any mixed state assuming |ψ⟩ is the highest fidelity state toρ , and becomes tighter as the system size increases (Ext. Data Fig. 15). Importantly, analytic (Theorems 3 and 4) and numeric (Ext. Data Figs. 16 and 18) evidence shows that under physically realistic conditions with local or quasi-static errors, violations of the assumption can only lead to small O(1) violations of our bound.
We demonstrate the efficacy of E P on both noisy RUC evolution and error model simulation of our Rydberg dynamics (Fig. 4a, Methods). In both cases, the target pure state log negativity increases and saturates, while the exactly calculated mixed state log negativity reaches a maximum before decaying at late times, behavior which the entanglement-proxy E P replicates as a lower bound.
We then plot the experimental entanglement-proxy (Fig. 4b), where E N (|ψ⟩) is extrapolated from small system sizes (Ext. Data Fig. 14), and F is found from Monte Carlo inference. We observe the entanglement proxy peaks before falling at late times; this peak value increases (Fig. 4c) as a function of effective system size defined as the number of qubits with the same Hilbert space dimension as our experiment under the Rydberg blockade constraint (∼42 for N =60). Importantly, with Eq. (1) we can directly compare the results of our present study against RUC evolution in state-of-the-art digital quantum devices [2][3][4][5]33 (Fig. 4c). We find we are within ∼2 ebits of early tests of quantum advantage 2 (an ebit is the entanglement of a two-qubit Bell state). For literature examples, we assume targeted states are Haar-random 51,52 , while for our experiment we conservatively use the extrapolated log negativity, which is ∼2 ebits below the expectation for Haar-random states at the largest system sizes (Ext. Data Fig. 14).
The mixed-state entanglement-proxy E P can serve as a useful quality-factor of the ability for different experiments to produce highly entangled states, and could be a more widely applicable alternative to other measures, such as quantum volume 16 , for directing efforts to improve NISQ-era quantum systems. We summarize the calculation of the entanglement-proxy in Ext. Data Fig. 2.
The classical cost of quantum simulation We finally ask: which device, quantum or classical, has a higher fidelity of reproducing a high-entanglement pure target state of interest? Equivalently, in terms of fidelity, what are the minimum classical resources required for a classical computer to outperform the quantum device?
To answer this, we compare the fidelity of the experiment against that of the MPS with varying bond dimension. For system sizes larger than 30 atoms, we estimate the classical fidelity using the MPS truncation error; this is accurate for RUCs 32 , and we find it applies for Rydberg Hamiltonian evolution as well (Ext. Data Fig. 25). We define the critical bond dimension for a given system size, χ * , as the minimum bond dimension for which the classical fidelity always exceeds the estimated experimental fidelity. Importantly, this controls the costs of classical simulation -for instance, MPS simulation time scales as O(N χ 3 ). We find χ * continually increases as a function of system size (Fig. 5a), reaching a maximum value of χ * = 3400 for N = 60 (Ext. Data Fig. 27), and apparently continuing to increase beyond that point.
In performing this study, we used our new Lightcone-MPS algorithm, but considered several alternative approximate classical algorithms, including path integral 2 , matrix product operator 53 , time-dependent variational principle 54 , Schrieffer-Wolff transformation 55 , and neural net 56 approaches (Methods); however, we found the equivalent classical cost of these methods quickly became infeasible, typically well before N = 60. As an example, we show χ * for a more conventional MPS approach using time-evolving block decimation 44 (Fig. 5a).
All calculations used a single 16-core node of the Caltech central computing cluster (Methods). On this machine, we estimate that running the Lightcone-MPS simulation for N = 60 and χ * = 3400 would entail a peak memory usage of ∼110 GB (scaling as O(N χ 2 )), and would take ∼11.3 days, or 11.3×16 ≈ 180 core-days; sampling from the resultant MPS would take ∼0.3 coreseconds per sample (scaling as O(N χ 2 )). For comparison, the experimental cycle time is ∼1.7 s, limited by array loading and imaging; the actual quantum simulation time is only ∼1 µs per shot. While the classical computer can utilize multiple cores, so too can the experiment be parallelized over multiple atom-array chains simultaneously, which in fact we do already at small system sizes.
We predict these classical costs are highly-sensitive to the effective per-atom fidelity, F, defined by F N t ≡ F (N, t) (Fig. 5b, Methods). For instance, the simulation time scales as ∼(1−F) −10 around the experimental F. While specialized classical hardware 14,57,58 may more readily perform the present approximate classical simulations, we thus expect small improvements in the quantum fidelity may soon make the experiment out of reach of even these more advanced classical systems.

Outlook
As quantum systems tackle tasks of rising complexity, it is increasingly important to understand their ability to produce states in the highly entangled, beyondclassically-exact regime. Here we have studied this regime directly by measuring the global fidelity of an analog quantum simulator with up to 60 atoms, the first such demonstration to our knowledge.
Our inference protocol may help scale other fidelityestimation schemes such as quantum volume 16 , and could be used by digital quantum devices [2][3][4][5]33 , or analog quantum simulators for itinerant particles 7,20,24,59 . Further, one may imagine applying the same basic technique to cross-platform comparisons 60-62 between erroneous quantum devices by varying the decoherence of each, a form of zero-noise extrapolation 63,64 .
In addition, we have addressed a longstanding problem by introducing a simple proxy of the experimental mixed state entanglement. This entanglement-proxy can serve as a universal quality-factor comparable amongst analog and digital quantum devices as a guide for improving future systems, and may act as a probe for detecting topological order 65,66 and measurement-induced criticality 67 .
Finally, we have studied the equivalent classical cost of our experiment on the level of global fidelity, which we note could be greatly increased through the use of erasure conversion 28,68,69 . Similar techniques could be applied to quantify the classical cost of measuring physical observables 10,70 , and to benchmark the performance of approximate classical algorithms themselves through comparison to high fidelity quantum data. While here we have focused on one-dimensional systems to exploit the power of MPS representations, using higher-dimensional systems 71,72 , while maintaining high fidelities, may prove even more difficult for classical algorithms. Ultimately, our results showcase the present and potential computational power of analog quantum simulators, encouraging an auspicious future for these platforms 20 .  Ext. Data Fig. 1 | Map of the relevant regions of Hilbert space. For any intermediate or large system size, N , an average Haar random state in the Hilbert space will have nearly maximal entanglement entropy (up to the Page correction 1 ), and correspondingly is described by an MPS with approximately maximal bond dimension. By contrast, states such as GHZ state and cluster states -which are colloquially referred to as maximally-entangled -actually have very little extensive entanglement entropy, and can be written in an efficient MPS representation with only χ = 2.
Ext. Data Fig. 2 | Schematic for mixed state entanglement estimation of a target quantum state. a. The quantum device prepares a target state 2 of interest, |ψ⟩, before performing an ergodic quench, and measuring a set of resulting bitstrings. The experimental fidelity drops both during state preparation and the quench due to errors (not pictured). b. A classical computer then performs noise-free simulations of the entire dynamics (either exactly or approximately), calculates both the bitstring probabilities and the time-averaged bitstring probabilities, pavg, and further estimates the second moment of the bitstring probability distribution. c. The fidelity is then estimated via a cross-entropy type quantity 2-4 , F d . If the classical simulation were approximate, F d is predicted either via Monte Carlo inference or direct extrapolation. d. The negativity of the target state is estimated or assumed, and the experimental mixed state entanglement-proxy is calculated.
A. Description of the Experiment

State preparation and readout
Our experiment is a Rydberg atom array quantum simulator 2,5 trapping individual strontium-88 atoms in optical tweezers 6,7 ; see also Ext. Data Fig. 3. Up-to-date details of our apparatus may be found in previous works 2, 8 . In brief, we use dark-state enhanced loading 9 to load a 68-tweezer array with ∼61 atoms, which we then rearrange 10,11 into defect free arrays of various sizes, with the array spacing calibrated with a laser-based ruler 12 . Atoms are initially in the 5s 2 1 S 0 state, and are cooled on the narrow-line 5s 2 1 S 0 ↔ 5s5p 3 P 1 transition close to their motional ground state. Atoms are prepared into the long-lived 5s5p 3 P 0 clock state with a preparation fidelity of 0.9956(1) per atom, which we then treat as a metastable ground state, |0⟩. Atoms are then driven globally to the 5s61s 3 S 1 , m J =0 Rydberg state, |1⟩ while the traps are briefly blinked off.
Following Hamiltonian evolution, state projection is performed by autoionizing 13 the Rydberg atoms, leaving them dark to our fluorescent imaging, with a fidelity of ∼0.9996 per atom. Atoms in the clock state are pumped into the imaging cycle, from which we map atomic fluorescence to qubit state 13,14 with a detection fidelity ≳0.9995 per atom 8,14 . For an initially defect-free array, this results in a series of bitstrings associated with the measured qubit states in the array; note that we discard any experimental bitstrings for which initial rearrangement failed. As we load many more atoms than are needed when benchmarking small system sizes, we simultaneously excite and detect multiple non-interacting subensembles when possible, and accrue several thousand bitstrings per system size and time (Ext. Data Fig. 7a, inset).
Ext. Data Fig. 3 | Description of the experiment. a. We use a Rydberg quantum simulator, based on trapping an array of single atoms of bosonic strontium-88. We treat the metastable 5s5p 3 P0 state as an effective ground state, and consider dynamics in the Rydberg manifold. State projection is performed by autoionizing the Rydberg state 13 (not shown). b. The Rydberg Hamiltonian is Ising-like, with single particle terms characterized by a detuning, ∆, and Rabi frequency, Ω, between the two qubit levels. Rydberg atoms are strongly interacting, characterized by the C6 coefficient, and with a strength falling off approximately as 1/R 6 , where R is the interatom distance. When two atoms are close enough together, simultaneous excitation to the Rydberg state is strongly suppressed, causing the so-called Rydberg blockade effect. We operate in a regime in which the nearest neighbor interaction strength is ∼13× the Rabi frequency, which is strongly blockaded (see also Ext. Data  Fig. 5d). c. We study one-dimensional chains of atoms (fluorescence image), with the Rydberg excitation laser aligned along the longitudinal array axis to minimize inhomogeneity. d. A classical control system 2 continuously interleaves data-taking with automated atom-based calibration of Hamiltonian parameters, resulting in highly stable operation over the course of multiple weeks. For example, the Rabi frequency is held constant with a relative standard deviation of only 0.0023 (1). e. A single run of the experiment takes ∼1.7 s, almost entirely limited by array loading and imaging. The actual time required for quantum evolution is ∼1 µs. We use dark-state enhanced loading 9 to reach the largest array sizes used in this work. f. Representative per-atom fidelities; the reported per-atom evolution fidelity is defined as in Fig. 5b of the main text.

The Rydberg Hamiltonian
The Hamiltonian of our system is well approximated bŷ describing a set of interacting two-level systems, labeled by site indices i and j, driven by a laser with Rabi frequency Ω and detuning ∆. The interaction strength is determined by the C 6 coefficient and the lattice spacing a. Operators where |0⟩ i and |1⟩ i denote the electronic ground and Rydberg states at site i, respectively. Hamiltonian parameters are summarized in Ext. Data Table I, and are chosen to lead to high-temperature thermalization 15,16 , and chaotic behavior consistent with previous studies 2 .
The main experimental dataset was taken continuously over the course of 17 days. Parameters such as Rabi frequency, detuning, beam alignment, state preparation, and others were automatically calibrated via our home-built control architecture 2 , resulting in high stability even over such a long data-taking period. The overall duty cycle of our experiment (the ratio of time spent on 'science shots' versus the total wall time) was roughly 36% over that period.
An important feature of our system arises from the Rydberg blockade 5 effect (Fig. 2b of the main text). Notably, the nearest neighbor interaction strength is 13× larger than the next largest energy scale (the Rabi frequency), greatly reducing the probability to have simultaneous excitation of neighboring atoms to Rydberg states. To first order, this reduces the Hilbert space dimension from 2 N to Fib(N + 2) ≈ 1.62 N , where Fib is the Fibonacci function. This means that for a system size of N atoms, the effective Hilbert space size is only that of ∼0.7N qubits, restricting the maximum entanglement built-up.
We account for this effect in our benchmarking protocol (discussed below), and, more concretely, in Fig. 4 of the main text we plot the experimental mixed state entanglement-proxy against the effective system size, Fib(N + 2).

Defining the experimental time unit
Throughout this manuscript, we define the time unit in terms of 'cycles', given by t cycle = 1/Ω, and thus concretely t cycle ≈ 145 ns. This timescale was chosen as Ω is the dominant energy-scale in the system (after the blockading nearest neighbor interaction energy). However, it is instructive to compare cycles versus a more natural timescale, namely the 'unit-entanglement-time', t ebit , required to generate 1 ebit of entanglement in the early time linear growth regime (see Fig. 2c). For our system parameters, we numerically find this timescale is given by t ebit ≈ 0.83 × t cycle ≈ 121 ns. This definition is convenient, because it then allows us to directly compare our analog evolution times against the equivalent depth for digital circuit evolution.
We numerically fit the early time linear entanglement growth for one-dimensional random unitary circuit (RUC) evolution, and compare against our experiment. For the gate-set of Ref. 17 , we find that t ebit ≈ 1.1 layers, meaning 1 ebit of entanglement is built up in 1.1 layers. While this value is gate-set dependent, the exact numerical values are not of too much importance, but serve to demonstrate that t cycle used throughout this manuscript can be treated roughly as 1 circuit depth. For further details on comparing the evolution of analog and digital devices, see Ref. 2  We employ a recently developed fidelity estimator 3 , F d , to evaluate the many-body fidelity of ergodic quench dynamics 18 producing large entanglement. Importantly, F d can be efficiently sampled using only a small number of measurements as where M is the number of measurements, z m is the experimentally measured bitstring at the m th repetition, p(z) is the theoretically calculated probability of measuring z by a classical algorithm, and p avg (z) is the time-averaged probability of measuring z over a certain time window. The denominator can be efficiently calculated by importancesampling from the MPS 19 as E z∼p(z) p(z)/p avg (z). For small system sizes where we accrue bitstrings from multiple non-interacting subarrays, we calculate F d for each individual 'island' and then average their values, weighting by the number of measured bitstrings from each. This procedure is summarized schematically in Ext. Data Fig. 2.
Ext. Data Fig. 4 | Accuracy of the fidelity estimator. a. Comparing fidelities estimated from experiment against both the true fidelity and the estimated fidelity for our error model, showing good agreement both between experiment and theory, and between the estimator F d and the true fidelity. F d shows systematic typicality fluctuations around the true fidelity, but these decrease in amplitude with increasing system size, see Ext. Data Fig. 7b. b. F d does exhibit a systematic multiplicative offset from the true fidelity in the time regime we study due to a time-delay needed for errors to scramble and become visible 2,3 . We find that F d ≈ 1.02F , regardless of system size around t = 15 cycles.
We compute p avg (z) via time-averaging the calculated bitstring probabilities. Specifically, for each time point t i of the experiment, we estimate p avg (z) as a discrete average of the probability distributions within a window of t i ±1.4 µs, with an step size of approximately 28 ns. This means around 100 points are averaged, which limits statistical sampling fluctuations. Points included in the averaging are weighted to emphasize data with similar values of classical fidelity, namely using a weight factor of min{ F svd (ti) F svd (tj ) , where F svd is the product of MPS truncation errors 20,21 . We weight in this way under the hypothesis that averaging classical simulations of very different accuracies (as typified by F svd ) will lead to a poorer estimation of p avg (z).
To account for the Rydberg blockade mechanism, we have the option to slightly modify 2 the F d formula. Indexing the different blockade sectors by the number of blockade violations in that sector, s, we can write an approximator for F d to get an estimator for the blockade sector, F d,s=0 , only: Here B exp (B thy ) is the total probability for an experimental (simulation) bitstring to be in the blockaded Hilbert space, s = 0, which further redefines the normalized probabilities, p ′ (z) = p(z)/B thy . Note here M ′ is the number of bitstrings, z ′ m , measured in the blockaded Hilbert space. Importantly, we find F d,s=0 is more robust against the failure of the classical simulation algorithm, as compared to F d,s>0 . This is because MPS truncations more strongly affect the s > 0 sectors in ways which are not necessarily visible only by looking at the MPS truncation fidelity (Ext. Data Fig. 5bc).
For this reason, when estimating the quantum fidelity, we employ a two-pronged approach. Where F svd > 0.99 we approximate F d as F d,s=0 , which from measurements at small system sizes we find is always conservative (Ext. Data  Fig. 5a). Utilizing F d,s=0 allows us to more confidently directly measure the fidelity out to intermediate times at the largest system sizes, and measure the according effective decay rate (Fig. 3c of the main text). However, where F svd < 0.99, we estimate F d from our Monte Carlo inference procedure (described below), which is trained directly on the values of F d .
To be concrete, in the main text, all data shown as markers in Figs. 2 and 3 is F d,s=0 for χ = 3072. We summarize all of the data, both F d,s=0 and the Monte Carlo fidelities, in Ext. Data Fig. 6.
Ext. Data Fig. 5 | Benchmarking in the blockaded Hilbert space. a. F d versus F d,s=0 (the fidelity benchmarked in only the blockaded Hilbert space, see text) for system sizes up to N=30, consistent with correlation with unity slope, but with a slight ∼0.01 offset. b. For exactly simulatable systems, F d typically always exceeds F d,s=0 (left), but for larger systems (right), F d systematically begins underestimating F d,s=0 , even before the nominal exact simulation time. c. We understand this behavior as arising from the MPS preferentially truncating higher blockade subspaces, leading to the non-blockaded classical fidelity dropping below unity before it is visible in the full Hilbert space fidelity. This implies F d,s=0 stays a more accurate estimator slightly longer than F d . d. Probability to be in the blockaded subspace for both experiment and simulation. Note the different color-scale axes for the simulation and experiment, as many experimental errors can lead to apparently non-blockaded measurements.
Ext. Data Fig. 6 | Fidelity benchmarking for all system sizes and times. Fidelity (quantified by F d,s=0 with χ = 3072) as well as Monte Carlo inference for all system sizes. As in the main text, grayscale markers are those for which the classical fidelity is less than 0.99.

Sample complexity
To estimate error bars on the raw data, we use the sampling error defined as where p is the bitstring probability distribution obtained from MPS simulation, the expectation value is taken over the set of experimentally sampled bitstrings, and M is the number of samples. We expect the statistical error of our benchmarking protocol 2,3 to scale as A/ √ M , where A is the sample complexity which scales exponentially in system size with a base which depends on the fidelity 3 . We measure A at an approximately fixed fidelity of F d = 0.5 by performing an average of all the experimentally measured error bars from Eq. (4), weighted by their inverse distance from F d = 0.5, such that with w(N, t) = 1/|F d (N, t) − 0.5|. With this approach, we find A = 1.0083(4) N (Ext. Data Fig. 7a).
Ext. Data Fig. 7 | Uncertainty sources of the benchmarking protocol. There are two fundamental sources of noise in calculating F d : statistical error, and systematic error arising from so-called typicality fluctuations 2 . a. We first evaluate the statistical error behavior. At a fixed fidelity, the statistical error bars of our protocol are given 3 by A/ √ M , where M is the number of samples and A is the sample complexity. At approximately a fixed fidelity of F d = 0.5 (see text), we find A grows weakly with system size, indicating favorable scalability of our protocol from a statistical perspective. For this work, M ∼40000 for all system sizes, summing over all times (inset). b. Typicality error manifests as temporal fluctuations of the estimator F d , with respect to the true fidelity F , due to benchmarking in a finite-sized Hilbert space. The standard deviation of these fluctuations scales 2 as 1/ √ D, where D is the Hilbert space dimension. From error model simulations, we calculate the standard deviation of temporal fluctuations of F d in a 1-cycle period at late times, which we find decreases as 1.
C. Origins of non-exponential fidelity decay In Fig. 3 of the main text, we experimentally observe that the many-body fidelity appears to decrease exponentially at intermediate times, before bending sub-exponentially at late times, a behavior which seems to grow stronger with increasing system size (also visible in Ext. Data Fig. 6). We believe this behavior originates from globally-correlated, non-Markovian Hamiltonian parameter fluctuations, which we note are often found in analog quantum simulators. In particular, a dominant noise source in our experiment is shot-to-shot variation of the Rabi frequency, for instance arising from fluctuations of the driving laser intensity, which we measure to be around ∼0.5%, corresponding to a shot-to-shot variation of σ∼40 kHz (Ext. Data Fig. 8a). Here, we present analytical and numerical support that such an error source can induce the observed fidelity decay behavior.

Fidelity response to coherent Hamiltonian errors
We discover a surprising dependence of the fidelity between states subject to coherent Hamiltonian errors.
x is a sum of local terms and θ quantifies the perturbation strength. We assume the Eigenstate Thermalization Hypothesis (ETH), i.e. that the expectation values ⟨E|V x |E⟩ of each termV x with respect to the eigensates |E⟩ ofĤ 0 are independent random samples from a fixed distribution. Then in the limit of large system size N and long time t, the fidelity F between two such states |Ψ(t, θ 1 )⟩ and |Ψ(t, θ 2 )⟩ is a Gaussian function of (θ 1 − θ 2 )t: for some constant λ defined in Eq. (15).
This result is illustrated in Ext. Data Fig. 8b and will be instrumental to showing our observed non-exponential fidelity decay (Corollary 1). This fidelity response is related to the Loschmidt echo, and our result is consistent with existing numerical observations in e.g. Refs. [22][23][24] .
Proof. Without loss of generality, it suffices to let θ 1 = 0 and θ 2 = θ. We compute the Taylor series of F (θ) ≡ F (0, θ) and show that for large N , it agrees to all orders with the Gaussian series It is instructive to compute the first non-trivial term: the second derivative In order to evaluate these derivatives, we use the relation 25 : = it where |E⟩ are the eigenstates ofĤ 0 with eigenvalues E, and we have defined V EE ′ ≡ ⟨E ′ |V |E⟩. For our purposes, Eq. (10) is valid even whenĤ 0 has degeneracies: the initial state |Ψ 0 ⟩ projects any degenerate subspace onto a single eigenstate |E⟩.
The first term in Eq. (10) grows linearly with time t, while the summands in the second term oscillate and are sub-dominant. We subsequently neglect such terms. We obtain a similar expression for the second derivative: This gives: where |c E | ≡ |⟨E|Ψ 0 ⟩| 2 are the populations of |Ψ 0 ⟩ in the energy eigenbasis. We interpret EE as averages of the vector V EE and V 2 EE with respect to the distribution |c E | 2 : we denote them ⟨V ⟩ and ⟨V 2 ⟩ respectively; the expression ⟨V 2 ⟩ − ⟨V ⟩ 2 is the variance, which we denote κ 2 for reasons made clear below.
Within the approximation made in Eq. (10), ∂ n θ F (θ)| θ=0 = 0 for all odd n. The next term can be expressed in terms of the moments ⟨V k ⟩ as where x is the sum of identical local terms: the Eigenstate Thermalization Hypothesis (ETH) predicts that their eigenstate overlaps V x EE fluctuate about a smooth function of energy 26 . Furthermore, for sufficiently high-temperature states, correlations ⟨V xV y ⟩ are short-ranged, and we can treat eachV x as an effectively independent random variable.

Non-exponential decay for Rydberg quantum simulators
The non-exponential fidelity decay immediately follows from Theorem 1.
Corollary 1. When the parameter value θ follows a Gaussian probability distribution P (θ) with mean θ ′ and standard deviation σ Theorem 1 and a straightforward integration give a universal form for the fidelity of the target state where Λ ≡ N t 2 λσ 2 and δ = (θ ′ − θ 0 )/σ is the overall relative parameter miscalibration.
This functional form leads to markedly different behavior as compared to simple exponential decay, as it becomes essentially like a power-law at late times (Ext. Data Fig. 8d). In simulation, we benchmark an N = 16 atom system experiencing only global intensity fluctuations -but otherwise no noise -and see an excellent agreement between the simulated fidelity, and a fit to Eq. (20) with free λ. Notably, we see Eq. (20) exhibits three distinct regimes: superexponential behavior at very early time, effectively simple-exponential behavior at intermediate times, and finally sub-exponential behavior at late times. Notably, by studying multiple system sizes, we find the divergence between Eq. (20) and an effective exponential decay model is mainly a function of fidelity, not time, and becomes visible when the fidelity reaches F ∼0.2, which in our dataset is only the case for the highest N (Fig. 3 of the main text).
As a quantitative comparison, we define an effective model of the fidelity as that is, Eq. (20) multiplied by an exponential decay factor to account for the other noise sources, where we have redefined the free parameter λ ′ ≡ λN σ 2 . We perform error model simulations from 7 < N < 21 for two cases: 1) for only global, shot-to-shot Rabi and detuning noise, 2) for all other noise sources. Then, assuming multiplicativity of the error sources (Ext. Data Fig. 28a), from case (1) we fit the expected behavior of λ ′ , and from case (2) we fit the expected behavior of γ. We extrapolate the observed parameters all the way to N = 60, and compare against the experiment (inset of Ext. Data Fig. 8d). Even assuming perfect calibration (δ = 0) we see general agreement between the fidelity decay curve of the error model prediction and the experiment. However, the relative proportion of exponential decay versus non-exponential decay is difficult to calibrate versus experimental data at small system sizes, as this requires evolving out to very late times when other error sources (such as finite recapture probability of atoms in their traps) become noticeable. In addition, we expect that Hamiltonian parameter calibration is likely imperfect, at the very least up to our uncertainty in the Hamiltonian parameters. As a general sanity check then, we fit with γ, λ ′ , and δ free. Fitting N = 60 (with F 0 fixed by the separately calibrated single-atom preparation fidelity), we find δ ≈ 0.2, γ ≈ 1/4 × γ 0 and λ ′ ≈ 4 × λ ′ 0 , where γ 0 and λ ′ 0 are the extrapolated predictions from error model. We note δ ≈ 0.2 corresponds to a potential error of ≈10 kHz in the calibrated Rabi frequency, which is within our measured error bar (Ext. Data Table I).
We also note that other error sources, in particular atomic position fluctuations, also likely have some contribution to the non-exponential decay behavior, which may improve the agreement above. However, the analytical expression for their effect becomes quite involved, and so we disregard them here for simplicity, and leave a more in-depth analytic and numerical study of the various error sources to future work.
Ext. Data Fig. 8 | Origins of the non-exponential fidelity decay. a. A major noise source for our platform is Rabi frequency fluctuations (for instance arising from variation in the driving laser intensity), which have a shot-to-shot standard deviation of ∼0.5% (∼40 kHz). b. For global, correlated intensity noise, the fidelity is Gaussian with respect to error in the Rabi frequency. c. The standard deviation of this Gaussian dependence decreases as 1/t √ N . d. This behavior leads to an analytic solution for the fidelity decay of F ∼(1 + λN t 2 ) −1/2 , for free parameter λ. This decay is characterized by super-exponential curvature at early times, and sub-exponential curvature at late times. In the inset, we use our error model with all error sources enabled, and extrapolate to the expected fidelity behavior at N = 60, seeing generally good agreement with the experiment, which is improved by directly fitting and allowing unity-level changes in the extrapolated error model parameters.

D. Fidelity extrapolation through Monte Carlo inference
To extrapolate fidelities into the high-entanglement (i.e. late-time, large system-size) regime, we use a Monte Carlo inference approach. More specifically, we train an ensemble 29 of fully-connected regression neural networks, implemented in MATLAB, with three inputs (system size, evolution time, and bond dimension), and one output (estimated fidelity). The neural networks are instantiated with randomized hyperparameters (including the initial seed for the weights and biases). We emphasize that this approach is not trying to learn directly from experimental observations [30][31][32][33] , and instead we are simply extrapolating a smoothly varying 3-to-1 function. Fundamentally, the goal of this approach is to utilize all aspects of the data simultaneously, rather than extrapolating along only a single axis at a time, for instance either via time-extrapolation or system-size scaling alone.

Method description
For training data, we use the fidelities from all combinations of the 14 system sizes, 18 evolution times, and 91 simulated bond dimensions (from χ = 2 to 3072), resulting in a total dataset of ∼23000 points (Ext. Data Fig. 9a). The network is then trained to learn the functional dependence of F (N, χ , t).
To improve generalization and avoid fine-tuning, we employ a Monte Carlo approach through ensemble learning 29 , where the training is repeated 1500 times. Each iteration is instantiated with a randomized seed, and every three iterations we randomly select from a range of hyperparameters, the domain of which are specified in Ext. Data Fig. 9c. The fidelity for a given set of inputs is then taken to be the unweighted ensemble average, while the error bar is taken as the ensemble standard error of the mean added in quadrature with the sampling error at a given time.
To preprocess the data, we log-normalize the bond dimensions asχ = log( χ )/ log( χ 0 ), where χ 0 is the size-and time-dependent saturation bond dimension required for the simulation to be exact (Ext. Data Fig. 10e). This makes the learned function have a more consistent domain for different system sizes and evolution times. We also test if we train using simply log( χ ). We stress that in doing so, the largest trained log( χ ) ≈ 8, but for instance for N = 60 we predict the fidelity at log( χ ) ≈ 14.5, with no training data in the interim regime. We find consistent mean predictions between the two methods, but the standard deviation of Monte Carlo instances is reduced by training onχ.
Ext. Data Fig. 9 | Monte Carlo fidelity inference. a. All the data in this study, mapping the three inputs (system size, N , evolution time, t, and log-normalized bond dimension,χ = log( χ )/ log( χ 0)) to an estimated fidelity. Here χ 0 is the timeand size-dependent saturation bond dimension (Ext. Data Fig. 10e). b. To improve generalization into the unknown regime, we use a Monte Carlo inference approach. We train an ensemble of 1500 fully-connected neural networks, each instantiated and trained with random choices of parameters within reasonably specified ranges (defined in c); the Monte Carlo predicted fidelity is then taken as the ensemble average for a given input. d. Monte Carlo prediction for the fidelity of the N = 60 system at the entanglement saturation time. Over ∼98% of the individual Monte Carlo instance predictions exceed the expectation from the early time exponential decay (see Fig. 3 of the main text), resulting in an average of 0.095 (11), where the error bar represents the standard error on the mean, added in quadrature with the intrinsic sampling error of the underlying data. e. Maximum log-normalized bond dimension available for training as a function of system size and time, for χ = 3072.
Ext. Data Fig. 10 | Determining the log-normalized bond dimension. a, b. As a function of bond dimension, the classical simulation fidelity rises before saturating to 1; we define χ 0 (the saturation bond dimension), as the minimum value for which the classical fideity is greater than 0.999. c, d. By χ 0, the experimentally benchmarked fidelities have also saturated. e. We fit the dependence of χ 0 as a function of system size and time (solid lines) in order to extrapolate beyond the range of χ values which we directly simulate in this work.
We determine χ 0 by finding the minimum χ , at a given system size and evolution time, where the MPS fidelity is greater than 0.999. We extrapolate the behavior of χ 0 into the high-entanglement regime of large system sizes and late evolution times (Ext. Data Fig. 10e). We further normalize the input (subtracting off the mean and dividing by the standard deviation) to improve learnability.
To avoid overfitting, we randomly select a few N from 21 to 30, for which we excise from the training any data withχ > 2/3 and t > 6.6 cycles. We then use the data for these system sizes and times withχ ∼ 1 as the validation dataset, and stop the training when the validation loss -defined by the prediction root mean square error (RMSE)does not decrease for 50 epochs in a row (see for instance Ext. Data Fig. 12b).
We summarize a partial subset of the data showing the Monte Carlo inference as a function of N , t, and χ in Ext. Data Fig. 11.

Consistency checks
In order to test that this approach is accurate in estimating the fidelity, we perform several consistency checks, ranging from varying the number of learnable parameters, testing the result as a function of the amount of experimental training data, checking against small system sizes in experiment and error model where we have exact ground truth data, and more. Several of these tests are described below.
Small system sizes, experimental-For the largest system sizes we do not have ground truth data for F d in the late-time regime, and so cannot directly verify our model's efficacy. However, we can emulate the effect in this regime at smaller system sizes where we do have ground truth data in all regimes.
We employ the same training methodology as on the full dataset, but only consider N ≤ 30 where we have the full ground truth. For t > 6.6 cycles and N ≥ 24, we excise from the training data all bond dimensions for whichχ > 2/3. This value is chosen as it is approximately the maximum normalized bond dimension available at N = 60 (Ext. Data  Fig. 9e). Concretely, this approach is essentially pretending as if N ≥ 24 is beyond the exactly simulable regime. In reality, of course, we have the ground truth data to compare against, and can for instance use it to study the behavior of the loss functions (Ext. Data Fig. 12ab).
Averaging over all Monte Carlo instantiations, we find the training loss decreases consistently as a function of training epoch, but the validation loss (where the ground truth N ≥24 results are used as the validation data set) reaches a minimum after ∼50 epochs, after which it proceeds into an overfitting regime. As with the full dataset, we thus stop the training for each instantiation at the point where the validation loss is non-decreasing for ∼50 epochs in a row, and then take the result from the best epoch. We note that both losses drop most dramatically after only a single epoch of training from initially randomized configurations, meaning the gradient descent quickly finds a nearly optimal regime, implying that the loss function is both smooth and non-flat.
We then compare directly between the Monte Carlo predicted F d and the ground truth value for N = 30 across all times. We see good agreement even in the emulated breakdown regime at long times (Ext. Data Fig. 12c), with no indication that the learned fidelities are misestimating the true values, despite not having access to the underlying ground truth data at large bond dimensions.
Small system sizes, error model-As described, we can check our model's efficacy at small system sizes directly on experimental data, but the non-exponential fidelity decay behavior is not yet strongly visible. In the main text, we reach this regime using Monte Carlo fidelity inference on error model simulations (Fig. 3 of the main text). Here we further describe this analysis. We perform error model simulations for N = 8 − 18 up to the maximum experimental time, but artificially increase the shot-to-shot Rabi frequency variation by a factor of 4 in order to roughly emulate the predicted experimental fidelities at large N . We choose to do this, rather than simulate for much longer timescales where the behavior would become naturally evident (Ext. Data Fig. 8d), because we want to replicate as much as possible the fact that for the experimental dataset the entanglement is still growing for most times for the largest system sizes. Performing error model simulations out to ∼100 cycles would dominate the training by a regime where the entanglement had already saturated for these small system sizes.
We perform MPS simulations for all bond dimensions up to χ 0 (N = 18, t → ∞) = 89, and benchmark the error model with each. Though we run the error model with a finer timestep of ∼10 ns, we only select data at the experimentally measured times for training. We then emulate the experimental dataset by treating all N < 16 as "small N" where we have full ground truth data. We excise from the training any data for whichχ > 2/3, N > 15, and t > 6.6 cycles, to emulate the available data for the full experimental dataset. We use N = 13 − 15 as the validation dataset to prevent overfitting. We then train in the same way as for the experiment, and compare Monte Carlo predictions versus ground truth output (Ext. Data Fig. 12d). We find the two are consistent, both in terms of the values obtained, and the non-exponential fidelity decay, indicating the Monte Carlo inference is able to learn this behavior.
Ext. Data Fig. 12 | Evaluating the Monte Carlo inference at small system sizes. a, b. As a cross-check of our approach, we use the same training methodology employed on the full dataset, but truncate to only include N ≤ 30 where we have the full ground truth. We bipartition the data (inset), and for the large system sizes and late evolution times we only directly train withχ < 2/3. Averaging over 1500 instantiations, we find the training loss decreases consistently, but validation loss atχ∼1 reaches a minimum after ∼50 epochs, and as expected proceeds into an over-fitting regime after. c. Model predictions show the Monte Carlo inference accurately predicts fidelities at late times, despite not having access to the underlying ground truth data at large bond dimensions in this regime. d. We perform a similar test using error model simulations of our experiment, as for instance shown in Fig. 3b of the main text. We train with N = 8 to 18, excisingχ > 2/3 for N > 15 and t > 6.6 cycles. The resultant Monte Carlo inference predictions are well correlated with the ground truth, across the whole range of fidelities. X-error bars represent estimated typicality fluctuations, y-error bars represent the Monte Carlo inference standard deviation.
Number of learnable parameters-For the full dataset, a total of ∼23000 training data points are generated, while depending on the choice of hyperparameters, the number of learnable parameters can vary from ∼350 − 83000. For ∼70% of the Monte Carlo instances, the number of learnable parameters is lower than the number of training points (Ext. Data Fig. 13ab). Further we do not see any strong correlation between the predicted fidelity and the number of learnable parameters, or more generally between the fidelity and either network depth or width.
Reducing the experimental dataset-We perform the same training procedure on several variants of reduced experimental datasets, where we excise from the training half of the bond dimensions, system sizes, or evolution times used. We find the resulting Monte Carlo mean is always consistent with the prediction from training on the full dataset (Ext. Data Fig. 13c).
Maximum bond dimension used-The maximum bond dimension used in the training is constrained by our computational resources, but we can still study if the Monte Carlo prediction appears to be converged as a function of the bond dimension. In Ext. Data Fig. 13d we systematically scale the training bond dimension used in the breakdown regime of large system sizes and long evolution times, and find that the results appear converged with a bond dimension of ∼1500, about a factor of 2 lower than the maximum bond dimension we consider in this work.
Ext. Data Fig. 13 | Robustness of Monte Carlo inference. a, b. For the majority (∼70%) of hyperparameter combinations sampled during the Monte Carlo inference (Ext. Data Fig. 9c), the number of learnable parameters is smaller than the number of training points (∼23000). Further, we see no correlation between number of learnable parameters (nor any individual specific choice of hyperparameter) with the predicted fidelity. c. The Monte Carlo mean prediction is insensitive to the choice of data: removing half of the data along any axis leads to the same predicted fidelity. Note that while the "Full training" is composed of of 1500 models, the "excised" ensembles only feature 192 trained models. Markers show mean and standard deviation for each distribution. d. To check the convergence of our protocol, we repeat the Monte Carlo inference procedure with varying maximum bond dimension allowed in the breakdown regime of large system sizes and long evolution times. We find the fidelities are generally consistent over the entire range, with convergence achieved around a bond dimension of 1500.

E. Extrapolating pure state entanglement growth
We target the generation of high entanglement entropy states through effectively infinite-temperature quench dynamics. In order to estimate the mixed state negativity, we first must estimate the pure state entanglement, even for system sizes for which we can no longer exactly simulate the dynamics. To do this, we extrapolate the scaling behavior from small system sizes. Specifically, we fit entanglement growth, S, with the following functional form as a function of time and system size: To respect the physical constraints of the entanglement growth, we fix m 1 -the early time linear entanglement growth -to be system size independent. The Rydberg Hamiltonian induces fast entanglement growth over the first ∼0.5 cycles of evolution which makes fitting the linear growth erroneous without also including a system-size independent offset, S 0 . We then expect t ent , which we identify with the approximate entanglement saturation time, to scale linearly with N , which we empirically find is true also for σ (a smoothness parameter). We further empirically find that m 2 (the shallow entanglement growth slope after t ent ) scales approximately quadratically with N .
We show the results of these fits in Ext. Data Fig. 14, where markers are measured directly from simulation, and lines are fits. Though here we show log negativity, the von Neumann entropy scales and acts in the same way.
It is important to note that as system size increases, we gradually reach relatively less entangled states at t ent as compared to the expectation for a Haar random state (dashed lines in Ext. Data Fig. 14). Given the Rydberg blockade constraint, we define the Haar random prediction as S Haar ≈ log 2 (D 0 ) − 0.47, where D 0 is the blockaded Hilbert space dimension of the half-cut chain, and the numerical offset of -0.47 is a type of Page correction 1 , which is actually 34,35 ≈ log 2 ( 64 9π 2 ), as explained further below.
Ext. Data Fig. 14 | Pure state entanglement growth dynamics for the Rydberg Hamiltonian. We directly calculate the entanglement (here log negativity) for system sizes up to N = 60, and fit the resulting data (see text) to extrapolate the entanglement dynamics for large N at late times. We also compare against the prediction for Haar random states at various system sizes (black dashed lines), and find that as system size increases we gradually undershoot this expectation more, which we attribute potentially to imperfect parameter selection making the dynamics effectively less than infinite temperature at large N .

F. Mixed state entanglement
In this work, we have introduced the entanglement-proxy, E P , which estimates the mixed state entanglement, E N (ρ), in terms of the pure state entanglement, E N (|ψ⟩), and the fidelity, F ≡ ⟨ψ|ρ|ψ⟩ (see Eq. (1) of the main text).
Explicitly, consider a system bipartitioned into two subsystems, A and B, and described by density matrix,ρ AB . The log negativity 36 is given by whereρ T A AB is the partially transposed density matrix, and || · || 1 is the trace norm. More plainly, the log negativity measures the log absolute sum of all eigenvalues ofρ T A AB . Ifρ AB is separable, the log negativity is zero, but for entangled states it generically serves as an upper bound to the distillable entanglement of the system 37 , and is an entanglement monotone 37 . For pure states the log negativity is equivalent to the Rényi-1/2 entropy.
The following discussion is generally split into three subsections: In the first, we study lower bounds and estimators of the mixed state entanglement. We prove the validity of our entanglement-proxy, E P , for the case when the target pure state, |ψ⟩ is an eigenstate of the mixed state,ρ, as for instance is the case if |ψ⟩ is the highest fidelity pure state toρ. We further improve our result for the case where |ψ⟩ is a Haar-random state. Finally, we prove more general unconditional upper and lower bounds to argue that our entanglement-proxy becomes exponentially tighter as the system size increases.
Next, we discuss possible violations of the lower bound given by E P , if we allow |ψ⟩ to no longer be an eigenstate ofρ. We consider two physically realistic possibilities: 1) globally correlated coherent errors, and 2) incoherent local errors. For each case we provide numeric or analytic evidence showing that at worst our entanglement-proxy lower bound is potentially violated by an O(1) amount.
Finally, we make a specific connection between the entanglement content of an erroneous mixed state, and of a truncated MPS state. We show that at equal fidelity, the truncated MPS is less entangled than the depolarized Haar random state, which explains an observed discrepancy between the experimental mixed state entanglement and χ * found in the main text.
As a brief aside, we note an important point, arising from the fact that in the main text we have used either F d as the fidelity for experiment, or the linear cross-entropy, F XEB , as the fidelity for the literature examples. However, F d and F XEB are both lowered by measurement error, which strictly speaking does not affect the mixed state entanglement. However, both F d and F XEB are expected to decrease due to measurement errors, and so this only serves to make our Ext. Data Fig. 15 | Estimation of log negativity from fidelity. a. We demonstrate the validity of the lower bound Eq. (1) of the main text, by plotting the difference EN (ρ) − EP , with |ψ⟩ the pure state with highest fidelity toρ. Here, we generate 1000 random two-qubit mixed states as uniform incoherent mixtures of two Haar random states. The difference is always above zero, indicating the validity of the lower bound. b. Dependence between log negativity EN (ρ) and fidelity F for globally depolarized Haar random states, Eq. (31). For small values of F , the negativity EN (ρ) (black dashed) deviates from the lower bound EP (red), illustrated here for the half-chain negativity of six-qubit depolarized Haar random states.
Remarkably, for such states, EN (ρ) is uniquely determined by the fidelity F and Hilbert space dimension D. We find an analytic series expression for this dependence, Eq. (33), with first-order correction illustrated in blue. We also illustrate general lower and upper bounds for the log negativity, Eq. (38), in light blue. Inset: The lower bound (red) improves exponentially as a function of system size, quantified here as the minimum fidelity for which the lower bound is 90% of the actual value. lower bound less tight. In principle the mixed state entanglement-proxy could be measurement-corrected to account for this effect, but we choose to include measurement errors as a more general quality-factor of the experiments.

Negativity lower bounds and estimators
The validity of our entanglement-proxy has been shown in the past 38 for the case of isotropic states; here we extend this result and validate our entanglement-proxy when the target pure state, |ψ⟩, is an eigenstate of the mixed state,ρ (Theorem 2). We then improve our result for the case of depolarized Haar-random pure states (Lemma 1), and then show more general lower bounds which allow us to argue our entanglement-proxy should tighten exponentially with system size (Lemma 2). Numerical support of these analytical claims is shown in Ext. Data Fig. 15.
We begin with the case of |ψ⟩ being an eigenstate ofρ Theorem 2. For any mixed stateρ and pure state |ψ⟩ which is an eigenstate ofρ, the logarithmic negativities E N (ρ) and E N (|ψ⟩) are related by where F ≡ ⟨ψ|ρ|ψ⟩ is their fidelity.
Proof. The proof of this Theorem follows in three steps. It is equivalent to derive the bound First, we express the stateρ for some positive semi-definite stateρ ⊥ , which is guaranteed because |ψ⟩ is an eigenstate of ρ.
Next, we use the variational definition of the trace-norm as a maximum over projectorŝ P , i.e. operators with eigenvalues either 0 or 1.
This maximum is attained whenP is the projector onto the positive-eigenvalue subspace of ρ T A . This variational definition gives the lower bound: whereP ψ is the projector onto the positive-eigenvalue subspace of |ψ⟩⟨ψ| T A . Finally, we utilize an explicit construction ofP ψ to show that tr(P ψρ T A ⊥ ) ≥ 1/2, giving the desired bound Eq. (24). This step is the most involved, which we show below.
The eigenvalues λ and eigenvectors |v⟩ of |ψ⟩⟨ψ| T A can be expressed in terms of the Schmidt values s j and Schmidt basis |a j ⟩, |b j ⟩ of |ψ⟩, with respect to the same bipartition 38 . With |ψ⟩ = j s j |a j ⟩|b j ⟩, the eigenvalues λ are s 2 j or ±s i s j (for i ̸ = j), and their corresponding eigenvectors are respectively |v jj ⟩ ≡ |a j ⟩|b j ⟩ and |v ± ij ⟩ ≡ 1 2 (|a i ⟩|b j ⟩ ± |a j ⟩|b i ⟩). ThenP ψ has explicit expression: Next, we use the relation tr(M T AN T A ) = tr(MN ), valid for all operatorsM andN defined on AB. Thus tr(P ψρ T A ⊥ ) = tr(P T A ψρ ⊥ ).P T A ψ can be expressed aŝ where |Φ⟩ ≡ j |a j ⟩|b j ⟩ is the unnormalized maximally entangled state in the Schmidt bases {|a j ⟩} and {|b j ⟩}. Sinceρ ⊥ is a positive semidefinite state, we have This proves the desired bound Eq. (24).
This Theorem validates the entanglement proxy in the simplest case. We next move to improve it, first specifically in the case where |ψ⟩ is taken to be a Haar random state, |ψ Haar ⟩. Lemma 1. For depolarized Haar random states, i.e. states of the form where F ′ = F − (1 − F )/(D − 1) (chosen so the fidelity ⟨ψ Haar |ρ|ψ Haar ⟩ equals F ), the mixed state entanglement has the form where the more general function f can be expressed to low order as where D A is the Hilbert space dimension of subsystem A, here assumed to be for a half-chain bipartitions.
Proof. Here, we restrict our discussion to equal bipartitions, although our results can be generalized to any bipartition. First, with Haar random pure states, with high probability we have 34,35 This expression follows from random matrix theory (RMT), which states that the Schmidt values s j of a Haar random state follow the quarter-circle law, while s 2 j follow the Marchenko-Pastur distribution.
This enables us to compute the distribution of eigenvalues λ of |ψ⟩⟨ψ| T A . The eigenvalues of the form λ = s 2 j contribute a constant amount to ∥ρ T A ∥ 1 . The dominant (exponentially larger) contribution to the log negativity comes from the eigenvalues of the form λ = ±s i s j which follow the distribution 39 whereλ ≡ D A λ, P (λ) is supported inλ ∈ [−4, 4], and K and E are complete elliptic integrals. The negativity of the depolarized stateρ can be calculated using this distribution: the eigenvalues λ ′ ofρ T A are given by λ ′ = F ′ λ + (1 − F ′ )/D. Therefore, the trace norm can be calculated from P (λ) as The first term evaluates to 64 9π 2 F ′ D A , while the second term can be systematically computed by using the expansion P (λ) ≈ − 2 π 2 ln |λ| + 8 ln 2−4 π 2 , valid for smallλ, giving Eq. (33).
Importantly, if more terms are added in the expansion of Eq. (33) Lemma 1 is an exact estimator of the mixed state entanglement. Further, if D A is large, as is the case for large system sizes, then the original simple proxy is restored as f ∼ F ′ ∼ F . We show the efficacy of this estimator, with f(F ) evaluated to first order, in Ext. Data Fig. 15b.
Finally we prove an even more general set of upper and lower bounds. While not amenable to be computed experimentally, they allow us to argue our entanglement-proxy likely becomes exponentially tight as the system size increases.
Lemma 2. For any state pure state |ψ⟩ and mixed stateρ with decomposition whereρ ⊥ is the (not necessarily positive semi-definite) remainder of the noisy state, we can bound the mixed state log negativity by Proof. Unlike above,ρ ⊥ does not necessarily commute with |ψ⟩⟨ψ|.
To bound the negativity for such a state, we simply apply the triangle inequality on the trace norm ∥ · ∥ 1 , yielding Eqs. (38).
This bound is particularly useful in settings such as ours. Here, we can expect the noisy state to be weakly entangled, i.e. ∥ρ T A ⊥ ∥ 1 = O(1), while the pure state is highly entangled: ∥|ψ⟩⟨ψ| T A ∥ 1 = O(exp(N )), where N is the system size. Then the above bounds imply that our entanglement-proxy is in fact exponentially close to the true mixed state negativity: E N (ρ) = E P + O(exp(−N )), as observed in the inset of Ext. Data Fig. 15b.

Robustness of our bounds to realisitic experimental scenarios
In the above section, we showed several proofs which let us argue our entanglement-proxy, E P is a tight lower bound for the mixed state entanglement under specific assumptions. Here, we relax these assumptions, but show that under realistic noise sources, maximum possible violations of our bound are at most O(1).
Error model simulations-To start, we reiterate and emphasize the results of Fig. 4a of the main text. There, we showed numerically that the entanglement-proxy appears to be a genuine lower bound of the mixed state entanglement either for a random unitary circuit (RUC) undergoing incoherent local noise, or for our experiment undergoing the full set of error sources (Ext. Data Fig. 28). Now, our aim is to provide further analytical and numerical evidence for the robustness of the entanglement-proxy under: 1) global coherent errors, or 2) incoherent local errors.
Global coherent errors-In this section, we consider potential violations of our bound due to global coherent errors. To study this, we first prove a Theorem for a more general bound of the mixed state entanglement, based on studying the norm of the commutator between the mixed stateρ and pure state |ψ⟩. This bound is equivalent to our original entanglement-proxy up to a correction term, which we then show analytically is small if the normalized entanglement of the target state is large. Finally, we present numerical support of this discussion.
Proof. We writeρ in a suitable basis as: where the first row/column denotes the basis vector |ψ⟩, and the second denotes the subspace perpendicular to |ψ⟩. With P ⊥ the projector onto this subspace, the matrixρ ′ ≡ P ⊥ρ P ⊥ is positive semi-definite. The off-diagonal element B ≡ P ⊥ρ |ψ⟩ is a (d − 1) × 1 vector, which we take to be an unnormalized vector |B⟩ in the full Hilbert space (with first entry equal to zero). Using the variational definition of the trace-norm used in the proof of Theorem 2, we write where |Φ⟩ is the unnormalized maximally entangled state in the full Hilbert space, in the Schmidt bases {|a j ⟩} and {|b j ⟩} of |ψ⟩. We then bound the terms of this expression. First, the Frobenius norm of the commutator C = ∥[ρ, |ψ⟩⟨ψ|]∥ F has the expression Therefore, ⟨B|B⟩ ≤ C 2 /2. We then use the Schur complement condition for positive semidefinite matrices. With our decomposition, the fact thatρ is positive semi-definite implies that the Schur complementρ ′ − F −1 |B⟩⟨B| is positive semi-definite. We conclude that Using the fact that, |B⟩ ⊥ |ψ⟩ and that |Φ⟩ is an unnormalized state with norm D A and overlap |⟨Φ|ψ⟩| 2 ≡ αD A , then accounting for the norms of |Φ⟩ and |B⟩, the maxi- Combining these ingredients with Eq. (42), we then have Taking the logarithm of both sides, we arrive at Theorem 3.
With this Theorem 3 in hand, to validate our entanglement proxy we need only bound the size of the quantity C/F , and show it is O(1). Here, we show this is the case under the physically realistic scenario of the system undergoing coherent global errors. which are a dominant error source in our experiment (Ext. Data Fig. 28). First, we prove a useful Lemma rewriting the coefficient to the correction term, C/F , in terms of only F . Lemma 3. For normally distributed global coherent quasistatic parameter errors, the coefficient to the correction term in Theorem 3, C/F , can be written entirely in terms of the fidelity as Proof. To begin, consider evolution under a fixed parameter value, which results in the pure state |Ψ(t, θ)⟩ ≡ e −i(Ĥ0+θV )t |Ψ 0 ⟩. In Theorem 1, we found a Gaussian fidelity dependence between states corresponding to any two parameter values θ 1 and θ 2 : For some constant λ [Eq. (6)]. Assuming a normal distribution P (θ) of parameter values θ with mean θ 0 and standard deviation σ, we obtain the mixed stateρ(t) = dθP (θ)|Ψ(t, θ)⟩⟨Ψ(t, θ)|.
In Eq. (20), we computed the fidelity of ρ(t) to the pure state |Ψ(t, θ 0 )⟩ and found it to be The Frobenius norm of the commutator, C, then has expression In this computation, the unknown phases of the inner products ⟨Ψ(t, θ i )|Ψ(t, θ j )⟩ cancel, allowing the use of Theorem 1. Rearranging, it is straightforward to then arrive at Eq. (48).
We can then combine Theorem 3 and Lemma 3, to arrive at the simplified result which shows that our entanglement-proxy is robust in the presence of quasistatic parameter fluctuations, up to a O(1) deviation which decreases with the normalized entanglement α ≡ 2 E N (|ψ⟩) /D A . More concretely, let us assume a normalized entanglement of α = 0.8 (as is approximately the expectation at N = 60 and t = 14.3 cycles, see Ext. Data Fig. 14). In that case, Eq. (55) reduces to In other words, we expect the maximum possible violation of our bound is at most around half an ebit of entanglement. We emphasize that this does not mean the bound is in fact violated by this amount -only that it potentially could be.
In order to validate that the coefficient to the correction term is small, in Ext. Data Fig. 16a we show C/(F √ 2) as a function of F for various system sizes, calculated with our full Rydberg error model (which includes both Markovian and non-Markovian terms, and both global and local errors). Despite the error model including more noise terms than just global parameter fluctuations, we see curves as a function of N increase but appear to converge near the prediction of Eq. (48). Then, assuming the prediction of Eq. (48), we show the maximum possible violation of the lower bound provided by E P (i.e. the correction term in Eq. (54)) as a function of normalized entanglement and fidelity. We further include estimated experimental values up to N = 60 (Ext. Data Fig. 16b). We find the experimental values appears to saturate around a maximum possible violation of ≈0.6 ebits.
Incoherent local errors-Next, we turn to prove our entanglement proxy is a valid lower bound for the case of random unitary circuits (RUCs) undergoing realistic noise, namely local depolarization error. To do so, we will introduce a new theorem, which bounds the violation in our entanglement proxy by a value related to the fidelity overlap with eigenstates ofρ, and then will show that this correction term is small.
We proceed first by proving a more general Lemma which will assist us; the proof has similarities to the proof of Theorem 3, but we will present it again here for completeness.
where F ≡ ⟨ψ|ρ|ψ⟩ is their fidelity, α ≡ 2 E N (|ψ⟩) /D A ∈ [0, 1] is a normalized entanglement of |ψ⟩, and D A is the Hilbert space dimension of the subsystem A (assumed to be smaller than its complement).
Proof. In deriving Theorem 2, we utilized the fact that when |ψ⟩ is an eigenstate ofρ, the remainder stateρ ⊥ is positive semi-definite, which allowed us to bound ⟨Φ|ρ ⊥ |Φ⟩ ≥ 0. To Ext. Data Fig. 16 | Bounding the entanglement-proxy in the presence of errors. a. We consider full error model simulations of the Rydberg dynamics studied in the main text, i.e. simultaneously under both global and local errors which each can be either Markovian or non-Markovian. We study the ratio of the commutator norm to the fidelity versus fidelity, truncating to only show data for which F > 1/DA, where DA is the half chain Hilbert space dimension. The data appears to converge as a function of system size, approaching the theory prediction obtained for the case of purely global, coherent Hamiltonian fluctuations, see Eq. (54). b. Here we plot the expected maximum violation of our entanglement-proxy lower bound as a function of the fidelity and the normalized entanglement; more specifically, we plot the correction term from Eq. (54). We find that in the reasonable parameter regime shown, the violation is always ≲ 1 ebit. From analysis shown in Ext. Data Fig. 14, we plot the expected experimental values from N = 12 to 60, and find the maximum violation appears to saturate around ≈0.6 ebits. Together these results imply that our entanglement-proxy is robust up to O(1) corrections in the face of realistic noise.
show Lemma 4, we use constraints that arise from the fact thatρ is positive semidefinite. We writeρ in a suitable basis as:ρ where the first row/column denotes the basis vector |ψ⟩, and the second denotes the subspace perpendicular to |ψ⟩. With P ⊥ the projector onto this subspace, the matrixρ ′ ≡ P ⊥ρ P ⊥ is positive semi-definite. The off-diagonal element B ≡ P ⊥ρ |ψ⟩ is a (d − 1) × 1 vector, which we take to be an unnormalized vector |B⟩ in the full Hilbert space (with first entry equal to zero). Using Eq. (27), the negativity can be obtained from Only the last two terms above may be negative. We bound their magnitude using the Schur complement condition for positive semi-definite matrices. With our decomposition, the fact that ρ is positive semi-definite implies that the Schur complementρ ′ −F −1 |B⟩⟨B| is positive semi-definite. We conclude that which allows us to further conclude that We then use the fact that |Φ⟩/ √ D A is a normalized state. Since |⟨Φ|ψ⟩| 2 = αD A = ∥|ψ⟩⟨ψ| T A ∥ 1 , trρ ′ = 1 − F and ⟨ψ|ρ ′ |ψ⟩ = 0, we conclude that Then when F ≥ 1 − α, we obtain the claimed Eq. (57).
We then simply generalize Lemma 4 to the case where |ψ⟩ has fidelity overlap with a particular eigenstate ofρ Theorem 4 (Eigenstate-fidelity based bound). For any mixed stateρ and pure state |ψ⟩ which has a fidelity f ≡ |⟨ψ|λ⟩| 2 with an eigenstate |λ⟩ ofρ, the logarithmic negativities E N (ρ) and E N (|ψ⟩) are related by where F ≡ ⟨ψ|ρ|ψ⟩ is the fidelity, α ≡ 2 E N (|ψ⟩) /D A ∈ [0, 1] is a normalized entanglement of |ψ⟩, and D A is the Hilbert space dimension of the subsystem A (assumed to be smaller than its complement).
Proof. We first writeρ = F t |v⟩⟨v|+(1−F t )ρ ⊥ , where |v⟩ is an eigenstate ofρ with eigenvalue F t , andρ ⊥ is the state projected onto the orthogonal subspace to |v⟩. We write the state |ψ⟩ that we benchmark againstρ in terms of |v⟩ as |ψ⟩ = Then writinĝ ρ in the same basis as Eq. (58) giveŝ As with the above, the only negative contribution to ⟨Φ|ρ|Φ⟩ arises from the off-diagonal terms of Eq. (64). This allows us to improve our bound to the claimed Theorem 4.
We now show that under incoherent local errors, the correction term presented in Theorem 4 is small. We study the case of random unitary circuits (RUCs) in the presence of local depolarization in order to treat the problem analytically and make connection to the digital quantum circuits we compare against in the main text.
We study the fidelity and purity and bound the fidelity f ≡ |⟨ψ(t)|v⟩| 2 between the ideal evolved state |ψ(t)⟩ with the largest eigenstate |v⟩ of the noisy evolved stateρ(t), by computing both the purity and fidelity of |ψ(t)⟩ andρ(t). The relationship between these quantities in fact bounds the fidelity of |ψ(t)⟩ to the largest eigenstate ofρ, allowing us to apply Theorem 4 to evaluate the robustness of our negativity proxy.
Physically, our result means that the stateρ is a mixture of a particular pure state |v 1 ⟩ with large population F = λ 1 and many other orthogonal states with smaller populations. For a circuit with spacetime volume V = N t, the largest eigenvalue decays exponentially λ 1 ≈ (1 − p) V , yet remains larger than all other eigenvalues. The other states correspond to trajectories with errors. The average number of errors is k = pV , and there are roughly V ! k!(V −k)! different spacetime locations of k errors. When the number of trajectories are much less than the Hilbert space dimension D, we expect the resultant wavefunctions resultant to be orthogonal to one another. This is true if the k errors are sparsely spread over the system in spacetime (such that their separations are sufficiently large) and the effects of errors are scrambled quickly. When p is sufficiently small and when the quantum dynamics is sufficiently chaotic, these assumptions are expected to hold.
We proceed to justify this intuition by averaging over random circuits. At every time step, we apply random unitary two-qubit gates in a brickwork circuit geometry, and subject it to local depolarization, which locally depolarizes the state ρ at every site with probability p, equivalent to applying the local channel ρ → (1 − p)ρ + p(I j /d) ⊗ tr j (ρ) independently to every site j, where d = 2 is the local Hilbert space dimension (Ext. Data Fig. 17a). For simplicity, we also take the initial state |ψ 0 ⟩ to be the product state |0⟩ ⊗N . After averaging over RUCs, the purity P ≡ tr[ρ(t) 2 ] and fidelity F ≡ ⟨ψ(t)|ρ(t)|ψ(t)⟩ can be expressed as the partition functions of two different Ising ferromagnets, with the Ising spins σ = ±1 respectively denoting the local identity and swap permutations (Ext. Data Fig. 17b).
The derivation of the mapping between RUCs and classical spin models is described in detail in many works including Refs. 40,41 . Here we simply quote the result: where the summation is over all possible Ising configurations of classical spin variables σ = ±1 and the product is over all downward-facing triangles in the resulting triangular lattice (Ext. Data Fig. 17c). W P/F (σ 1 , σ 2 , σ 3 ; p) act as local "Boltzmann weights," parameterized by the error probability p. Their values are enumerated in Ext. Data σ1, σ2, σ2 WP (σ1, σ2, σ3; p) WF (σ1, σ2, σ3; p) Ext. Data Table II | Ising Boltzmann weights of local downward-facing triangle configurations for the effective statistical mechanical models. Their partition functions equal the purity P [Eq. (65)] and fidelity F [Eq. (66)] of random unitary circuits (RUCs) with local depolarization. The geometric arrangement of σ1, σ2, σ3 is indicated in Fig. 17c. In this work, we consider RUCs of qubits, with local Hilbert space dimension d = 2, and only consider the O(1) and O(p) terms. In this table, however, we state the full expressions in terms of general d for future reference.
Ext. Data Fig. 17 | Purity and fidelity in random unitary circuits (RUCs) with local depolarization. a. We consider a brickwork RUC, where local depolarization with strength p is applied after every layer and at every site (red dots). b. Typical values of purity or fidelity can be computed by random circuit averages. In both quantities, swap boundary conditions are applied on four copies of the circuit, representing two copies of the states. For purity, dephasing is applied to both copies (depicted here), while for fidelity, dephasing is only applied to one copy. c. After averaging over random circuits, the purity and fidelity can be expressed as the partition functions of effective statistical mechanical Ising models, with three-spin Boltzmann weights on every downward-facing triangle (listed in Table II) When p = 0, notice that W P/F (−, −, +) = 0. Combined with the fact that all top boundary spins have σ = −, the only non-vanishing contribution to P or F is the one where all σ = − (Ext. Data Fig. 17d). In this case, P = ▽ W P (−, −, −) = 1 and similarly F = 1. This is expected since in the absence of errors, ρ is a pure state with perfect fidelity. Now consider small, nonzero p. P and F have contributions from configurations where some of the spin variables are flipped to the '+' state, with weight O(p) (Ext. Data Fig. 17e). We call such spin flips "bubbles," because such bubbles are penalized from growing by an effective line-tension term, which weights larger bubbles by a factor of (2/5) l , where l is the length of the boundary of the bubble. These small bubbles will effectively have renormalized weights, but will remain O(p) and whose precise values are not consequential to our conclusions. Therefore, in this regime, the Ext. Data Fig. 18 | Bounding the entanglement-proxy in the presence of errors. We consider the case of incoherent local errors, specifically for random unitary circuits undergoing local depolarization. The overlap, f , between the target pure state and the highest fidelity state toρ remains near unity. Further, 1 − f is linearly bounded by the quantity log(F ) 2 /N t with a small proportionality constant 0.03 (inset), consistent with our RUC analysis leading up to Eq. (72), and giving the overall bound Eq. (73).
number of such spin flips will be rare and sparse. To this end, we approximate P and F by simply summing over the contributions from sparse, disjoint bubbles. However, this approximation breaks down at larger values of p, discussed below. Within this approximation, where V is the spacetime volume (number of Ising spins) and A F and A P are the renormalized relative weights of a single bubble. We are interested in the ratio P/F 2 ≈ e (2B F −B P )V ≥ 1. When p ≪ 1, Because each bubble already has O(p) weight, the leading order contribution to A P − 2A F is determined by that of W P (−, −, +) and W F (−, −, +), which have ratio precisely 2. Therefore, we conclude that all O(p) terms vanish and In order to translate this into a bound on the eigenstate-fidelity f , we denote the fidelities between |ψ⟩ and the eigenstates |v i ⟩ ofρ as µ i ≡ |⟨v i |ψ⟩| 2 , with f ≡ µ 1 . With λ i the eigenvalues ofρ, we have F = i λ i µ i and P = i λ 2 i . The Cauchy-Schwarz inequality implies that Using Eq. (70), we obtain Using Theorem 4, we obtain for another O(1) constant C 2 , indicating that the entanglement-proxy E P is valid up to a small correction which is negligible as long as F ≫ exp(− √ V ) = exp(− √ N t) ∼ exp(−N ), i.e. F is not trivially small. We give numerical support for this in Ext. Data Fig. 18.
Finally, we remark that additional configurations may come to dominate the partition function sums Eqs. (65), (66) when the circuit is sufficiently deep. This is easiest to understand in the statistical mechanical picture. For both the purity and fidelity, the bulk has a small preference for the '+' state, because depolarization breaks the Ising symmetry and favors the identity element. Meanwhile, the boundary condition (associated with evaluating purity or fidelity) pins the boundary to the '-' state, acting as a strong boundary field. When the circuit is shallow, the boundary term dominates, giving rise to the global '-' domain assumed above. However, when the circuit is sufficiently deep, it becomes energetically favorable for a large domain wall to form, dividing the spins into a '-' domain near the boundary, and a '+' domain in the bulk (Ext. Data Fig. 17f), at the expense of a domain wall with free-energy scaling with the spatial size. In our above calculations, we neglect this phenomenon because it is only relevant when the fidelity is trivially small, O(D −1 ), where D is the total Hilbert space dimension. In this regime, the purity and fidelity decouple from the bulk dynamics and assume this constant value which arises from the boundary Ising '-' domain.

Negativity of truncated MPS states
Finally, we investigate an interesting observation in the main text, that at N = 60 the entanglement-proxy found for the experiment in Fig. 4 is ∼3 ebits higher than log 2 ( χ * ) in Fig. 5. In other words, the fidelity-equivalent MPS simulation is less entangled than the experiment. Here we study the fidelity and entanglement (negativity) of truncated pure state MPS representations to explain this behavior.
For a theoretical treatment, we consider a simplified setup where we start with an ideal pure state and its equal bipartite Schmidt decomposition |ψ⟩ = D A j=1 s j |a j ⟩|b j ⟩ where D A is the Hilbert space dimension of the half system and the Schmidt values s j are assumed to be in descending order. Then the Schmidt rank is truncated to a maximum bond dimension χ ≤ D A . Denote the truncated state and its Schmidt decomposition as |ψ It is straightforward to see that To derive the entanglement negativity, the eigenvalues λ of the partial transposed state |ψ⟩⟨ψ| T A is given by s 2 j or ±s i s j as mentioned above. This gives and therefore For generic states with unknown s j , it is hard to draw any meaningful relationship between fidelity and entanglement negativity. Therefore, we consider two cases: maximally entangled states and Haar random states.
Maximally entangled states-For a maximally entangled state, the Schmidt values are given by s j = 1/ √ D A . It follows directly that with log 2 D A the log negativity of the ideal maximally entangled pure state. Thus, the log negativity for a fidelityequivalent truncated maximally entangled state is equal to that of an isotropic mixed state.
Haar random states-The case for Haar random states is more interesting, as such states arise from random unitary circuit evolution, and are closer to the output of our Rydberg experiment as well. As mentioned previously, the Schmidt values of Haar random states follow the quarter-circle law as wheres ≡ √ D A s. Setting a maximum Schmidt rank can be translated to setting a minimum cutoff s min (and the corresponding 0 ≤s min ≤ 2) of the Schmidt values (keeping only Schmidt values above s min ), where the relation between χ ands min is given by Depending on the cutoff, Therefore, the fidelity and log negativity can be written in parametric forms as It is easy to verify that whens min = 0, the equations reduce to the case of ideal Haar random states, with χ = D A , F = 1, and E = log 2 D A + log 2 64/9π 2 , where 64/9π 2 ≈ 0.72 is the Page correction 1 . Importantly, we can then compare this prediction against the prediction for a depolarized Haar random state, where we use the first-order approximation from Eq. (33), in other words taking D A → ∞. At a given fidelity for each, the difference in estimated entanglement, δ, is given as In the last line we have performed series expansions of both F and δ in terms ofs min to arrive at a non-parametric form. In Ext. Data Fig. 19 we plot δ (as well as the non-parametric approximation) as a function of F , which is always positive, indicating that for all F , the MPS representation of a truncated Haar random state is less entangled (by O(1) ebits) than a equivalent-fidelity depolarized Haar random mixed state. Given that the experiment does not produce a perfect Haar random state, as well as the fact that a real MPS simulation performs many truncations between potentially correlated timesteps, these results are in good agreement with the experimentally found discrepancy of ∼3 ebits.
Ext. Data Fig. 19 | Entanglement difference between truncated and depolarized Haar random states. At fixed fidelity, MPS truncation and depolarization affect the entanglement content of a pure state differently. For a Haar random state, we can solve this relation analytically, finding the depolarized state has a higher entanglement (log negativity) content than the MPS-truncated stated at equal fidelity.

Description of the Caltech computing cluster
This work extensively utilizes the Caltech Resnick High Performance Computing (HPC) Center, running CentOS Linux 7. In our work, each simulation is assigned to a single node, using 16 cores of Intel Cascade Lake CPU (at 2.2 or 2.4 GHz), with a total node memory of 386 GB (at 2933 MT/s). We note that many classical simulation times are reported as core-time, which we simplify as the wall time multiplied by the number of cores.
In terms of raw wall time, the largest simulation we performed (for N = 60 and χ = 3072) took ∼8.3 days (∼11.1 days if including the additional time points past the last experiment time in order to calculate p avg ). Representative times to simulate up to the last experimental time point are shown as a function of bond dimension in Ext. Data Fig. 20; extrapolating the expected O( χ 3 ) scaling we predict simulation for N = 60 and χ * = 3400 would entail a simulation time of 11.3 wall-days (180 core-days, as reported in the main text). In total, we generated roughly ∼100 TB of data in order to create and store the MPS tensors at various bond dimensions used in this work.
Ext. Data Fig. 20 | Simulation time as a function of bond dimension. After an initial overhead plateau, the Lightcone-MPS simulation time (measured in raw wall time on the Caltech central cluster) to reach the last experimental time increases as a function of N , approaching the expected asymptotic χ 3 scaling.

Exact simulation with a Schrieffer-Wolff transformation
We first consider the possibility of exact brute-force simulation of our system. The current state-of-the-art in exact RUC simulation is 45 qubits 42 , while the state-of-the-art for continuous Hamiltonian time evolution is 38 qubits 43 . While clearly neither of these approaches would work natively for the largest experimental system sizes of N = 60, it is possible they could become applicable through a Schrieffer-Wolff transformation by rewriting the Rydberg Hamiltonian only in the blockaded subspace, applying appropriate level shifts to mimic the effect of higher blockade sectors, as described in the supplement of Ref. 44 . Because of the Rydberg blockade constraint, the effective system size is given byÑ = log 2 (Fib(N + 2)) ≈ 0.7N , where Fib is the Fibonacci function. For the 60-atom system,Ñ ∼42 qubits, so such brute-force techniques could allow for potentially pseudo-exact simulation.
However, here we discount such approaches for multiple reasons. First they both rely on specialized hardware -a supercomputing system 42 with 100s of TB of RAM, or tensor processing units 43 which are not widely available. Further, even with this hardware it is not clear that our system would be easily simulable, because of difficulties associated with treating the blockaded subspace. In particular, applying local gate operations becomes more complicated to implement efficiently, because the blockaded subspace is not factorizable into the product of local bases.
Ext. Data Fig. 21 | Using an effective Hamiltonian simulation. a. The Rydberg blockade constraint stratifies the energy levels of the Rydberg Hamiltonian, enabling approximating the dynamics in the full Hilbert space by an effective set of dynamics in the lowest energy Hilbert space (relative size of energy sectors are not to scale). b. While a lowest-order approximation (blue) quickly causes the fidelity of the reduced-Hilbert-space simulation to drop compared to simulation in the full space, adding the second-order term from a Schrieffer-Wolff Hamiltonian transformation 44 (orange) improves the fidelity. c. Even with the second-order correction, the simulation fidelity decreases as a function of system size. It is possible that numerically solving for higher order terms in the Schrieffer-Wolff transformation will improve this fidelity, but it likely can never achieve unity given the finite blockade violation probability (Ext. Data Fig. 5d).
Finally, applying this transformation is still an approximation to the true dynamics, for which it is not clear if the resulting fidelities will actually be competitive at these large system sizes and late evolution times. In Ext. Data Fig. 21 we test the accuracy of the Schrieffer-Wolff transformation at second order for system sizes up to N = 20 (Ñ ∼ 14), where we see the fidelity falls by the latest experimental time. Importantly, the fidelity decay increases with system size. A simple linear extrapolation implies a simulation accuracy of ∼0.48 for N = 60. Further, employing this approach will always have some fundamental inaccuracy as the blockade constraint is imperfect, and we expect the ideal target state to always have some finite population in the blockade-violating sectors, a symptom which grows stronger with increasing system size and evolution time (Ext. Data Fig. 5d). However, we do not discount the possibility that by adding successively higher-order terms in the Schrieffer-Wolff transformation, the accuracy of this method could be at least improved.
We strongly emphasize that storing even a single copy of the wavefunction atÑ ∼42 still requires ∼65 TB of memory, and thatÑ ∼42 exceeds the current state-of-the-art for Hamiltonian simulation 43 , N = 38. Still, we welcome collaboration on studying the boundaries of nearly-exact classical simulation with the most advanced hardware.

Standard Trotterization-based TEBD-MPS
A standard approach for simulating large one-dimensional (1D) quantum systems is to use a matrix product state (MPS) representation 45 . In this representation, the quantum state is decomposed into a tensor network in the form of a 1D chain through repeated Schmidt decompositions. The key insight of such schemes is that if particles are weakly entangled at any bipartition of the system, then the on-site tensors will be low rank, and so can be efficiently truncated to a smaller size while preserving most of the information. The size to which the tensor is truncated is user-controllable, and is known as the bond dimension, χ . Thus, larger bond dimensions equate to truncating less information when converting a state into its MPS representation. For a more general review of MPS, see Ref. 20 .
An important aspect of the tensor decomposition of the MPS is that unitary evolution can be efficiently applied to just part of the tensor network. This enables the standard time-evolving block decimation (TEBD) algorithm 46 , in which Hamiltonian evolution is discretized (or Trotterized ) using the Suzuki-Trotter formula For a many-body Hamiltonian with (geometrically) local interactionsĤ i , each term is applied sequentially onto the MPS. With suitable implementation, Trotterization results in a second-order decomposition of the time evolution operator. Thus, the basic algorithmic loop of the TEBD approach is: 1) perform Schmidt decompositions to write the state as a tensor network, 2) truncate the tensor ranks, 3) Trotterize the Hamiltonian, 4) and apply it to the relevant tensors, after which we repeat step 1. The two fundamental sources of error in this approach come from steps 2 (the truncation) and 3 (the Trotterization) (Ext. Data Fig. 22).
The error of the Trotterization decomposition depends on the product of the commutator of the terms in the Hamiltonian and the time step dt (here dt is defined as a full cycle of application of local evolution from the leftmost site to the rightmost site and then back to the leftmost site). For the Rydberg system, the commutator is on the order of the nearest neighbor interaction V nn . Therefore, even with the second-order Suzuki-Trotter decomposition, the perturbative expansion converges only for V nn · dt ≲ 1. Due to the Rydberg blockade, the nearest neighbor interaction for Rydberg Hamiltonian is strong, putting a significant constraint on the maximum dt possible for the simulation. In practice, with a second-order decomposition of the time-evolution operator, we found that a maximum of dt ≈ 1/100 Rabi cycle is necessary to have a 0.5% Trotterization error.
Importantly, however, for too small of a timestep, the TEBD algorithm will perform many more truncations, each time lowering the fidelity. This leads to a fundamental tension, where an optimal dt must be chosen which balances these competing effects of truncation and Trotterization errors (Ext. Data Fig. 22d).
Moreover, a truncation of the long-range interaction must be chosen to efficiently use the TEBD algorithm. In practice, we found that keeping an interaction length of 5 results in the desired error of less than 0.5% (Ext. Data  Fig. 22c). This results in a local evolution involving at most 6 atoms.
We emphasize that we do not believe this performance could be strongly improved through the use of a Schrieffer-Wolff transformation as described above. This is because in performing the Schmidt decompositions, MPS-based methods already find the optimal working basis, which is nearly entirely within the blockaded Hilbert space (Ext. Data Fig. 5d). In fact, it is likely such a transformation will decrease the MPS performance, because of the added complexity in local gate application due to the non-locally-factorizable structure of the blockaded subspace.

The Lightcone-MPS algorithm
One of the challenges of the Trotterization TEBD algorithm is the small dt required to faithfully simulate the quantum dynamics. However, this difficulty can be circumvented by efficiently utilizing the lightcone dynamics of quantum systems, allowing for simulation with much larger dt, and thus higher fidelity as truncation errors are minimized. In particular, in Ref. 47 it was shown that a global time evolution operator could be decomposed into Ext. Data Fig. 22 | Comparing Lightcone-MPS and TEBD algorithms. a. Illustration of the Lightcone-MPS Hamiltonian decomposition over a single time step dt. The long range Rydberg interaction is truncated after a cutoff distance of L. b. Illustration of the MPS update procedure. c. We choose the Lightcone-MPS block size to be L = 6, which we estimate as introducing an error of < 0.5% at N = 60. d, e. For MPS algorithms there exists an optimal simulation timestep; for too large a timestep, the Hamiltonian Trotterization introduces simulation errors. For too small a timestep, the MPS is contracted and truncated too often, leading to loss of accuracy. For the TEBD algorithm, the maximum timestep is set by the inverse blockade energy, but the Lightcone-MPS algorithm can be used accurately with much larger timesteps. This ultimately means that at equal bond dimension, χ , the Lightcone-MPS algorithm can be optimized to higher fidelities than the TEBD. small blocks of local evolution by combining forward and backward time evolution. In this proposed non-perturbative decomposition, the time step is only limited by the Lieb-Robinson velocity v and the block size L of the local time evolution, not the blockade interaction strength.
Here, we propose a modified version of the Hamiltonian decomposition in Ref. 47 , which is especially amenable to efficient integration with MPS state representations, and which we call Lightcone-MPS. This modification allows for a smaller block size L with the same cutoff of long-range interactions. This algorithm consists of alternating forwardtime and backward-time evolution, where the forward-time evolution has a block size L, and the backward evolution has a block size of L − 1. Each block performs the time evolution for all Hamiltonian terms strictly inside the block. Crucially, in this implementation, dt is no longer limited by blockade interactions at the edges of the local evolution blocks (Ext. Data Fig. 22d).
Here, the order of applying the time evolution operator is reversed in every other layer (i.e. left to right for odd layers and right to left for even layers). In practice, each forward evolution can be combined with the backward evolution into a single block, reducing the simulation cost. The simulation cost can be further reduced by always keeping a tensor with L − 1 atoms fused together, without performing addition decompositions. (Ext. Data Fig. 22a).
The Lightcone-MPS algorithm allows for a maximum dt∼L/v. Based on empirical results, we find in practice that dt ≈ 0.5 Rabi cycle is sufficient for a 0.5% decomposition error with a block size of 6 (long-range interaction cutoff at 5). Again, the dt is defined as a full cycle of application of local evolution from the left-most site to the right-most site and then back to the left-most site.
Crucially, there is still a tension on dt from the competing effects of truncation and Trotterization errors, but for the Lightcone-MPS algorithm the Trotterization errors are minimized up to much larger dt. This means at equal bond dimension, if selecting the optimal dt, the Lightcone-MPS can reach higher fidelities than TEBD (Ext. Data  Fig. 22de). Equivalently, to reach the same fidelity necessitates the TEBD to use a larger bond dimension, as is visible in Fig. 5 of the main text.
Additionally, even if a larger bond dimension is used for the TEBD (where truncation error is small), the optimal dt in TEBD is empirically ∼50× smaller than that for the Lightcone-MPS, increasing the raw simulation time further. In practice, to benchmark the experiment, we need to choose dt such that we can match the time step chosen in the experiment (around 0.77 Rabi cycle). Therefore, we use half the experimental time step as the dt for the Lightcone-MPS algorithm, and use 1/80 the experimental time step as the dt for the TEBD algorithm. This still results in a 1/40 reduction in number of simulation steps thanks to the Lightcone-MPS algorithm.

Time-dependent variational principle
The Time-Dependent Variational Principle 48 (TDVP) algorithm is another MPS time evolution algorithm. In this algorithm, the Schrödinger equation is projected onto the variational manifold of MPS, where an effective (nonlinear) partial differential equation (PDE) is obtained to update the MPS parameters. This method is better than TEBD in certain situations such as simulating emergent hydrodynamics, due to its ability to preserve all symmetries and conservation laws. In addition, TDVP can naturally handle long-range interactions.
However, the TEBD algorithm (and the Lightcone-MPS algorithm) is expected to be better suited in our setup. First, the TEBD algorithm is optimal for minimizing the infidelity in each evolution time step. This is done by truncating the Schmidt coefficients in a way that the infidelity is minimized. In the TDVP algorithm, on the other hand, the infidelity is not necessarily minimized at the cost of preserving conserved quantities. Second, when combined with MPS representation, the TDVP implementation is nontrivial. This is due to the existence of the gauge degree of freedom in MPS representation. More specifically, the representation of MPS is not unique due to gauge transformation. This makes it difficult to utilize MPS-based TDVP by leveraging existing PDE solvers. Ref. 49 presents a solution to this problem, which shows that TDVP can be implemented by using the canonical form of MPS, but it relies on evolving the system locally in small time steps, akin to the Trotterization of the Hamiltonian. In addition, it turns out that this solution is a minor modification of the original TEBD algorithm, where a backward time evolution is added. This backward evolution is already naturally incorporated in our Lightcone-MPS algorithm.
Moreover, regardless of the implementation of the TDVP algorithm, the PDE needs to be solved numerically, with the maximum time step limited by the largest term of the Hamiltonian or the largest term in the PDE. Therefore, the TDVP algorithm has the same bottleneck as the TEBD algorithm -the time step must be small compared to the blockade interaction strengths, limiting its utility. In our Lightcone-MPS algorithm, however, this limitation is lifted by utilizing extra physical properties such as Lieb-Robinson bound and our knowledge about the geometry of the system (Ext. Data Fig. 22). Given this, we do not explicitly test a TDVP implementation in this work.

Noisy MPO-based simulation
Instead of simulating perfect quantum dynamics with limited fidelity, an alternative method of benchmarking with the experiment is to (almost) exactly simulate noisy quantum dynamicsρ(t) with similar fidelity. We explore this possibility using the recently developed matrix product operator (MPO) based simulation 50 . In MPO simulation, each local tensor has an extended Hilbert space dimension of 4 instead of 2. The increased local dimension significantly increases the cost of simulation using the efficient decomposition of Ref. 47 . Therefore, we fall back to simulating the Trotterized Hamiltonian dynamics and ignore the long-range interactions. In addition, we choose dt to be half the Rabi cycle, and ignore Trotterization error, in order to consider this method in the most favorable light.
For simplicity, we consider local dephasing error as the only error source and find the effective dephasing rate γ per qubit as a function of system size such that the fidelity of the simulation matches the fidelity of the experiment (Ext. Data Fig. 23), extrapolated using the exponential decay fit. As shown in the figure, the effective dephasing rate decreases as the system size increases, which means the dephasing rate at N = 12 overestimates the dephasing rates at larger system sizes. Since the MPO entanglement decreases as the decoherence rate increases, this leads to an underestimation of the MPO simulation complexity. To make the case more favorable for the MPO method, we thus use the dephasing rate at N = 12 for the simulation at N = 60.
Assuming the early time exponential fit of the experimental fidelity, the Lightcone-MPS algorithm achieves similar fidelity with the experiment at N = 60 using χ = 913 (Ext Data Fig. 27). This motivates us to test if the MPO method with a similar or larger bond dimension can achieve the same fidelity. We simulate the noisy quantum dynamics at N = 60 using χ = 1100 and the dephasing rate upper bound, yielding the un-normalized simulation mixed stateρ trun , where we don't explicitly normalizeρ trun at each time step. Here, tr(ρ trun ) serves as a measure of the simulation accuracy, where it equals 1 when the simulation is perfect and decreases as we truncate the density matrix. We measure tr(ρ trun ) as a function of time (Ext. Data Fig. 23, inset) and find that it drops to ∼10 −7 . This shows that the MPO algorithm is far from an exact simulation of the noisy quantum dynamics. We note that the tr(ρ trun ) measures the accuracy of the algorithm compared to the noisy dynamics. It does not measure the fidelity of the simulation compared to the ideal dynamics, which can be even smaller. Thus, this study, in combination with the Trotterization errors not considered here, implies the MPO-based approach will achieve far lower fidelity than the experiment and the lightcone-MPS method for reasonable simulation parameters.
Ext. Data Fig. 23 | Matrix product operator (MPO) simulation. Effective dephasing rate per qubit to obtain noisy dynamics with a fidelity similar to the exponential decay fit of the experiment. Inset) Truncation accuracy of the MPO method with χ = 1100 at N = 60. The effective dephasing rate is taken as the dephasing rate at N = 12, which should serve as an upper bound of the effective dephasing rate at N = 60.

Path integral formulation
A hybrid path integral algorithm was introduced in Ref. 17 for quantum circuit simulation. In this algorithm, the system is cut into two patches. When no gate is applied between the two patches, each patch evolves independently under the Schördinger equation. When a gate is applied across the two patches, the gate is decomposed into sums of gates that apply independently on the two patches, where one path was sampled out of this decomposition. More formally, we write the gate asÛ = iV i ⊗Ŵ i . For each path, we only choose a particularV i ⊗Ŵ i , so that the two patches stay in a product state. The wave function of each path can be then summed over to obtain the desired result. Suppose each gate across the cut can be minimally decomposed as r product gates and there are m gates across the cut in total, r m paths are required to fully describe the system.
In this work, we adapt the hybrid path integral algorithm to the long-range Hamiltonian dynamics. We first decompose 47 the global time evolution operator into local gates with block size L = 10. We further reverse the order of the gate application at every other layer so the gates can be fused between the two layers (Ext. Data Fig. 24). This reduces the number of gates that need to be decomposed at the cut. This allows us to have a minimum of 19 cuts throughout the simulation to match the experiment time steps. Similar to the original hybrid path integral algorithm, we cut the system into two halves. For gates applied across the two halves of the system, we use Schmidt decomposition to write the gate asÛ = i σ iVi ⊗Ŵ i , where σ i can be then absorbed into either of the halves. However, a naïve decomposition results in a large interaction between the blockade and non-blockade sectors, which results in poor convergence. Therefore, we project the gate into the blockade sector before performing the decomposition.
Ext. Data Fig. 24 | A hybrid path integral simulation. a. Illustration of the hybrid lightcone-path integral (Lightcone-PI) method. b. Simulation fidelity of the Lightcone-PI with 4 × 10 6 paths, showing decreasing fidelity as a function of N . We compare against F svd for the Lightcone-MPS algorithm at χ = 1024 and N = 60, which outperforms the L-PI method. c, d. At a fixed N , the infidelity in the Lightcone-PI method from using too few paths is visible when benchmarking directly against experimental data, as was the case for the Lightcone-MPS algorithm (Fig. 2

of the main text).
We test this algorithm for up to 24 atoms where we truncate the rank of the gates across the cut to r i to mimic actually simulating the path integral with i r i paths. We find that the resulting fidelity mostly depends on the number of paths and only weakly depends on the system size, where the dependency becomes weaker as the system size increases. This is because, for all system sizes, we have the same number of gates across the cut, which we expect to be the only factor affecting the fidelity for large enough system sizes. Extrapolating to 60 atoms, we find that we need at least 4 × 10 6 paths to obtain a fidelity of ∼0.01, where each path requires an exact simulation of two 30-atom systems. Even using our Lightcone-MPS method, simulating each 30-atom system takes 60 core-hours (with χ = 1536, which is nearly exact). This adds up to a formidable amount of computational resources. In comparison, a direct application of the Lightcone-MPS algorithm for 60 atoms achieves better fidelity with a bond dimension of χ = 1024 (Ext. Data Fig. 24b).
Interestingly, we can still benchmark the experimental fidelity using the hybrid path integral approach, which we see shows qualitatively the same behavior as the Lightcone-MPS algorithm in Fig. 2 of the main text. That is, when the number of paths is too low, the classical simulation fidelity drops, and this drop is visible directly on the benchmarked experimental fidelity (Ext. Data Fig. 24cd).
This raises the interesting potential prospect of using the experimental benchmarking protocol in reverse to determine the fidelity of the classical algorithm, which we leave to future work.

Neural network based Hamiltonian simulation
Neural networks have shown great potential in simulating quantum dynamics 51,52 . However, such protocols are still in an early age of development and their application to the Rydberg Hamiltonian with ergodic, long-time dynamics is still challenging. We considered two different neural network algorithms used in Refs 53,54 , but neither results in a fidelity higher than the Lightcone-MPS algorithm, even for comparing both at system sizes as low as N = 12. While it is possible that neural network based algorithms require additional fine-tuning and hyper-optimization, as it is not the main focus of this work, we leave them for future study. We also note that neural network algorithms are under active development, and we welcome readers to benchmark their algorithms on our data.
H. Estimating the MPS fidelity

Approximating the MPS fidelity with truncation errors
While MPS gives a controlled approximation of the exact wave function, an exact calculation of the MPS fidelity, C -defined as the overlap between the MPS state and the ideal target state -is not possible over multiple time steps for system sizes greater than N ∼30. However, it is possible to estimate the MPS fidelity from the truncation fidelity where i runs over all steps involving Schmidt value truncations, and s i,α are the Schmidt values at truncation step i.
Here, we assume the wave function is normalized where ∞ α=1 s 2 i,α = 1. While this estimation is only approximate, it can be extremely accurate when successive truncations are independent 21 . For N ≤ 30, a bond dimension of χ = 3072 almost exactly captures the dynamics, therefore, it is possible to compute the MPS fidelity, C, for various bond dimensions. In Ext. Data Fig. 25a, we plot F svd versus the actual MPS fidelity, C, to show the general agreement when using the Lightcone-MPS algorithm. This is expected due to the large dt of the algorithm, which makes successive truncations approximately independent.
In addition, we notice that the agreement becomes better when the system size increases, indicating that F svd can be a good estimator of the MPS fidelity for large systems (Ext. Data Fig. 25b). If anything, it appears that in the parameter regime of evolution times shorter than the entanglement saturation time, F svd slightly over-estimates C, making it a conservative quantity to study (Ext. Data Fig. 25bc). Further, we emphasize that for N < 30, where we have access to both C and F svd , the values of χ * are consistent (Ext. Data Fig. 26).
Ext. Data Fig. 25 | Estimating the classical fidelity. a. MPS truncation accuracy, F svd , and MPS fidelity, C, for various bond dimensions and system sizes. The MPS fidelity is measured against another MPS calculation with χ = 3072, which is almost exact for N ≤ 30. b. For a fixed log-normalized bond dimension,χ∼0.7, F svd first over-estimates C, before under-estimating it at late time (left). However, rescaling the time by the entanglement saturation time (which is linearly proportional to system size, see Fig. 3e of the main text) causes the data to collapse (right). This implies that before tent, F svd will over-estimate C, so in the parameter regimes we study in this work, using F svd is the conservative choice when comparing quantum and classical fidelities. c. The ratio of F svd /C around the entanglement saturation time, showing the over-estimation increases with decreasingχ. Given the data collapse, we expect that F svd overestimates by a factor of 1.15 at the entanglement saturation time for N = 60. Taking this effect into account implies the actual χ * (N =60) ≈ 4000, higher than the value of 3400 quoted in the main text. However, we choose to present the lower value as a more conservative estimate. We note that this relationship between F svd and C is only observed to hold for the case of the Lightcone-MPS algorithm, because the timestep is long enough that truncations are relatively independent. The agreement is not observed with a TEBD MPS implementation (which requires much finer timesteps, see text).
Ext. Data Fig. 26 | Estimating χ * from F d and true classical fidelity. For system sizes larger than N > 30, we are not easily able to calculate the exact classical MPS fidelity (as it is impractical to simulate the exact reference state), so in the main text we used F svd to compare against the experimental quantum fidelity. However, for N ≤ 30, we can directly compare against the classical MPS fidelity, C, which we find gives consistent values of χ * in the applicable range. Finally, for these small N , we can directly calculate F d between an approximate MPS simulation and an exact simulation. We find the corresponding χ * is also consistent with the values based solely on fidelity.

Extrapolating the experimental χ * value
We only directly perform Lightcone-MPS simulations up to χ = 3072. At this highest χ , the MPS fidelity for N = 60 at the latest experimental time is 0.088, as quantified by F svd . Given that this is lower than the experimental fidelity of 0.095 (11), we need to extrapolate in order to find the predicted value of χ * . The fidelities are very close, so we make a simple linear extrapolation approximation, from which we find χ * = 3392 ≈ 3400 as quoted in the main text (Ext. Data Fig. 27a).

Fitting and extrapolating the MPS fidelity
In the previous subsection, we used a linear extrapolation of MPS fidelity as a function of bond dimension in order to predict χ * beyond the regime we directly simulated. However, in Fig. 5b of the main text, we compare the experimental and MPS fidelity over a much larger hypothetical fidelity range, where the linear approximation breaks down (see for instance Ext. Data Fig. 10ab).
In order to make the comparisons in Fig. 5b, we perform fits of the MPS fidelity, approximated by F svd . For RUCs, F svd is essentially exponentially decaying past t ex , as the truncations are independent 21 . In our case this behavior is not perfect, and so we fit the MPS fidelity as a stretched exponential decay after t ex .
We are able to fit the exponential parameters, and in particular observe an apparent universality in the decay rate as a function of the log-normalized bond dimension (Ext. Data Fig. 27b), an interesting phenomena which we leave to future study. We ultimately use these fits to predict the MPS fidelity at arbitrary bond dimension to compare against experiment; we find the fits generally agree to within a few percent with the true fidelities.
To generate Fig. 5b, we assume an effective per-atom error rate, F, and find the minimum χ * such that the fitted MPS fidelity is always greater than F = F N t . The experimental F is determined as the value required for exponential decay of the experiment to match the fidelity value of 0.095 (11) for the experiment at t = 14.3 cycles and N = 60. We then convert χ * into runtime using the known conversion of χ = 3072 → 136 core-days, measured empirically on the Caltech cluster, and assuming the runtime scales as O( χ 3 ) (Ext. Data Fig. 20). I. Error model simulations

Random unitary circuits
The dynamics of the one-dimensional random unitary circuit (RUC) shown in Fig. 4a are simulated using randomly sampled two-qubit SU(4) unitary gates from the Haar measure. Specifically, the time evolution of an N -qubit system starting from the initial state |0⟩ ⊗N can be described as follows: whereÛ odd = {Û 1 ,Û 3 , · · · } andÛ even = {Û 2 ,Û 4 , · · · } are the odd-and even-time unitaries composed of local two-qubit unitaries asÛ with open boundary condition, and t is the circuit depth andÛ µ,ν is the randomly-sampled SU(4) gate acting on two qubits at site µ and ν. Note that at each circuit depth t, we randomly sample two-qubit random unitaries to generate a many-body unitaryÛ, which leads to chaotic dynamics. To emulate noisy quantum dynamics, we utilize a stochastic evolution method 55 incorporating local errors represented by the Pauli operatorsŜ x,y,z . These local errors are stochastically applied to individual qubits at a single-qubit error rate of p err = 0.007 per circuit layer. By repeating the simulations of the noisy dynamics more than ∼2000 times with a fixed p err for 10 different circuit realizations, we obtain good approximations of the density matrices,ρ RUC , from which we extract the logarithmic negativity as a measure of entanglement in the mixed state.

The Rydberg Hamiltonian
The chaotic dynamics of a one-dimensional Rydberg-atom array presented in this study can be described by time evolution governed by the Rydberg Hamiltonian given in Eq. (1) of the Methods. To simulate the open quantum dynamics of the Rydberg atom array, we employ an ab initio error model that incorporates realistic error sources observed in our experiment. For detailed information about the error model, please refer to Refs. 2,8 . Similar to the case of the RUC, we simulate the noisy quantum evolution using the stochastic wavefunction method 55 to obtain the density matrix. This allows us to calculate the logarithmic negativity (Fig. 4a of the main text). We generally find excellent agreement between our error model and the experimental fidelities across a range of system sizes (Ext. Data  Fig. 4), and note that this is the same error model employed in our recent study of record high-fidelity two-qubit Bell state generation 8 .
Of present interest is understanding the main limitations to our experimental fidelity, to best improve the system in the future. To this end, we perform error budget simulations, grouping error sources generally into four main categories: noise on the detuning, noise on the Rabi frequency, spontaneous decay, and atomic temperature effects (Ext. Data Fig. 28). We generally find that Rabi frequency noise (in particular, as arising from shot-to-shot laser intensity fluctuations) is a dominant noise source across all times. We extrapolate from error model simulations at small system sizes to larger systems, and find that likely spontaneous decay can become of similar strength for large systems (Ext. Data Fig. 28). We note that in these simulations, collective Rydberg decay effects are not accounted for 56 , but we do not yet observe evidence of such effects in the t < 15 cycles time window in which we experimentally operate.