A lab-based test of the gravitational redshift with a miniature clock network

Einstein’s theory of general relativity predicts that a clock at a higher gravitational potential will tick faster than an otherwise identical clock at a lower potential, an effect known as the gravitational redshift. Here we perform a laboratory-based, blinded test of the gravitational redshift using differential clock comparisons within an evenly spaced array of 5 atomic ensembles spanning a height difference of 1 cm. We measure a fractional frequency gradient of [ − 12.4 ± 0. 7(stat) ± 2. 5(sys)] × 10−19/cm, consistent with the expected redshift gradient of − 10.9 × 10−19/cm. Our results can also be viewed as relativistic gravitational potential difference measurements with sensitivity to mm scale changes in height on the surface of the Earth. These results highlight the potential of local-oscillator-independent differential clock comparisons for emerging applications of optical atomic clocks including geodesy, searches for new physics, gravitational wave detection, and explorations of the interplay between quantum mechanics and gravity.

optical atomic clocks including geodesy, searches for new physics, gravitational wave detection, and explorations of the interplay between quantum mechanics and gravity.

Introduction
Einstein's theory of general relativity [1] has thus far been consistent with every experimental test performed [2], including classical [3,4], modern [5,6] and strong field cosmological tests [7][8][9]. However, despite the successful integration of special relativity and quantum mechanics as quantum field theory, there is currently no theory that successfully unifies general relativity with quantum mechanics. This motivates continued experimental tests at new length scales, and suggests that performing precision tests of general relativity with quantum systems may offer a way to explore the interplay between general relativity and quantum mechanics [10][11][12].
The gravitational redshift is a central prediction of general relativity. Thanks to rapid advancements in their stability and accuracy [13][14][15][16][17][18][19], atomic clocks have now enabled tests of the gravitational redshift over a wide range of length scales [20][21][22][23][24][25]. Recent tests of the gravitational redshift include a frequency comparison between two single-ion clocks with one of the clocks elevated by 30 cm [21], comparisons between terrestrial clocks and microwave atomic clocks in eccentric orbits which have produced the strongest limits on deviations from the expected redshift [22,23], a frequency comparison between two synchronously linked portable 87 Sr optical lattice clocks with a 450 m height difference at the Tokyo Skytree tower [24] resulting in the most precise terrestrial constraint on deviations from the gravitational redshift at the 10 −5 level, and a recent in-situ synchronous frequency gradient measurement across a millimeter-scale atomic ensemble with an unprecedented differential precision of 7.6 × 10 −21 [25].
For two otherwise identical clocks experiencing the same gravitational field with a height difference ∆h, their frequency difference δf due to the gravitational redshift is given by where f is the clock frequency, c is the speed of light, and g is the gravitational acceleration. Near the surface of Earth, this amounts to a fractional frequency shift of 1.1 × 10 −18 per centimeter of vertical displacement. With optical clocks now reaching instabilities and inaccuracies at the level of 10 −18 and below [15-17, 26, 27], they are becoming a sensitive probe of the point to point geopotential at the sub-centimeter scale, where they are expected to complement other methods of geodesy [15,[28][29][30][31][32][33][34][35]. For example, a blinded comparison between two independent Yb optical lattice clocks was recently performed with accuracy, instability and reproducibility all at the level required to resolve subcm height differences [15]. In addition, the frequency gradient due to the gravitational redshift across a single millimeter-scale atom ensemble was recently observed using Rabi spectroscopy of 87 Sr without the use of a blinding offset and taking advantage of an 8-mHz linewidth clock laser [25].
Emerging clock applications such as relativistic geodesy require transportable optical clocks, which currently have poorer stabilities and accuracies than those of laboratorybased clocks [32,34,36]. State-of-the-art laboratory-based optical clocks often make use of bulky and immobile reference cavities with second-scale coherence times [37][38][39] in order to achieve lower levels of clock instability and to aid in rapid systematic evaluation, limiting deployment in the field. It has recently been demonstrated that differential measurements between single ions [40] and neutral atom ensembles [41][42][43], as well as differential spectroscopy by phase-coherently linking a zero-dead-time optical lattice clock and a single ion clock [44], allow interrogation times beyond the limit set by the local oscillators, opening up the prospects for future applications with transportable or spacebased clocks [45][46][47][48][49].
In this work, we perform a local-oscillator independent, blinded test of the gravitational redshift at the sub-centimeter scale using a spatially multiplexed optical lattice clock [43] consisting of an array of 87 Sr atom ensembles trapped in a vertical, one-dimensional (1D) optical lattice (Fig. 1a). We prepare 5 atomic ensembles equally spaced by 2.5 mm, spanning a total height difference of 1 cm. Synchronous differential comparisons are performed between the 5 ensembles, resulting in 10 unique pairwise clock comparisons recorded simultaneously, including 4 pairs at 2.5 mm, 3 pairs at 5.0 mm, 2 pairs at 7.5 mm, and 1 pair at 1.00 cm height difference. The gravitational redshift is tested by mapping out the frequency differences between each ensemble pair as a function of height difference.

Results
Testing the gravitational redshift with a miniature clock network The principles and basic operation of the multiplexed optical lattice clock used in this work were first described and experimentally demonstrated in Ref. [43]. In that work we demonstrated differential clock comparisons between atom ensembles with fractional frequency imprecision below 1 × 10 −19 . However, there are a wide range of potential sources of differential frequency shifts at the mHz level, so the exact origin of the frequency differences we measured was not known, and the contributions from the relativistic gravitational redshift could not be independently extracted. In this work, we leverage several a, A representative camera image of a spatially multiplexed array with 5 ensembles of 87 Sr atoms (indexed 1-5 from top to bottom) trapped in a vertical 1D optical lattice for differential clock comparisons. The spacing between neighboring ensembles is 0.25 cm, spanning a total height difference of 1 cm. Due to the gravitational redshift, clocks at a higher gravitational potential are predicted to tick faster than clocks at a lower potential. The grey box shows the orientations of the applied bias magnetic field (B), and the lattice and clock laser polarizations ( ). b, A representative outcome from synchronous Ramsey spectroscopy on the | 1 S 0 , m F = +5/2 ↔ | 3 P 0 , m F = +3/2 clock transition with a 10 s free precession time using 5 atomic ensembles, resulting in 10 pairwise clock comparisons. In each plot, the excitation fractions of ensemble j are plotted against the excitation fractions of ensemble i (P j : P i , where j > i), tracing out an ellipse which is fitted to in order to extract the frequency difference between that pair. The frequency difference is dominated by the first-order differential Zeeman shift, which is rejected when averaging transitions between opposite spin states (Supplementary Note 3A).
key improvements to our apparatus and experimental procedure that have been made since our prior work, and perform a full systematic evaluation of all potential sources of differential frequency shifts at the 10 −19 level, enabling a blinded test of the gravitational redshift with mm-scale sensitivity to gravitational potential differences.
Our major improvements include employing a deeper initial optical lattice (130 E rec trap depth, where E rec is the recoil energy of a lattice photon) during atomic ensemble loading and in-lattice cooling provide larger atom numbers (> 2000 atoms per ensemble) with reduced atomic temperatures both axially (> 99% occupancy in the lattice ground band) and radially (< 200 nK). Synchronized Ramsey measurements are subsequently performed at a shallower operational lattice depth (u op ) of 15 E rec . Combined with common-mode rejection of clock laser noise, we achieve 32 s atom-atom coherence times, more than 300 times longer than our measured atom-laser coherence time of roughly 100 ms (Supplementary Note 2). This enables differential instabilities below 1 × 10 −17 / √ τ for all the ensemble pairs, a factor of 3 reduction in instability over our previous work (3 × 10 −17 / √ τ with 6 ensembles). In addition, we now suppress the residual magnetic field gradient along the lattice axis by a factor of more than 10, reducing systematic uncertainties arising from Zeeman shifts. These advances allow us to rapidly evaluate the differential systematic shifts at the 10 −19 level due to environmental perturbations and thus perform a precision test of the gravitational redshift. To the best of our knowledge, our work represents the first complete systematic evaluation of differential frequency shifts at the 10 −19 level making use of synchronous Ramsey spectroscopy. This technique, which unlike differential Rabi spectroscopy can be used to probe well beyond the coherence time of the local oscillator, enables a new modality of precision measurement with optical lattice clocks where both the achievable accuracy and precision are now unbounded by the quality of the local oscillator.
For our measurements, we utilize synchronous Ramsey spectroscopy in conjunction with spatially resolved fluorescence imaging to probe the clock transition along the ensem-ble array. The optical lattice is operated near the magic wavelength where the differential polarizability between the ground ( 1 S 0 , g) and clock ( 3 P 0 , e) states is zero [50,51]. We probe with an interleaved sequence using the magnetically insensitive |g, m F = ±5/2 ↔ |e, m F = ±3/2 clock transitions [18], where m F is the projection of total angular momentum along the quantization axis defined by the bias magnetic field. Taking the average of the clock transitions with opposite sign nuclear spin states rejects first-order Zeeman shifts and vector AC Stark shifts. The differential phase (φ d ) for each ensemble pair is extracted through least squares ellipse fitting, and is related to the differential frequency The differential frequency between each atomic ensemble pair includes a contribution from the gravitational redshift as well as other frequency shifts arising from differences between the two ensembles and their environments, necessitating an evaluation of potential sources of systematic effects. To avoid possible bias towards the expected result, we adopt a blinded measurement protocol. A blinded offset gradient was randomly drawn from a range of ±5×10 −18 /cm, roughly 10 times the size of the expected redshift gradient, and is automatically added to the extracted differential phase by our data analysis code following the ellipse fitting step. This blinded offset was not known to the authors during systematic evaluation and data taking. The blinded value of the measured frequency gradient across the array, the corrections for systematic shifts, and the systematic and statistical uncertainties were all finalized prior to the removal of the blind.

Systematic effects and error budget
The results of the full systematic evaluation are listed in measuring the contribution of each potential systematic is discussed in Supplementary Note 3. Several effects dominate the extracted differential frequency and the corresponding systematic uncertainty, and we highlight them here. First, atomic interactions due to p-wave collisions between on-site atoms lead to a frequency shift that scales linearly with atomic density [52][53][54]. In our differential measurement, the density shift is suppressed by a factor of roughly 10 compared to the absolute shift by balancing the number of atoms loaded into each ensemble. By intentionally varying the atom number differences, the differential density shift for a symmetric pair (2,4) at u op is evaluated to be The largest systematic shift in our system is the second-order Zeeman shift due to the background magnetic field gradient (∼ 1.5 mG/cm). The splitting of the transitions with opposite sign nuclear spin states provides a measurement of the magnetic field difference between each ensemble pair, while the overall magnitude of the applied bias magnetic field (∼ 5.5 G) is measured independently using a more magnetically sensitive transition. This bounds the uncertainty from the second-order Zeeman gradient to below 1 × 10 −19 /cm, limited by uncertainty in the second order Zeeman coefficients for the clock transition ( Fig. 2b and Supplementary Note 3A).
The frequency shift due to black body radiation (BBR) is the dominant source of systematic uncertainty for many room-temperature optical clocks [14][15][16][17] comparisons between ensembles within a single shared optical lattice, this is significantly suppressed. A differential lattice AC Stark shift between two ensembles is caused by the relative trap depth difference δu arising from the lattice beam profile, and scales linearly with δu to first-order (Supplementary Note 3D). In conjunction with the multiplexed ensemble technique, we can rapidly map out both δu and the differential light shifts. In doing so, we also observe a residual spatial light shift gradient of −8.0(1.1) × 10 −20 /E rec /cm, which depends linearly on the lattice depth and the spatial separation between ensembles, and does not depend on δu. We believe this gradient is likely due to the differential tensor Stark shift arising from slight variations in the orientation of the magnetic field vector across the ensemble array. This is supported by the observation of a differential vector Stark gradient of −2.5(2) × 10 −18 /E rec /cm in our system (Supplementary Note 3D). Regardless of the exact origin of the spatial light shift gradient, we are able to measure and account for it by varying the lattice depth. Subtracting off this residual spatial gradi-ent, we observe correlations between δu and the remaining shifts (Fig. 2d), as expected.
This allows us to extract the operational lattice detuning from the effective magic wavelength [25,51], and to independently evaluate the lattice light shift gradient upon removal of the δu shifts (Fig. 2e).

Data analysis and interpretation
We performed 14 blinded measurements of the gravitational redshift under normal operating conditions over a 3-week data taking campaign. In each measurement run (ranging in duration from 1 to 4 hours), the frequency gradient is determined by fitting a linear slope to the 10 measured differential frequencies as a function of the pairwise height differences, after taking into account systematic corrections such as ellipse fitting bias and density shift. In order to account for correlations that arise between clock comparison pairs that share the same clock, the covariance between the pairwise comparisons is included in the The statistical uncertainty is scaled up by the square root of the reduced χ 2 statistic, Our measurement is inconsistent with the hypothesis that there is no gravitational redshift on the surface of the Earth for mm-cm scale height differences at a confidence level of 4.9σ.
Deviations from the gravitational redshift predicted by general relativity can be pa-  Fig. 3|Gravitational redshift measurements. a, The measured fractional frequency gradients after accounting for all systematic corrections over 14 data taking runs (red scatter points), each of duration ranges from 1 to 4 hours. The weighted average (red solid line) yields a measured gradient of [−12.4 ± 0.7 (stat) ± 2.5 (sys) ] × 10 −19 /cm, consistent with the expected redshift gradient (black solid line). Red (blue) area represents ±1σ statistical (total) uncertainty, in which the statistical uncertainty is inflated by the square root of the reduced χ 2 statistic, χ 2 red = 1.16. The error bars correspond to 1σ standard deviation. b, Mean differential frequencies as a function of height difference across all measurements (red scatter points), analyzed using the same data set as in a. A linear fit (red solid line) yields a fractional frequency gradient of (−11.9 ± 2.6) × 10 −19 /cm, again fully consistent with the expected redshift gradient (black solid line). rameterized by defining a modification parameter α to first-order of the gravitational potential difference [2,24]. We constrain deviations from the predicted scaling by α = 0.13 ± 0.23 for millimeter to centimeter scale height differences. We note that while the most stringent constraints on α are at the 10 −5 level [22,23], those measurements were performed at very different length scales, with height differences roughly a factor of 10 9 times larger than those used here.
In the future it appears feasible to extend our measurement approach to a larger apparatus in order to achieve spatial separations between ensembles on the scale of roughly 1 meter, a scale of laboratory atomic physics experiment that has already been demonstrated with atom interferometers [55], which would offer an increase in the magnitude of the redshift by a factor of 100× over the separations used in this work. In addition, we see a clear path to reducing the systematic uncertainty of our differential comparisons by more than one order of magnitude, which can be accomplished by mitigating density shifts via operating at "magic" excitation fractions (as was recently demonstrated within a single ensemble [25]) or through modifications to the lattice geometry [19], reducing the differential tensor lattice light shift by eliminating background magnetic field gradients, and mitigating the black-body radiation gradients through improved control of the thermal environment. When combined, such an experiment would promise constraints on α at the 10 −4 level, indicating that laboratory-based tests of the gravitational redshift could soon be competitive with space-based tests [22,23] or tests with portable clocks [24].
As an alternative to the above analysis, we reanalyze the same experimental data by taking the weighted average of the differential frequencies of each ensemble pair from the 14 measurement runs, with systematic corrections applied individually to each pair. The weighted mean differential frequencies of ensemble pairs with the same height differences (0.25, 0.50, 0.75 and 1.00 cm) are shown in Fig. 3b. Through a final linear fit, we find a frequency gradient of (−11.9 ± 2.5) × 10 −19 /cm, again fully consistent with the expected redshift. Extrapolating the spatially varying systematic uncertainties to 0 cm of separation, we find our gravitational redshift resolution to be 1.3 mm, dominated by systematic uncertainty due to the differential density shift, which could potentially be further reduced by incorporating the recent technique for density shift cancellation [54] in future work (Supplementary Note 3B).

Sources
Gradient

Discussion
The measurement of gravitational potential differences between clocks with sub-centimeter resolution is a major goal of relativistic geodesy [15,[31][32][33][34]. In the preceding discussion and analysis we treated the height differences between each ensemble pair and the local gravitational acceleration g as known, as we measure them independently, but did not a priori trust the gravitational redshift predicted by general relativity. From another perspective, our measurements can be viewed as a proof-of-principle demonstration of relativistic gravitational potential measurement with millimeter scale resolution. Taking Fig. 4|Extracting relative height differences using relativistic gravitational potential measurements. The relative clock height differences across the array are determined using the measured gravitational redshifts and the independently measured local gravitational acceleration g. The double arrows represent the extracted height difference and the associated uncertainty for each clock pair. The true clock heights are shown on the right (red values), with the lowest clock (clock 5) defined as being at a height of 0 cm. All units are in cm unless otherwise specified. The uncertainties correspond to 1σ standard deviation.
as a given that the redshift is given by Eq. 1 and treating the ensemble array as a network of spatially distributed clocks with unknown height differences, we can extract the height ordering and relative height differences from the measured gravitational redshifts and g, which we measure independently. We find that we correctly assign the order of gravitational potential differences within the network, and that all of the extracted height differences are within 2 mm of the known values (Fig. 4). However, we note that we greatly benefit from the rejection of common-mode systematic shifts thanks to the ensembles sharing the same optical lattice and the same science chamber, which will not be possible when comparing two individual clocks at different geospatial locations. In addition, over a long baseline (> 1 km), phase noise from the frequency transfer will not be common-mode and will limit the coherence times of the differential comparison. Therefore, while these results demonstrate that relativistic gravitational potential measurements with mm-scale height resolution are achievable in the lab over short spatial separations, considerable challenges must be overcome before they can be applied to relativistic geodesy at length scales of interest.
In a recent work [25], Bothwell and collaborators resolved the gravitational redshift across a single 1 mm atom ensemble. While there are aspects in common between this work and Ref. [25], there are also several critical differences that set this work apart. First, we employed a blinded offset during data taking and systematic evaluation. Second, while Bothwell et al. made use of second-scale Rabi spectroscopy with an 8 mHz linewidth clock laser, we demonstrate comparable levels of differential stability and perform a full systematic evaluation at the 10 −19 level by employing synchronous Ramsey spectroscopy with a Hz linewidth clock laser. This demonstrates that measurements of this kind need not be limited be the stability of the local oscillator. Third, we measure between spatially resolved ensembles using techniques that are likely more relevant to applications that require spatially separated clocks such as relativistic geodesy and gravitational wave detection. Finally, we also observe and characterize several systematic effects that were not observed by Bothwell et al., such as a black-body radiation gradient shift and differential tensor lattice light shift, likely in part because of the larger range of spatial separations used in our work.
In conclusion, we perform a blinded, precision test of the gravitational redshift on the sub-centimeter scale with 5 spatially multiplexed ensembles of 87 Sr. We observe a gravitational redshift for millimeter to centimeter scale differences in height, and find that it is consistent with the expected general relativity gravitational redshift to within 1σ total uncertainty. Our result is inconsistent with zero gravitational redshift at a 4.9 σ confidence level and constrains deviations from the redshift predicted by general relativity to 0.13 ± 0.23 for mm to cm scale height differences. We demonstrate a gravitational redshift measurement resolution of 1.3 mm. Our results highlight the use of the spatially multiplexed ensemble techniques for achieving long coherence times and low differential instabilities without the need for a state-of-the-art clock laser, and demonstrate its utility for characterization of spatially varying systematic shifts in optical lattice clocks on the sub-centimeter scale and at the 10 −19 level. These results represent an important milestone along the way to gravitational potential measurements at the sub-centimeter scale with optical atomic clocks [15,[31][32][33][34], and explorations of the interplay between quantum mechanics and gravity [10][11][12].

Methods
Sample preparation and experimental procedure of above 80 %, the typical differential instability for each pairwise comparison is below where τ is the averaging time, consistent with the quantum projection noise (QPN) limit where f is the clock frequency, τ is the averaging time, and the factor of √ 2 assumes equal contribution from each clock.

Ellipse phase extraction
We perform synchronous Ramsey spectroscopy with up to 5 ensembles (indexing 1, 2, . . . where C i(j) is the contrast of ensemble i(j), θ L is the common-mode laser phase, and φ d is the differential phase which yields the differential frequency (δf ij = f j − f i ) between ensemble pair (i, j) through φ d = 2πδf ij T R for a given known Ramsey free evolution time Since we are operating at Ramsey dark times well beyond the laser coherence times, θ L is random and uniformly distributed from 0 to 2π. The data randomly samples from points lying on an ellipse (with slight deviations from the ellipse due to QPN). We then fit to this ellipse using a least-squares approach [56]. To extract the differential phase [57], we rewrite the data {P i , P j } (denoted as {x, y} below) in the form of a generalized conic section a 1 x 2 + a 2 xy + a 3 y 2 + a 4 x + a 5 y + a 6 = 0, which describes an ellipse when a 2 2 − 4a 1 a 3 < 0. We rewrite Eq. 4 as Through cancelling out θ L , we have which can be matched up with the coefficients a i from Eq. 5. The differential phase φ d is then extracted using: The associated Allan deviation is extracted via jackknifing technique [58], and is then fitted to a white frequency noise model with 1/ √ τ scaling. Extrapolating the fit to the full averaging time yields the statistical uncertainty of the differential frequency.
In our measurements, we probe with an interleaved sequence between clock transi-

Data blinding protocol
To eliminate possible bias of our data taking and systematic analysis towards an expected outcome, we employ a data blinding protocol. Our data software adds a large constant offset gradient to our measurements, including the data taken for systematic evaluations and data runs taken under normal operating conditions. The blinded offset gradient is pseudo-randomly drawn from a uniform distribution spanning over ±5 × 10 −18 /cm, 10 times the size of the expected redshift gradient. The blinded offset gradient is scaled by the height difference between each ensemble pair, and is then automatically added to the results from our data analysis code for ellipse phase extraction.
The blinded offset gradient was only unblinded after finalizing the corrections for all systematic effects, determining the measured value with blinded offset gradient taken under normal operating conditions, and finalizing the associated statistical and systematic uncertainties. No additional data was taken and no changes were made to the analysis, the error budget, the measured value, or the uncertainties after unblinding.

Normal operation, data taking, unblinding and analysis
We performed 14 blinded measurement runs of gravitational redshift data under normal operating conditions over a 3-week campaign. Each run ranged in duration from 1 to 4 hours, and was performed in conjunction with verification of several experimental parameters to ensure that the associated systematic effects are under control, such as the magnetic field gradient, density shift coefficients, δu, and clock and lattice beam alignments (See Methods). In each measurement run, the differential frequency of each ensemble pair was extracted through ellipse fitting, with the associated Allan deviation extrapolated to the full averaging time taken as the statistical uncertainty. The corrections for density shifts and bias error from the ellipse fitting are applied to each ensemble pair individually. The total uncertainty of each clock comparison is given by the quadrature sum of its statistical uncertainty and the uncertainties of systematic corrections. We analyze the measured frequency gradient using two approaches.
In the first approach, the extracted frequency differences for 10 ensemble pairs from each measurement run are plotted as a function of the height differences. A linear fit is applied to each measurement run. Many of the clock comparison pairs share a clock, e.g., pair (1, 2) and pair (2, 3) share clock 2. This means that the quantum projection noise is partially correlated between pairs, and not accounting for this would result in an underestimation of the error bar associated with the fit. To account for this, the covariance matrix is included in the fitting algorithm, where the covariance between clock pairs (a, b) and (b, c) is given by the jackknifing re-sampling approach [58] Cov whereφ JK =i is the extracted phase except the i th data point,φ JK is the mean ofφ JK =i , and N is the total number of measurements.
The associated fitted slopes from 14 measurement runs, after accounting for systematic gradient corrections, are then weighted averaged, yielding a statistical uncertainty of 0.7×10 −19 /cm, inflated by the square root of the reduced χ 2 statistic, χ 2 red = 1.16 (Fig. 3a).
Upon completion of the measurements, the pseudo-randomly generated offset blinding gradient was revealed and subtracted from the measurements. The offset gradient proved to be +3.7 × 10 −18 /cm, and the measurements before and after unblinding are shown in In the second approach, we re-analyze the data and apply systematic corrections to each pairwise clock comparison individually over the same raw data set. For each pair, the total uncertainty is calculated as the quadrature sum of the standard error of its weighted mean, systematic uncertainties that don't scale with height difference (density shift and ellipse fitting corrections), and other systematic uncertainties that scale with height difference. The weighted averaged frequency differences of each ensemble pair are given in Supplementary Table 1. Through a final linear fit to the differential frequencies as a function of the height differences, we find a frequency gradient of (−11.9 ± 2.5) × 10 −19 /cm (Fig. 3b), again fully consistent with the expected redshift gradient within 1σ total uncertainty.

Relativistic clock height difference measurements
Assuming that theory of general relativity is correct and that the gravitational redshift is given by Eq. 1, and treating the ensemble array as a network of spatially distributed clocks with unknown heights, we demonstrate relativistic gravitational potential measurement [15,32,34] using synchronous clock comparisons (Supplementary Table 1). The height difference ∆h for each clock pair is then given by where δf is the measured gravitational redshift, g is the independently measured local gravitational acceleration (g = −9.803 m/s 2 ), f is the clock transition frequency, and c is the speed of light. The uncertainties of the extracted height differences are dominated by the systematic uncertainties of the measured gravitational redshifts.

Data availability
The process data used in this study have been deposited on Zenodo digital repository under https://doi.org/10.5281/zenodo.8184043.

Code availability
The code used for experimental control, data analysis, and simulation in this work are available from the corresponding author upon reasonable request.

SUPPLEMENTARY NOTE 2: ATOMIC COHER-ENCE TIMES
We use a synchronized Ramsey sequence to measure the atomic coherence. We pre-

SUPPLEMENTARY NOTE 3: SYSTEMATIC EVAL-UATION A. Zeeman shifts
Coupling of the clock states to external magnetic fields give rise to Zeeman shifts of the clock transition frequency. For a single atomic ensemble with magnetic field amplitude B, the Zeeman shifts when probing the |g, m F = ±5/2 ↔ |e, m F = ±3/2 clock transitions can be expressed as where µ L and µ Q are the first and second-order Zeeman shift coefficients, respectively.
For differential clock comparison between ensemble pair (i, j), where j > i, with differing magnetic field strengths B i and B j , the differential Zeeman shifts, ∆ν ZS (B i , B j , ±) = ν ZS (B j , ±) − ν ZS (B i , ±), are given by where B = (B i + B j )/2 is the mean magnetic field amplitude, and δB = B j − B i is the magnetic field difference. The first-order differential Zeeman shift (on the order of ±8×10 −17 /cm) is rejected by averaging opposite spin state transitions, while their splitting yields δB. In the limit of |δB| B, the second-order differential Zeeman shift can be approximated as where ξ = µ Q /µ L , and ∆ν split is the splitting between the transitions with opposite spin states after subtraction of residual vector AC Stark shift.
The second-order Zeeman shift correction is applied for each measurement run, with uncertainty primarily limited by ξ, which is found to be 0.0105 (1)

B. Density shift
Due to the Pauli-exclusion principle, s-wave interactions are forbidden for identical Fermionic atoms within a single lattice site, while p-wave collisions are allowed, leading to a clock frequency shift that scales linearly with atomic density [3,4]. A recent study found that operation in a gravity-tilted shallow lattice allows for cancellation of density shifts at a "magic" lattice depth near 12 E rec , where the partially delocalized Wannier-Stark states enable tunability of on-site p-wave versus neighbouringsite s-wave atomic interactions [5]. In this work we did not observe such a cancellation effect at shallower lattice depths, likely due to the difference in dynamics between Rabi spectroscopy as was employed in Ref. [5] versus our use of Ramsey spectroscopy with a 50:50 superposition [6]. This offers the prospect of further reducing uncertainty from differential density shifts in future works.

C. Black body radiation shift
In our experiment, the optical lattice is orientated nearly vertically, with a tilt of about The BBR shift due to thermal gradients between the top and bottom viewports can be expressed as [9]: where h is the Planck constant, α Sr is the atom's DC polarizability, σ SB is the Stefan-Boltzmann constant, Ω k (z) corresponds to the solid angle as seen by the atom ensemble at z, and T k denotes the temperature on the surface of the viewport. The differential BBR shift between an ensemble pair at (z i , z j ) is then given by Because the viewport separation (2Z 0 = 15 cm) is much larger than the separation between the ensemble pairs (δz ji = z j −z i ≤ 1 cm), we approximate the solid angle difference as ∆Ω ji,top = πr 2 1 where r = 5.7 cm is the radius of the viewport. The differential BBR shift can be further simplified through a Taylor expansion up to order (δT /T ) 3 , where δT = T top − T bot is the temperature difference between the top and bottom viewports: Because the absolute temperature (T = 300 K) is a factor of 150× greater than the temperature difference we applied to the viewports (δT ≤ 2 K), the magnitude of the first-order term is roughly a factor of 100× larger than the second-order term. Therefore, we can approximate Eq. 17 as an expression with a linear scaling with both δT and δz ji : Measuring the frequency differences of the 10 ensemble pairs simultaneously, we find that as expected from Eq. 18, the resulting frequency shifts scale linearly with the temperature difference, and the corresponding slopes scale roughly linearly with the height differences (Supplementary Figure 3). Through a linear fit to the extracted slopes as a function of height differences, we find the BBR sensitivity in our system to be −4.2(1) × 10 −18 /cm per 1 K difference (Fig. 2c in the main text).
To monitor the temperature, we use commercially available 10 kΩ negative temperature coefficient thermistors (Amphenol Thermometrics, MC65F103A), rated for an in-

D. Lattice light shift
The lattice light shift for a single atomic ensemble is given by [10]: in whichα E1 ,α qm , andβ are the differential E1, E2-M 1 polarizabilities, and hyperpolarizability on the clock transition, respectively. u is the lattice trap depth in units of E rec , n z is the axial vibrational quanta, and δ L is lattice detuning from the effective magic wavelength, where the scalar and tensor shifts cancel. The vector shift is rejected by averaging the transitions with opposite nuclear spin states. Through clock sideband thermometry both axially and radially we find n z < 0.01 and radial temperature T r < 200 nK at u op = 15 E rec , where thermal averaging of the effective trap depth can be neglected. By operating at lattice depths below 35 E rec , the hyperpolarizability terms are also negligible.
To model the differential lattice light shift in our system, we introduce a dimensionless parameter δu, which characterizes the relative lattice trap depth difference between ensemble pair (i, j), where j > i For δu 1, the differential light shift (ν LS,j − ν LS,i ) can be approximated as which scales linearly with δu.
By modulating the lattice detuning between δ L + 100 MHz and δ L − 100 MHz at the operational depth u op = 15 E rec , we have Although the origin of the spatial light shift gradient is not definitively known, we also To independently evaluate the light shift gradient, we need to account for the δu dependent shifts, which require the knowledge of δu and δ L . The latter is done by first subtracting the residual spatial gradient from the measured differential light shifts. We then find that the remaining shifts are correlated with the extracted δu (Fig. 2d in the main text) as expected from Eq. 21. From this we find our operational lattice detuning δ L to be 16.9(1.5) MHz. With the extracted δ L and δu calibrated from each measurement day, we are able to monitor and account for the residual light shift gradient. A representative plot is shown in Fig. 2e in the main text.
Our operational lattice frequency is measured to be 368, 554, 780(30) MHz, limited by the accuracy of the wave-meter (HighFinesse, WS7). However, the lattice frequency is stabilized to a ultra-low-expansion cavity with typical drifts of less than −20 kHz per day, inferred from the narrow-linewidth 1 S 0 ↔ 3 P 1 MOT, providing sufficient long-term stability throughout the measurements. The lattice intensity is actively stabilized and controlled via feedback to the acoustic-optical modulator before the beam delivering optical fiber (NKT Photonics, LMA-PM-15). Both the incoming and retro-reflected lattice alignments are monitored on several cameras and photo-diodes, ensuring the daily-calibrated δu remained symmetric around 0 throughout the data taking campaign. Overall, the lattice light shift gradient in our system is evaluated to be −11.8(1.2) × 10 −19 /cm.

E. Probe Stark shift
The probe AC Stark shift arises from the clock light itself, and is suppressed when probing with a shared clock light, with uncertainties primarily arising from inhomogeneity and misalignment of the clock beam with respect to the lattice. The clock light beam waist indicating that the background electric field gradient is small.
To quantify our uncertainty in the differential shifts due to the background electric field gradient, we follow the approach laid out in Ref. [11]. In the presence of a background field E bg , which would most likely arise due to charge accumulation on the nearby top and bottom viewports, the DC Stark shift for a single atom ensemble is given by where c is the atomic coefficient. We rewrite the above form as [11] ν dc (V ) = ν bg + aV + bV 2 , where ν bg is the background stray field shift, and coefficients a and b are experimentally accessible parameters by modulating V . We note that aV represents the coupling between the background field and the applied field. When comparing ensemble i and ensemble j (j > i), the differential DC Stark shift (∆ν dc = ν j − ν i ) becomes where A = a j − a i , B = b j − b i , and ∆ν bg = ν bg,j − ν bg,i is the differential background shift due to charges on the viewports.
Due to the finite spatial extent (s) of the atom ensembles and inhomogeneity of the applied field, there is no V such that the applied field identically cancels ∆ν bg . We consider the extremum value of ∆ν dc (V ), denoted as ∆ν * . We then have the difference between the differential background shift and the extremum shift, ∆ν diff ≡ ∆ν bg − ∆ν * (refer to Fig. 1 in Ref. [11] for details). From Eq. 24, we find ∆ν diff = A 2 /(4B). We note that ∆ν * plays the role of a frequency correction for the field gradients, and can be written as ∆ν * = η∆ν diff , due to the fact that ∆ν * and ∆ν diff scale similarly with the stray field.
The differential background shift is then given by in which η is given by η = ζ 2 s 2 /R 2 to leading order of s, where ζ ≡ (q 1 + q 2 )/(q 1 − q 2 ) quantifies the charge symmetry between the viewports, and R is the effect length.
By applying voltages of up to ±100 V to the electrodes, we find A 2 /(4B) = 0.6(3) × 10 −20 for the ensemble pair of 1 cm separation. In our system, we obtain η ≈ 0.02 for s = 500 µm, R = 40 mm, and assuming 25% more charge on one viewport than the other. This bounds the background DC Stark shift gradient to below 0.1 × 10 −19 /cm.

G. Ellipse fitting
In the presence of QPN and numerical constraints which ensure an ellipse-specific solution, the least squares ellipse fitting approach is biased at phases close to 0 or π where the ellipse collapses into a straight line [12]. The differential phases across the ensemble array are dominated by the differential Zeeman shifts, such that through precise control of the magnetic field and gradient, we can operate in the regime where all the differential phases lie within the (π/6, 5π/6) range. For typical experiments with a 10 s Ramsey dark time and 2000 atoms per ensemble, the ellipse fitting bias error is bound to below 2 × 10 −19 (Supplementary Figure 6). During initial data processing, the bias error is corrected using Monte-Carlo simulations with the experiment parameters such as atom number and contrast as inputs. While the bias error does not scale with spatial separations, we estimate an upper bound limit of 0.5 × 10 −19 due to uncertainty in the input parameters used for Monte-Carlo simulation during bias error corrections. H. Wannier-Stark ladder, imaging, and the expected gravitational redshift The local gravitational acceleration in our laboratory was measured by the group of Prof. Tikoff from the Department of Geoscience at the University of Wisconsin-Madison using a LaCoste-Romberg gravimeter, and was then cross-validated with known values from several other survey points in Wisconsin. The local gravitational acceleration in our laboratory was measured to be g = −9.803 m/s 2 , rounded to the 4 th digit.
The lattice tilt (θ tilt ) with respect to gravity is independently measured using the splitting of the ±1 st order Wannier-Stark sidebands [13] to the clock transition at 5 E rec lattice depth (Supplementary Figure 7a), which is given by where m is the mass of 87 Sr, and λ L = 813.4 nm is the lattice wavelength. A representative plot of the measured Wannier-Stark sidebands is shown in Supplementary Figure 7b, and we find the weighted average splitting between ±1 st order sidebands to be 1729(2) Hz, corresponding to a tilt with respect to gravity of 5.0(1) • .
The overall spatial extent of the ensemble array is 1.00(1) cm, calculated based on the frequency chirp profile of the moving optical lattice loading sequence, which is precisely controlled using a direct digital synthesizer (DDS, Moglabs XRF). This yields an effective height difference of ∆h = 0.99(1) cm, consistent with the extraction of the height difference from the camera images. The effective camera pixel size in our imaging system is calibrated to be 34(1) µm per pixel using the standard time-of-flight imaging.
where c is the speed of light. This results in an expected redshift gradient of −10.9 × 10 −19 /cm, with the uncertainty bounded < 0.1 × 10 −19 /cm due to uncertainties in g and ∆h.