Quantum-enhanced interferometry with large heralded photon-number states

Quantum phenomena such as entanglement can improve fundamental limits on the sensitivity of a measurement probe. In optical interferometry, a probe consisting of $N$ entangled photons provides up to a $\sqrt{N}$ enhancement in phase sensitivity compared to a classical probe of the same energy. Here, we employ high-gain parametric down-conversion sources and photon-number-resolving detectors to perform interferometry with novel heralded quantum probes of sizes up to $N=8$ (i.e. measuring up to 16-photon coincidences). Our probes are created by injecting heralded photon-number states into an interferometer, and in principle provide quantum-enhanced phase sensitivity even in the presence of significant optical loss. Our work paves the way towards quantum-enhanced interferometry using large entangled photonic states.


INTRODUCTION
Optical interferometry provides a means to sense very small changes in the path of a light beam. These changes may be induced by a wide range of phenomena, from pressure and temperature variations that impact refractive index, to modifications of the space-time metric that characterize gravitational waves. In its simplest form, interferometry measures distortions via the phase difference φ between the two paths of the interferometer. The uncertainty ∆φ in a measurement of this phase difference is limited fundamentally by the quantum noise of the illuminating light beams. This noise can be reduced by employing light exhibiting nonclassical properties such as entanglement and squeezing in order to improve the sensitivity of an interferometer beyond classical limits [1]. Quantum states of light are most effective when it is desirable to maximize the phase sensitivity per photon inside an interferometer, such as in gravitational wave detectors [2,3] or when characterizing delicate photosensitive samples [4][5][6][7][8].
In principle, N -photon quantum states of light such as the highly entangled N00N state can provide up to a √ N precision enhancement over a classical state of equal energy [9][10][11][12][13][14][15][16][17]. Unfortunately such highly entangled states are vulnerable to decoherence, especially at large photon numbers. In practice, their enhanced sensitivity disappears in the presence of loss which may originate from interactions inside the interferometer (e.g. absorption in a sample) as well as external losses in the state preparation and detection [18].
Although a √ N enhancement is not achievable in the presence of loss, one can engineer states that tradeaway sensitivity for loss-tolerance in order to still achieve some advantage over classical limits [19,20]. For example, squeezed light [21][22][23][24] and non-maximally entangled states such as Holland-Burnett states [25][26][27][28][29][30][31] can surpass classical limits despite some losses. Importantly, the precision enhancement achievable with such states can grow with N , even in the presence of loss [19]. Experimental demonstrations have prepared unheralded N = 6 [29,30] (or heralded N = 2 [28]) Holland-Burnett states, but further increase of N is constrained by source brightness as well as detector efficiency and number-resolution. This motivates developing experimental protocols that can produce and detect loss-tolerant states with larger photon numbers.
In this work, we address a number of key challenges in order to scale-up quantum-enhanced interferometry using definite photon-number states of light. Firstly, we introduce novel probe states that are prepared by combining two photon-number states on a beam splitter similarly to Holland-Burnett states. However, unlike the latter, we allow the initial photon-number states to be unequal. We show that these generalized Holland-Burnett states are more sensitive than both Holland-Burnett and N00N states in the presence of loss and approximate the performance of the optimal probe [19]. Secondly, we experimentally implement our scheme using high-gain parametric down-conversion sources [32,33] and state-of-theart photon-number-resolving detectors [34] in order to access a large photon-number regime. We herald entangled probes of sizes up to N = 8 and measure up to 16-photon coincidences, thereby further increasing the scale of experimental multiphoton quantum technologies [35][36][37].
The idea is illustrated in Fig. 1(a). Two type-II parametric down-conversion (PDC) sources each produce pairs of light beams that are quantum-correlated in photon number, i.e. a two-mode squeezed vacuum state Here, λ is a parameter that determines the average number of photons in each beam, n = λ 2 /(1 − λ 2 ). Measuring one of the beams with an ideal lossless photonnumber-resolving detector projects the second beam onto a known photon-number state |h 1 . Duplicating this procedure with a second independent source and detector, we herald pairs of photon-number states that are not necessarily identical, i.e. the probe |h 1 , h 2 . When these states are combined on the first beam splitter, multiphoton interference generates a path-entangled probe inside the interferometer [38]. We quantify the phase-sensitivity of the probe inside the interferometer by calculating the quantum Fisher information Q. The quantity Q provides a lower limit on the best achievable phase uncertainty via the quantum Cramer-Rao bound, ∆φ ≥ 1/ √ Q. The bound can be saturated using the optimal measurement strategy, which in the absence of loss is photon counting for the probes considered here [39].
In Fig. 1(b), we plot Q for several probes with the same total photon number N = h 1 +h 2 = 8, but different ∆ = |h 1 − h 2 |, as a function of the signal transmissivity η s which we assume to be equal in both interferometer modes. Probes with a small ∆ provide a greater advantage over the classical shot-noise limit but are more sensitive to losses. Since the probe is heralded in our scheme, one can choose the optimal ∆ for a given η s . Also shown in Fig. 1(b) is Q for the optimal state that maximizes this parameter for a given N and η s . This state has been found in Ref. [19]; the derivation is reproduced in the Supplementary Materials. For the loss-free case (η s = 1), the optimal state is the N00N state. However, for efficiencies below ∼ 90%, our probes significantly surpass the N00N state in terms of Q, exhibiting performance close to optimal. Moreover, in contrast to the N00N and Holland-Burnett states, our probe performs at least as well as the shot-noise limit for any amount of loss.
We now turn to the experiment. Both PDC sources are periodically poled potassium titanyl phosphate (pp-KTP) waveguides pumped with ∼ 0.5 ps long pulses from a mode-locked laser at a repetition rate of 100 kHz. The four detectors are superconducting transition edge sensors which we use to count up to 10 photons with a detection efficiency exceeding 95% [34]. The interferometer is a fiber-based device in which we can control the distance between two evanescently-coupled fibers using a micrometer to vary φ, much like changing the path length difference between two arms of an interferometer. Further details on the experimental setup can be found in the Methods.
We measure interference fringes given by pr s1,s2,h1,h2 (φ), the joint photon-number probability per pump pulse to obtain the herald outcome (h 1 , h 2 ) and measure (s 1 , s 2 ) at the output of the interferometer when the phase difference is φ. We will refer to this as the (s 1 , s 2 , h 1 , h 2 ) rate. To quantify the phase sensitivity of the rates measured with a particular herald outcome (h 1 , h 2 ), we calculate the Fisher information: where ∂ φ denotes the partial derivative with respect to φ, andpr s1,s2,h1,h2 (φ) is a model fitted to the measured rates (see Supplementary Materials). Note that F h1,h2 (φ) quantifies the amount of information about φ in our measurement results, i.e. for a specific measurement strategy, and so F h1,h2 (φ) ≤ Q. Our primary figure of merit is the Fisher information per detected signal photon conditioned on measuring (h 1 , h 2 ) at the heralding detectors, is the total number of detected signal photons. Injecting a coherent state into our interferometer would in principle yield the Fisher information F = ñ when the detected mean photon number is ñ [18]. Thus, our figure of merit can be easily compared to the shot-noise limit which corresponds toF h1,h2 (φ) = 1. We measured the total efficiency of both the heralding and signal modes to be between 47 − 55% (see Supplementary Materials). This includes ∼ 60% mode coupling efficiency into fibers, 90% interferometer transmission, and 95% detector efficiency. Due to the latter two losses, the detected ñ is 10-15% smaller than the mean photon number inside the interferometer. As such, the Fisher information per photon inside the interferometer (which is the relevant resource when e.g. probing a delicate sample) is about 10-15% smaller thanF h1,h2 (φ).

RESULTS
We begin with low pump power to test our setup in the weak gain regime (λ ∼ 0.25, 10 µW per source). In Fig. 2, we show results for two different probes, (a) |1, 1 , the well-studied N = 2 N00N or Holland-Burnett state, and (b) |2, 1 , a probe studied here for the first time. We calculateF h1,h2 (φ) using two methods. In the first, we discard events in which we know photons were lost by only including rates where s 1 + s 2 = h 1 + h 2 in the sums of Eqs. (2) and (3). These rates are shown in the top panels of Fig. 2. Using this first method,F h1,h2 (φ) [green curves] surpasses the shot-noise limit by 0.09±0.01 for |1, 1 and 0.10 ± 0.04 for |2, 1 at its highest point. In the second method, we include all measured events. Note that this may include events where s 1 + s 2 < h 1 + h 2 due to loss in the signal modes, but also s 1 + s 2 > h 1 + h 2 due to loss in the herald modes. Conditioned on obtaining the herald outcome (h 1 , h 2 ), the probability of the latter occurring can be minimized by reducing the pump power and hence λ. This increases the purity of the probe at the cost of reducing its heralding rate. Without postselection,F(φ) [red curves] drops below the shot-noise limit mainly due to losses.
In addition to loss, the spectral purity and distinguishability of our photons are also sources of imperfection that reduce the contrast of the fringes and hence diminishF h1,h2 (φ) [40]. Consider the probe |1, 1 , for example. For φ = ±π/2, the whole interferometer acts as a balanced beam splitter, in which case Hong-Ou-Mandel interference should lead to a complete suppression in coincidences at its output. However, as can be seen in the orange (1, 1, 1, 1) fit in Fig. 2(a), the visibility of this interference effect is ∼ 75%. This visibility exceeds √ 0.5, which is the minimum required for demonstrating post-selected quantum-enhanced sensitivity with the probe |1, 1 [12,28,41]. In addition to spectral and polarization mismatch between the signal modes, the visibility is degraded by uncorrelated background photons (∼ 5% of detected photons) and the slight multi-mode nature of our sources, both of which reduce the purity of our heralded photons. The finite detector energy resolution also plays a small role as the detectors have a ∼ 1% chance to mislabel an event by ±1 photon [42].
Next, we increase the pump power to reach a highgain regime (λ ∼ 0.75, 135 µW per source) in which we can herald large photon numbers. We detect 16photon events at a rate of roughly 7 per second, which is much higher than the state-of-the-art achievable with bulk crystal PDC sources [36] or quantum dots [37]. In Fig. 3(a), we plotF h1,h2 (φ) calculated without postselection for all probes with N = 8. As expected given the amount of loss in our experiment, probes with larger ∆ are more phase sensitive due to their increased robustness to loss [ Fig. 1(b)]. In particular, the sensitivity of the ∆ = N probe should be shot-noise limited regardless of losses [43]. However in practice, the heralded detection of 0 photons could occur due to photon loss in the corresponding herald mode, resulting in the contamination of the signal with states for which ∆ = N . This degrades the performance of the ∆ = 8 probe [orange curve]. In the Supplementary Materials, we show that shot-noise limited performance with the ∆ = N probe is recovered by blocking one of the sources.

DISCUSSION
The fringes produced by our probes exhibit a number of different features compared to those measured with N00N or Holland-Burnett states. For example, with these two states, the expected signature of N -photon interference are fringe oscillations that vary as cos(N φ). While our measured fringes do not exhibit such oscillations in the high gain regime, they do exhibit sharper features than classical fringes. We show this explicitly by comparing our rates to those measured with distinguishable photons. This is achieved by temporally delaying photons coming from the top source with respect to photons coming from the bottom source by more than their coherence time. As an example, we consider the probe |3, 2 in Fig. 4. When the photons are injected inside the interferometer at the same time, the fringe contrast is significantly higher than when they are temporally delayed [ Fig. 4(a)]. Likewise, when we calculatẽ F 3,2 (φ) without post-selection, we find an improvement in the probe's sensitivity in the former case [ Fig. 4(b)]. This demonstrates that the probe sensitivity derives from multiphoton interference even at high photon numbers.
With any finite amount of loss,F h1,h2 (φ) vanishes when all fringes share a common turning point such as at φ = 0. In the case of Holland-Burnett (∆ = 0) and N00N states, there are also common turning points at φ = ±π/2 which causes the reduction inF h1,h2 (φ) around these phase values [ Fig. 3(c)]. In contrast, the probes with ∆ = 4, 6, 8 do not have a dip inF h1,h2 (±π/2). The origin of this effect for ∆ = 8 can be seen directly in the rates shown in Fig. 3(b). The region of the fringe with high sensitivity to φ (i.e. large gradient) is different for different values of |s 1 −s 2 |. This feature ofF h1,h2 (φ) allows the estimation of φ without prior knowledge of the range in which it lies, as is required for N00N or Holland-Burnett Testing multiphoton interference. Benefits of multiphoton interference using the probe |3, 2 . (a) Two sets of rates [blue: (5, 0, 3, 2), orange: (3, 2, 3, 2)] measured when the photons are injected inside the interferometer at the same time (data: circles, theory: bold lines) or at different times (data: crosses, theory: dashed line). In the latter case, the photons are well modelled by classical distinguishable particles. Error bars are one standard deviation assuming Poissonian counting statistics. (b)F3,2(φ) shows a significant improvement in sensitivity in the former case (bold line) compared to the latter case (dashed line), demonstrating that multiphoton interference improved the sensitivity of our probe. Red shaded regions shows 1σ confidence intervals obtained by fitting 50 simulated data sets that are calculated with a Monte Carlo method.
states, and thus provides a means for global phase estimation without using an adaptive protocol [27,44].
Finally, we briefly compare our results to other works reporting Fisher information per detected photon. The highest achieved here is ∼ 1.1 using the herald outcome (2, 1), i.e. a N = 3 probe. Ref. [31] and Ref. [17] respectively report ∼ 1.25 and ∼ 1.2 using a N = 2 probe. The latter work also achieves a Fisher information per photon inside the interferometer (i.e. accounting for undetected photons) of ∼ 1.15 which thus far is the only experiment demonstrating an unconditional improvement to the shot-noise limit. In the Supplementary Materials, we estimate that an efficiency of 80% (in all four modes) and quantum interference visibility of 85% would be sufficient to demonstrate an improvement to the shot-noise limit with N = 8 photons without post-selection. Although we do not attain these parameters in our experiment, our results do demonstrate the robustness of our probes to losses despite their large size. For example, the Fisher information per photon calculated without post-selection for the N = 8 probe with ∆ = 6 [ Fig. 3(a)] is slightly higher than that of the N = 2 N00N state [ Fig. 2(a)]. This contradicts the usual expectation that large entangled probes will necessarily be more fragile to noise and loss.
In summary, we proposed and experimentally demonstrated a scheme for quantum-enhanced interferometry that exploits bright two-mode squeezed vacuum sources and photon-number-resolving detectors. We measured interference fringes involving up to 16 photons which is significantly higher than the previous state-of-theart [35,36]. Crucially, our scheme prepares probes that are nearly optimally robust to losses and hence addresses one of the principal challenges when scaling-up to large entangled photonic states. With further improvements in the quality (e.g. coupling efficiency into optical fiber and purity) of bright two-mode squeezed vacuum sources compatible with transition edge sensors [33,45], we believe our loss-tolerant scheme provides a promising route towards achieving quantum-enhanced resolution using large entangled photonic states.

Sources
We pick 150-fs pulses from a mode-locked Ti:Sapphire laser (Coherent Mira-HP) at a rate of 100 kHz using a Pockels-cell-based pulse picker having a 50 dB extinction ratio. This repetition rate is chosen to accommodate the recovery time of the transition edge sensor detectors. The pump pulses are filtered to 783 ± 2 nm [full-width at half maximum] using a pair of angle-tuned bandpass filters. We split the pulses into two paths that are matched in length using a translation stage. In each path, we pump a 8 mm long ppKTP waveguide that is phasematched for type-II parametric down-conversion. At the exit of the waveguide, the pump light is rejected with a longpass filter, and the orthogonally-polarized downconverted modes are separated using a polarizing beam splitter. Each down-converted mode is filtered with a bandpass filter whose bandwidth is chosen to transmit the main feature of the down-converted spectrum but reject its side-lobes. The herald modes (1566 ± 7 nm) are coupled into single-mode fibers and sent directly to the detectors. The signal modes (1567 ± 7 nm) are coupled into polarization-maintaining single-mode fibers and sent into the interferometer.

Interferometer
The interferometer is a fiber-based variable beam splitter (Newport F-CPL-1550-P-FP). The splitting ratio is adjusted by controlling the distance between two evanescently-coupled fibers using a micrometer, which is analogous to changing the path length difference between two arms of an interferometer. In fact, any variable beam splitter that coherently splits light into two modes can be described by the same transformation as a Mach-Zendertype interferometer [46].
At low powers, we find that the quantity T (x) typically varies within [0.02, 0.98]. To obtain the corresponding phase, we correct for the imperfect visibility: such that T corr (x) varies between [0,1]. For a single photon injected into a Mach-Zender type interferometer with phase difference φ between its two arms, one expects T corr (x) = [1 − cos (φ)]/2. Solving for φ, we find:

Detectors
Our detectors are superconducting transition edge sensor detectors that operate at a temperature of 85 mK inside a dilution refrigerator. Details on their physical operation can be found in Ref. [34]. An electrical trigger signal from the pump laser begins a 6 µs time window of data acquisition during which the detector outputs are amplified and recorded with an analogue-to-digital converter. We use a matched-filter technique in realtime to convert each detector's output trace into a scalar value [47]. The scalar value is then converted into a photon number using bins that are set during an initial calibration run prior to data acquisition.

COMPETING INTERESTS
The authors declare no competing interests.

ADDITIONAL INFORMATION
The data sets generated and/or analyzed during this study are available from the corresponding author on reasonable request. Correspondence and requests for materials should be addressed to G.S.T.

Derivation of the Quantum Fisher information
We first consider the ideal lossless case. In general, the quantum Fisher information of a pure state |Ψ(φ) that depends on some parameter φ is given by [48]: . In our case, |Ψ is the two-mode state inside the interferometer (before the phase shift) and |Ψ(φ) = e iĉ †ĉ φ |Ψ is the state after the phase shift φ is applied in the upper interferometer mode c. After some simple algebra, one finds that Q is independent of φ and is determined by: We wish to calculate Q for the particular probe |Ψ =Û BS |h 1 , h 2 whereÛ BS is the balanced beam splitter unitary transformation. The second term in Eq. S8 is given by: where in line S10 we used the fact thatÛ BS is unitary and in line S11 we transformed mode c to the input modes a and b. The first term in equation S8 is calculated in a similar manner: Therefore Q is given by Eq. S20 only applies when there are no losses in the system. In the presence of losses, the probe |Ψ is transformed to a mixed stateρ. Then, Q is calculated using whereρ(φ) = e −iĉ †ĉ φρ e iĉ †ĉ φ is the probe state after the phase shift φ andΛ[ρ(φ)] is a Hermitian operator called the "symmetric logarithmic derivative" defined implicitly via We notice that by combining Eq. S22 with Eq. S21 we obtain an alternative equation for the QFI By writingρ in its eigenbasis,ρ = i p i |e i e i | and writing out the derivative ∂ φρ (φ) = ie −iĉ †ĉ φ [ρ,ĉ †ĉ ]e iĉ †ĉ φ , it can be shown that Q is given by [1] which is independent of φ. The sum is taken over all terms with a non-vanishing denominator.

Optimal states
In the main text we compare the performance of our probes to the "optimal states" which provide the largest possible quantum Fisher information given some amount of loss [19,49]. A general N -photon pure state inside the interferometer can be written in the Fock basis as In the absence of loss, the optimal state is found by optimizing the coefficients {α n } to maximize the quantum Fisher In the presence of loss, |Ψ turns into a mixtureρ which can be written in the following form where |Ψ j do not have to be orthogonal. Due to the convexity of quantum Fisher information, Q ofρ is upper bounded by The bound is attained if the kets |Ψ j are orthogonal, which is the case for e.g. N00N states or if photon losses are present in only one interferometer mode. Applying Eq. S27 to Eq. S25, we obtain (η −1 s2 − 1) m and η s1 , η s2 denote the transmittances in the signal modes.
The optimal states are found by numerically maximizing Q over the probabilities {x n }. Since Q is a concave function of {x n } [49], any maximum is global. Although Q < Q [Eq. S27] when losses are present in both modes, the difference between the two quantities is small relative to the difference between the shot-noise limit and the Heisenberg limit [49]. Due to this approximation, the optimized Q is a slight over-estimate of true quantum Fisher information Q of the optimal states. Fig. 1(b) in the main text shows Q of our probes and the optimal state as a function of equal transmissivity in the signal modes η s1 = η s2 = η s which varied from 0 to 1 in steps of 0.01. The optimal state was calculated in Mathematica by maximizing over coefficients {x n } in Eq. (S28), assuming they all sum up to 1 and are real and positive. We computed Q of our probes in Python using the following method. We started with two copies of the state in Eq. (1) in the main text, inserted a beam splitter in each signal mode, and traced over the reflected port to model signal transmissivities η s . The two matrices were then combined on the first interferometer beam splitter and form a four mode density matrix, which was then reduced to two modes by projectively measuring the two herald modes. Eigenvalues and eigenvectors were found for the two-mode density matrix inside the interferometer which were then used to calculate Q via Eq. (S24). To compare the ideal performance of our probes with the optimal state, we exclude the effect of imperfect heralding on the former by using η h1 = η h2 = 1 in the calculation of Q.
From the calculations described above we show that for η s ∈ (0, 0.5 the best approximation to the optimal state is given by the probe with ∆ = 8; for η s ∈ (0.5, 0.58 by the probe with ∆ = 6; for η s ∈ (0.58, 0.66 by the probe with ∆ = 4; for η s ∈ (0.66, 0.69 by the probe with ∆ = 2; and for η s > 0.69 by the probe with ∆ = 0, as shown by the colored line in Fig. 1(b).

Modelling measured rates
Here we describe the modelpr(s 1 , s 2 , h 1 , h 2 , φ) used to fit the experimentally measured rates. We model optical loss by placing fictitious beam splitters (see Fig. S5) and tracing over the reflected modes. For now, we assume η d1 = η d2 = 1. We will treat the effect of these detection losses at the end.
The sources produce two-mode squeezed vacuum states: where i = 1, 2 represent sources 1 and 2, respectively. The joint photon-number distribution of this two-mode squeezed vacuum state after the losses is given by: The intuition for the expression above is as follows. Imagine that there are two detectors after the fictitious beam splitters that give the detection outcome (x, y). The source must have produced at least max (x, y) photon pairs and perhaps some photons were lost (i.e. reflected at the beam splitters). The probability to produce n pairs of photons is (1 − λ i ) 2 λ 2n i . Having produced n pairs, the probability to reflect n − x [n − y] photons and transmit x [y] photons in the herald [signal] mode is n x η x hi (1 − η hi ) n−x [ n y η y si (1 − η si ) n−y ]. In principle, n can range up to ∞, but in practice it suffices to truncate this sum at some value where (1 − λ i ) 2 λ 2n i becomes small. In our numerics, we truncate the sum at n = 50.
If we obtain the herald outcome (h 1 , h 2 ), then the (unnormalized) state that is injected into the interferometer is given by:ρ = ∞ m,n=0p r 1 (h 1 , m)pr 2 (h 2 , n) |m, n m, n| . (S31) Losses occurring inside the interferometer can be absorbed into η s or η d if they are equal in both interferometer modes, which was approximately the case in our experiment. Thus, the interferometer transformation can be described by a unitary operatorÛ (φ) which depends on the phase difference φ between both arms. The probability that we wish to calculate is given by:p Knowing that there are a total of s 1 + s 2 photons before the interferometer, we can constrain n = s 1 + s 2 − m and truncate the sum at s 1 + s 2 in Eq. S31. Thus, we obtain: The matrix element s 1 , s 2 |Û (φ)|m, s 1 + s 2 − m 2 is derived in Ref. [50] and is given by: Alternatively, the matrix element can also be evaluated using Kravchuk polynomials [38].
The model for temporally distinguishable photons follows the same approach as above. While the description below focuses on temporal distinguishability, the same equations are valid to describe distinguishability in any other degree of freedom. We adopt a heuristic approach (e.g. as in Ref [40]) in which the temporal mode of the photons produced in the top source is decomposed into a component completely indistinguishable ( ) to the temporal mode of the bottom source photons as well as a component completely distinguishable (⊥). With this decomposition, Eq. S31 becomes: where M ∈ [0, 1] is a mode overlap parameter characterizing the distinguishability of the photons. For M = 0 (M = 1), the photons from top and bottom sources are completely distinguishable (indistinguishable). Since our detectors cannot resolve the time difference between ⊥ and , they convolve the probabilities for the photons to have originated from either temporal mode. This measurement is described by the following incoherent sum of projectors: Many of the terms in the sum of Eq. S36 can be eliminated due to constraints on the photon numbers. For example, a total of m − l photons are produced in mode ⊥ and so x + y = m − l. Moreover, s 1 + s 2 = m + n. After applying these constraints, the final joint probability is given by: Finally, we can now consider the effect of the losses just before the detectors. These losses can be modelled with a transformation analogous to Eq. (S30). Applying this transformation on Eq. (S33), we obtain: The same method is used for the distinguishable photons model, i.e. replacepr(j, k, h 1 , h 2 , φ) withpr dist (j, k, h 1 , h 2 , φ) in Eq. (S38). In our numerics, we truncate the sums in Eq. (S38) to only include the effect of losing a few photons, which is a good approximation given the high efficiency of our number-resolving detectors. The equations above are evaluated numerically and fitted to the experimentally measured pr(s 1 , s 2 , h 1 , h 2 , φ) by varying the fit parameters η h1 , η h2 , η s1 , η s2 , η d1 , η d2 , λ 1 , λ 2 . Fitting is performed using the Python package lmfit with a least squares method. Note that, for the sake of increasing the speed of the fitting, we used M = 1 for all data except in Fig. 4 where we used M = 0. Thus, the fit parameters generally did not correspond to the measured efficiencies and squeezing parameters (see below). Instead, the fitting procedure converged on larger λ values and smaller η values to emulate the effect of imperfect interference (i.e. reduced fringe visibility). We tested the full model (i.e. including M) by fitting a subset of rates measured in the high gain regime and found the fit parameters: η h1 = 0.50, η h2 = 0.50, η s1 = 0.61, η s2 = 0.50, η d1 = 0.9, η d2 = 0.99, λ 1 = 0.68, λ 2 = 0.68, and M = 0.73. These efficiency values are within error to the measured values (see below), and M = 0.73 is roughly consistent with the measured ∼ 75% quantum interference visibility of the (1, 1, 1, 1) rate.

Estimating efficiencies
We characterize the efficiency of our setup using a Klyshko-like method that is generalized to photon-numberresolved detection [51]. We set the variable beam splitter to maximize reflection and measure the joint photon-number distribution pr i (x, y) pumping one source at a time. We fit the measured pr i (x, y) topr i (x, y) [see Eq. S30] using three parameters: the PDC gain λ i and the total efficiency of the herald mode (η hi ) and the signal mode (η si ). Note that the latter will also include the detection efficiency η di . By repeating the procedure with five different pump powers, we find that η s1 η d1 = 56 ± 3% and η h1 = 47 ± 1% for the first source, and η s2 η d2 = 52 ± 4% and η h2 = 51 ± 1% for the second source.
Recovering shot-noise limited performance Using the ∆ = N probe, all photons injected into the interferometer originate from one source. As such, imperfections such as spectral purity and mode matching should not affect the performance of the probe. The ∆ = N probe is prepared by considering trials where e.g. (h 1 , h 2 ) = (N, 0). However, even when h 2 = 0, this second source can still inject unwanted light into the interferometer due to losses in the herald modes. This generally degrades the performance of the ∆ = N probe. Here we show that shot-noise limited performance is recovered by blocking one of the sources. In Fig. S6, we plotF 5,0 (φ) for the ∆ = N = 5 probe calculated without post-selection. We performed the measurement with a single source blocked and with both sources unblockled. In the latter case, we find thatF 5,0 (φ) reaches 0.991 ± 0.001 at its highest point, demonstrating shot-noise limited performance. Ideally,F 5,0 (φ) should be flat with φ. However, experimental imperfections such as imbalanced detector efficiency, detector dark counts (∼ 1%), and imperfect interferometer visibility cause the dips inF 5,0 (0) andF 5,0 (±π) where the photons should ideally always exit the interferometer from one port. Parameters required to surpass shot-noise limit without post-selection Here we provide an analysis on estimating the efficiency and quality of the two-mode vacuum sources required to surpass the shot-noise limit without post-selection. We focus on the ∆ = 6 [i.e. (h 1 , h 2 ) = (7, 1)] probe as this is the most loss-tolerant N = 8 probe that can surpass the shot-noise limit in our scheme. For simplicity, we assume equal efficiency η in all four modes of the experiment (η = η h1 = η h2 = η s1 = η s2 ) and equal PDC gain parameters λ. There are three main experimental parameters to consider: (i) the efficiency η, (ii) the distinguishability M of photons between the top and bottom sources, (iii) the PDC gain λ. Given these parameters, we estimate the sensitivity of the probe by calculating the classical Fisher informationF 7,1 (φ) (per detected signal photon). Since this quantity generally depends on φ, we focus on region in phase with the largest possible sensitivity, i.e. max[F 7,1 (φ)].
We begin by focusing on the effect of the PDC gain λ and assume M = 1 for now. As shown in Fig. S7(a), a smaller λ provides a larger max[F 7,1 (φ)]. This is because lowering λ increases the photon-number purity of the heralded probe in the presence of loss in the heralding arms, i.e. it reduces the probability that the herald detectors under-counted the true number of photon pairs produced by the sources. As a reference, we include the perfect heralding case which is shown by the black line. While reducing λ minimizes the detrimental effects of imperfect heralding, it also drastically decreases the heralding rate. For example, assuming η = 0.5 and a 100 kHz laser repetition rate, λ = 0.75 would produce a N = 8 probe roughly once per second whereas λ = 0.35 would produce such a probe only about once per day.
Next we consider the combined effect of imperfect distinguishability and efficiency. In Fig. S7(b) [(c)], we plot max[F 7,1 (φ)] as a function of η and M for λ = 0.75 [λ = 0.35]. The approximate region achieved in our experiment is shown in yellow. Improvements in both η and M are necessary to unconditionally surpass the shot-noise limit. As a reference point, a distinguishability of M ∼ 0.85 was achieved in Ref. [38] using the same type of high-gain PDC sources as used in our experiment. With such a distinguishability, the efficiency would need to improved to ∼ 80% [∼ 70%] when using λ = 0.75 [λ = 0.35].