## Introduction

Encoding d levels of quantum information on single photons, known as photonic qudits1, offers crucial advantages for quantum communication and networking applications2, such as higher information capacities3, increased noise tolerance4,5, and stronger violations of Bell’s inequalities6. Generation and manipulation of photonic qudits have been explored in many degrees of freedom, including path7,8, orbital angular momentum9,10, frequency bins11,12,13,14, and time bins15,16. Integrated photonics plays a pivotal role in scaling the complexity of quantum states17,18 and quantum operations19, and the frequency degree of freedom is particularly attractive as on-chip biphoton frequency combs (BFCs) can produce a large number of spectrally entangled bins in a compact fashion.

Joint spectral intensity (JSI) measurements are commonly used to characterize BFCs, but such measurements are insensitive to phase coherence (and hence entanglement) across frequency-bin pairs. Thus, reconstruction of BFC density matrices has been realized through active mixing of frequency bins11,12,20, such that measurements in multiple bases can be realized. In one method, by properly setting the amplitude and phase on a pulse shaper and the modulation voltage on a subsequent electro-optic phase modulator (EOM), one can filter out overlapping sidebands to perform projective frequency-bin measurements11,12. Alternatively, a quantum frequency processor21 can be used to synthesize full quantum gates for tomography 20. Nevertheless, both methods face roadblocks en route to higher dimensions: aggressive amplitude filtering of the input state is inevitable in the first approach, while the number of elements required for arbitrary frequency qudit operations22 limits the maximum dimensionality possible with current technology. Accordingly, these existing methods are ill-suited to single-frequency electro-optic modulation: for them, the infinite Fourier series of Bessel functions produced by sinewave electro-optic modulation—a far cry from standard quantum bases—present a challenge to be overcome.

In this work, we instead leverage the complex mixing behavior of an EOM to our advantage as built-in randomized measurements for BFC characterization. By applying Bayesian tomographic techniques23,24, we obtain complete state estimates for any dataset, including uncertainties commensurate with the data gathered, thus bolstering all results obtained from our measurement technique with a principled foundation. Importantly, these Bayesian features extend beyond the specific nuances of frequency-bin encoding to any quantum system, offering the promise of meaningful inference irrespective of whatever experimental constraints may have limited the measurements performed.

## Results

### Experimental scheme and Bayesian analysis

Figure 1a illustrates the experimental setup and concept behind our proposed scheme. The states of interest are BFCs with mode spacing Δω/2π ~40 GHz and dimension d in both signal and idler photons. The first test source is prepared by pumping a periodically poled lithium niobate (PPLN) waveguide with a continuous-wave laser operating at 780 nm, followed by filtering the broadband spontaneous parametric down-conversion spectrum with a Fabry-Perot etalon20,21. The second source exploits spontaneous four-wave mixing in an on-chip Si3N4 microring resonator (MRR)25, where we pump the ring with a tunable continuous-wave laser operating in the optical C-band at one of the ring resonances11,12. Ideal maximally entangled states are of the form $$\left|{{{\Psi }}}_{d}\right\rangle=\frac{1}{\sqrt{d}}\mathop{\sum }\nolimits_{m=1}^{d}{e}^{{{{{{{{\rm{i}}}}}}}}{\alpha }_{m}}\left|m,m\right\rangle$$, where $$\left|m,m\right\rangle$$ represents the photon pair which is centered at frequency ω0 ± (m + Bω for the signal (idler); αm is the phase of each pair. The integer B here denotes the number of signal (idler) bins at the center of the biphoton spectrum that are blocked by bandstop filters (omitted in Fig. 1a). Details regarding BFC state preparation can be found in the “Methods”.

The generated state is then directed to a pulse shaper and an EOM for the implementation of Rtot randomly chosen operations. For each operation, we apply a set of d random spectral phases θm (ϕm) onto the signal (idler) bins with the pulse shaper. The spectral phases (θm, ϕm) are uniformly sampled between 0 and 2π. The EOM is driven by a sinusoidal voltage with amplitude δ and frequency equal to the mode spacing Δω, imposing a temporal phase $$\exp [-{{{{{{{\rm{i}}}}}}}}\delta \sin {{\Delta }}\omega t]$$ onto each photon—equivalently introducing coupling between distinct frequency bins with weights given by Bessel functions of the first kind Jn(δ). The strength of the imposed phase modulation δ is selected from a set of Rtot values equispaced between $$\delta \in [0,{\delta }_{\max }]$$, with $${\delta }_{\max }$$ set by the maximum radio-frequency (RF) power attainable at Δω. Among these Rtot operations, we designate δ = 0 for the first measurement (i.e., the EOM turned off, making it a conventional JSI), while the modulation indices for the remaining Rtot − 1 operations are chosen in a random order from the equispaced set, without replacement.

The photons are then passed to a wavelength-selective switch and we scan the filters to collect coincidences over the original d × d frequency mode grid for each implemented operation, omitting any photons scattered outside of this computational space. Figure 1d shows examples of such measurements, simulated using multinomial statistics, corresponding to two different random operations for the cases of a classically correlated separable state ($${\rho }_{{{{{{{{\rm{sep}}}}}}}}}=\frac{1}{d}\mathop{\sum }\nolimits_{m=1}^{d}\left|m,m\right\rangle \left\langle m,m\right|$$) and a maximally entangled quantum state ($${\rho }_{{{{{{{{\rm{ent}}}}}}}}}=\frac{1}{d}\mathop{\sum }\nolimits_{m,n=1}^{d}\left|m,m\right\rangle \left\langle n,n\right|$$) for d = 4. In the absence of modulation (δ = 0), their JSIs are identical, yet when the EOM is turned on, the frequency correlations vary strongly. For example, in the extreme case of complete incoherence between energy-matched bins (ρsep here), applying random phases on the initial pulse shaper has no impact on the measured output. Such differences imply that a collection of these measurements can be exploited to infer the full density matrix.

Given our knowledge of the quantum operations applied and a set of d × d coincidence results, we then employ Bayesian tomographic techniques23,24 to estimate the input quantum state. Conceptually simple—though numerically challenging—Bayesian quantum state tomography (QST) assigns a posterior probability distribution to all unknown parameters x, given a set of observations $${{{{{{{\boldsymbol{{{{{{{{\mathcal{D}}}}}}}}}}}}}}}}$$, according to Bayes’ theorem, $$\Pr ({{{{{{{\bf{x}}}}}}}}|{{{{{{{\boldsymbol{{{{{{{{\mathcal{D}}}}}}}}}}}}}}}})=\Pr ({{{{{{{\boldsymbol{{{{{{{{\mathcal{D}}}}}}}}}}}}}}}}|{{{{{{{\bf{x}}}}}}}})\Pr ({{{{{{{\bf{x}}}}}}}})/\Pr ({{{{{{{\boldsymbol{{{{{{{{\mathcal{D}}}}}}}}}}}}}}}})$$, which incorporates both a physical model through $$\Pr ({{{{{{{\boldsymbol{{{{{{{{\mathcal{D}}}}}}}}}}}}}}}}|{{{{{{{\bf{x}}}}}}}})$$ and any prior information in Pr(x). Significantly, the estimator formed by averaging any quantity of interest over the posterior $$\Pr ({{{{{{{\bf{x}}}}}}}}|{{{{{{{\boldsymbol{{{{{{{{\mathcal{D}}}}}}}}}}}}}}}})$$ is guaranteed to offer the lowest squared error on average26, making the Bayesian mean provably optimal for any number of measurements. Using the model and algorithm described in the “Methods”, we obtain estimates of the density matrix ρ, fidelity $${{{{{{{{\mathcal{F}}}}}}}}}_{d}$$, and logarithmic negativity Ed; Ed > 0 is a sufficient condition for nonseparability, and Ed upper bounds distillable entanglement27,28.

### BFCs from a χ(2) source

For PPLN experiments, we consider BFCs of qudit dimension d {3, 4, 5}. We implement a total of Rtot = 21 randomly chosen operations for each dimension, with a maximum modulation index $${\delta }_{\max }=2.5$$ rad. We compute the fidelity of the retrieved density matrices at various stages of Bayesian estimation with respect to the ideal state $$\left|{{{\Psi }}}_{d}\right\rangle$$ with αm = β2LΔω2(m+B)2, corresponding to dispersion accumulated over L = 20 m of single-mode fiber (approximate length of fiber between the PPLN source and EOM). Figure 2a shows the evolution of the Bayesian-estimated fidelity after RRtot random operations are performed. (For these and all results in this paper, no subtraction of accidentals is performed.) In Fig. 2b, we plot the mean density matrices retrieved from Bayesian analysis for d = 5, at specific numbers of random operations. For the case of R = 1, the density matrix resembles a separable state with small off-diagonal elements. As R increases, the off-diagonal elements rise as the phase coherence in the BFC is revealed by operations involving frequency mixing; the fidelity with respect to the ideal state increases accordingly, converging at R ≈ 10 operations for all three dimensions (see “Methods” for discussion of possible explanations of this convergence behavior).

In Fig. 3, we plot both the ideal and the final estimated density matrices (R = Rtot = 21). Both absolute values and phases align well with theory—the only discrepancy being the inconsequential phase values of the near-zero off-diagonal elements. From the Bayesian results, we report fidelities $${{{{{{{{\mathcal{F}}}}}}}}}_{d}$$ of $${{{{{{{{\mathcal{F}}}}}}}}}_{3}=(95.8\pm 0.4)\%$$, $${{{{{{{{\mathcal{F}}}}}}}}}_{4}=(94.0\pm 0.4)\%$$, and $${{{{{{{{\mathcal{F}}}}}}}}}_{5}=(91.7\pm 0.4)\%$$, which are in the neighborhood of the theoretically predicted $${{{{{{{{\mathcal{F}}}}}}}}}_{3}=97.1\%$$, $${{{{{{{{\mathcal{F}}}}}}}}}_{4}=96.0\%$$, and $${{{{{{{{\mathcal{F}}}}}}}}}_{5}=94.9\%$$ for a white noise model $${\rho }_{\lambda }=\lambda \left|{{{\Psi }}}_{d}\right\rangle \left\langle {{{\Psi }}}_{d}\right \vert+\frac{1-\lambda }{{d}^{2}}{I}_{{d}^{2}}$$, with λ chosen such that the coincidences-to-accidentals ratio (CAR) matches the experimentally measured value of 90 (see “Methods”). The Bayesian-estimated log-negativities are E3 = 1.523 ± 0.006 ebits, E4 = 1.911 ± 0.006 ebits, and E5 = 2.198 ± 0.007 ebits, comparable with those of $$\left|{{{\Psi }}}_{d}\right\rangle$$ ($${\log }_{2}\,d$$) given by 1.58, 2, and 2.32 ebits, respectively.

### BFCs from a χ(3) source

We then adopt the same methodology for BFCs generated using an on-chip Si3N4 MRR (Fig. 1b)25. The measured JSI is shown in Fig. 1c, with coincidences recorded over 49 signal-idler bin pairs, eight of which (bins 23–30) are selected for testing. The estimated on-chip pair generation rate varies between 1.3 × 106 and 2.2 × 106 s−1 per frequency-bin pair, and the CAR—here defined as the ratio of a given diagonal element to the average of all off-diagonal elements—lies in the interval [17, 30] for the eight bins we test. We perform tomography for qudit dimension d {2, 3, …, 8} with Rtot = 30 operations for each d and $${\delta }_{\max }=3.4$$ rad (increased from 2.5 rad in the PPLN case due to reoptimization of the RF setup for lower loss). We also apply additional spectral phases that compensate for the residual biphoton phase (see “Methods”), and thus we compute the fidelities of retrieved density matrices with respect to the ideal state $$\left|{{{\Psi }}}_{d}\right\rangle$$ with αm = 0.

In Fig. 4a, we plot the final estimated density matrices. The elements indexed by $$\left|m,m\right\rangle \left\langle n,n\right|$$ have strong nonzero amplitudes, agreeing well with theory for ideal entangled states. The background corresponding to energy-mismatched bins (gray baseline on the diagonal) is consistent with a white noise model and real-valued owing to hermiticity. Figure 4b plots the fidelities with respect to $$\left|{{{\Psi }}}_{d}\right\rangle$$ and the corresponding log-negativities Ed, lying comfortably within the range predicted for our noise model using experimentally observed CARs (shaded region). Significantly, our d = 8 result of E8 = 2.50 ± 0.08 ebits can only be achieved by two-qudit states with d ≥ 6, indicating the genuine high-dimensional nature of the observed entanglement. These results showcase the highest dimension of a fully reconstructed density matrix—Hilbert space dimension of 64—in experimental frequency-bin encoding.

## Discussion

Technologically speaking, our approach aligns closely with recent ideas presented in EOM-based frequency-bin quantum random walks29, where here we precede the walk with random spectral phases and consider varying circuit depths. Yet beyond the confines of frequency-bin encoding, our statistical treatment hints at the much wider value of Bayesian models in quantum information. Neither the choice of measurement settings nor number of datapoints has any bearing on the legitimacy of Bayesian tomography23,24; hence, Bayesian estimation will return a reasonable result for any dataset, with automatic uncertainty quantification indicating the confidence warranted from the data. This feature imparts Bayesian inference with unique flexibility compared to other advanced random measurement approaches, in that it does not assume, e.g., low-rank states30 or rely on unitary operations drawn from specific distributions31,32. Therefore, any quantum system for which an appropriate physical model can be constructed is ripe to potentially benefit from Bayesian models like the one presented here. Although not required conceptually, well-chosen measurements are practically valuable for obtaining final estimates accompanied by low uncertainties. Our experimental results show through example that the datasets from random modulation are more than sufficient to converge from an initially uniform (Bures) prior to density matrices with small error bars and in good agreement with the expected ground truth. Indeed, arguments from a simple theoretical model suggest that random EOM measurements with $${\delta }_{\max } \sim {{{{{{{\mathcal{O}}}}}}}}(d)$$ efficiently cover the entire Hilbert space of a d-dimensional quantum system (see “Methods” for details), so that our measurement approach offers a straightforward path for high-dimensional frequency-bin characterization and should widen opportunities for BFCs in quantum information processing.

## Methods

### PPLN source setup

Our test source is a 2.1-cm-long fiber-pigtailed PPLN ridge waveguide (AdvR), possessing an internal efficiency of 150 %/W for second-harmonic generation and fiber-coupling efficiency of ~75% per facet. We couple a continuous-wave laser (Toptica) operated at ~5 mW and 780 nm into the PPLN waveguide, temperature-controlled to 51.6 °C for SPDC under type-0 phase matching. Broadband spectrally entangled photon pairs spanning >5 THz are generated and subsequently filtered by a tunable fiber-pigtailed Fabry-Perot etalon (Luna Innovations) with 20 GHz mode spacing and a full-width at half-maximum linewidth of 1 GHz. We carefully tune the etalon’s temperature to align its transmission peaks with the generated entangled photons, i.e., to maximize the coincidences between the symmetric, spectrally filtered mode pairs.

To minimize crosstalk in our 20 GHz-resolution demultiplexer (Finisar Waveshaper 4000S/X), we utilize the first pulse shaper (Finisar Waveshaper 1000S) to perform amplitude filtering (on top of the phase masks programmed for the QST) and block every other frequency bin, resulting in a 40 GHz-spaced BFC with a measured CAR ≈ 90. A 300 GHz bandstop filter (corresponding to the case of B = 3) is programmed on the same pulse shaper in the middle of the BFC spectrum, which allows us to apply strong modulation on the EOM without the possibility of the signal photon jumping over into the idler’s modes, and vice versa. For coincidence measurements, we use a window of 128 ps and integration times of 20 s for all dimensions.

### MRR source set up

The Si3N4 MRR used in our experiment is fabricated using the optimized photonic Damascene reflow process33 with a cross-section of 2 μm × 0.95 μm. Such a process has enabled ultralow loss waveguides that have paved the way for dissipative Kerr solitons with free spectral ranges (FSRs) as low as 10 GHz25. In the current work, the radius of the ring is 561 μm, corresponding to an FSR of 40.5 GHz—within the range of commercial EOMs—allowing us to drive the EOM at a frequency equal to the FSR for random operations. The gap between the ring and the bus waveguide is 0.3 μm, resulting in strong overcoupling with an intrinsic Q-factor of 107 and a loaded Q-factor of 106. We pump the ring using a tunable continuous-wave laser (New Focus) operating in the optical C-band, with an on-chip power of 10 mW—well below the classical comb threshold of 80 mW. The pump is amplified using an erbium-doped fiber amplifier (EDFA), which is subsequently filtered using a set of two 100 GHz-wide dense wavelength division multiplexing (DWDM) filters to suppress amplified spontaneous emission from the EDFA. Lensed fibers are used to couple the pump into the MRR, which is positioned on a temperature-controlled stage maintained at 23 °C. The fiber-to-fiber coupling loss of the ring is around 4 dB. Such a low loss was realized using engineered inverse waveguide tapers34. When the pump is tuned into the ring resonance (1550.5 nm) and operated below threshold for classical comb generation, it gives rise to BFC states. Since the pump and newly generated biphotons are in the same wavelength band, it is essential to suppress the residual pump after the ring to reduce accidentals in coincidence measurements. We use a set of three 200 GHz-wide DWDM filters, which when combined with bandstop filters in the pulse shaper and demultiplexer gives a net pump suppression of ~100 dB. We also tap a portion of the pump power to track its wavelength using a wavelength meter, setting up a computer-based feedback loop to ensure that the pump is operated at the intended resonance. For coincidence measurements, we use a window of 2048 ps (roughly equivalent to the inverse resonance linewidth) and integration time of 5 s for d {2, 3, …, 6}. For d {7, 8}, we reduce the integration time to 3 s in the interest of minimizing the total experimental duration.

It is necessary to experimentally characterize the FSR of the MRR to determine the precise modulation frequency needed. In addition to this, we also characterize the residual spectral phase accumulated by the biphotons, likely due to fiber dispersion, to precompensate for it. For characterization of these quantities, the experimental setup remains the same as shown in Fig. 1a. We select two adjacent signal-idler bin pairs in the first pulse shaper and drive the EOM at a frequency ωRF with amplitude δ = 1.43 such that $$\left|{J}_{0}(\delta )\right \vert=\left|{J}_{1}(\delta )\right|$$ for equal mixing. We then pass one energy-matched pair of signal-idler frequency bins through the demultiplexer, now consisting of equal contributions from the adjacent bin due to phase modulation, and measure the biphoton time-correlation function. The theoretical expression for the coincidence rate, assuming both bins have identical Lorentzian lineshape and equal probability amplitude, is given by35

$$R(\tau )\propto {e}^{-\gamma \left|\tau \right|}\left\{1-\cos \left[\phi+{\phi }_{0}-\left({\omega }_{FSR}-{\omega }_{RF}\right)\tau \right]\right\}$$
(1)

where γ represents the Lorentzian linewidth, τ the signal-idler delay, ϕ the joint spectral phase between the bins applied by the pulse shaper, ϕ0 the residual biphoton phase, and ωFSR the FSR. In Fig. 5a, we plot the coincidences at τ = 0 as a function of ϕ for a set of two adjoining signal-idler bin pairs, where τ = 0 is defined as the peak of the histogram in the unmodulated case. The offset of the experimentally obtained cosine function from the origin can be used to deduce the residual spectral phase between the bins. This process is repeated for all sets of adjoining signal-idler bin pairs used in our experiment to deduce the residual spectral phase of the entire biphoton state. For finding ωFSR, we again work with a set of two adjoining signal-idler bin pairs and operate at a spectral phase that minimizes the coincidences at τ = 0. In Fig. 5b, we plot the coincidences integrated over τ as we sweep the applied RF frequency ωRF. When ωRF = ωFSR, the coincidences are minimized, so that Fig. 5b implies ωFSR/2π = 40.5 GHz. The solid lines in Fig. 5 are theoretical estimates, scaled and vertically offset to match the datapoints via least squares. The vertical offset and the scaling account for both the nonzero accidentals and unequal bin probability amplitudes. The width of the trace in Fig. 5b can be expressed as 2γ from Eq. (1), from which γ/2π ≈ 200 MHz is inferred. These obtained resonator linewidth and FSR values are consistent with linear spectroscopy measurements.

### Bayesian inference model

The ultimate goal of the inference process is to estimate the full BFC state, which can be represented as a d2 × d2 density matrix ρ. The frequency bins of the signal (idler) qudit possess annihilation operators $${\hat{a}}_{k}^{(S)}$$ ($${\hat{a}}_{l}^{(I)}$$) where k, l {1, …, d} correspond to center frequencies of ωk = ω0 + (k + Bω and ωl = ω0 − (l + Bω. The density matrix of interest can be formally written as

$$\rho=\mathop{\sum }\limits_{k,l,{k}^{\prime},{l}^{\prime}=1}^{d}{\rho }_{(kl)({k}^{\prime}{l}^{\prime})}{\left[{\hat{a}}_{k}^{(S)}\right]}^{{{{\dagger}}} }{\left[{\hat{a}}_{l}^{(I)}\right]}^{{{{\dagger}}} }\left|{{{{{{{\rm{vac}}}}}}}}\right\rangle \left\langle {{{{{{{\rm{vac}}}}}}}}\right|{\hat{a}}_{{k}^{\prime}}^{(S)}{\hat{a}}_{{l}^{\prime}}^{(I)},$$
(2)

where $$\left|{{{{{{{\rm{vac}}}}}}}}\right\rangle$$ is the vacuum state.

We use the index s to denote a specific measurement, which consists of a particular EOM and pulse shaper setting r(s)  {1, …, Rtot}, the signal frequency bin measured m(s)  {1, …, d}, and the idler bin measured n(s)  {1, …, d}. For notational convenience, the explicit s-dependence is suppressed in many of the formulas below, but remains implied for (r, m, n). For a given r, the frequency bins of the signal photon undergo unitary transformations into the output modes via

$${\hat{b}}_{m}^{(S)}=\mathop{\sum }\limits_{k=1}^{d}{V}_{mk}^{(r)}{\hat{a}}_{k}^{(S)},$$
(3)

and similarly for idler photon we have

$${\hat{b}}_{n}^{(I)}=\mathop{\sum }\limits_{l=1}^{d}{W}_{nl}^{(r)}{\hat{a}}_{l}^{(I)}.$$
(4)

The unitary operations V(r) and W(r) consist of line-by-line phase shifts on each of the input modes, followed by sinewave electro-optic modulation. The index r defines the phase shifts and modulation index for a specific setting: $${\theta }_{n}^{(r)}$$, $${\phi }_{n}^{(r)}$$, δ(r). We can therefore write

$${V}_{mk}^{(r)}={J}_{m-k}({\delta }^{(r)}){e}^{{{{{{{{\rm{i}}}}}}}}{\theta }_{k}^{(r)}},$$
(5)

and

$${W}_{nl}^{(r)}={J}_{l-n}({\delta }^{(r)}){e}^{{{{{{{{\rm{i}}}}}}}}{\phi }_{l}^{(r)}},$$
(6)

where Jn(  ) is the Bessel function of the first kind, and for definiteness we have assumed a modulation function of the form $$\exp [-{{{{{{{\rm{i}}}}}}}}{\delta }^{(r)}\sin {{\Delta }}\omega t]$$. The only major difference between the signal and idler equations is the input/output index reversal in the Bessel function order, which results from our definition of signal frequencies that increase with index and idler frequencies that decrease with index.

After these operations, we look for coincidences between bins m and n, the probability of which can be computed for the input density matrix as

$${p}_{s}= \left\langle m,\; n\right|\rho \left|m,\; n\right\rangle=\left\langle {{{{{{{\rm{vac}}}}}}}}\right|{\hat{b}}_{m}^{(S)}{\hat{b}}_{n}^{(I)}\rho {\left[{\hat{b}}_{m}^{(S)}\right]}^{{{{\dagger}}} }{\left[{\hat{b}}_{n}^{(I)}\right]}^{{{{\dagger}}} }\left|{{{{{{{\rm{vac}}}}}}}}\right\rangle \\= \mathop{\sum }\limits_{k,l,{{k}^{\prime}},{{l}^{\prime}}=1}^{d}{\rho }_{(kl)({k}^{\prime}{l}^{\prime})}{V}_{mk}^{(r)}{W}_{nl}^{(r)}{\left[{V}_{m{k}^{\prime}}^{(r)}\right]}{^*}{\left[{W}_{n{{l}^{\prime}}}^{(r)}\right]}{^*}\\= \mathop{\sum }\limits_{k,l,{{k}^{\prime}},{{l}^{\prime}}=1}^{d}{\rho }_{(kl)({{k}^{\prime}}{{l}^{\prime}})}\,{J}_{m-k}({\delta }^{(r)})\,{J}_{l-n}({\delta }^{(r)})\,{J}_{m-{{k}^{\prime}}}({\delta }^{(r)})\,{J}_{{{l}^{\prime}}-n}({\delta }^{(r)}){e}^{{{{{{{{\rm{i}}}}}}}}\left[{\theta }_{k}^{(r)}+{{\phi }_{l}^{(r)}}-{\theta }_{{k^{\prime}}}^{(r)}-{\phi }_{{l^{\prime}}}^{(r)}\right]},$$
(7)

using Eqs. (2)–(6) to simplify. This expression provides a linear mapping from the density matrix elements $${\rho }_{(kl)({k}^{\prime}{l}^{\prime})}$$ to each output probability ps.

We then need to relate these probabilities to the observed coincidence counts Ns through an appropriate likelihood function. In typical quantum tomographic contexts where the full Hilbert space is detected at each setting, a multinomial model is the most conceptually straightforward23; in our case, this would consist of products of the factors $${p}_{s}^{{N}_{s}}$$. However, such a model does not readily apply to situations in which some of the outcomes are unmonitored. In our particular experiment, many bins outside of the original d2-dimensional computational space can be populated, owing to the nonzero Bessel function weights in Eq. (7). As a rule of thumb, the success probability (i.e., the possibility of a single photon staying in the original d-mode computational space) is roughly 1/2 when the modulation depth $$\delta \sim {{{{{{{\mathcal{O}}}}}}}}(d)$$36. Rather than attempting to measure all output mode combinations—which in principle involves an infinite-dimensional Hilbert space and in practice means the addition of many measurements with few counts—we focus only on the central d × d space here.

On the model side, we can account for unobserved outcomes by introducing an additional flux parameter K, defined as the average number of total coincidences that would be measured if all (m, n) combinations were tested. Since the EOM and pulse shaper operations are unitary when considered over all modes—apart from an overall insertion loss that does not vary with setting r—this scale factor is fixed for all measurement settings. It also automatically accounts for efficiency; explicitly, it can be written as K = ηSηIΦΔT, where ηS (ηI) is the total system efficiency from generation through detection for the signal (idler), Φ the photon pair generation flux, and ΔT the integration time. Although these quantities themselves could be inferred by considering both singles counts as well as coincidences20,37,38, they are not of interest in the present investigation, and so through K we are able to reduce to their combined effect only.

Thus, the mean number of coincidences for a specific setting s becomes Kps, which we can model with a Poissonian distribution, the product of which produces the full likelihood $${L}_{{{{{{{{\boldsymbol{{{{{{{{\mathcal{D}}}}}}}}}}}}}}}}}(\rho,\;{K})\propto \Pr ({{{{{{{\boldsymbol{{{{{{{{\mathcal{D}}}}}}}}}}}}}}}}|\rho,\;{K})$$

$${L}_{{{{{{{{\boldsymbol{{{{{{{{\mathcal{D}}}}}}}}}}}}}}}}}(\rho,\;{K})=\mathop{\prod }\limits_{s=1}^{R{d}^{2}}{e}^{-K{p}_{s}}{(K{p}_{s})}^{{N}_{s}},$$
(8)

where $${{{{{{{\boldsymbol{{{{{{{{\mathcal{D}}}}}}}}}}}}}}}}=\{{N}_{1},\ldots,{N}_{R{d}^{2}}\}$$ denotes the set of measured coincidences for all Rd2 settings. This likelihood matches the form we adopted previously in the construction of JSIs (not the full quantum state) from random measurements39. And not only does it readily handle probabilities that do not encompass the full Hilbert space, but it also more accurately reflects the physical situation of high-dimensional measurements with single-outcome detectors. For example, our use of raster scanning with two single-photon detectors means that, for any given configuration, we measure events only for a specific (m, n) frequency-bin pair. Although one can pool the results for all pairs (m, n) and view them synthetically as resulting from a single ΔT integration time of true d2-outcome measurements—the situation assumed by a multinomial distribution—this ultimately does not align with the actual measurement procedure. Accordingly, the Poissonian likelihood provides both a practical and conceptually satisfying model for our tomographic scenario.

With the likelihood defining the relationship between a given density matrix and the observed data, Bayesian inference next requires specification of a suitable prior distribution for ρ and K. In the case of QST, uniform priors are generally preferred, as these apply appreciable weights to all possible states and thereby minimally bias the final results. Although a variety of reasonable uninformative priors can be posited for density matrices, we select the Bures distribution, which enjoys a unique position as the single monotone metric which reduces to both the Fisher and Fubini–Study metrics in the classical and pure state limits, respectively40, and in this sense can claim preference as a “definitive” uniform prior for Bayesian inference41.

To work with the Bures ensemble, it is convenient to express any d2 × d2 density matrix ρ in the computational basis as

$$\rho=\frac{({I}_{{d}^{2}}+U)G{G}^{{{{\dagger}}} }({I}_{{d}^{2}}+{U}^{{{{\dagger}}} })}{{{{{{{{\rm{Tr}}}}}}}}\left[({I}_{{d}^{2}}+U)G{G}^{{{{\dagger}}} }({I}_{{d}^{2}}+{U}^{{{{\dagger}}} })\right]},$$
(9)

where the d2 × d2 matrices $${I}_{{d}^{2}}$$, U, and G are the identity, a unitary matrix, and a general complex matrix, respectively. This expression automatically satisfies all physicality conditions (unit-trace, hermiticity, and positive semidefiniteness); by sampling U from the Haar distribution and G from the Ginibre ensemble, the ρ thus formed represents a single draw from the Bures distribution41. We can in turn parameterize ρ by the complex vector $${{{{{{{\bf{y}}}}}}}}=({y}_{1},\ldots,{y}_{2{d}^{4}})$$, with each yk observing a complex standard normal distribution $${y}_{k} \mathop{\sim}\limits ^{{{{{{{\rm{i.i.d.}}}}}}}}{{{{{{{\mathcal{CN}}}}}}}}(0,1)$$. d4 of the components comprise the d2 × d2 elements of the Ginibre matrix G directly, while the remaining d4 parameters form a second Ginibre matrix which is converted to the unitary U through the Mezzadri algorithm42, thereby ensuring Haar randomness.

In addition to the parameters forming ρ, the scale factor K must also be suitably parameterized. Following ref. 39, we find it convenient to write K = K0(1 + σz), where K0 and σ are hyperparameters defined separate of the inference process, and z is taken to follow a standard normal distribution $${{{{{{{\mathcal{N}}}}}}}}(0,1)$$, leading to a normal prior on K of mean K0 and standard deviation K0σ. We take σ = 0.1 and K0 equal to the sum of the counts in all d2 bins for the first JSI measurement (r = 1), where the absence of modulation ensures that all initial photon flux remains in measured bins, i.e., $${K}_{0}=\mathop{\sum }\nolimits_{s=1}^{{d}^{2}}{N}_{s}$$. This provides an effectively uniform prior, since a fractional deviation of 0.1 is much larger than the maximum amount of fractional uncertainty $$1/\sqrt{{K}_{0}}\;\approx\; 0.02$$ expected from statistical noise at our total count numbers; the use of a normal distribution simplifies the sampling process.

The total parameter set can therefore be expressed as the vector $${{{{{{{\bf{x}}}}}}}}=({{{{{{{\bf{y}}}}}}}},\; z)=({y}_{1},\ldots,{y}_{2{d}^{4}},\; z)$$, with the prior distribution

$${\pi }_{0}({{{{{{{\bf{x}}}}}}}})\propto \left(\mathop{\prod }\limits_{k=1}^{2{d}^{4}}{e}^{-\frac{1}{2}|{y}_{k}{|}^{2}}\right){e}^{-\frac{1}{2}{z}^{2}}.$$
(10)

We note that this parameterization entails a total of 4d4 + 1 independent real numbers (2d4 complex parameters for ρ, one real parameter for K)—noticeably higher than the minimum of d4 − 1 required to uniquely describe a density matrix. Nevertheless, this ρ(y) parameterization is to our knowledge the only existing constructive method to produce Bures-distributed states, and is straightforward to implement given its reliance on independent normal parameters only.

Following Bayes’ rule, the posterior distribution becomes

$$\pi ({{{{{{{\bf{x}}}}}}}})=\frac{1}{{{{{{{{\mathcal{Z}}}}}}}}}{L}_{{{{{{{{\boldsymbol{{{{{{{{\mathcal{D}}}}}}}}}}}}}}}}}({{{{{{{\bf{x}}}}}}}}){\pi }_{0}({{{{{{{\bf{x}}}}}}}})\propto \left(\mathop{\prod }\limits_{s=1}^{R{d}^{2}}{e}^{-K(z){p}_{s}({{{{{{{\bf{y}}}}}}}})}{\left[K(z){p}_{s}({{{{{{{\bf{y}}}}}}}})\right]}^{{N}_{s}}\right)\left(\mathop{\prod }\limits_{k=1}^{2{d}^{4}}{e}^{-\frac{1}{2}{\left|{y}_{k}\right|}^{2}}\right){e}^{-\frac{1}{2}{z}^{2}},$$
(11)

where $${{{{{{{\mathcal{Z}}}}}}}}$$ is a constant such that ∫dxπ(x) = 1. We have adopted this notation for Bayes’ theorem—rather than the more traditional $$\Pr ({{{{{{{\bf{x}}}}}}}}|{{{{{{{\boldsymbol{{{{{{{{\mathcal{D}}}}}}}}}}}}}}}})=\Pr ({{{{{{{\boldsymbol{{{{{{{{\mathcal{D}}}}}}}}}}}}}}}}|{{{{{{{\bf{x}}}}}}}})\Pr ({{{{{{{\bf{x}}}}}}}})/\Pr ({{{{{{{\boldsymbol{{{{{{{{\mathcal{D}}}}}}}}}}}}}}}})$$—to emphasize the functional dependencies on x, which are all that must be accounted for in the sampling algorithm below. From π(x), the Bayesian mean estimator fB of any quantity (scalar, vector, or matrix) expressible as a function of x can be estimated as

$${f}_{B}=\int d{{{{{{{\bf{x}}}}}}}}\,\pi ({{{{{{{\bf{x}}}}}}}})f({{{{{{{\bf{x}}}}}}}})\;\approx \;\frac{1}{S}\mathop{\sum }\limits_{j=1}^{S}f({{{{{{{{\bf{x}}}}}}}}}^{(j)}),$$
(12)

where, in lieu of direct integration, S samples {x(1), …, x(S)} are obtained from the distribution π(x) through Markov chain Monte Carlo (MCMC) techniques, as described below.

### MCMC sampling

Acquiring the samples necessary for computation of high-dimensional integrals of the form in Eq. (12) forms the primary bottleneck in Bayesian inference. The most common family of solutions to address this challenge fall under the general umbrella of MCMC, in which samples from a well-chosen Markov chain are designed to approach the statistics of π(x) asymptotically26,43. Recently, we applied a particularly efficient MCMC algorithm—known as preconditioned Crank–Nicolson (pCN)44—to the problem of Bayesian QST, finding significant computational improvements over previous implementations24. We utilize the same pCN approach here; the only difference from the algorithm in ref. 24 is a simpler acceptance probability depending exclusively on the likelihood ratio, a consequence of the fact that all parameters here have normally distributed priors44.

For each collection of measurement results considered in the main text, we retain S = 210 samples {(y(j), z(j))} from an MCMC chain of total length ST, where T is a thinning factor that is successively doubled until convergence is obtained. From these samples, we then compute the Bayesian mean estimate of the density matrix

$${\rho }_{B}=\frac{1}{S}\mathop{\sum }\limits_{j=1}^{S}\rho ({{{{{{{{\bf{y}}}}}}}}}^{(j)}),$$
(13)

examples of which are plotted in Figs. 24. The mean and standard deviation of fidelity with respect to some ideal state $$\left|{{{\Psi }}}_{d}\right\rangle$$ can also be computed as

$$\overline{{{{{{{{{\mathcal{F}}}}}}}}}_{d}}=\frac{1}{S}\mathop{\sum }\limits_{j=1}^{S}\left\langle {{{\Psi }}}_{d}\right|\rho ({{{{{{{{\bf{y}}}}}}}}}^{(j)})\left|{{{\Psi }}}_{d}\right\rangle$$
(14)

and

$${{\Delta }}{{{{{{{{\mathcal{F}}}}}}}}}_{d}={\left(\frac{1}{S}\mathop{\sum }\limits_{j=1}^{S}{\left[\left\langle {{{\Psi }}}_{d}\right|\rho ({{{{{{{{\bf{y}}}}}}}}}^{(j)})\left|{{{\Psi }}}_{d}\right\rangle \right]}^{2}-{\overline{{{{{{{{{\mathcal{F}}}}}}}}}_{d}}}^{2}\right)}^{\frac{1}{2}},$$
(15)

respectively. For the PPLN states specifically, which are not dispersion-compensated, we take the state for comparison as

$$\left|{{{\Psi }}}_{d}\right\rangle=\frac{1}{\sqrt{d}}\mathop{\sum }\limits_{m=1}^{d}{e}^{{{{{{{{\rm{i}}}}}}}}{\beta }_{2}L{{\Delta }}{\omega }^{2}{(m+B)}^{2}}\left|m,\; m\right\rangle,$$
(16)

with β2 = 2.06 × 10−2 ps2 m−1 for standard single-mode fiber and L = 20 m. The phases of the MRR states are precompensated, so for computing fidelities for them we use the uniform-phase version

$$\left|{{{\Psi }}}_{d}\right\rangle=\frac{1}{\sqrt{d}}\mathop{\sum }\limits_{m=1}^{d}\left|m,\; m\right\rangle .$$
(17)

In order to monitor convergence with T, we consider the sequential fidelity, defined for a given T = 2n as

$${{{{{{{{\mathcal{F}}}}}}}}}_{n,n-1}={\left({{{{{{{\rm{Tr}}}}}}}}\sqrt{{\sqrt{{\rho }_{B}^{(n-1)}}}{\rho }_{B}^{(n)}\sqrt{{\rho }_{B}^{(n-1)}}}\right)}^{2},$$
(18)

where we use the notation $${\rho }_{B}^{(n)}$$ to denote the Bayesian mean estimate [Eq. (12)] for a chain of length ST = S × 2n. For a sufficiently large thinning value $${{{{{{{{\mathcal{F}}}}}}}}}_{n,n-1}$$ will converge to unity and remain for all subsequent n. We note that this metric contains no reference to the ideal state $$\left|{{{\Psi }}}_{d}\right\rangle$$, but checks for consistency between subsequent Bayesian estimates only.

Figure 6 plots these sequential fidelities as a function of thinning ($$n={\log }_{2}T$$) for all BFCs characterized in the main text. In all cases, we consider the full measurement sets, R = 21 for the PPLN results and R = 30 for the MRR findings, as these cases generally possess the slowest MCMC convergence. As expected, the sequential fidelity converges more rapidly with T for lower d; nonetheless, all examples ultimately reach $${{{{{{{{\mathcal{F}}}}}}}}}_{n,n-1} \; > \; 0.99$$ at their respective maximum of n, indicating high MCMC convergence for all reported ρB matrices. Continuing to even larger values of T would certainly be desirable, particularly for d = 7 and d = 8, yet we are currently limited by computational power. For example, the d = 8 MCMC chain with T = 214 required almost a week to complete; with a total of 16,385 parameters to infer (4d4 + 1), it is no surprise that we are pushing the limits of our desktop computer. Indeed, to our knowledge the present results with d2 = 64 represent a record-high Hilbert-space dimension for complete Bayesian QST with a fully general (mixed state) prior, for any quantum system, simultaneously implying the efficiency of our existing MCMC algorithm as well as the importance of pursuing highly parallelizable MCMC methods45 to reach even larger dimensions in the future.

### Theoretical fidelities and ebits

In our quantum system, several sources of noise, such as multipair emission and dark counts, are expected to be uniform across the BFC bins. Accordingly, as a simple model, we theoretically anticipate a ground truth quantum state of the form

$${\rho }_{\lambda }=\lambda \left|{{{\Psi }}}_{d}\right\rangle \left\langle {{{\Psi }}}_{d}\right \vert+\frac{1-\lambda }{{d}^{2}}{I}_{{d}^{2}},$$
(19)

where $$\left|{{{\Psi }}}_{d}\right\rangle$$ is the ideal maximally entangled state [Eq. (16) or (17)], $${I}_{{d}^{2}}$$ is the d2 × d2 identity operator, and λ [0, 1] determines the noise level. In this section, we provide quantities of interest for ρλ in order to compare against those found in Bayesian estimation, including CAR

$${{{{{{{\rm{CAR}}}}}}}}=\frac{\mathop{\max }\limits_{m}\left\langle m,\; m\right|{\rho }_{\lambda }\left|m,\; m\right\rangle }{\mathop{\min }\limits_{m,\,n}\left\langle m,\; n\right|{\rho }_{\lambda }\left|m,\; n\right\rangle }=1+\frac{d\lambda }{(1-\lambda )},$$
(20)

fidelity $${{{{{{{{\mathcal{F}}}}}}}}}_{d}$$

$${{{{{{{{\mathcal{F}}}}}}}}}_{d}=\left\langle {{{\Psi }}}_{d}\right|{\rho }_{\lambda }\left|{{{\Psi }}}_{d}\right\rangle=\frac{({d}^{2}-1)\lambda+1}{{d}^{2}},$$
(21)

and log-negativity Ed (adapted from ref. 28)

$${E}_{d}=\left\{\begin{array}{ll}0\hfill &;{{{{{{{{\mathcal{F}}}}}}}}}_{d} \; < \; 1/d\\ {\log }_{2}\;d{{{{{{{{\mathcal{F}}}}}}}}}_{d} &;{{{{{{{{\mathcal{F}}}}}}}}}_{d} \; > \; 1/d\end{array}\right..$$
(22)

The fidelities and log-negativities reported for ρλ in the main text are calculated using these equations. In general, both fidelity and Ed deviate more strongly from their respective ideals as d increases for a fixed CAR. The good agreement in the main text between the Bayesian-estimated states and this simple white noise model suggests that our understanding of noise processes in the system is well justified.

### Theoretical analysis of measurement efficiency

In this section, we provide a basic theoretical analysis of the effectiveness of our proposed tomographic method. We first consider a highly simplified model that captures the key aspects of how phase modulation and mode mixing enable tomographic reconstruction, concentrating in this case on a single frequency-bin qudit occupying d modes x {1, …, d}. While the experiments in the main text examine the joint state of two qudits instead, this simpler single-qudit case reveals the basic principles of the tomography method with minimal distractions. The only type of measurement considered is projection onto the photon’s output mode, reflecting the condition of frequency-resolved detection; the probability of obtaining bin x is $${p}_{x}^{(0)}\equiv {\rho }_{xx}$$, and the qudit assumption ensures that ρxx = 0 for x < 1 or x > d.

Consider the mode mixing operator Sk which weakly mixes each mode x with modes x + k and x − k:

$${S}_{k}\left|x\right\rangle=-\epsilon \left|x-k\right\rangle+\left|x\right\rangle+\epsilon \left|x+k\right\rangle .$$
(23)

For k = 1, this approximates the action of an EOM with small modulation depth δ = 2ϵ and modulation frequency equal to the fundamental mode spacing. More generally, for small ϵ, Sk is approximately unitary and describes a weak, translation-invariant mode mixing operation (Fig. 7a).

Suppose we apply Sk to the photon and measure its bin. The probability of observing bin x is

$${p}_{x}^{(k)}\equiv \left\langle x\right|{S}_{k}\rho {S}_{k}^{{{{\dagger}}} }\left|x\right\rangle .$$
(24)

Using $${S}_{k}^{{{{\dagger}}} }\left|x\right\rangle=\epsilon \left|x-k\right\rangle+\left|x\right\rangle -\epsilon \left|x+k\right\rangle$$ and the fact that $${\rho }_{yx}={\rho }_{xy}^{*}$$ we obtain

$${p}_{x}^{(k)} =\left(\epsilon \left\langle x-k\right \vert+\left\langle x\right|-\epsilon \left\langle x+k\right|\right)\rho \left(\epsilon \left|x-k\right\rangle+\left|x\right\rangle -\epsilon \left|x+k\right\rangle \right)\\ ={\rho }_{xx}+\epsilon \left({\rho }_{x-k,x}+{\rho }_{x,x-k}-{\rho }_{x+k,x}-{\rho }_{x,x+k}\right)+{{{{{{{\mathcal{O}}}}}}}}({\epsilon }^{2})\\ ={\rho }_{xx}+2\epsilon (\Re {\rho }_{x-k,x}-\Re {\rho }_{x,x+k})+{{{{{{{\mathcal{O}}}}}}}}({\epsilon }^{2}).$$
(25)

Now, from the fact that ρxy2 ≤ ρxxρyy and the fact that the photon was initially restricted to modes {1, …, d} we have that the support of ρx,x+k is x {1, …, d − k} and the support of ρxk,x is x {k + 1, …, d}. Considering just the outcomes in bins {1, …, d} yields the approximate system of equations

$$-\Re {\rho }_{1,1+k}=\frac{1}{2\epsilon }\left({p}_{1}^{(k)}-{p}_{1}^{(0)}\right)\\ \vdots \\ -\Re {\rho }_{k,2k}=\frac{1}{2\epsilon }\left({p}_{k}^{(k)}-{p}_{k}^{(0)}\right)\\ \Re {\rho }_{1,k+1}-\Re {\rho }_{k+1,2k+1}=\frac{1}{2\epsilon }\left({p}_{k+1}^{(k)}-{p}_{k+1}^{(0)}\right)\\ \vdots \\ \Re {\rho }_{d-2k,d-k}-\Re {\rho }_{d-k,d}=\frac{1}{2\epsilon }\left({p}_{d-k}^{(k)}-{p}_{d-k}^{(0)}\right)\\ \Re {\rho }_{d-2k+1,d-k+1}=\frac{1}{2\epsilon }\left({p}_{d-k+1}^{(k)}-{p}_{d-k+1}^{(0)}\right)\\ \vdots \\ \Re {\rho }_{d-k,d}=\frac{1}{2\epsilon }\left({p}_{d}^{(k)}-{p}_{d}^{(0)}\right)$$
(26)

valid for k {1, …, d − 1} and small ϵ. In principle, this (generally overdetermined) system of d equations enables one to determine the real parts of ρ1,k+1, …, ρdk,d, the nonzero part of the kth diagonal of ρ. (Note that the −kth diagonal of ρ is just the conjugate of the kth diagonal.) The elements on diagonals {−k, …, 0, …, k} will be collectively referred to as the k-band.

To determine the imaginary parts of the kth diagonal, each pair of components ρx,x and ρx+k,x must be mixed with a relative phase of π/2. Let Φk be the operation

$${{{\Phi }}}_{k}\left|x\right\rangle={\varpi }_{k}^{x}\left|x\right\rangle$$
(27)

where ϖk = eiπ/2k is the kth principle root of i. Suppose we apply Φk and then Sk to the photon and measure its mode. The probability of observing bin x in this case is

$${p}_{x}^{(k)^{\prime} } \equiv \;\left\langle x\right|{S}_{k}{\Phi }_{k}\rho {\Phi }_{k}^{{{{\dagger}}} }{S}_{k}^{{{{\dagger}}} }\left|x\right\rangle \\ =\;\left\langle x\right|{S}_{k}{\rho }^{\prime}{S}_{k}^{{{{\dagger}}} }\left|x\right\rangle$$
(28)

where $${\rho }^{\prime}={\Phi }_{k}\rho {\Phi }_{k}^{{{{\dagger}}} }$$. Using $${\rho }_{xy}^{\prime}={\varpi }_{k}^{x-y}{\rho }_{xy}$$ we have

$${p}_{x}^{(k)^{\prime} } = \left(\epsilon \left|x-k\right\rangle+\left\langle x\right|-\epsilon \left\langle x+k\right|\right){\rho }^{\prime}\left(\epsilon \left|x-k\right\rangle+\left|x\right\rangle -\epsilon \left|x+k\right\rangle \right)\\ ={\rho }_{xx}^{\prime} +\epsilon \left({\rho }_{x-k,x}^{\prime}+{\rho }_{x,x-k}^{\prime}-{\rho }_{x+k,k}-{\rho }_{x,x+k}\right)+{{{{{{{\mathcal{O}}}}}}}}({\epsilon }^{2})\\ ={\rho }_{xx}+\epsilon \left({\varpi }_{k}^{-k}{\rho }_{x-k,x}+{\varpi }_{k}^{k}{\rho }_{x,x-k}-{\varpi }_{k}^{k}{\rho }_{x+k,k}-{\varpi }_{k}^{-k}{\rho }_{x,x+k}\right)+{{{{{{{\mathcal{O}}}}}}}}({\epsilon }^{2}).$$
(29)

Since $${\varpi }_{k}^{\pm k}={\pm}{{{{{{{\rm{i}}}}}}}}$$ we have

$${p}_{x}^{(k)^{\prime} }= {\rho }_{xx}^{\prime}+\epsilon \left(-{{{{{{{\rm{i}}}}}}}}{\rho }_{x-k,x}+{{{{{{{\rm{i}}}}}}}}{\rho }_{x,x-k}-{{{{{{{\rm{i}}}}}}}}{\rho }_{x+k,k}+{{{{{{{\rm{i}}}}}}}}{\rho }_{x,x+k}\right)+{{{{{{{\mathcal{O}}}}}}}}({\epsilon }^{2})\\= {\rho }_{xx}+2\epsilon (\Im {\rho }_{x-k,x}-\Im {\rho }_{x,x+k})+{{{{{{{\mathcal{O}}}}}}}}({\epsilon }^{2}).$$
(30)

Using the probabilities of bins 1 through d we obtain a system of d equations which, together with p(0), in principle enables determination of the imaginary parts of the kth diagonal.

To summarize: p(0) determines the diagonal elements of ρ, then p(k) and $${p}^{{(k)}^{\prime}}$$ (obtained by applying Sk and SkΦk, respectively) determine the off-diagonal elements of ρ that lie k positions above and below the diagonal. Thus, all d2 elements of the single-qudit density matrix can be probed by 1 + 2(d − 1) = 2d − 1 experimental settings with d outcomes each.

To relate this to the experimental approach, we note that the EOM effects a mode mixing operation Tδ of the form [cf. Eqs. (5) and (6)]

$${T}_{\delta }\left|x\right\rangle=\mathop{\sum }\limits_{k=-\infty }^{\infty }{J}_{k}(\delta )\left|x+k\right\rangle$$
(31)

As shown in Fig. 7b, Jk(δ) is oscillatory in k with approximate support k { − δ, …, δ}. That is, Tδ is approximately a linear combination of {S1, …, Sδ}, mixing each x with multiple modes ranging from roughly x − δ to x + δ and yielding information about the elements in the δ-band of ρ (Fig. 7c). If the elements in the (δ − 1)-band have already been determined, Tδ provides new information primarily concerning the δ-band. In our specific experiments, we applied a set of random phases with the pulse shaper in addition to the modulation, allowing us to probe linear combinations of both the real and imaginary parts of the k-band similar to the Sk and Φk operators in the discussion above. In this way, a collection of measurements with modulation indices uniformly distributed δ [0, k] along with random phases is sufficient to fully probe the k-band. And since the complete d × d density matrix ρ is encompassed within the (d − 1)-band, this model suggests $${\delta }_{\max } \;\sim\; {{{{{{{\mathcal{O}}}}}}}}(d)$$ as an ideal design choice for efficient state tomography with our method.

This intuitive picture reveals how an EOM-based mode mixer can be designed to respond to all elements of the density matrix, yet it does not quantify how well such measurements span the space of hermitian operators; if the mixing weights are small or produce excessive scattering into modes outside of the computational space, the number of observations required to reach a desired accuracy will be high, even if the measurements are tomographically complete. To address this question explicitly, we next explore the specifics of the measurement operations through singular value decomposition. Suppose we have a set of R different measurements M1, …, MR, each of which is described by a set of positive operator-valued measures (POVM): $${{{{{{{{\bf{M}}}}}}}}}_{i}=\{{M}_{i1},\ldots {M}_{i{m}_{i}}\}$$ where mi is the number of possible outcomes of Mi (d in our case) and each Mij is a positive semidefinite operator. To account for photon scattering outside of the measured computational space, the operators need not conserve probability. When Mi is performed on state ρ, the probability of outcome j is

$${p}_{ij}={{{{{{{\rm{Tr}}}}}}}}({M}_{ij}\rho ).$$
(32)

This may be summarized as the linear system

$$p={O}^{{{{\dagger}}} }\overrightarrow{\rho }$$
(33)

where $$\overrightarrow{\rho }$$ is the vectorization of ρ and

$$O=\left[{\overrightarrow{M}}_{11}\;\cdots \,\;{\overrightarrow{M}}_{1{m}_{1}}\;\cdots \,\;{\overrightarrow{M}}_{R1}\;\cdots \,\;{\overrightarrow{M}}_{R{m}_{R}}\right].$$
(34)

is a d2 × m matrix where m = m1 +  + mR. The measurement set $${{{{{{{\mathcal{M}}}}}}}}=\{{{{{{{{{\bf{M}}}}}}}}}_{1},\ldots,{{{{{{{{\bf{M}}}}}}}}}_{R}\}$$ is called informationally complete if ρ is uniquely determined by measured probabilities p. This occurs iff O is full rank (i.e., rank d2), which requires at least d different measurements (R ≥ d) with d outcomes each. Practically speaking, however, more important than the attainment of informational completeness is the actual distribution of singular values $${s}_{1}\ge \cdots \ge {s}_{{d}^{2}}$$ of O. If the singular values are all of order 1, then $${{{{{{{\mathcal{M}}}}}}}}$$ determines all components of ρ with comparable sensitivity. But any singular values of O that are much smaller than 1 correspond to components of ρ to which $${{{{{{{\mathcal{M}}}}}}}}$$ is only weakly sensitive. The estimates of such components will be susceptible to statistical fluctuations in the experimental data.

As examples, we compute singular values for a variety of settings at d = 8—the maximum single-qudit dimension characterized experimentally. We consider R = 2d = 16 settings, which we found sufficient to obtain robust and repeatable distributions. Each measurement configuration involves a modulation index δ chosen uniformly at random in the interval $$[0,\;{\delta }_{\max }]$$, preceded by a random phase vector $$\overrightarrow{\phi }\in {[0,\;2\pi ]}^{d}$$ applied by the pulse shaper. Figure 7d plots histograms of singular values from 2,000 measurement matrices O each for $${\delta }_{\max }\in \{4,\;8,\;16\}$$, corresponding to d/2, d, and 2d at d = 8. A narrow peak around $${\log }_{10}s \;\approx \; 0$$ indicates high and comparable sensitivity for all elements in the density matrix; indeed, a complete set of mutually unbiased bases—an ideal choice for tomography46—possesses d2 equal singular values at s = 1. By this criterion, $${\delta }_{\max }=d=8$$ is seen to provide the most efficient measurement distribution of the three examples. For the smaller index of $${\delta }_{\max }=4$$, the main peak is accompanied by a strong tail indicating small sensitivity to an appreciable percentage of the matrix elements. And for the larger index $${\delta }_{\max }=16$$, the main peak shifts to lower values, which reflects “over-modulating” of the quantum state; taking $${\delta }_{\max }$$ beyond d increases the probability of scattering the input outside of the computational space (see Fig. 7c) without any improvement to mixing within the space. Our numerical findings therefore join the simple theory above in suggesting a maximum modulation index of $${\delta }_{\max }\approx d$$. Nevertheless, we note that all cases in Fig. 7d sample the Hilbert space comprehensively, so that any $${\delta }_{\max } \;\sim\; {{{{{{{\mathcal{O}}}}}}}}(d)$$ is likely to prove sufficient in a tomographic context. Ultimately, we emphasize that the theory developed here is meant to provide heuristic guidelines for implementing EOM-based frequency-bin tomography, not to imply the optimality of our specific measurements. For example, it is quite possible that alternative distributions for δ—e.g., other than the uniform draw $$\delta \in [0,\;{\delta }_{\max }]$$—may show more favorable properties with further research. Nevertheless, our reliance on Bayesian estimation ensures that these questions need not be answered for useful inference.

Extending to the experimental task of two-qudit characterization, our model thus indicates one should consider joint measurements of the form $${T}_{{\delta }_{1}}\otimes {T}_{{\delta }_{2}}$$ with δ1, δ2 [0, d]. Considering our experimental values of $${\delta }_{\max }=2.5$$ for the PPLN tests and 3.4 for the MRR, $${\delta }_{\max }$$ falls in the range of 0.425d and 1.7d for all qudit dimensions examined, aligning well with the $${{{{{{{\mathcal{O}}}}}}}}(d)$$ desideratum. Since the entire density matrix of a d2-dimensional two-qudit state requires specification of d4 − 1 real parameters and each experimental pulse shaper/EOM setting r provides d2 outcomes, $$R \sim {{{{{{{\mathcal{O}}}}}}}}({d}^{2})$$ would be expected to be required to fully probe the two-qudit Hilbert space. Empirically, however, we were able to attain low-error state reconstruction with fewer measurements: e.g., R = 10 instead of 25 for the d = 5 PPLN BFC (Fig. 2a) and R = 30 instead of 64 for the d = 8 MRR BFC (Fig. 4b). Several aspects are likely responsible for this reduction. From a broad perspective, the positive semidefiniteness of the density matrix imposes additional constraints that are not reflected in a linear system analysis but automatically accounted for in the Bayesian inference procedure. Thus, simply comparing the number of measurements with the number of parameters generally yields an overly pessimistic assessment of the information required for tomography.

Moreover, we suspect that specific features of our quantum state also contribute to more efficient reconstruction. The highly correlated nature revealed in the first JSI measurement implies that only off-diagonal elements of the form ρ(xx)(x+k, x+k)—i.e., those satisfying biphoton energy conservation—can be significantly different than zero. This reduces the number of appreciable elements in our density matrix from $${{{{{{{\mathcal{O}}}}}}}}({d}^{4})$$ to $${{{{{{{\mathcal{O}}}}}}}}({d}^{2})$$. Of course, the other off-diagonal density matrix elements are not strictly zero in practice, and the Bayesian model neither requires nor assumes such a simplification. However, the strong frequency correlations do eliminate a large portion of the Hilbert space from consideration. Consequently, future experiments with more general states may require more experimental settings for low-uncertainty estimation than we have currently used.