Introduction

As carriers of quantum information, optical photons feature a host of valuable attributes, such as immunity to environmentally induced decoherence, availability of precise tools for state control, and room temperature operation, enabling quantum information processing (QIP)1 in a variety of encodings such as space/polarization2,3,4 and temporal modes.5,6,7 Frequency-bin encoding—which offers additional advantages in terms of compatibility with state-of-the-art fiber-optic networks—has advanced rapidly in recent years, facilitated by the development of integrated frequency-bin photon sources8,9,10,11 and quantum gates based on both nonlinear-optical12,13 and electro-optical14,15 mixing approaches. However, two-photon entangling gates for frequency bins have yet to be realized on any platform.

Such entangling gates are required for universal QIP, for an arbitrary quantum operation can be constructed with single-qubit rotations plus a two-qubit entangling gate.1 While photonics excels for single-qubit gates, the inherent difficulty in realizing photon–photon interactions has made the two-qubit gate a persistent obstacle in photonic QIP. In the absence of a sufficient nonlinearity, such gates can still be achieved via quantum interference, ancilla photons, and single-photon detection. While two-qubit gates succeed only probabilistically in this paradigm, linear-optical quantum computation (LOQC)2 is in principle scalable with polynomial auxiliary resource requirements and has laid the foundation for many subsequent advances in photonic QIP.3,16,17,18,19,20,21,22,23,24,25 It is this approach which we invoked in proposing spectral LOQC—a universal QIP scheme tailored to frequency-bin qubits which makes use of electro-optic phase modulators (EOMs) and Fourier-transform pulse shapers (PSs).26 Systems implementing designs from spectral LOQC, termed “quantum frequency processors” (QFPs), have been utilized to demonstrate coherent single-photon operations with near-unity fidelity,14,15 but a two-photon gate has heretofore proven elusive.

Theoretically, we previously discovered EOM/PS configurations capable of realizing ancilla-based two-qubit gates in spectral LOQC.26 Yet if one relaxes the gate requirements slightly, by conditioning on the presence of a photon in each pair of qubit modes, it is well-known in standard LOQC that one can engineer a two-qubit gate with no ancillas and success probability \({\cal P} = 1/9\).18,19 Assuming a quantum nondemolition measurement is unavailable, such gates are destructive (succeeding only when both information-carrying photons are detected). Yet they require only two-fold coincidences for characterization, making them excellent choices for experimental studies of basic quantum computing functionalities.

To explore two-qubit coincidence-basis gates with a QFP, we follow the optimization approach in refs. 14,26, numerically finding phase patterns for an EOM/PS sequence which maximize success \({\cal P}\) constrained to fidelity \({\cal F} \ge 0.9999\). Specifically, with Uideal defined as the desired two-qubit unitary and W the actual Hilbert space transformation,

$${\cal P} = \frac{{{\mathrm{Tr}}({W}^{\dagger} W})}{d}$$
(1)
$${\cal F} = \frac{{\left| {{\mathrm{Tr}}(U_{{\mathrm{ideal}}}^{\dagger} W)} \right|^{2}}}{{d^{2}{\cal P}}},$$
(2)

where d = 4 is the dimensionality of the subspace spanned by the coincidence basis.26 In order to facilitate experimental implementation, we restrict our simulations to sinewave-only electro-optic modulation. We find that a 3EOM/2PS QFP can realize a frequency-bin CNOT at the optimal success probability of \({\cal P} = 1/9\), while a smaller 2EOM/1PS circuit can do so with reduced success: \({\cal P} = 0.0445\). (See Methods for the specific EOM/PS modulation patterns.) Due to equipment availability and system complexity, we elect to implement this simpler 2EOM/1PS CNOT in the experiments below.

Results

Figure 1 provides a schematic of the setup. The gate itself comprises the central EOM/PS/EOM sequence (the QFP), and the frequency bins for encoding are defined according to ωn = ω0 + nΔω, where ω0 = 2π × 193.45 THz and Δω = 2π × 25 GHz, corresponding to the standard ITU grid and facilitating low-crosstalk, line-by-line shaping by our 10 GHz resolution pulse shapers. The specific bins for encoding follow in Fig. 2a, where {C0, C1} and {T0, T1} denote logical |0〉 and |1〉 for the control and target, respectively. This particular mode placement makes sense conceptually: mode C0 is spectrally isolated from the target’s logical bins, ensuring a photon in mode C0 leaves the target unchanged; on the other hand, bin C1 is close to both target bins, able to be coupled to T0 and T1 with equal strength.

Fig. 1
figure 1

Experimental setup. PPLN (SRICO Model 2000-1550), Etalon (Optoplex 25 GHz C-Band), BFC Shaper (Finisar WaveShaper 1000A), EOMs (EOSpace 40 Gbps phase modulators), QFP Shaper (Finisar Waveshaper 4000A), WSS wavelength selective switch (Finisar 1 × 9 Flexgrid), SNSPD superconducting nanowire single-photon detector (Quantum Opus model Opus One, >80% detection efficiency), ATT variable radio-frequency (RF) attenuator, AMP RF amplifier

Fig. 2
figure 2

a Mode definitions for frequency-bin control and target qubits. The labels {Ω00, Ω01, Ω10, Ω11} mark the pump frequency values (divided by two) needed to produce each of the computational basis states. b Experimentally obtained complex mode transformation V. c Inferred two-photon transformation W obtained from permanents of 2 × 2 submatrices of V. For b and c, we use phasor notation to represent the complex elements, with filled color signifying the amplitude (normalized by the matrix’s maximum value, and shown on a logarithmic scale), and the arrow depicting the phase. Dotted circles denote phases we could not retrieve due to weak amplitudes. (See Methods for details.)

Since this gate is based on a linear-optical network, we can estimate its performance using coherent-state-based characterization,14,27 i.e., probing it with an electro-optic frequency comb and measuring the output spectrum for different input frequency superpositions. This technique allows us to estimate the mode transformation matrix V, which controls how input mode operators \(\hat a_n\) at each frequency ωn transform to the output operators \(\hat b_n:\hat b_n = \mathop {\sum}\nolimits_{n\prime } V_{nn\prime }\hat a_{n\prime }\). The mode matrix V, averaged over five independent measurements and projected onto the four computational modes, is shown in Fig. 2b. We utilize phasor notation to represent the complex elements \(V_{nn^{\prime}}\); the filled color reflects the amplitude on a logarithmic scale, normalized to the maximum value in the matrix (0.499), and the arrow marks out the phase. (See Methods for values of all matrix elements including uncertainty.) From this matrix V, we can compute the equivalent two-photon state transformation matrix W,26 plotted for the coincidence basis in Fig. 2c and also normalized to its peak magnitude of 0.222. We note that the implemented mode transformation (Fig. 2b) has the same eight high-amplitude elements as in the original path-encoded CNOT.18 The phases differ, however, as there are a continuum of combinations which can yield the desired two-photon transformation W. The particular set was selected numerically as optimal given our experimental constraints.

Because this estimate predicts all four of the large elements of W to be in-phase, the corresponding inferred fidelity is \({\cal F}_{{\mathrm{inf}}} = 0.995 \pm 0.001\); the success probability is \({\cal P}_{{\mathrm{inf}}} = 0.0460 \pm 0.0005\). Both values are with respect to the ideal CNOT and in good agreement with theory. We emphasize that, unlike single-qubit gates which act on photons independently, two-qubit entangling gates rely on quantum interference effects that are inherently absent with high-flux laser fields. Thus this inferred fidelity is only an indirect estimate, based on extrapolating measured one-photon interference results to the two-photon case. Nevertheless, it provides strong initial evidence for the phase coherence and proper operation of our gate.

To test our gate with truly quantum states, however, we prepare a biphoton frequency comb (BFC) by pumping a 35 mm-long periodically poled lithium niobate (PPLN) waveguide with a continuous-wave Ti:sapphire laser (at ~4.5 mW) under type-0 phase matching, followed by filtering with a Fabry–Perot etalon with 25 GHz mode spacing and a full-width at half-maximum linewidth of 1.8 GHz (see Fig. 1). The BFC pulse shaper subsequently selects specific modes as input to the gate. By translating the pump frequency to four different values (as shown in Fig. 2a) and filtering out all but the desired modes using the BFC pulse shaper, we can prepare all inputs from the two-qubit computational basis: \(|C_0T_0\rangle = |1_{\omega _0}1_{\omega _7}\rangle\), \(|C_0T_1\rangle = |1_{\omega _0}1_{\omega _8}\rangle\), \(|C_1T_0\rangle = |1_{\omega _6}1_{\omega _7}\rangle\), and \(|C_1T_1\rangle = |1_{\omega _6}1_{\omega _8}\rangle\). To ensure the photon flux remains constant across the four inputs, we tune the PPLN waveguide temperature to align the peak of the phase-matching spectrum with the pump laser frequency. After the gate, the output photons are frequency-demultiplexed: we send control photon bins to detector A and target photon bins to detector B.

Figure 3a is a conceptual example of the interference underpinning the CNOT, where the rails denote particular frequency bins and the lines trace out probability amplitudes of single photons initially in bins C1 and T1; blue follows the control, red the target, and the thickness is proportional to the squared amplitude. Each EOM serves as a multimode interferometer mixing all bins simultaneously; in this particular example, the phases applied by the QFP shaper produce destructive interference of the two amplitudes yielding the output |C1T1〉, leaving only the possibility |C1T0〉 in the coincidence basis (the characteristic CNOT bit flip). This picture highlights that, while the general interference phenomena remain the same between path and frequency encoding, the basic manipulations are significantly different: standard beamsplitters interface two input modes with two outputs, while EOMs couple, in principle, infinitely many. Accordingly, this schematic cannot be taken as fully quantitative, but it does, through line weights, give an idea of the coupling magnitudes in this example. Such a lack of direct correspondence between frequency-bin and path primitives is the reason for our use of numerical optimization of the full transformation, rather than constructing and combining individual frequency-bin beamsplitters.

Fig. 3
figure 3

a Conceptual frequency-bin interference in the QFP, for the case of input state 11. Quantum interference suppresses the result 11 at the output, leaving only state 10 in the coincidence basis. b Experimentally measured coincidences over 600 s for all input/output state combinations. c Estimated number of accidentals computed from the product of single detector counts

Figure 3b shows the measured coincidences for all 16 input/output mode combinations, integrated over 600 s for each point. As expected, inputs with a photon in control mode 0 retain their quantum state, whereas a photon in control mode 1 leads to a flip in the output target qubit. In Fig. 3c, we plot the accidentals as determined by the product of the singles counts and our timing resolution.28,29 The nonuniform distribution of accidentals stems from the fact that the singles counts vary significantly across input/output state combinations. Indeed, this is a natural feature of coincidence-basis gates: they are designed to discard cases when one of the qubit spaces is empty or doubly occupied, so that photon detection rates in a specific mode can change without impacting the designed operation.

Such information-bearing features in the accidentals suggest that incorporating knowledge from single-detector events—as well as the coincidences—can add significant value for quantifying the performance of our gate in the presence of noise. To utilize all of our experimental data in a consistent fashion, we make use of Bayesian machine learning techniques to implement a numerical parameter inference approach built on Bayesian mean estimation (BME).30 In the context of quantum state retrieval, BME is a powerful method which returns uncertainties on any quantity directly and makes efficient use of all available information, in the sense that the confidence in any estimate naturally reflects the amount of data gathered.31 BME models for photon pairs including single-detector events have been developed as well, permitting extraction of the quantum pathway efficiencies in conjunction with estimates of the input density matrix.32 In our BME model here, not only do we account for noise effects, but we can also retrieve meaningful estimates of the full complex matrix V, even though we only prepare and measure states in the computational basis. This represents an entirely new capability in two-photon gate analysis, for previously such truth-table measurements as in Fig. 3b have only been used to establish magnitudes in the matrix transformation, with superposition states required to assess the phase.20

In our model, the unknown parameters to retrieve include the mode matrix V, the pair generation probability μ, and the total system efficiencies ηA and ηB preceding detection at the control and target photon detectors, respectively. Obtained from independent measurements, and thus taken as fixed and known, are the dark count probabilities dA and dB. All probabilities {μ, dA, dB} are specified for one resolving time τ (~1.5 ns). For the input photon state |CkTl〉 (k, l {0, 1}) with detectors A and B set to respond to output modes Cr and Ts (r, s {0, 1}), respectively, the probability of a coincidence between detectors A and B is

$$p_{AB} = \mu \eta _A\eta _B\left| {V_{C_rC_k}V_{T_sT_l} + V_{C_rT_l}V_{T_sC_k}} \right|^2 + 2p_Ap_B.$$
(3)

Here, pA and pB are the marginal probabilities for clicks on A or B, irrespective of clicks on the other, during a given time τ. This formula thus contains both a correlated term (from photons of the same pair) and an accidental term. The latter, equal to 2pApB,28,29 represents the chance of simultaneous clicks in which at least one detector registers a dark count, or the photons come from different pairs (see Methods for details).

The marginal probabilities pA and pB can be found by summing the contributions from each possible number of photons N being present in the monitored mode, sketched formally as, e.g., \(p_A = \mathop {\sum}\nolimits_N P({\mathrm{click}}|N\,{\mathrm{photons}})P(N\,{\mathrm{photons}})\). Writing out each term for N = 0, 1, 2, and simplifying, we ultimately arrive at the probabilities for a click on either detector within a time τ (see Methods):

$$\begin{array}{*{20}{l}} {p_A} \hfill & = \hfill & {\mu \eta _A\left( {\left| {V_{C_rC_k}} \right|^2 + \left| {V_{C_rT_l}} \right|^2} \right) + d_A} \hfill \\ {p_B} \hfill & = \hfill & {\mu \eta _B\left( {\left| {V_{T_sC_k}} \right|^2 + \left| {V_{T_sT_l}} \right|^2} \right) + d_B,} \hfill \end{array}$$
(4)

valid under the assumptions \(\mu ,\eta _A,\eta _B,d_A,d_B \ll 1\)—satisfied in our experiment. In words, a detector can click from either of the following: (i) a photon pair is generated (μ), one of the photons is sent to the monitored frequency bin (through V), and the photon reaches the output and is successfully detected A, ηB); or (ii) the detector fires spontaneously (dA, dB). Crucially, the singles probabilities [Eq. (4)] depend only on the moduli of the V-matrix elements, whereas the coincidences also depend on the relative phase [via the permanent term in Eq. (3)]. It is this complementary dependence which underpins our ability to extract the full complex matrix from experimental data.

Specifically, for a single preparation/measurement configuration we possess three numbers as data: clicks on A (NA), clicks on B (NB), and coincidences (NAB). This gives us the multinomial likelihood for this specific input/output configuration (|CkTl〉 → |CrTs〉):

$$\begin{array}{*{20}{l}} {P\left( {{\cal D}_{C_kT_l}^{C_rT_s}|\beta } \right)} \hfill & = \hfill & {(p_A - p_{AB})^{N_A - N_{AB}}(p_B - p_{AB})^{N_B - N_{AB}}} \hfill \\ {} \hfill & {} \hfill & { \times p_{AB}^{N_{AB}}\left( {1 - p_A - p_B + p_{AB}} \right)^{M - N_A - N_B + N_{AB}},} \hfill \end{array}$$
(5)

where \({\cal D}_{C_kT_l}^{C_rT_s} = \{ N_A,N_B,N_{AB}\}\) contains all data values for the specific configuration. We have also reexpressed the events to make them mutually exclusive: click on A only, happening NA − NAB times; click on B only, occurring NB − NAB times; coincidence between A and B (NAB times); and no clicks (all remaining frames). The symbol β is shorthand for all model parameters (β = {V, μ, ηA, ηB}), and M equals the total number of τ frames considered in one counting period (~4 × 1011 in our tests). The complete likelihood comprises 16 factors in the manner of Eq. (5) for all combinations of inputs and outputs. Incidentally, one could retrieve estimates of the system parameters from this likelihood directly, via conventional maximum likelihood estimation (MLE). Computationally simpler than BME, MLE generates only a point estimate, without intrinsic error bars, in contrast to BME which quantifies uncertainty by integrating over the full probability distribution.31,32We must emphasize that our model relies on explicit enumeration of system noise sources, and thus is restricted to a parameter space smaller than the set of all two-qubit operations. This stands in contrast to the standard procedure for gate characterization, quantum process tomography (QPT),1,21,33,34 which is designed to recover a quantum operation treating the system as a black box. Nevertheless, process tomography is complex, requiring a number of measurements that grows exponentially with system size—and in our case these measurements require more components than what we have available. Physically motivated simplifications35 and alternatives36 to QPT are thus of significant value in quantum information, and so, in our particular case, the key question is whether the model encompasses all relevant physical effects. Theoretically, our understanding of photon generation, frequency-bin operations, and detection suggest no additional sources of decoherence. Empirically as well, we previously explored frequency-bin Hong–Ou–Mandel interference in this QFP (the basic effect behind two-photon LOQC gates2,3) measuring ~97% visibility, where the reduction from unity was attributable to accidental coincidences resulting from system inefficiencies and dark counts.15 Accordingly, our total channel model—linear-optical multiport plus loss, dark counts, and accidentals—is strongly justified, as well as validated ex post facto by the agreement with experiment below. Finally, it is important to note that Bayesian machine learning techniques can be applied toward any model, whether custom-tailored or built on more conventional quantum tomography, indicating additional opportunities for data analysis in a variety of quantum information experiments.15,31,32

Returning to details of the Bayesian analysis, we next assume uniform prior distributions over the interval (0, 1) for the unknown parameters μ, ηA, and ηB. For the complex matrix V, we express each element in terms of amplitude and phase: \(V_{nn\prime } = r_{nn\prime }e^{i\phi _{nn\prime }}\). Since an overall scaling factor on V is indistinguishable from changes to ηA and ηB [Eqs. (3) and (4)], for concreteness we fix the Hilbert–Schmidt norm \({\mathrm{Tr}}(V^\dagger V) = 1.6558\), to match the ideal V matrix obtained from the numerical optimization [see Eq. (7)], thus constraining the sum of the squares of \(r_{nn\prime }\). Otherwise, we let the squared amplitudes vary uniformly over all possible values subject to this condition. Because phase shifts on each of the modes before and after the multiport V are not physically significant, we are free to take some of the \(\phi _{nn\prime }\) as given as well.27 For convenience, we fix \(\left\{ {\phi _{C_0C_0},\phi _{C_1C_1},\phi _{C_1T_0},\phi _{C_1T_1},\phi _{T_0C_1},\phi _{T_1C_1}} \right\}\) to their theoretical predictions, thus leaving 10 phases to be retrieved via BME.

With the likelihood and prior formally defined, in principle we are done: we have the posterior probability distribution from Bayes’ rule, which represents complete knowledge of the parameters given the observed data. However, practically speaking, computing integrals or, equivalently, sampling from this many-parameter multimodal distribution is a formidable challenge. It is here that the techniques of Markov chain Monte Carlo (MCMC) sampling offer a solution, which—with minimal input—enable Bayesian machine learning of complex models. In our case, we employ slice sampling, an MCMC algorithm designed to produce a sequence of samples whose stationary distribution converges to the posterior.37

Using the predicted matrix V as an initial guess for the slice sampler, a procedure which we found important to speed up convergence given the large search space of 28 independent variables, we ultimately converge to the Bayesian fidelity estimate \({\cal F}_{{\mathrm{BME}}} = 0.91 \pm 0.01\), where \({\cal F}\) is defined according to Eq. (2). Our truly quantum measurement does not reach the >0.99 classically inferred \({\cal F}_{{\mathrm{inf}}}\), which is a consequence of the relatively few coincidence counts (<100 in all cases) and additional noise from residual light. Nevertheless, the low uncertainty on \({\cal F}_{{\mathrm{BME}}}\) indicates high confidence in our BME model, especially in light of its ability to retrieve the full complex fidelity with computational basis measurements. To see how \({\cal F}_{{\mathrm{BME}}}\) translates into output state probabilities in the coincidence basis, we plot the Bayesian-estimated pathway probabilities in Fig. 4, where the four outcomes for each input state are normalized to sum to unity. The average probability for obtaining the correct output is 0.92 ± 0.01, computed by taking the mean of the four peaks in Fig. 4. (See Methods for details on all retrieved parameters, including the mode matrix V.)

Fig. 4
figure 4

Output state probabilities retrieved from BME, for each computational-basis input state

Discussion

Moving forward, it will be valuable to implement this gate with input states beyond just the computational basis, useful for implementing photonic QIP algorithms such as the variational quantum eigensolver38 and Shor factoring.39 Fundamentally, such states would also enable direct demonstration of the gate’s ability to entangle two photons, offering independent verification of the quantum phase coherence which here we have estimated through Bayesian machine learning. Probing with such states is readily attainable in the QFP paradigm; for example, one could precede the CNOT operation with Hadamard gates on one or both input photons.14,15 Yet cascading additional elements at the moment is limited by technical loss; we predict that we could not at present obtain coincidences above the accidental level with the additional equipment required. In order to concentrate on the basic physics in this proof-of-principle experiment, we have constructed our frequency-bin CNOT with off-the-shelf fiber-optic components. While the phase-only EOM/PS operations themselves are unitary, such commercial devices introduce significant additional loss, on the order of 3–4 dB per element. Accordingly, in scaling up to larger QIP systems, improving throughput is a substantial engineering goal.

Fortunately, state-of-the-art fabrication techniques presage significant improvements just over the horizon, via on-chip integration of the fundamental QFP elements. For example, process design kits from photonics foundries40 suggest that the loss through a pulse shaper channel can be less than 1 dB, while recent experiments have demonstrated lithium-niobate EOMs with loss below 0.5 dB41 and foundry-compatible EOMs with losses on the order of 1–2 dB.42 Thus, an on-chip CNOT—identical to the present configuration in terms of functional components, but attaining <3 dB insertion loss—seems feasible with current technology, and we certainly anticipate even better performance as on-chip photonics continues to progress over the coming years.

In conclusion, we have realized an entangling gate on frequency-bin qubits. We confirm high-fidelity operation of the CNOT with two forms of characterization: coherent-state-based matrix retrieval and photon pair measurements in the computational basis. The classically inferred fidelity of \({\cal F}_{{\mathrm{inf}}} = 0.995 \pm 0.001\) and Bayesian estimate \({\cal F}_{{\mathrm{BME}}} = 0.91 \pm 0.01\) both demonstrate high performance in our system. As the sole realization of a two-photon entangling gate in frequency—and only the second CNOT in the entire field of time-frequency quantum information5—our gate significantly expands the potential of single-spatial-mode, fiber-optic-based QIP. More generally, our Bayesian characterization approach provides further evidence of the potential of machine learning in analyzing quantum systems, particularly for extracting information within measurements which traditional methods overlook.

Methods

Gate design

The optimization approach for designing quantum frequency gates using a series of EOMs/PSs was first proposed in ref. 26, and adopted to experimentally demonstrate a single-photon gate in ref. 14. In this work, we follow the same procedures, utilizing the MATLAB Optimization Toolbox to search for an optimal set of phases for a particular EOM/PS sequence, constraining fidelity \({\cal F} \ge 0.9999\) and maximizing the success probability \({\cal P}\) for the two-photon state transformation matrix W. Compared to single-qubit gates, where only one frequency scale appears (the spacing between the two computational bins), two-qubit gates provide a much richer parameter space; namely, the placement of the four computational modes relative to each other can have a profound impact on the EOM/PS complexity needed to realize a specific operation. We have performed a thorough—though non-exhaustive—search over these possible mode placement combinations in each round of optimization. In general, we are guided by the intuition to spectrally isolate control mode 0 (C0) while packing control mode 1 (C1) close to both target modes.

For reference, the ideal CNOT matrix is

$$U_{{\mathrm{ideal}}} = \left[ {\begin{array}{*{20}{c}} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{array}} \right],$$
(6)

against which we compare the numerically obtained two-photon matrix W (a function of V) via Eq. (2). The optimal solution we found for the CNOT gate using a 2EOM/1PS circuit is presented in Fig. 5, with \({\cal F} = 0.9999\), \({\cal P} = 0.0445\), and modes {C0, C1, T0, T1} at frequency bins {0, 6, 7, 8}, respectively. The temporal phase modulation on both EOMs are simply π-phase-shifted sinewaves, and combined with the spectral phase modulation imparted by the PS, the corresponding mode transformation matrix V is numerically calculated as:

$$V = \left[ {r_{nn^{\prime}} \measuredangle \phi _{nn^{\prime}}} \right] = \left[ {\begin{array}{*{20}{c}} {0.4407 \measuredangle - 2.5976} & {0.0022 \measuredangle 0.2103} & {0.0026 \measuredangle 1.2938} & {0.0010 \measuredangle - 2.0353} \\ {0.0022 \measuredangle 0.2104} & {0.4343 \measuredangle - 2.6045} & {0.4596 \measuredangle - 1.5754} & {0.4549 \measuredangle 1.5710} \\ {0.0026 \measuredangle 1.2939} & {0.4596 \measuredangle - 1.5754} & {0.4830 \measuredangle 2.5973} & {0.0030 \measuredangle - 2.8778} \\ {0.0010 \measuredangle - 2.0352} & {0.4549 \measuredangle 1.5710} & {0.0030 \measuredangle - 2.8779} & {0.4783 \measuredangle 2.5979} \end{array}} \right],$$
(7)

using the phasor shorthand \(r_{nn\prime } \measuredangle \phi _{nn\prime } \equiv r_{nn\prime }e^{i\phi _{nn\prime }}\). This provides a reference to compare the experimental mode transformations below, obtained either by coherent state characterization or BME.

Fig. 5
figure 5

Numerical solutions for the time-frequency phases required to implement coincidence-basis CNOT gate. a Temporal phase modulation applied to the first EOM (solid red) and second EOM (dotted blue), plotted over one period. b Phases applied to each frequency mode by the pulse shaper, where modes 0 and 6 denote the control bins {C0, C1}, and modes 7 and 8 represent the target bins {T0, T1}

Coherent state measurements

To investigate the performance of a linear-optical multiport, ref. 27 provides an efficient characterization method utilizing only coherent states as sources and power measurements at the output. We follow similar procedures (adopted in ref. 14 for single-qubit frequency gates) by probing our frequency multiport with an electro-optic frequency comb, and measuring the output spectrum for different input frequency superpositions. We first send a continuous-wave laser with center frequency Ω01 = 2π × 193.550 THz (see Fig. 2a) into an additional EOM modulated at 25 GHz to create ~10 comb lines, and we utilize a subsequent pulse shaper to prepare specific input states. To obtain the modulus of every matrix element in the four columns of V, each time we send in only one input mode from the set {C0, C1, T0, T1} and measure the spectrum at the output of the gate, collecting all the output modes with power levels within 60 dB of the maximum. This allows us to retrieve the amplitudes \(r_{nn\prime }\). Then by sending in two lines and scanning their relative input phase, we can map out the V-matrix phases \(\phi _{nn\prime }\), where we compute all unknown values relative to phase values we are free by physical considerations to define a priori.27 We perform five identical measurements of V in order to estimate uncertainty; following are the resulting amplitudes and phases, with each number averaged individually over the five successive, independent matrix acquisitions:

$$\left[ {r_{nn^{\prime}}} \right] = \left[ {\begin{array}{*{20}{c}} {0.428 \pm 0.008} & {0.0030 \pm 0.0003} & {0.0027 \pm 0.0001} & {0.0017 \pm 0.0001} \\ {0.0031 \pm 0.0001} & {0.427 \pm 0.001} & {0.451 \pm 0.002} & {0.451 \pm 0.002} \\ {0.0028 \pm 0.0002} & {0.465 \pm 0.005} & {0.478 \pm 0.003} & {0.041 \pm 0.003} \\ {0.0018 \pm 0.0003} & {0.458 \pm 0.002} & {0.036 \pm 0.004} & {0.499 \pm 0.006} \end{array}} \right]$$
(8)
$$\left[ {\phi _{nn^{\prime}}} \right] = \left[ {\begin{array}{*{20}{c}} { - 2.5976 \pm 0} & \ldots & \ldots & \ldots \\ \ldots & { - 2.6045 \pm 0} & { - 1.5754 \pm 0} & {1.5710 \pm 0} \\ \ldots & { - 1.5754 \pm 0} & {2.621 \pm 0.002} & { - 2.89 \pm 0.05} \\ \ldots & {1.5710 \pm 0} & { - 2.7 \pm 0.1} & {2.631 \pm 0.006} \end{array}} \right].$$
(9)

The phase values with ±0 uncertainty are those we could fix to the theoretical prediction [Eq. (7)], found by the optimizer to yield high CNOT fidelity. Because the coupling between mode C0 and {C1, T0, T1} is too weak, we could not extract meaningful phase estimates of the elements delineated by “…” in the \(\phi _{nn\prime }\) matrix. However, we have confirmed that setting these phases to any set of random values impacts our calculation of the fidelity at only the fifth decimal place, so that it has no influence on our computed \({\cal F}_{{\mathrm{inf}}} = 0.995 \pm 0.001\). From the retrieved amplitudes and phases, we find uncertainties for the eight large elements \(\left( {r_{nn\prime } > 0.4} \right)\) at the third significant digit, an indication of the high precision possible with this high-flux characterization method.

Parameter model

In order to make use of the observed data to estimate the key parameters of our quantum gate, we first derive a realistic model connecting the underlying gate operation to photon counts, encapsulated in a likelihood function \(P({\cal D}|\beta )\), for the model parameters β given data \({\cal D}\) (proportional to the conditional probability of \({\cal D}\) given β). In our case, the set β contains not only the mode transformation matrix V, but also the pair generation probability μ and the system efficiencies ηA and ηB.

Initially, we focus on how the input quantum state propagates through the multiport—for the moment neglecting loss, which will be incorporated later. The total optical network (defined over countably infinite frequency bins) maps inputs \(\hat a_n\) to outputs \(\hat b_n\) according to

$$\hat b_n = \mathop {\sum}\limits_{n = - \infty }^\infty V_{nn^{\prime}}\hat a_{n^{\prime} },$$
(10)

with V unitary when considered over all modes. For a particular counting experiment, we take the prepared input state as

$$\left| \Psi \right\rangle = \left| {1_u1_v} \right\rangle = \hat a_u^\dagger \hat a_v^\dagger \left| {{\mathrm{vac}}} \right\rangle ,$$
(11)

where u ≠ v. Specifying such a state relies on several assumptions. For one, it neglects contributions from other frequency-bin pairs, justified experimentally by the BFC shaper’s ability to suppress adjacent frequency bins by >40 dB. Additionally, this state expression—and the multiport model in general—treats each frequency bin as a pure single mode. Experimentally, as a consequence of the pump laser’s ~kHz linewidth (much narrower than our 1.8 GHz-thick bins), a given photon pair is highly frequency-entangled, containing substructure absent in the separable state of Eq. (11). While such hidden entanglement would markedly reduce, e.g., the purity of heralded frequency-bin photons, it does not degrade the correlations in the two-photon experiments we conduct here. The counts registered for a particular pair of bins do result from a continuum of photon pairs with slightly different frequency offsets, implying that the net result is the incoherent sum of partially distinguishable probability amplitudes. However, as all such frequency pair combinations under the same bin lineshapes undergo matching frequency operations, the net measurement result is identical to the case in which all bins are purely single mode, apart from an overall scaling constant (see discussion of frequency filtering below). Finally, Eq. (11) does not include higher-order pair generation (e.g., four, six, eight, etc., photon terms) explicitly. Incidentally, the ansatz we incorporate for accidental coincidences [see Eq. (17) below] ends up capturing the main effects of multiple photon pairs on our data in a simpler fashion.

We define pμ(1m1n) as the probability for one photon to be found in mode m and the other in mode n at the output (again assuming no loss). This is given by

$$p_\mu\left( {1_m1_n} \right) = \left| {V_{mu}V_{nv} + V_{mv}V_{nu}} \right|^2\quad\quad(n\; \ne\; m).$$
(12)

When n = m (two photons in the same mode), the probability is

$$p_\mu\left( {2_m} \right) = 2\left| {V_{mu}V_{mv}} \right|^2,$$
(13)

with the factor of two a consequence of boson statistics. From these results, we can also compute the marginal probability for one-photon occupancy in a particular mode,

$$\begin{array}{*{20}{l}} {p_\mu (1_m)} \hfill & = \hfill & \sum\limits_{\mathop{n=-\infty}\limits_{n\ne m}}^{\infty} \left| {V_{mu}V_{nv} + V_{mv}V_{nu}} \right|^2 \hfill \\{} \hfill & = \hfill & {\sum\limits_{n = - \infty }^\infty} \left( {\left| {V_{mu}V_{nv} + V_{mv}V_{nu}} \right|^2} \right) - 4\left| {V_{mu}V_{mv}} \right|^2 \hfill \\ {} \hfill & = \hfill & {\left| {V_{mu}} \right|^2 + \left| {V_{mv}} \right|^2 - 4\left| {V_{mu}V_{mv}} \right|^2,} \hfill \end{array}$$
(14)

with the last line following from the unitarity of V and the fact that uv in our input state.

We then map these fundamental “per-pair” probabilities to expected detection rates. For accounting purposes, we define all detection probabilities within a specific temporal frame τ, the time within which clicks on detector A (tA) and B (tB) are deemed coincident: |tA − tB| < τ. Our stationary (continuous-wave pumped) source ensures that all such probabilities are equal in every length-τ time bin. With μ defined as the pair generation probability within such a frame, the marginal probabilities for single-detector clicks are

$$\begin{array}{*{20}{l}} {p_A} \hfill & = \hfill & {\mu \left[ {\eta _A + (1 - \eta _A)\eta _A} \right]p_\mu (2_m) + \mu \eta _Ap_\mu (1_m) + d_A} \hfill \\ {p_B} \hfill & = \hfill & {\mu \left[ {\eta _B + (1 - \eta _B)\eta _B} \right]p_\mu (2_n) + \mu \eta _Bp_\mu (1_n) + d_B} \hfill \end{array}$$
(15)

for detector A monitoring frequency bin m and B frequency bin n. The probabilities dA and dB represent the dark (or more generally, background) count probabilities; we measure these independently and take them as fixed at dA = 9.60 × 10−7 and dB = 7.77 × 10−7, corresponding to dark count rates of 640 Hz and 518 Hz, respectively. The efficiencies ηA and ηB include all loss effects through the system, from generation in the crystal to photon detection; we assume them to be mode-independent—validated by the relatively small bandwidth comprising all modes of interest (~500 GHz)—yet they can vary by the different relative efficiencies of our superconducting nanowire detectors. And while spectral filtering per se does not modify these general considerations, the multimode frequency substructure (mentioned above), coupled with the Lorentzian linewidth profile of the etalon, introduces an effective transmission given by the average over all frequency offsets—we believe this contributes to lower overall ηA and ηB retrieved in BME. Next, we make use of the fact that the system efficiencies \(\eta _A,\eta _B \ll 1\). Plugging in Eqs. (13) and (14), we obtain

$$\begin{array}{*{20}{l}} {p_A} \hfill & = \hfill & {\mu \eta _A\left( {\left| {V_{mu}} \right|^2 + \left| {V_{mv}} \right|^2} \right) + d_A} \hfill \\ {p_B} \hfill & = \hfill & {\mu \eta _B\left( {\left| {V_{nu}} \right|^2 + \left| {V_{nv}} \right|^2} \right) + d_B.} \hfill \end{array}$$
(16)

The simple addition of pair and dark-count contributions is justified in our case by their small values (~10−6), so that there is no concern for pA or pB approaching or exceeding 1 in the numerical analysis below.

To establish the probability for a coincidence between detectors A and B in our model, we make a sharp distinction between two types of events: (i) correlated coincidences, deriving from two photons of the same pair; and (ii) accidental coincidences, in which two random clicks (from at least one dark count, or photons from two different pairs) overlap within the resolving time τ. We note that, in principle, such a distinction is not necessary: it should be possible to derive a completely ab initio model for coincidences, with an input density matrix including higher-order pair generation effects, and positive-operator valued measures (POVMs) incorporating dark count noise. However, our approach proves much simpler, requiring fewer parameters while still satisfying conceptual demands.

For event (i), the click probability follows from multiplying the per-pair probability pμ(1m1n) by μηAηB, so that \(p_{AB}^{({\mathrm{i}})} = \mu \eta _A\eta _B\left| {V_{mu}V_{nv} + V_{mv}V_{nu}} \right|^2\), which assumes that τ is sufficiently large to integrate over the full two-photon correlation time. Regarding event (ii), in general the rate of accidental coincidences between two independent detectors is given by a product of the rates of the two detectors individually: \(R_{AB}^{({\mathrm{ii}})} = 2\tau R_AR_B\),28,29 where the factor of two follows from the fact that—under our definition of τ—all events such that (tA − tB)  (−τ, τ) register as coincidences. Making the connection pj = τRj then allows us to write \(p_{AB}^{({\mathrm{ii}})} = 2p_Ap_B\), so that the total coincidence probability becomes

$$\begin{array}{*{20}{l}} {p_{AB}} \hfill & = \hfill & {p_{AB}^{({\mathrm{i}})} + p_{AB}^{({\mathrm{ii}})}} \hfill \\ {} \hfill & = \hfill & {\mu \eta _A\eta _B\left| {V_{mu}V_{nv} + V_{mv}V_{nu}} \right|^2 + 2p_Ap_B,} \hfill \end{array}$$
(17)

with pA and pB defined as in Eq. (16). Expanding 2pApB, the expected noise sources appear naturally: a μ2 term reflects clicks from two different pairs, while μdA and μdB terms give coincidences from a photon and dark count. In this way, we can recover noise effects otherwise absent in the physical model, via what can be called an “accidentals correction” term 2pApB. Finally, we emphasize that the accuracy of Eq. (17) relies again on the relative order of magnitudes of the probabilities involved: \(p_{AB}^{({\mathrm{i}})}\sim 10^{ - 10}\), so that the differences between alternative forms one could conceivably argue for—such as \(p_B \to p_B - p_{AB}^{({\mathrm{i}})}\), to help ensure that singles counts from correlated coincidences do not also count toward accidental probabilities—become numerically inconsequential.

Finally, with these probabilities established, we can write the likelihood using a multinomial distribution for all event types. Over the course of a single measurement of duration T, we experience M = T/τ total frames, in which we can register one of the four mutually exclusive outcomes: click on A only, click on B only, coincidence, or no clicks. The likelihood for the specific input/output mode configuration (defined by the mode numbers uv → mn) is

$$\begin{array}{*{20}{l}} {P\left( {{\cal D}_{uv}^{mn}|\beta } \right)} \hfill & = \hfill & {(p_A - p_{AB})^{N_A - N_{AB}}(p_B - p_{AB})^{N_B - N_{AB}}} \hfill \\ {} \hfill & {} \hfill & { \times p_{AB}^{N_{AB}}\left( {1 - p_A - p_B + p_{AB}} \right)^{M - N_A - N_B + N_{AB}},} \hfill \end{array}$$
(18)

where we emphasize that both the dataset \({\cal D}_{uv}^{mn} = \{ N_A,N_B,N_{AB}\}\) and probabilities {pA, pB, pAB} themselves depend on the mode configuration uvmn. The total likelihood follows by multiplying out all 16 individual combinations

$$P\left( {{\cal D}|\beta } \right) = \mathop {\prod}\limits_{\begin{array}{*{20}{c}} {u,m \in \{ C_0,C_1\} } \\ {v,n \in \{ T_0,T_1\} } \end{array}} {P\left( {{\cal D}_{uv}^{mn}|\beta } \right)} ,$$
(19)

where the modes {C0, C1, T0, T1} are as defined in the main text. (We also neglect unimportant scaling factors which do not depend on the parameters β.) This likelihood forms the basis for estimating the parameters β = {V, μ, ηA, ηB} from the dataset \({\cal D} = \cup {\cal D}_{uv}^{mn}\).

Bayesian machine learning

To estimate these values along with their uncertainties, we make use of Bayes’ rule for the posterior probability distribution

$$P(\beta |{\cal D}) = \frac{1}{{\cal Z}}P\left( {{\cal D}|\beta } \right)P(\beta ),$$
(20)

with \({\cal Z} = {\int} d\beta\, P\left( {{\cal D}|\beta } \right)P(\beta )\) the (undetermined) normalizing factor. P(β) represents the prior probability distribution for the parameters. We take P(β) as uniform over (0, 1) for each of μ, ηA, and ηB; uniform over (0, 2π) for all phases \(\phi _{nn\prime } = {\mathrm{arg}}\,V_{nn\prime }\) which are not taken as fixed \(\left\{ {\phi _{C_0C_0},\phi _{C_1C_1},\phi _{C_1T_0},\phi _{C_1T_1},\phi _{T_0C_1},\phi _{T_1C_1}} \right\}\); and uniform for all squared moduli \(r_{nn\prime }^2\) subject to the constraint \(\mathop {\sum}\nolimits_{nn\prime } r_{nn\prime }^2 = 1.6558\) from Eq. (7). This uninformative prior allows the estimates to be fully determined by the counting data itself.

Due to the complexity of integrating Eq. (20) over our parameter space, we employ slice sampling37 and retrieve 4096 samples of all 28 parameters from the unnormalized \(P\left( {{\cal D}|\beta } \right)P(\beta )\). We use best guesses of all parameters as the starting point to enable convergence, invoking a burn-in period and thinning until stationarity is achieved. At each sample of β, we can compute any quantity of interest, and use the statistics over all samples to produce the mean and standard deviation. Specifically, we find

$$\mu = 0.024 \pm 0.002$$
(21)
$$\eta _A = (3.5 \pm 0.3) \times 10^{ - 4}$$
(22)
$$\eta _B = (4.7 \pm 0.3) \times 10^{ - 4}$$
(23)
$${\cal F}_{{\mathrm{BME}}} = 0.91 \pm 0.01.$$
(24)

The retrieved pathway efficiencies are smaller by ~9 dB compared to our insertion loss alone, which we estimate to be ~25 dB from generation to detection. While we have fully characterized the insertion loss of the gate components themselves (12.9 dB in total: each EOM contributes ~2.8 dB; the pulse shaper, ~4.7 dB; and the remainder comes from polarization controllers and fiber patch cords), uncertainties remain in the state preparation and measurement components, such as the breakdown of loss inside the fiber-pigtailed photon source, as well as questions of how strongly the spectrally varying transmission of the etalon reduces its effective transmission from its peak value. Otherwise, the retrieved μ and fidelity match predictions. Even though \({\cal F}_{{\mathrm{BME}}}\) is smaller and has higher uncertainty than the classically inferred \({\cal F}_{{\mathrm{inf}}}\), the fact it still exceeds 90% with fairly sparse measurements is strong confirmation of excellent performance, particularly in light of the uninformative prior, which permits high fidelity only based on the strength of the observed data.

We also compute the mean and standard deviation for all elements of the retrieved transformation V, for both the magnitude and phase:

$$\left[ {r_{nn^{\prime} }} \right] = \left[ {\begin{array}{*{20}{c}} {0.452 \pm 0.005} & {0.124 \pm 0.009} & {0.06 \pm 0.01} & {0.02 \pm 0.02} \\ {0.06 \pm 0.03} & {0.465 \pm 0.008} & {0.475 \pm 0.006} & {0.411 \pm 0.006} \\ {0.04 \pm 0.01} & {0.463 \pm 0.005} & {0.470 \pm 0.005} & {0.03 \pm 0.01} \\ {0.028 \pm 0.009} & {0.455 \pm 0.005} & {0.02 \pm 0.01} & {0.413 \pm 0.005} \end{array}} \right]$$
(25)
$$\left[ {\phi _{nn^{\prime} }} \right] = \left[ {\begin{array}{*{20}{c}} { - 2.5976 \pm 0} & { - 2.8 \pm 0.2} & {1.3 \pm 0.1} & { - 2.01 \pm 0.09} \\ {0.30 \pm 0.09} & { - 2.6045 \pm 0} & { - 1.5754 \pm 0} & {1.5710 \pm 0} \\ {1.35 \pm 0.09} & { - 1.5754 \pm 0} & {2.6 \pm 0.1} & {0.7 \pm 0.2} \\ { - 2.0 \pm 0.1} & {1.5710 \pm 0} & {0.3 \pm 0.1} & {2.5 \pm 0.1} \end{array}} \right].$$
(26)

As before, the phases with uncertainties ±0 are those fixed prior to parameter retrieval. Comparing this result to the design [Eq. (7)] and coherent-state-retrieved matrix [Eqs. (8) and (9)], the most significant mismatch occurs for the element in row 1, column 2 (the coupling from mode C1 to C0). At 0.124, this value is significantly larger than designed, and contributes to the higher error for the cases |C1T0〉 → |C0T0〉 and |C1T1〉 → |C0T1〉 in Fig. 4. While the source of this error is still uncertain, experimentally we did observe extraneous counts on detector A during these integration times, beyond the theoretical prediction. Bayesian retrieval succeeds in finding matrix elements to account for this observation, as intended.