A quantum annealer with fully programmable all-to-all coupling via Floquet engineering

Quantum annealing is a promising approach to heuristically solving difficult combinatorial optimization problems. However, the connectivity limitations in current devices lead to an exponential degradation of performance on general problems. We propose an architecture for a quantum annealer that achieves full connectivity and full programmability while using a number of physical resources only linear in the number of spins. We do so by application of carefully engineered periodic modulations of oscillator-based qubits, resulting in a Floquet Hamiltonian in which all the interactions are tunable. This flexibility comes at the cost of the coupling strengths between qubits being smaller than they would be compared with direct coupling, which increases the demand on coherence times with increasing problem size. We analyze a specific hardware proposal of our architecture based on Josephson parametric oscillators. Our results show how the minimum-coherence-time requirements imposed by our scheme scale, and we find that the requirements are not prohibitive for fully connected problems with up to at least 1000 spins. Our approach could also have impact beyond quantum annealing, since it readily extends to bosonic quantum simulators, and would allow the study of models with arbitrary connectivity between lattice sites.

Quantum annealing is a promising approach to heuristically solving difficult combinatorial optimization problems. However, the connectivity limitations in current devices lead to an exponential degradation of performance on general problems. We propose an architecture for a quantum annealer that achieves full connectivity and full programmability while using a number of physical resources only linear in the number of spins. We do so by application of carefully engineered periodic modulations of oscillator-based qubits, resulting in a Floquet Hamiltonian in which all the interactions are tunable; this flexibility comes at a cost of the coupling strengths between spins being smaller than they would be had the spins been directly coupled. Our proposal is well-suited to implementation with superconducting circuits, and we give analytical and numerical evidence that fully connected, fully programmable quantum annealers with 1000 qubits could be constructed with Josephson parametric oscillators having cavity-photon lifetimes of ∼100 µs, and other systemparameter values that are routinely achieved with current technology. Our approach could also have impact beyond quantum annealing, since it readily extends to bosonic quantum simulators and would allow the study of models with arbitrary connectivity between lattice sites.

I. INTRODUCTION
Quantum annealers are computational devices designed for solving combinatorial optimization problems, most typically Ising optimization problems [1][2][3]. An Ising problem is specified by connections among N spins on a graph, as well as local fields on each spin. One of the foremost challenges in the experimental realization of quantum annealers is the requirement that quantum annealers be able to represent densely connected Ising problems with minimal overhead in the number of qubits (and other physical components) used [4][5][6][7]. If a quantum annealer is not able to directly represent a particular problem because the problem graph has higher connectivity than the physical annealer does, then one incurs a penalty in the number of qubits needed to represent the Ising problem. For example, the largest fully connected Ising problem that can be represented in the D-Wave 2000-qubit quantum annealer is one that has N = 64 spins [6]; this limitation arises because the connectivity in this particular quantum annealer is very sparse (the connectivity graph has maximum degree six).
Superconducting circuits are one of the most prominent technologies for realizing quantum information processing devices, including quantum annealers, and form the basis for many of the major projects to construct experimental quantum annealers [8][9][10]. However, when qubit connectivity is achieved via physical pairwise couplers (as is the case for the efforts described in Refs. [8][9][10]), there is a substantial engineering impediment to realizing full connectivity: each qubit would need N − 1 physical couplers, and arranging such couplers spatially has proven to be impractical for large N . On the other hand, bus architectures have been demonstrated for superconducting-circuit qubits in the context of circuit-model quantum computing [11][12][13], and bus-mediated interactions naturally provide all-to-all coupling [11,14], with the use of one physical coupler per qubit. In this paper, we address the challenge of realizing full programmability of these all-to-all couplings, in the context of developing a quantum annealer.
In classical neuromorphic computing, a scheme allowing full programmability has been proposed in an all-to-all coupled system of Kuramoto oscillators via periodically modulated external inputs [15]. In a completely separate community and context, nonlinear-oscillator-based qubits, whose operation relies on the continuous-variable nature of the oscillators, have also been established as a promising building block for the realization of superconducting-circuit quantum annealers, owing partially to their resilience to photon loss [16][17][18]. In this work, we take inspiration from both these lines of research to show that the all-to-all, bus-mediated couplings in a system of quantum nonlinear oscillators can be made fully programmable by careful design of the modulation of each oscillator's instantaneous frequency. We utilize the mathematical tools of Floquet theory [19,20] to engineer the desired interactions between oscillators and establish that the functionality of our dynamically-coupled system is equivalent to that of a staticallycoupled system with pairwise physical couplers.
In summary, the previously known approaches to building a quantum annealer with JPOs are, variously, incompatible with dense connectivity due to engineering limitations; requiring of a large overhead in the number of qubits and/or couplers (i.e., scaling with N 2 ); or lacking full programmability. In contrast, our proposal uses a bus to provide all-to-all connectivity, and a dynamical approach to coupling that enables full programmability while requiring only a number of oscillators and couplers linear in N to solve Ising problems with N spins.

II. RESULTS
In this paper, we study the design of a quantum annealer whose purpose is to solve the Ising optimization problem, defined as finding the N -spin configuration σ i ∈ {−1, +1} (i = 1, . . . , N ) that minimizes the classical spin energy E(σ) := j =i C ij σ i σ j , where C is a symmetric real matrix. A choice of C specifies a problem instance to be solved, and C can be interpreted as the adjacency matrix of a graph whose vertices are spins and whose edges represent spin-spin interactions. In general, C can have O(N 2 ) non-zero entries. It is desirable for a quantum annealer to be fully programmable, such that there are no restrictions on the structure of C, and that the annealer not use more than N oscillators to represent a given N -spin problem, nor use more than ∝ N other physical components. In this paper, we show how this can be achieved using nonlinear oscillators in a bus architecture together with dynamically realized couplings designed via Floquet engineering. Figure 1A shows an overview of our proposed architecture. The N nonlinear oscillators of the quantum annealer are coupled to a common (bus) resonator. If the center frequencies of the oscillators are sufficiently far-detuned from the bus resonance, then the bus mediates a photon-exchange interaction between any pair of oscillators. In particular, denoting the annihilation operator for the ith oscillator asâ i , the bus mediates interactions that contribute terms of the formâ † iâ j +â iâ † j to the system Hamiltonian [11] which generically couples every oscillator to every other oscillator. Thus with N nonlinear oscillators coupled to a bus, we can implement all-to-all coupling. However, the couplings up to this point are not programmable. The principal result in this paper is that we can engineer complete programmability of all the couplings by phase-modulating the oscillators in a specific way.
In our scheme, the oscillators are, to a good approximation, detuned from each other by multiples of a fundamental frequency Λ, such that the kth-nearest neighbor of any given oscillator is detuned from it by kΛ. Despite the presence of the bus, for Λ sufficiently large the oscillators are effectively uncoupled in the absence of modulation (by the rotating-wave approximation). To effect dynamical coupling, each oscillator is controlled with a phase modulation (PM) signal containing harmonics of Λ, such that the resulting sidebands of each oscillator overlap in frequency with the center frequencies of the other oscillators. More precisely, the ith oscillator is phase- N . Intuitively, the strengths of the dynamically induced couplings are determined by the strengths of the sidebands, which in turn are controlled by the elements of F . Thus, the task of programming these couplings reduces to making an appropriate choice for F ; we will present a method for choosing these coefficients shortly. Figure 1B is a cartoon depicting the realization of this PM scheme for a representative 5-spin problem instance; the coefficients F  0 ω i (t ) dt . Each PSD shows a strong peak at its respective oscillator's center frequency, together with sidebands of various amplitudes (controlled by F ) that overlap with the center frequencies of the other oscillators.
While our method is independent of the exact type of nonlinear oscillator used, we specialize our discussion in this paper to a superconducting-circuits realization. For concreteness, we consider Josephson parametric oscillators (JPOs) [17,28,29,31], although other superconducting-circuit-based oscillators are plausible choices as well [33,34]. In the absence of coupling, the eigenstates of the JPO are 0-phase and π-phase coherent states [27,35]; these two eigenstates can be used to encode, respectively, the spin-up and spin-down configurations of an Ising spin [16], and superpositions of these eigenstates (i.e., Schrödinger cat states) have been experimentally demonstrated [31]. Figure 1C shows N JPOs coupled to a superconducting resonator, which acts as the bus in this platform. Energy is supplied to each JPO

FIG. 1.
Achieving fully programmable all-to-all coupling in oscillator networks via phase modulation. (A and B) An implementation-independent overview of the architecture, showing an example with N = 5 oscillators each coupled to a common bus resonator and each driven by a set of phase modulation (PM) signals. As depicted in (A) each oscillator is modulated by up to N − 1 harmonics with frequencies kΛ for integers k < N . The amplitude of each harmonic for each oscillator is given by an element in a phase-modulation matrix F , an example of which is shown on the left side of (B) (times a factor of 100 for clarity). The color of the PM signal on each oscillator indicates with which other oscillator the PM causes an interaction: for example, the k = 1 harmonic PM applied to Oscillator 1 (shown in orange) causes it to couple with Oscillator 2, and the k = 2 harmonic PM applied to Oscillator 1 (shown in green) causes it to couple with Oscillator 3. When a signal potentially mediates coupling to two other oscillators (e.g., k = 2 for Oscillator 3), we color by the oscillator with the lower center frequency (larger oscillator index i); the signals colored gray are not applied as they would not create any useful couplings. As shown in the power spectral density (PSD) of the oscillators' canonical positions on the right side of (B), the oscillators have center frequencies spaced by Λ; thus each PM signal at frequency kΛ creates a primary sideband which overlaps in frequency with the center frequency of the neighbor k oscillators away. Taken together, the effect of these sideband interactions is to generate a desired effective Ising coupling matrix (see the Supplementary Materials for the matrix used in this example). (C) A superconducting-circuits implementation of the dynamical-coupling architecture. A generic oscillator may be realized by a Josephson parametric oscillator (JPO), whose frequency can be modulated by a flux line, while the bus may be realized by a microwave resonator. (D) Quantitative overview of the scheme for a JPO system, using the modulations shown in (B) and depicting the instantaneous angular frequencies ωi(t) of the JPOs, both in the time domain (left) and as a PSD of ω1 in the frequency domain (right). The PM signals manifest as modulations in the instantaneous frequency of each JPO, or, equivalently, in the detuning ∆i(t) between the bus and Oscillator i. In the PSD, the PM signal appears in a low-frequency band (<0.5 GHz), while the parametric drive to the JPO (at twice the JPO's frequency) occurs as usual in a high-frequency band ∼20 GHz [17,28]. The parameters used for this calculation are obtained from the bus-coupled JPO model described in the Supplementary Materials. by a flux line carrying a time-varying current. Conventionally, the dc component of the current determines the center frequency of the JPO, and an ac component at twice that frequency provides the parametric drive. In our scheme, the center frequency of each JPO is additionally modulated in time byδ φ(t), which is achieved by an additional modulation of the flux-line current corresponding toδ φ [36,37]. (In an alternative technological realization using a system of optical oscillators, this PM could potentially be applied by a physical phase modulator.) For this JPO realization of a nonlinear-oscillator-based quantum annealer, Fig. 1D shows a more quantitative picture of our dynamical-coupling scheme, with plots of the instantaneous angular frequencies ω i (t) both in the time domain (for all the oscillators) and in the frequency domain (for the first oscillator only). First, on the time-domain plot, we see that the effect of the PM is to create oscillations in the instantaneous detuning ∆ i (t) between the fixed bus resonance frequency and ω i (t); in the absence of these modulations (or on time-averaging), the oscillator frequencies are approximately evenly spaced. As expected, the PM consists of four harmonics; looking at the frequency-domain plot, we see that these harmonics lie in a "coupling band" with frequency content 1 GHz. These plots also illustrate two more technical features specific to our construction (see the Supplementary Materials for more information). First, the deviations in oscillator frequency from the center (i.e., the modulation depths) are small compared to the mean values of the bus-oscillator detunings; this qualitative feature ensures that the native couplings J ij are approximately time-independent despite the PM. Second, there is a significant gap between the coupling band and the "pumping band" in which the parametric drive is realized; because of this separation of time scales, we can first design the flux-line currents to support the PM needed for dynamical coupling, while the parametric drive simply follows that modulation as needed. Since the ac part of the flux-line current is directly proportional to the ac part of ω i (t) (see the Supplementary Materials), we see that the entire control signal only requires approximately 20 GHz of bandwidth at most, which is realizable with current microwave technology.
We have thus far not explained how to choose the modulation coefficients F for a given problem instance C. An intuitive, but incorrect, choice is to simply set F (k) i ∝ C i,i+k , since generating a sideband at frequency kΛ on oscillator i intuitively causes it to interact with oscillator i + k. However, this is not entirely accurate, as the symmetrically generated sideband at −kΛ (caused by the same F (k) i coefficient) also leads to the same interaction with oscillator i−k; for arbitrary C, these two interactions may need to differ in general. Furthermore, we must also consider higher-order sidebands as well: even if we were to only phase modulate oscillator i at frequency kΛ, weaker sidebands at ±2kΛ, ±3kΛ, and so on are also generated, causing interactions with oscillators i ± 2k, i ± 3k, and so on, respectively. Thus, there is a nontrivial relation between F and the desired couplings C, and F needs to be chosen in a way such that the various contributions of F to the effective couplings among oscillators combine appropriately to give the desired couplings C. To do this formally, we apply the mathematical tools of Floquet theory [19,20].
First, to more explicitly define the problem we are trying to solve, a quantum annealer based on Josephson parametric oscillators can be described by a (rotating-frame) Hamiltonian of the form [16,28] where χ is the Kerr nonlinear rate, δ i is the detuning between oscillator i and the half-harmonic of its parametric drive, and r(t) is the (slowly time-varying) amplitude of the parametric drives; in this work we choose r(t) to be a linear ramp in time for the duration of the computation (see the Supplementary Materials for more details). Here, λ C is a problem-strength parameter dictating the strength of the oscillator-oscillator couplings C ij (which are normalized in this work to max j =i |C ij | = 1). One way to realize such a Hamiltonian is to have O(N 2 ) physical pairwise couplers, which can be programmed given a desired C ij ; these programmed couplings can then be held static throughout the annealing process while the parametric drive is varied. By contrast, our goal is to realize this annealing Hamiltonian via dynamical control of the effective couplings among the oscillators. As a result, we start instead with the (rotating-frame) Hamiltonian (see the Supplementary Materials for a derivation) which is the same asĤ static with the exception of the final coupling term. In this case, J ij are the bus-mediated coupling rates natively present in the system but which are not fully programmable in general. By detuning the oscillators relative to one another by multiples of Λ and applying PM to the oscillators according to δφ i (t), the result is the "native" coupling termĤ native : The time-varying phase factor is due to the PM of the Λ-detuned oscillators, and control over its time dependence forms the core of our dynamical-coupling scheme.
If we denote the coupling term in the static annealing Hamiltonian (1) asĤ target := −λ C j =i C ijâ † iâ j , then the goal is to achieveĤ native ≈Ĥ target , under some appropriate sense of the approximation. As previously mentioned, we choose the modulations to be − sin(kΛt), which means thatĤ native is periodic with frequency Λ. If Λ is much larger than all other system timescales, then the results of Floquet theory allow us to make the Floquet approximationĤ native ≈ λ C j =i C effijâ † iâ j where describes the effective couplings between oscillators i and j due to all the sideband interactions. (More formally, this approximation is the leading-order term in the Floquet-Magnus expansion ofĤ native in 1/Λ.) Thus, all that remains is to choose the coefficients in F such that C effij ≈ C ij . As discussed in the Supplementary Materials, while it is possible to solve for F by direct numerical nonlinear optimization, there are a number of ways to make this precomputation step more tractable and robust. In particular, one can consider a second-order Taylor expansion of C effij (intuitively, by considering effective interactions only up to the second-order sidebands) and obtain a system of quadratic equations that can be numerically solved. Evolution of the success probability of finding a ground state of the inset Ising problem instances, for both a system whose couplings are implemented by static pairwise physical couplers (static coupling) and a system using this work's dynamical-coupling approach (dynamical coupling), for the case of no dissipation. Shown to the bottom of (A) are the joint quantum states of the oscillators (represented as probability distributions |ψ(x1, x2)| 2 and |ψ(p1, p2)| 2 in canonical position and momentum coordinates, respectively) at three different times for the dynamically-coupled system. Initially the system is in a vacuum state (with equal probability on each of the four possible spin configurations); by the end of the evolution, the system is in a coherent superposition 1 √ 2 (|↑↓ + |↓↑ ) (in the qubit subspace), so the probability to measure a ground state is ≈ 100%. (C) The probabilities of obtaining each of the possible spin configurations (i.e., configuration probability) as a function of the evolution time, for the problem instance and simulations shown in (B). Since the problem does not have any external fields, each spin configuration shown is degenerate to flipping every spin; for brevity, we have added together the probabilities corresponding to these degenerate configurations. (D) Correlation matrices of success probabilities Psuccess (evaluated at the end of the computation) between the statically-coupled and dynamically-coupled systems, for 100 instances of the SK7 problem class. (For the trajectory simulation data used to generate these correlation matrices, see the Supplementary Materials.) Here, we also introduce a nonzero field decay rate κ (relative to the timescale χ), to show that the correlations are high even in the presence of photon losses. Projected to the sides in blue and red are the corresponding histograms for the dynamically-coupled and statically-coupled systems, respectively. For simulation parameters and methodology, see the Supplementary Materials.
To demonstrate the effectiveness of our approach, we perform numerical simulations of both the statically-coupled system governed byĤ static as well as the dynamically-coupled system governed byĤ dynamical (with an appropriate choice of F given C), and we show that they achieve nearly indistinguishable results, as one would expect if one has achievedĤ dynamical ≈Ĥ static . Figure 2A shows the results of simulating the quantum evolution of both systems as they perform quantum annealing on a two-spin problem, where the coupling between the spins is antiferromagnetic. We can see that both the statically-coupled system and the dynamically-coupled system obtain the correct final state, which is a superposition 1 √ 2 (|↑↓ + |↓↑ ) in the qubit basis, and would produce one of the correct ground-state spin configurations ↑↓ or ↓↑ upon measurement. Moreover, we see that the evolution of the success probability to obtain a ground state is nearly identical for both architectures at all times, suggesting that the dynamically-coupled system is closely mimicking the behavior of the statically-coupled system, as desired. Figure 2B shows the same evolution of the success probability for a particular N = 4 problem instance, chosen from the class of finite-range, integer-valued Sherrington-Kirkpatrick (SK) instances studied in a foundational quantumannealing benchmark work [38]; in particular, we utilize range-7 graphs (SK7) (see the Supplementary Materials for details about instance generation). We again see that the evolutions of the success probabilities match very closely. Aside from the success probability, Fig. 2C shows the projections of the quantum state onto the spin configurations (of which there are 16 for N = 4), which demonstrates that the two architectures produce nearly indistinguishable evolutions in the projections as well.
In Fig. 2D, we show the results from N = 4 simulations for 100 different randomly generated problem instances (again drawn from the SK7 problem class). We simulate both the statically-coupled and dynamically-coupled quantum annealers and show the correlations between their respective success probabilities. To explore the effects of decoherence due to photon loss, we also examine these correlations for three different values of cavity-field decay rate κ (which is related to the cavity-photon lifetime as T cav := (2κ) −1 ). We see that even in the cases of non-zero loss (κ > 0), both annealers are still able to find ground states with high success probability, which demonstrates the loss-resilience results found in previous work [17,18]. Just as importantly, the correlations remain strong in the presence of photon losses. As a result, the loss resilience of the statically-coupled architecture carries over to the dynamically-coupled Evolution of the success probability of finding a ground state of the inset 10-spin Ising problem instance, for both a system whose couplings are implemented by static pairwise physical couplers (static coupling) and a system using this work's dynamical-coupling approach (dynamical coupling). To estimate the probability, we sample random initial conditions according to a Gaussian distribution (see Supplementary Materials, Ref. [16]). architecture, and even for those instances where the statically-coupled system suffers in success probability due to photon loss events, the dynamically-coupled system follows its behavior as expected.
While it would be desirable to validate our theoretical results with quantum simulations having more than N = 4 spins, full quantum simulations are prohibitively expensive for N > 4. Fortunately, for an oscillator-based quantumoptical system, there is a natural set of classical equations of motion (EOMs) which one can derive from the quantum model; formally, this consists of replacing the annihilation operatorsâ i with coherent-state amplitudes α i ∈ C in the quantum Heisenberg EOMs. Such a set of classical EOMs does not fully capture the dynamics of the system in the quantum regime, but we can nevertheless simulate these classical EOMs for both the statically-coupled and dynamically-coupled annealers and compare their dynamical behavior. If the two systems correspond well, we gain further confidence that our theoretical results can lead to the desired performance on a dynamically-coupled quantum annealer. Figure 3 shows the results of simulating these classical EOMs for both architectures on SK7 problems. Figure 3A shows the evolutions of the success probabilities for a particular N = 10 problem instance, while Fig. 3B shows the evolutions of the field amplitudes α i (t). We see that both the success probabilities and the oscillator amplitudes of the two architectures match well. Figure 3C shows the distribution of success probabilities for different problem sizes N up to N = 50, using 100 problem instances for N ≤ 30, and 30 instances for N = 50; the corresponding correlation plots for N = 8 and N = 30 are also shown to the bottom of the figure (additional correlation plots can be found in the Supplementary Materials). There is good agreement between the simulation results of the statically-and dynamically-coupled systems: the success probability as a function of N scales in the same way, and the correlation plots show similar performance on an instance-by-instance basis.
Having shown that the dynamically-coupled system indeed reproduces the behavior of the desired quantum annealer, we now discuss an important tradeoff inherent in our PM scheme for dynamical coupling. Examining the expression (3) for C eff , we see that a key parameter is the ratio between the native couplings J ij and the desired problem-strength parameter λ C . For simplicity, we take in this work an approach where the bus-resonator interactions are designed in such a way that the native couplings are approximately constant: J ij ≈ λ J > 0 for j = i (see the Supplementary Materials for more details). Under this framework, we can identify the "dynamical-coupling parameter" η := λ C /λ J , so called because intuitively, η captures the "cost" of the dynamical-coupling scheme. More precisely (see the Supplementary Materials for details), for any given problem, there is a fixed, required value for λ C /χ to ensure successful annealing (regardless of whether this is obtained by physical pairwise couplers or through our dynamical-coupling scheme). As a result, a small value for η implies a large required value for λ J /χ. However, as we discuss later, the absolute scale of λ J is generally hardware-limited, either by the physically realizable bus-oscillator coupling rates or the amount of available bandwidth in the system. Assuming we operate at one of these hardware limits on λ J , a small value of η thus implies a correspondingly small value for the nonlinear rate χ, which is unfavorable in the presence of a fixed cavity-photon lifetime. (A full account of the hierarchy of timescales and the chain of parameter requirements are given in the Supplementary Materials.) Therefore, it is desirable to use as high a value of η as possible.
On the other hand, a clear requirement for dynamical coupling to work well is the ability to obtain C eff ≈ C (so thatĤ dynamical ≈Ĥ static ) for any desired C. When η becomes large, however, the prefactor J ij /λ C ≈ 1/η in (3) becomes small. As discussed in the Supplementary Materials, this effect can become detrimental to our ability to obtain C eff ≈ C: intuitively, the maximum magnitude of the elements of C is by construction unity, but for a sufficiently small prefactor for the integral (i.e., sufficiently small 1/η), it becomes impossible to find a suitable set of coefficients F (k) i that could allow the integral over the cosine to compensate. This phenomenon is demonstrated in Fig. 4. Given a particular target coupling matrix C, Fig. 4A shows, over a range of values for η, the C eff matrices corresponding to the optimal F matrix found by our (second-order-Taylor-expansion-based) numerical routine (see the Supplementary Materials). We see that when η is chosen to be too large, the correspondence between C and C eff is rather poor. However, by decreasing η, one can achieve improved accuracy. More quantitatively, Fig. 4B shows the maximum element-wise error in the C eff matrix, as a function of η and problem size N (averaged over an ensemble of 100 instances for each N ). To show how the resulting error depends on the structure of C, we consider three problem classes: the SK7 class discussed above, the class of unweighted MAX-CUT problems with 50% edge density, and the class of unweighted MAX-CUT problems on cubic graphs (see the Supplementary Materials for details about instance generation). As expected from our intuition about the role of η, the particular scaling of the error with η depends on the type of problems considered: for SK7, the required η goes approximately as η ∝ 1/N , while for Dense MAX-CUT, η ∝ 1/(N log N ); in contrast with these two dense problem classes, cubic MAX-CUT problems require η ∝ 1/ log N . From this figure, we see that, assuming we can tolerate an upper limit for the error in C eff (say, of 3 %), there is a maximum value for η that we are able to use for any given N . By staying below this maximal value, we can ensure feasible solutions for the modulation coefficients F (k) i which generate effective couplings to within the acceptable error. In the Supplementary Materials, we derive for each problem class an explicit, functional form of η with respect to problem size N such that we are guaranteed an error of at most 3 %, but which is also not so conservative that the resulting nonlinear rate χ is unnecessarily low.
Finally, to study the technological feasibility of our dynamical-coupling scheme, it is important to understand Comparison of the effective coupling matrices C eff (target Ising problem matrix C shown to the left) for different settings of the dynamical-coupling parameter η, which sets the ratio between the effective coupling strength and the strength of the native coupling provided via the bus. As η is decreased, the accuracy of the C eff increases. (B) The maximum entry-wise error in the effective coupling matrix C eff , averaged over 100 problem instances for each size N , as a function of the dynamical-coupling parameter η. We show the scaling for three different classes of Ising problems, with illustrative examples of instances shown as insets. For all panels, the modulation matrix F used to produce C eff is found by a numerical optimizer that attempts to maximize the accuracy by using a second-order Taylor approximation to (3), with native couplings Jij constructed to be approximately uniform (see the Supplementary Materials).
how this tradeoff between the accuracy of the effective couplings and the nonlinear rate-in combination with the requirements for the Floquet approximation-translate into concrete scaling requirements for hardware figures of merit as a function of problem size. As previously mentioned, two relevant hardware limitations are the maximum coupling rate g max between each oscillator and the bus, as well as the maximum realizable detuning ∆ max between the bus and each oscillator. In the presence of these two limitations, we find (see the Supplementary Materials for a full analysis) that there is a maximum allowable nonlinear rate χ max for dynamical coupling to work.
In Fig. 5A, we show χ max as a function of problem size N , upon varying the hardware limits g max and ∆ max (for the SK7 problem class; see the Supplementary Materials for MAX-CUT). We observe that at fixed values of g max and ∆ max , χ max decreases as N increases. To more concretely interpret the implications of the requirement χ ≤ χ max , we note that κ χ is a necessary condition for the annealer to operate in the quantum, rather than the dissipative, regime. Thus, the requirement of χ ≤ χ max translates directly to a minimum required cavity-photon lifetime T cav = (2κ) −1 . Taking κ = χ/10 for concreteness, we plot the minimum required cavity-photon lifetime T cav on the right axis of Fig. 5B as a function of N , for g max /2π = 20 MHz and ∆ max /2π = 5 GHz. This plot indicates that T cav ≈ 100 µs is required to achieve N = 1000. The values for g max and ∆ max are readily achievable experimentally; regarding the T cav requirement, transmon qubits with T 1 times beyond 500 µs have been measured [39]. We also mention that, in the spirit of Refs. [3,8], one could also ignore this latter requirement on T cav and build a system with the best T cav one can achieve-even if it is smaller than the required T cav to satisfy κ = χ/10-and experimentally explore the performance of a highly dissipative quantum annealer. For a small experimental demonstration that still obeys the κ = χ/10 requirement, T cav ∼ 10 µs should be sufficient to realize an N = 10 version of the system; this is a very realistic parameter regime for current experiments.

III. DISCUSSION
There are two fundamental tradeoffs that one is making by adopting our dynamical-coupling architecture versus a static-coupling architecture: firstly, it is necessary to perform some classical precomputation to obtain the phase modulation coefficients (i.e., the matrix F ), and secondly, each oscillator's cavity-photon lifetime in a dynamicalcoupling architecture will likely need to be longer than it would in a static-coupling architecture (this is a consequence of the required native coupling strength λ J /χ, and hence the required cavity-photon lifetime, increasing with the desired accuracy of the achieved couplings C effij ). In exchange for accepting these two downsides, one is able to build a fully programmable, fully connected N -spin quantum annealer with a number of qubits and couplers that only scales as N , as opposed to current proposals for statically-coupled quantum annealers, which require a number of qubits and/or couplers that scales as N 2 .
Our method for computing F -by minimizing a second-order Taylor approximation of effective coupling errorrequires only O(N 3 ) time, which is efficient, and remarkably so given that the Ising problem matrix contains O(N 2 ) entries in general, hence the runtime could at best be O(N 2 ). Moreover, in our numerical experiments, the wallclock times when running our implementation of the method on a single-core processor were approximately 2 min for N = 1000, so the precomputation time should not be a significant practical concern, given the difficulty of solving hard instances of Ising problems. With regards to the required cavity-photon lifetime, we have provided (in the Supplementary Materials) a prescription for the parameters in the dynamical-coupling scheme such that the error in the realized couplings will be at most ∼3 %, and we have computed the resulting lifetime requirement as a function of N . Our scaling analysis shows that our dynamical-coupling approach could be realized for N = 1000 with existing superconducting-circuits technology, provided that JPOs with cavity-photon lifetimes of ∼100 µs can be engineered.
In the current approach from D-Wave Systems [6,8,40], as well as the proposed approaches in Refs. [18,32], realizing a 1000-spin quantum annealer for fully connected problems would require on the order of 10 6 qubits. This stands in contrast to our proposed architecture, which would require only 1000 oscillators to achieve the same number of spins. Beyond the engineering expense of implementing a relatively larger number of qubits, quantum annealers using problem embedding can also suffer from an additional exponential degradation in success probability [6]. The dynamical-coupling approach uses the minimal number of physical qubits possible, and hence avoids this additional penalty in performance.
Outside the context of quantum annealing for solving optimization problems, our work has a strong connection with Floquet engineering of quantum simulators and quantum spin chains [41][42][43][44][45]. A quantum annealer, in addition to being an optimization machine, is also a realization of a quantum simulator of the transverse-field Ising model [46]. We anticipate that our techniques for dynamical control of the couplings in a spin system will also enable the realization of novel simulation capabilities for more general spin Hamiltonians. Furthermore, our technique and derivations are not restricted to simulators of spin systems, but apply directly to generic bosonic simulators, and may allow more complex Hamiltonians to be engineered than are currently realized with pairwise physical couplers in Bose-Hubbard simulators built with superconducting circuits [47][48][49][50][51]. engineering: Supplementary Materials This document is structured as follows: • In Section S1, we describe our design principles, based on quantum Floquet theory, for programming arbitrary oscillator-oscillator couplings via dynamical control of a set of oscillators with a fixed set of all-to-all native couplings. We cast the problem as a generic, hardware-independent design problem of specifying phase-modulation indices for each oscillator. When the modulations required are small, we provide approximate solutions to the design problem in the form of analytic solutions at first order as well as an efficient numerical routine at second order. We also perform an analysis of the accuracy of the effective couplings and identify the dynamical-coupling parameter as a pivotal quantity in controlling the errors. Finally, we provide some guidelines for choosing the Floquet frequency in order to ensure that the Floquet approximation holds.
• In Section S2, we review the use of nonlinear Kerr parametric oscillators (KPOs) in quantum annealing to solve Ising spin problems. We present an archetypal model of a statically-coupled KPO network as introduced by Ref. [16], whose performance and solution mechanism we aim to approximate with our dynamical-coupling scheme. We also elaborate on some of the technical operational requirements of the statically-coupled annealer with respect to cavity field decay rate and problem-instance edge density, which we incorporate into our dynamically-coupled annealer.
• In Section S3, we present a hardware model for the realization of the KPO in the form of a Josephson parametric oscillator (JPO). We derive the relationship between the flux control signals applied to the JPO and resulting instantaneous frequency of the oscillator; the flux control signals are the control variables for achieving dynamical coupling. We also derive a model for an all-to-all-coupled network of JPOs using a common bus resonator, which provides the fixed native couplings that our dynamical-coupling scheme transforms into programmable effective couplings. We formulate and address the design problem that arises in performing dynamical coupling: what control signal should one apply to the flux lines in order to realize the dynamical modulations prescribed by our dynamical-coupling scheme given in Section S1?
• In Section S4, we analyze the structure of the native coupling matrix and how it is affected by both the dynamical modulations (which can lead to temporal variation in the couplings) as well as the bus-oscillator detunings of the oscillators (which affect the uniformity of the native couplings). Using this analysis, we give a prescription for choosing the bus-oscillator couplings and nominal detunings of each oscillator, in a way that ensures that the native coupling matrix possesses desirable properties such as approximate time-independence and uniformity while obeying all other constraints imposed by the analyses of previous sections. In particular, these properties dramatically simplify the procedure for solving the control design problem formulated in Section S3.
• In Section S5, we bring together the results of the above sections to give, for any desired Ising problem matrix, an explicit prescription for choosing the system parameters and modulations in order to realize dynamical coupling in a bus-mediated JPO network, while still respecting the constraints and approximations made in the above sections. With this prescription in hand, we then analyze the implications for scalability in the context of hardware limitations on the oscillator-bus detuning (i.e., system bandwidth) and the oscillator-bus coupling. We find that the main implication is that a maximum allowable nonlinear rate is imposed on the system, which translates into a minimum required cavity photon lifetime that generally increases with problem size.
• In Section S6, we describe the open-quantum system formalism we use to study the dynamics of the quantum annealer in our simulations. We also detail how we compute the metrics we use in our main manuscript figures, such as the success probability and spin configuration probabilities. We derive a classical model which is amenable to simulations of larger problem sizes in order to gain confidence that dynamical coupling can be achieved for more than a handful of oscillators.
• In Section S7, we define more formally the problem classes (SK7, Dense MAX-CUT, and Cubic MAX-CUT) we use in this work to study the performance of the dynamical-coupling scheme. We also briefly describe how we generate instances drawn from these problem classes.
• Finally, in Section S8, we provide additional data from our simulations which may be relevant or useful to interpreting our main results. In particular, we present the trajectories that resulted from simulating all 100 problem instances we used in our quantum simulations, as well as the correlation matrices for every problem size we considered in our classical simulations.

S2
Finally, some notes on our conventions: 1. Throughout this work, we set = 1, so that energy (i.e., our Hamiltonians) are in units of frequency.
2. Furthermore, whenever the limits of a summation are not explicitly stated, it is implied that the index runs over the set {1, . . . , N }, where N is the problem size.

S1. DESIGN PRINCIPLES FOR DYNAMICAL COUPLINGS
In the context of engineering dynamical couplings, we are primarily concerned with the problem of designing a set of control variables to cause a Hamiltonian possessing native couplings to approximate some other desired target Hamiltonian exhibiting static couplings of a particular form. More concretely, we suppose we start with a native Hamiltonian having the oscillatory form is the magnitude of the native coupling between modesâ i andâ j , and φ i (t) are scalar functions of time. This Hamiltonian can be interpreted as the excitation-exchange energy between oscillators with nominal native couplings J ij (t) and where φ i (t) is the total accrued phase of oscillator i in the Hamiltonian's frame. We show in Section S3 one realization of such a Hamiltonian in a system of nonlinear oscillators, in which time-dependent control of the functions φ i (t) can be designed. Our goal of engineering dynamical couplings is to achievê where C ij (satisfying C ij = C ji and max ij |C ij | = 1) is the desired target coupling between modesâ i andâ j , while the problem-strength parameter λ C ∈ R + controls the overall strength of the target couplings. In general, if we only have limited control over J ij (t) and φ i (t), it is impossible to satisfy such a relationship (i.e., obtainĤ native =Ĥ target ) for all times. To solve this problem, we utilize quantum Floquet theory [19]: ifĤ native is a periodic Hamiltonian with period 2π/Λ, where Λ ∈ R + is the Floquet frequency, then it can be approximated by its time average over one period: provided that Λ is sufficiently large (i.e., larger than any other energy scale in the system Hamiltonian) [19]. The evolution of the system underĤ eff matches the evolution underĤ native at periodic times 2πk/T , where k ∈ Z. For this reason, the dynamics under the "Floquet approximation"Ĥ native ≈Ĥ eff is a stroboscopic approximation to the original dynamics. (Formally this is a leading-order approximation in the Magnus expansion ofĤ native [19].) We discuss in this section how we use the Floquet approximation can be used to solve the design problem (S1.2).

A. Floquet design problem
In principle, we can use any form of φ i (t) that causesĤ native to satisfy the Floquet approximation while achieving (S1.2). For concreteness, however, we choose the following ansatz for the form of φ i (t): This ansatz allows us to reduce the design problem for φ i (t) into a design problem for the (scalar) Floquet design parameters F is alternative named phase modulation coefficients.) Note that we also assume that J ij (t) can be made periodic with period 2π/Λ. (Preferably, J ij (t) is approximately constant on the timescale of Λ.) Interpretationally, if the energy shifts due to other terms of the Hamiltonian andδ φ i (t) are small, then the mean frequency of the ith oscillator is approximately ω 0 −(i−1)Λ, corresponding to evenly-spaced detunings by the Floquet frequency Λ. The functions δφ i (t) then represent additional dynamical phase modulation applied to each oscillator, which are used to engineer the effective interactions, with parametric controls represented by the scalars F (k) j . Provided that the φ i (t) have the form prescribed in (S1.4), then the Floquet approximation tells us that where the Floquet integral is Thus, instead of trying to solve (S1.2) directly for all times, we can instead try to satisfy This is the Floquet design equation, which results in (S1.2) holding true in a time-averaged sense. We note that the time averaging does not preclude the possibility of designing a slow or adiabatic variation in, for example, λ C (i.e., that it be a function λ C (t) of time), since any variation on timescales much longer than 2π/Λ is approximately constant in the Floquet integral; furthermore, such adiabatic variations need not be periodic in Λ. The design parameters F can be interpreted in a signal-processing sense as phase modulation indices since they cause the phase of the cosine function in (S1.5b) to modulate in time. Thus, our scheme for achieving dynamical coupling can be equivalently viewed in a signal-processing framework as a phase-modulated coupling scheme.

B. Small-modulation approximation
One way to solve the Floquet design equation in general involves inserting the appropriate expression for J ij and performing an optimization or root-finding search for F (k) j by iterated evaluation of the integral. However, under certain special assumptions and/or restrictions, it is possible to simplify the problem further.
A number of assumptions can be made to simplify the Floquet equation. Here, we consider one such special case where 1) the time dependence of J ij (t) is negligible; and 2) the magnitude of the phase modulation terms satisfy where ζ 1. Then we can expand (S1.5b) to first order in ζ (i.e., around ϕ ij = 0) and (S1.6) becomes, to first order, for i > j. Thus, the problem reduces to a simple set of linear equations in F (k) i , which is readily solvable. We call this case the time-independent, small-modulation approximation. This equation likewise has an interpretation in terms of signal processing. In the rotating frame of the coupling interaction, the ith oscillator has carrier frequency iΛ, and so phase modulation at frequency (i − j)Λ produces sidebands on oscillator i that overlap with oscillator j at carrier frequency jΛ.
There are a number of general solutions to (S1.8), due to the symmetries of the equations. One solution prescribes that the ith oscillator be driven by i − 1 different frequencies, according to the recursive relation with a base case of F Here and elsewhere, C Another solution, which emphasizes symmetry in the design parameters, is given by As discussed later in Section S1 D, this choice of solution centers the phase modulation strength on the middle oscillator (i = N/2), which makes the dynamical-coupling scheme more robust to noise. (See Section S1 D for further analysis of this solution.) Unless otherwise noted, this is the symmetry choice we will henceforth utilize for the first-order analytic solution.
While the first-order Taylor approximation provides an analytic form for the design parameters, we can further improve the approximation by expanding the Floquet integral to second order in ζ. At second order, (S1.6) becomes In contrast to the first-order expansion (S1.8), it is not possible in general to analytically solve this set of equations. However, as an algebraic system of second-order equations in F (k) j , it is still substantially easier to solve numerically than the full Floquet problem (S1.6), and the residual errors are almost negligible even for realistic modulation depths (which stands in contrast to the situation for the first-order expansion). Further expansions in ζ are also possible, at the cost of algebraic complexity.

C. Numerical optimization for Floquet design parameters
As previously mentioned, the Floquet equation can be solved either by direct root finding or numerical optimization (specifically, minimization) over F (k) i of the objective function (S1.12) given two matrices λ C C and J assumed to be time-independent here for simplicity, which allows us to perform a change of variables t → t/Λ to scale out the Floquet frequency Λ. While direct root finding is generally prohibitively expensive, optimization of (S1.12) usually works well for small N . However, this optimization occurs over O(N 2 ) variables (the number of F (k) i entries), which degrades the quality of solutions for larger N (at least, for local methods and/or fixed optimization time).
To get around this issue, our preferred method for solving the Floquet equation is a "column-by-column" approach. This is inspired by the empirical observation that, given any particular set of design parameters F (k) i , modifying the values of the kth "column" F primarily (though not exclusively) produces changes in the kth diagonal of the Floquet integral I 1,1+k , I 2,2+k , . . . , I N −k,N (and, symmetrically, in the −kth diagonal). That is, instead of doing a single optimization over O(N 2 ) variables, we can opt to perform O(N ) different optimizations, each over O(N ) variables. Of course, because the various values of k are in fact coupled, this approach is unlikely to produce the global optimum. Nevertheless, we have found that error values of |C ij − I ij /λ C | ∼ 10 −4 are readily achievable for problem sizes N 100, and it is reasonable to expect this behavior to continue for larger N as well.
To be more precise, while optimizing column k, we utilize the more restricted objective function where the "cached" functions are which contain all the values of F (l) i that are not currently being optimized over and are thus static from the perspective of the kth-column optimization. (To see the connection between (S1.13) and (S1.12), consider k := i − j, so that i = j + k.) Thus, we need only update u (k) j right before starting optimization over the kth column. In this approach, we see that the objective function (S1.13a) costs O(N 2 ) to evaluate: a factor of N for the summation and a factor of N for the integral, which needs O(N ) samples in time. The updating of the cache S6 functions u for each fixed k. Comparing (S1.12) and (S1.13), we improve the cost of the objective function by O(N ), which may be significant depending on how the cost of the optimizer scales with the cost of the objective function, compared to the cost of the cache update.
As a technical point, we also note that in (S1.13a), there are only N − k elements in the summation, but as written, there are N different values of the Floquet design parameters F (k) i ; this redundancy can be eliminated by a choice for the symmetry of the design parameters. For example, in the form chosen for (S1.10), only F Similarly, this column-by-column approach can also be adapted to perform numerical optimization in solving the second-order equations (S1.11). In this case, the objective function is where the binary-valued step function Again, these cached variables only involve values of F (l) i that are not currently being optimized over and are thus static from the perspective of the kth-column optimization.
In this second-order approach, the objective function (S1.14a) costs only O(N ) to evaluate, while updating the cached variables costs O(N 2 ). Thus, the second-order approach is more efficient than the full-order approach (S1.13) by O(N ), at the cost of having some residual error (which itself is already better than the residual error in the first-order analytic solution).
If the optimization is performed using a gradient-based algorithm, the second-order approach also has the advantage of having a simple analytic gradient. The gradient of (S1.14) in the direction of F (k) i is given by The partial derivatives in (S1.15a) are not explicitly evaluated because their values depend on the choice of symmetries in F (k) j ; in an upper-triangular form, for example, the first term would be zero and the second a Kronecker delta; for a symmetry choice like in (S1.10), the matrix of partial derivatives would be a sparse matrix. Thus, despite the sum in (S1.15a), the evaluation of a single element of the gradient has O(1) cost, as long as the matrix of partial derivatives has O(1) nonzero entries per column (i.e., per fixed k).
In summary, provided that we choose a parametrization resulting in a sparse matrix of partial derivatives in F To validate this analysis, we implemented and ran a program using the above optimization method for solving the second-order Floquet design problem. To test the scaling of the optimization procedure, we run it on random instances of SK7 across different problem sizes. The results of this numerical experiment are summarized in Figure S1, from which it is clear that 1) the design problem can be solved with only moderate processing time, even for a problem size of N = 10 3 ; and 2) the scaling at large N goes as O(N 3 ), as was argued in the discussion above. FIG. S1. Time taken to solve Floquet design problem. Time taken to solve the Floquet design problem using a secondorder, "column-by-column" approach as described in Section S1 C and implemented in Julia. The optimization time is the time required to compute the F (k) j Floquet design parameters and is shown, for each problem size N , as an average over ten SK7 problem instances. For the procedure used to select the structure of the native couplings Jij, see Section S5 A, and for details on the optimization routine, see Section S1 C 1. This calculation was performed with a single Julia process (i.e., using a single core) running on an Intel i7-5600U CPU at a nominal clock speed of 2.60 GHz. Note that error bars (standard error on the mean) are shown but are not visible on all points.

Additional details on numerical optimization routine
For the purposes of performing simulations and other numerical studies in this work, we implemented a straightforward version of the optimization routines described above. Our program is written in the Julia language (version 0.6.4), and the numerical optimization is performed by the Julia package Optim.jl (version 0.14.1) [52]. We mention here some additional details particular to our methodology, which may be relevant to the interpretation of our results.
We utilize the column-by-column approach as described in (S1.13) or (S1.14) (for a second-order Taylor approximation to the former), since it is more scalable than the joint approach described by (S1.12). In this approach, there are N − 1 objective functions, one for each value of k in (S1.13) and (S1.14). In each of these objective functions, the optimization variables are F Thus, if the Floquet design parameters F (k) j were collected as a matrix, then the kth objective function involves optimizing over the kth "column" of that matrix. The goal is to iteratively optimize each of these columns, using the first-order analytic solution (S1.10) as the starting point.
For each column being optimized, we make a call to the optimization routine provided by Optim.jl, using the conjugate gradient method with at most 250 iterations (i.e., steps) [52]. The value of the attained objective function minimum is then analyzed. If this minimum is below a certain tolerance level, then we deem that column to be "converged" and remove it from the set of columns that require optimizing further. This tolerance level is chosen for this work to be (10 −3 η) 2 (N − k), where η is the dynamical-coupling parameter (see Sections S1 D, S1 E, and S5 A). Interpretationally, this choice of the tolerance implies that the average element-wise residual error in solving the Floquet design equation (S1.6) is approximately 10 −3 , at least for the case of approximately uniform native couplings J ij ≈ λ J . (However, we note that for the second-order optimizer, there is also a residual error due to the Taylor approximation as well, which increases with λ C /λ J = η and can in fact dominate the residual error.) Once every column to be optimized has been visited, the program then recomputes the objective function values for each column that has converged. Since the various columns are coupled, it is possible that the objective function value for, say, k = 1, previously below the tolerance, now exceeds the tolerance due to the optimization over, say, k = 5. Thus, if any column that has converged now exceeds the tolerance, it is added back into the set of columns requiring additional optimization. This new set of columns to optimize then becomes the starting point for another "pass", and this whole procedure repeats until either 1) the number of such passes exceeds some maximum (set to 25 in this work), or if there are no longer any columns left to optimize. Furthermore, a column which fails to improve its objective function value by more than a fixed amount (set to 1 % in this work) on two consecutive passes is also removed (permanently) from the set of columns to optimize. Finally, at the end of the routine, if any of the objective function values are worse than those of the first-order analytic solution (S1.10) (specifically, exceeding the latter by more than 1 % of the tolerance level), then the optimization results are rejected and the latter is returned.

D. Structure of the Floquet design parameters
Since the Floquet design parameters dictate the physical behavior of the oscillators undergoing phase modulation, it is useful to understand the general structure of F (k) j . Here, we provide some qualitative analysis which yields relationships that will be useful later on in the analysis of errors and scaling in hardware-specific contexts. More specifically, we analyze the small-modulation, first-order solution (S1.10): in most of the situations we are concerned with, the numerical solutions do not drastically change the qualitative structure at first order.
The small-modulation solution (S1.10) depends on both the target couplings C ij as well as the native couplings J ij . Let us denote by f (k) j the first-order small-modulation solution to the design problem with J ij taken to be a uniform and time-independent (i.e., J ij (t) = λ J for i = j and λ J ∈ R + ). This case anticipates approximations we later apply in which the hardware-provided native couplings are, in fact, quite close to this form (see Section S4).
We see that the dynamical-coupling ratio naturally appears in the expression for (S1.10), and it fully captures the dependence of f (k) j on λ C and λ J . However, the utility of the ratio η extends beyond just analyzing f (k) j ; as we will see, η plays a critical role in determining both the performance (see Section S1 E) as well as the scaling (see Section S5) of the dynamical-coupling scheme. Intuitively, it captures how much of the native coupling strength (represented by λ J ) is utilized to generate the effective target coupling strength (represented by λ C ).
We present in Figure S2 the average magnitude of f (k) j for each of the representative problem classes. In Figure S2 A, we see that, quite generally, the oscillators at the two extremes of large and small detunings (i.e., nearer to the first and last rows) experience greater overall phase modulation than the oscillators closer to the middle. We also see that the magnitude of the phase modulation generally decreases as the harmonic of the modulation (i.e., the row index) increases. In addition, Figure S2 B, highlights the fact that the modulation strength for SK7 problems decays more slowly compared to the MAX-CUT problems.

S9
The latter behavior can be understood by observing that (S1.10) in this case simplifies to (S1.17) Assuming that (N − 1)/k ≈ (N − 1)/k and making use of the statistical properties of each problem class, we can derive that (S1. 18) We see that different problem classes have different scaling with respect to N and k. In the case of SK7, the target couplings C ij have random alternations in sign, which causes cancellations in the sum of (S1.17), so that the mean magnitude decays more slowly with modulation frequency. On the other hand, for sparse problem classes, there are asymptotically fewer terms in (S1.17), leading to a different scaling. This functional form is shown in Figure S2 B, which indicates good agreement with empirical results, on average.

E. Effective coupling error and the dynamical-coupling parameter
For the dynamical-coupling scheme to accurately approximate the target Hamiltonian, it is important for the effective couplings under time-averaging to approximate the target couplings accurately. Given a set of desired target couplings λ C C ij and a (time-independent) set of native couplings J ij , suppose we have solved the Floquet design equation (S1.6) for the design parameters F (k) j (using, for example, the methods of Section S1 C). We then define to be the effective couplings produced by that set of F (k) j . Of course, if the Floquet design equation is solved well, C eff should be close to C. Nevertheless, it is clear that C eff can be different from C, especially if, for example, we use a "column-by-column" optimization approach or derive our objective functions from a small-modulation approximation.
For this work, we define the error of the effective couplings to be This error metric of taking the maximum over all entries is conservative (one could choose to compute an elementwise average error instead, for example) and characterizes the largest element-wise deviation of the effective couplings from the target ones. We aim here to characterize the error E C and to connect the scaling of the error with the dynamical-coupling parameter η.
In Section S1 B, we showed how the Floquet design equation can be simplified by an expansion in the modulation depth ζ. Intuitively, we expect the numerical algorithms presented in Section S1 C to perform better and produce lower error when the expansion parameter ζ is small. By examining (S1.7), ζ is limited by the largest ϕ ij . Furthermore, we have observed in Section S1 D that oscillators with the smallest and largest detunings (i = 1, N ) are driven most strongly (so that ϕ 1N is largest), so we can reason that on the equality. We now further assume, as argued in Section S1 D, that the structure of F (k) j is similar to the structure of f (k) j , derived in the first-order, constant-J ij approximation. Then using (S1.18), we arrive at the conclusion that ζ is, to a good approximation, linearly proportional to the dynamical-coupling parameter η, at least on average over problem instances. This in turn suggests that the error of the effective couplings reduces as η is made smaller. This is physically intuitive, as a small imposed ratio between the target coupling strength and native coupling strength implies more flexibility in engineering the system to approximate the desired target couplings.
Taken together, this discussion motivates the assertion that η is, in addition to Λ, one of the key dynamical-coupling parameters that determines the performance of the scheme. At the same time, given a fixed desired λ C /χ in the target Hamiltonian, a small value of η implies a large value of λ J /χ = (λ C /χ)/η, which in turn implies a small value of the nonlinear rate χ; this latter condition imposes a more stringent requirement on the field decay rate (see Section S5 C 3 for more details). Thus, a natural tradeoff is to choose the largest η that achieves an acceptable error.
It is reasonable to expect that, for any given problem instance, there is a precise value of η such that the Floquet design equation (S1.6) fails to have a solution. While this "threshold" η varies across problem instances, we have found empirically (e.g., through numerical optimization) that, even on ensemble average over a problem class, there tends to be a sharp boundary between those values of η with acceptable errors after optimization and those for which the optimizer cannot improve the error relative to the first-order analytic solution (i.e., there is a dramatic change in the error once η exceeds a certain threshold value). This boundary value of η is a function of N , and we can understand its functional form by the following argument. We first substitute the analytic formulas for the mean magnitude of the Floquet design parameters f (k) j from (S1.18) into F (k) j in (S1.21). After some simplifying approximations (such as N − 1 ≈ N ), this results in (S1.22) Heuristically, when ζ exceeds a certain modulation depth, the Floquet design equation tends to have no solution. If we denote this nominal value by ζ max , then we can solve for η in the above to conclude that if we want a modulation depth smaller than ζ max , we should take η ≤ η bound , given by (S1.23) Figure S3 shows (in green) the range of values of η for which our numerical optimizer finds acceptable errors. Alongside this data (in pink) are the functional forms for η bound derived above. We see that the functional forms are reproduced well across the different problem classes, while the offset can also be made to agree by setting ζ max to reasonable values (of order unity).
The upshot of this discussion is that the highest possible value of η as a function of N (and hence the best-case scaling of the dynamical-coupling scheme as a whole) is constrained by η bound . In practice, however, finding the true solution of the Floquet design equation-or even being able to use the full-order numerical solver-is only possible for small problem sizes. As mentioned in Section S1 C, when the problem size is large, we generally turn to a second-order Taylor approximation approach for the optimizer. In this case, the error of the resulting solution is no longer sharp, as the quality of the second-order approximation (S1.11) (which is used to construct the objective function) also depends on η (through ζ). In this case, we need to qualify our discussion by taking into account the choice of acceptable error. For simplicity, we consider E C ≤ 3 % as an acceptable error in this work.
In Figure S4, we show the errors which are empirically attained by the second-order numerical optimizer. The green curve shows the values of η that achieve the acceptable error of 3 %, below which the error is generally smaller. As expected, the 3 % error boundary is qualitatively similar in shape to η bound (cf., Figure S3), except for an overall offset due to the Taylor expansion. Thus, even though the second-order approach produces larger error overall, the acceptable-error cutoff for η scales in much the same way as the fundamental upper-bound η bound discussed above.
Motivated by these observations, we choose η in this work by taking our theoretically derived fundamental bounds of η (i.e., η bound ) and multiplying them by an empirically determined multiplicative factor: (S1.24) This choice of η is plotted in blue in both Figures S3 and S4. Because of the scaling depicted in Figure S4, we have reasonable confidence that even if a second-order approach is used for solving the Floquet design problem, we are within a constant factor of the optimal scaling possible, as implied by Figure S3. As such, we use this prescription for η in all our simulations in this work as well as for the scaling analysis performed in Section S5 C 3. As a technical point, we note that in Figure S4, there is a crossover between the blue and green lines, indicating that at smaller N , (S1.24) is not quite enough to ensure E C ≤ 3 %. However, we can circumvent this by simply using the full-order solver for N ≤ 100 (which is efficient at small N ), and only switch to the second-order approach for N > 100. Of course, this technicality does not affect any of our scaling arguments. "column-by-column" approach as described in Section S1 C-achieves low error EC, as a function of problem size N . Specifically, we choose EC ≤ 3 %, although the qualitative features are insensitive to this choice due to the sharpness of the error transition. The error is computed as a mean over ten problem instances, and a mesh grid of 8 points (in N ) and 31 points (in η) is used to generate the boundary curve. The pink lines show the heuristically derived functional forms for η bound from (S1.23). The blue lines indicate the choice of η as a function of N used in this work, as given by (S1.24). For the choice of the native couplings Jij, refer to Section S5 A; for details on the numerical optimization routine, see Section S1 C. Heat map of the error EC resulting from the numerical solution to the Floquet design equation using a second-order, "column-by-column" approach as described in Section S1 C, plotted as a function of the dynamical-coupling parameter η and problem size N . The level curve for an acceptable error of EC = 3 % is shown in green. The error is computed as a mean over 100 problem instances for each point. The blue lines indicate the choice of η as a function of N used in this work, as given by (S1.24). For the choice of the native couplings Jij, refer to Section S5 A; for details on the numerical optimization routine, see Section S1 C. Note that to save on computation, we omit the region where η has safely exceeded η bound (namely, the upper-right triangular region in the left and center plots, around where the colormap saturates); by extrapolation, we take the error in this region to exceed 1.

F. Requirements on Floquet frequency
Finally, we discuss the Floquet frequency Λ, which sets the periodicity of used in the Floquet approximation to be 2π/Λ. As mentioned above, the natively-coupling Hamiltonian (S1.1) is approximated by the time-average effective Hamiltonian (S1.3) only if Λ is much greater than all other system rates. If this assumption is not met, then it is possible that the physics of the dynamically-coupled system fails to approximate that of the target statically-coupled system, despite the Floquet equation (S1.6) being met (i.e., despite C effij ≈ C ij ). More precisely, we require Λ = A × max (all system rates) , (S1. 25) where A 1 is a large multiplicative factor. As will be discussed in later sections-see especially Sections S2, S4,and S5 C 3 (and as presented in the main manuscript), the most relevant rates for our system are χ, δ i , r max , and λ J . Comparison between the behavior of the staticallycoupled target system (red, in back), vs. that of the corresponding dynamically-coupled system using a range value for the Floquet frequency Λ (parameterized by the factor A, as defined in (S1.25)). The metric plotted here is the success probability (see Section S6 A) on a particular problem instance (specifically, the instance used for Figure 2 B in the main manuscript).
Here, the physical model for the oscillators is a bus-coupled system of JPOs (see Section S3), under appropriate approximations (see Section S4) and with parameters chosen according to Section S5 A. For quantum simulation methods, see Section S6 A.
In order to determine the value of A that suffices, we simulate both the target statically-coupled system and the corresponding dynamically-coupled system using different values of A and examine the respective evolution of a key performance metric, the success probability (see Section S6 A). In Figure S5, we show one example of how the dynamically-coupled system progressively matches the evolution of the target statically-coupled system (at least, on timescales greater than 2π/Λ) as A is increased. For this particular problem instance and system parameters, we see that the value of A should be larger than 40 for good correspondence.
From such empirical observations, we conservatively use in this work a value of A = 100 for quantum simulations. For simulations of the classical EOMs, similar empirical investigations show that a larger value of A = 200 is needed for comparable convergence. S13

S2. QUANTUM ANNEALING WITH KERR PARAMETRIC OSCILLATORS
In this section, we review some operational principles for solving Ising problems with networks of Kerr parametric oscillators (KPOs) via quantum annealing. A number of approaches towards this end have been proposed [16][17][18], but for simplicity, we focus on the directly-and statically-coupled model used in Ref. [16], as this is the model that our dynamical-coupling scheme aims to approximate.
A generic KPO network is represented by a Hamiltonian of the general form where χ > 0 is the Kerr nonlinear rate, r is the amplitude of the parametric pump drive, δ i is the detuning of the ith oscillator from the half-harmonic of the pump, λ C C ij is the strength of the coupling between oscillator i and j. Generally, the symmetric coupling matrix C ij is chosen to be that of a (bias-free) Ising problem whose ground state is desired. Finally, although implicit in (S2.1), all these parameters can potentially be time-dependent, varying as needed by the annealing procedure.
In the absence of coupling and detuning, the eigenstates of each KPO are the 0-phase and π-phase coherent states |+r/χ and |−r/χ , which can be utilized in a quantum annealer to encode, respectively, the spin-up and spin-down configurations of an Ising spin system. In a quantum annealer based on this encoding, the state of the KPO network after the annealing procedure is ideally the multimode Schrödinger cat state where σ is the ground-state configuration of the Ising problem of interest; and it comprises σ = (σ 1 , . . . , σ N ) with σ i ∈ {−1, 1}. Given this state, measurement of the sign of the in-phase field amplitudes of the KPOs produces solutions to the Ising problem. We can consider two approaches in which such a KPO network can be used for performing quantum annealing and obtaining the final state (S2.2). One way is to take a purely qubit-based approach [2], as the KPO has been shown to exhibit a qubit subspace (spanned by |±r/χ ) in which (S2.1) can be used to realize universal control [27,28]. In this approach, the individual KPOs are each first prepared in the Pauli-X +1-eigenstate, which corresponds to the cat state |r/χ + |−r/χ ; as described in [28], this can be done by engineering the evolution of the pump parameter r appropriately. After the preparation step, the annealing procedure then requires realizing and controlling two Hamiltonian terms in the qubit basis. The first is the so-called driver Hamiltonian iX i , which can be realized in the qubit subspace by proper choice of the detunings δ i [27,28]. The second is the problem Hamiltonian i =j C ijẐiẐj , which is realized in the qubit subspace by the coupling term in (S2.1). The key point here is that the state (S2.2)when interpreted in the qubit subspace-is the ground state of this latter Hamiltonian, so that the end result is the state (S2.2), provided the assumptions of the quantum adiabatic theorem are met (i.e., in the context of standard adiabatic quantum computation). In this approach, the pump is held fixed to stabilize the two coherent states |±r/χ and ensure that the system remains in the qubit subspace, while the coupling term is gradually turned on by adiabatically time-varying λ C .
For our work, however, we focus on a more recent continuous-variable approach, as exemplified by Ref. [16] but also utilized in Refs. [17,18]. This approach has the advantage of possessing a straightforward classical limit, allowing us to numerically study scaling to larger problem sizes. In contrast to the purely qubit-based approach, the initial state of the quantum annealer in this continuous-variable approach is the vacuum state. Then, the quantum annealing procedure is realized by time-varying the parametric pump drive according to some function r(t), where r(0) = 0. A useful choice for r(t) takes the form of a linear ramp with slope r max /T ramp clamped at r max : The detunings δ i are static but independently tuned for each oscillator such that the highest-eigenvalue eigenstate of (S2.1) at r = 0 is the vacuum, and one satisfactory choice is [16] This choice ensures that the initial vacuum state of the oscillator network is adiabatically connected to the multimode cat state (S2.2) at the end of the computation, thus implementing quantum annealing [16]. In this approach, the S14 coupling term λ C i,j C ijâ † iâ j is also held static throughout the procedure, with some suitable choice of the problemdependent strength parameter λ C ; we discuss this tuning parameter further in Section S2 A below.
Of course, these networks of KPOs are not only described by the Hamiltonian (S2.1), as they are also coupled to the environment. Following the approach in [17,18], we model this dissipation due to the environment with Lindblad operators [53]. In particular, we treat the photon loss for the ith oscillator with the Lindblad operator where κ is the field decay rate associated with this dissipation process.

A. Parameter considerations for annealing
The performance of a continuous-variable quantum annealer can be quite sensitive to the choice of parameters and annealing schedules used in the scheme. Focusing on the continuous-variable annealing scheme with a clamped-linear pump schedule as in Ref. [16], we discuss in this subsection the prescriptions we follow for selecting the ramp time T ramp , the maximum pump rate r max , and the problem-strength parameter λ C . These prescriptions are then directly mapped onto the corresponding parameters in our dynamical-coupling scheme.
Among the three parameters, λ C is the most subtle, as its optimal choice depends on the problem size N as well as the problem structure. To first order, Ref. [16] shows that the KPO system is an effective quantum annealer under the condition that Thus, in order for the annealer to be effective, we should scale the value of λ C to cancel out the scaling in the sum over C ij , to ensure the right-hand side remains bounded as required. We present in Figure S6 the success probability, computed via simulations of the classical EOMs that are described in Section S6 B, of various problem classes as we vary λ C and N . For simplicity the loss is set to zero κ = 0 for these simulations. The result shows that dense problem classes (SK7, Dense MAX-CUT) require λ C /χ ∝ 1/N , while sparse problem classes (Cubic MAX-CUT) require a constant λ C . This numerical investigation is consistent with (S2.6) in that the optimal choice for the problem-strength parameter is λ C /χ ∼ r max ij |C ij | −1 . Based on these results, unless stated otherwise, we recommend the following forms for λ C : In this work, we also analyze the performance of the statically-coupled system with dissipation to ensure that the dynamical-coupling scheme is (at least comparably) robust to effects of linear loss. The presence of dissipation affects the choice for the parameters T ramp , r max and λ C that optimize success probability. To investigate these effects, we perform quantum simulations of the statically-coupled system and present them in Figure S7, which shows how the success probability of solving N = 4 SK7 problems varies for different settings of the parameters λ C , r max and T ramp , across the three different loss rates κ/χ ∈ 0, 10 −2 , 10 −1 , which are presented along each row respectively.
First, we see that the optimal choice of T ramp varies strongly with κ. In the absence of loss, success probability monotonically improves as T ramp is increased, since the annealing procedure becomes more adiabatic. However, with finite loss, larger T ramp causes a decrease in the success probability, as the number of photon emission events accumulate, causing decoherence. Based on the results of Figure S7, the ramp time should be set to T ramp = min (1/κ, 100/χ) to maximize success probability, corresponding to order unity number of photon emission events per oscillator in the duration of the computation [17] (and where 100/χ gives acceptable adiabaticity in the lossless case).
Next, r max determines the amplitude of the coherent states that constitute the multimode cat at the end of the computation. Thus, it should be sufficiently large to ensure that the two constituent coherent states are distinguishable (i.e., nearly orthogonal). On the other hand, when there is loss, r max should also not be too large, as the number of photon emission events increases with the cat-state amplitude, reducing success probability. The numerical simulations suggest that r max = 5 achieves reasonable success probabilities even in the presence of dissipation. Finally, we note that there is not very strong dependence of λ C with dissipation, as we find that the value given by (S2.7) results in reasonably good success probabilities even in the presence of photon loss, at least for N = 4.
Of course, in addition to these qualitative observations about the operating point of the statically-coupled system, one can perform additional fine-tuning. For the quantum simulations in this work, we have chosen specific values of the parameters using Figure S7 to maximize success probabilities; see Table S6 A. Success probability of the staticallycoupled KPO quantum annealer at solving SK7 problems with problem size N = 4, as functions of field decay rate κ/χ (rows) and the linear ramp time Trampχ (columns), as well as the maximum pump rate rmax/χ and problem-strength parameter λC/χ. The success probability is computed as a mean over ten problem instances. For quantum simulation approach, see Section S6 A.

A. Josephson parametric oscillators
A Josephson parametric oscillator (JPO) is a Kerr resonator with an additional two-photon drive. JPOs can be implemented in different ways, such as with a SNAIL parametric amplifier [30] or by parametric modulation of a transmon oscillator's resonance frequency [17,28]. For this work, we focus on the latter implementation, where the magnetic flux Φ (henceforth normalized to the flux quantum Φ 0 := h/2e) applied to the DC SQUID is modulated instead of being held static. The system is described by the standard transmon Hamiltonian [54] 4E Cq 2 − 2E J cos(Φ/2) cosθ, (S3.1) whereq,θ are canonically conjugate variables representing the charge on the capacitor and the phase difference of the two junctions, respectively, and E C and E J are the charging and Josephson energies, respectively. We posit that the flux Φ(t) has a time-dependence which can be separated into DC and AC components, as Operationally, Φ ac is used for both parametric driving (at high frequencies) and for detuning modulation (at lower frequencies) to achieve dynamical coupling while Φ dc is used to set the operating frequency of each JPO in the absence of the modulation. Assuming the modulation amplitude is small (i.e., |Φ ac (t)| Φ dc ), we can expand [17] cos In the transmon limit of E J E C [55], we can also expand (S3.1) inθ. Dropping the zeroth-order (constant) term and inserting (S3.2) after applying the approximation (S3.3), we have, to fourth order inθ, the Hamiltonian We next introduce bosonic operatorsâ andâ † (satisfying â,â † = 1) via the canonical transformations [17] θ Inserting (S3.5) into (S3.4), we arrive at the Hamiltonian Finally, after normal ordering and applying the rotating wave approximation [56] to cut off frequencies much higher than twice the fundamental (ω dc in (S3.8)), we have the dynamically-controlled JPO Hamiltonian where we have defined the nonlinear Kerr coefficient χ := E C and the instantaneous frequency ω(t) := ω dc + ω ac (t), which has DC and AC components given by These relations indicate that, for appropriate choices of E C and E J , both the DC and AC components of the instantaneous frequency can be fully controlled in a JPO given appropriate control over Φ dc and Φ ac (t).

B. Bus-mediated native couplings
To couple together JPOs that are individually described by (S3.7), we utilize an architecture involving a common bus resonator. We consider an excitation-exchange interaction between the JPOs and the bus, and the linear part of system Hamiltonian (i.e., omitting the individual JPO's parametric and Kerr terms) is [14] H coupling : where ω bus is the frequency of the bus resonator and g i is the coupling rate of the ith JPO to the bus. As in (S3.7), the instantaneous frequency ω i is in general a time-dependent parameter. We start by introducing the bus-oscillator detunings Next, we make the assumption that, for ε 1 for all i and at all times. Provided these conditions are met, we can perform a Schrieffer-Wolff expansion [55] in the rotating frame of the bus, to second order in ε. The result of that expansion produces the coupling Hamiltonian where we have dropped the bus mode operatorsb in (S3.12) since the bus will no longer be populated in this limit. Note that due to the Schrieffer-Wolff expansion, this Hamiltonian is only approximately in lab frame, as there is an O(ε) "polariton angle" between the JPO and bus modes: theâ i operators used in (S3.12) are hybridized withb to O(ε). We make the standard assumption in circuit QED that this has negligible effect [55].

C. JPO control problem
Taking the results of the two above subsections together, the (approximately) lab-frame Hamiltonian of a system of JPOs with bus-mediated couplings is Our goal is to choose the (time-dependent) parameters such that this Hamiltonian approximates the generic KPO quantum annealing Hamiltonian (S2.1). We first go into a rotating frame in which the ith JPO oscillates at a constant frequency of −δ i . This produces the Hamiltonian where the pump and coupling terms are, respectively, Comparing this Hamiltonian with (S2.1), we see that the design problem is to obtain We see thatĤ c is already written in the form of (S1.1), so in principle, we can use our approach to dynamical coupling from Section S1 to parameterize ω i (t) in terms of the Floquet design parameters F (k) i and thus satisfy (S3.15b) by solving the Floquet design problem for F (k) j . Unfortunately, for the specific case of JPOs, there is an additional complication because that same control signal ω i is already used for performing the parametric pumping of the system: this is encapsulated by (S3.15a) and (S3.14a), which directly depends on the AC part of ω i (t). We emphasize that this additional complication is due to the hardware constraints of JPOs and is not inherent to the design of dynamical couplings. (For example, in optical implementations of KPOs, a separate pump field is responsible for producing the parametric drive, opening a new degree of freedom for solving the design problem.) To help get around this JPO-specific complication, we assume that the AC part of the instantaneous frequency has two "bands" of frequency content: ω ac (t) = ω slow (t) + ω fast (t), (S3. 16) where, roughly speaking, ω slow consists of frequency content near DC (the "coupling band"), while ω fast consists of frequency content near 2ω dc (the "pumping band"). We now proceed to solve the problem by assuming that the following two replacements can be made in (S3.14), at least in the context of trying to achieve the approximate equalities (S3.15): This trick allows us to effectively decouple the pumping and coupling problems, at the cost of potential "crosstalk" between the two Hamiltonian terms once we insert our solution for ω i (t). We first use the approach of dynamical coupling to solve the design problem of (S3.15b). To match the abstract form of (S1.1), we define the native coupling matrix and phase functions These definitions identify for us, in the two-band approximation, the manifestation of the native-coupling matrix J ij (t) and oscillator phases φ i (t) in the context of the bus-coupled JPO network. Thus, following (S1.4), we want to parameterize ω dci and ω slowi (t) (on the left) in terms of the Floquet design parameters (on the right) according to where the scalars F (k) i are the solution to the Floquet design problem (S1.6). Provided that we know the values of F (k) j required, (S3. 19) is a (hardware) design problem for the control variables ω dci and ω slowi (t). Differentiating both sides of (S3. 19), We then split this equation into time-independent (DC) and time-dependent (slow) parts: The first equation can be converted into a quadratic equation for ω dci and solved, as can the second equation for ω slowi (t) after given ω dci . (See subsection below for more details.)

S19
As an aside, if δ i Λ and g 2 i /∆ i (t) Λ (which is necessary to satisfy the Schrieffer-Wolff conditions anyway), these equations (S3.21) can be approximated by which shows that this hardware manifestation supports our claim in Section S1 that the mean oscillator frequencies are approximately evenly spaced. It also shows that the magnitudes of the slow component of the instantaneous frequencies directly correlate with the magnitude of F (k) i and is intuitively given by the derivative of the instantaneous phase modulationδ φ i (t).
It is also worth noting that technically, the determination of the Floquet design parameters F (k) i requires knowledge of the matrix J ij (t), and in this system, J ij (t) also depends on the control variables ω i (t), as shown by (S3.18a). Thus, the whole problem is, in principle, highly coupled, and we cannot assume we have the values of F (k) j when trying to solve (S3.21)! A full numerical technique could iteratively solve the Floquet design problem (S1.6) by proposing values for F (k) i , solving (S3.21) for those proposed values, inserting the resulting value for J ij (t) into (S1.5b), and repeating until converged. Alternatively, it is possible to approximate J ij using an approximate solution to (S3.21) that is independent of F (k) i , and this latter approach will be discussed in Section S4. Finally, turning to the pumping design problem in (S3.15a), the two-band approach says that we want We note that the first term of this equation (proportional to ω slowi (t)) produces fast oscillations in the pumping band, whereas the right-hand side of the equation contains no high-frequency components. Thus, we ignore the first term via the rotating-wave approximation. Then all that remains is to choose ω fasti (t) to cancel out the oscillations of the exponential term, producing a slow bias of the form r(t). One way to obtain this would be to take using ω dci and ω slowi (t) from (S3.21). There is also a remaining oscillation of the phase at frequencies around 4ω dc , but those can also be neglected with a rotating wave approximation. Equations (S3.21) and (S3.24), together with the Floquet design parameters F (k) i , constitute a proposed solution to the bus-coupled JPO control problem for achieving dynamical coupling by designing the instantaneous frequency in the JPOs.

Solving for the JPO control signals
Here we summarize the procedure for obtaining the time-dependent instantaneous frequencies ω i (t), as depicted, for example, in Figure 1 of the main manuscript. The JPO control problem described by (S3.21) and (S3.24) can be explicitly solved as follows.
We assume that for a given problem size N , problem matrix C, bus frequency ω bus , maximum detuning ∆ max , and maximum bus-oscillator coupling g max , we have computed the values for r max = r max χ, δ i = δ i χ, Λ = Λχ, g i = g i χ, ∆ nomi = ∆ nomi χ, and F (k) j using the prescription described in Section S5 A. This makes use of various approximations and analytics that are described in later sections below.
1. The first step is to choose (S3. 25) and solve (S3.21a) for ω dc,i for each i using the quadratic formula.
2. Then, with the solution to ω dc,i , we can solve (S3.21b) for ω slow,i (t), again using the quadratic formula, for each i and every t. 3. Third, we can insert the above solutions for ω dc,i and ω slow,i into (S3.24), and numerically evaluate the integral to find ω fast,i (t) for each i and every t.
4. Finally, the control signal for the ith JPO ω i (t) is, using the two-band approximation (S3. 16), (S3. 26) In Figure 1, we use N = 5, ω bus /2π = 10 GHz, ∆ max /2π = 5 GHz, and g max /2π = 20 MHz. We choose an SK7 problem instance C that results in prominent sidebands (for the purposes of illustration), which happens to be To visualize the signals in Figure 1B (right), we compute the power spectral density of which can be interpreted as the in-phase component of an oscillator with instantaneous frequency ω i (t). Finally, we note that for the value of η prescribed by (S1.24), the modulation depth is strong enough to produce asymmetry in the sidebands, which often occurs for phase-modulated signals; for illustrative purposes, however, we reduced this asymmetry by arbitrarily using η = 0.2/N instead; all other variables follow the usual procedure.

S4. NATIVE COUPLING MATRIX
Here, we discuss the structure of the native coupling matrix J ij . As alluded to in the previous section, the exact form of J ij (t) is unknown until we have chosen the instantaneous frequencies ω i (t); in our dynamical-coupling approach, these are parameterized by the Floquet design parameters F (k) j , which in turn require knowledge of J ij to obtain. One way we can get around this set of coupled dependencies is to constrain ourselves to an approximate form for J ij (t) that is independent of F (k) i . We first address the time dependence of J ij (t). Using (S3.22b), ω slowi (t) can be heuristically bounded by where f i (k) is the first-order small-modulation solution (S1.10) to the design problem with J ij assumed uniform and time-independent, as was done in Section S1 B for example. This bound is a good heuristic provided that the native couplings are designed to be close to uniform, a condition that we will discuss shortly. Since f 1 (k) depends only on the elements C ij (as indicated by (S1.10)), δω bound1 can be readily computed. In fact, if we assume (S1.18) as an approximate form for f (k) 1 on average over random problem instances, then In any case, regardless of how we compute δω bound1 , we see that if we impose that δω bound1 ω bus − ω dc1 , then we can neglect the effects of (S3.22b), upon which (S3.22a) implies that the oscillator detunings are nominally (i.e., approximately) equally spaced and given by where ∆ nom1 := ω bus − ω dc1 = ω bus − ω 0 is the nominal detuning of the first oscillator. Under this approximation, the native coupling matrix can be approximated using the nominal detunings via which is a time-independent expression. Thus, so long as J ij is approximately uniform, we can ensure timeindependence by choosing δω bound1 /∆ nom1 1. We now turn to the question of how to ensure the matrix J ij is uniform, in order to justify (S4.1). The uniformity is primarily dictated by the distribution of the bus-resonator couplings g i . Looking at the form of (S4.5), one reasonable choice of the resonator-bus couplings is where λ J is a scalar chosen to represent the approximate value of the elements in the first diagonal of J ij . (If we succeed in obtaining a small non-uniformity, λ J is also the approximate value of every nondiagonal entry of J.) We show in Figure S8A the approximation to the native coupling matrix (S4.5) using the choice of resonator-bus coupling (S4.6). We observe that the overall matrix is close in magnitude to λ J , but there is some degree of non-uniformity. A good measure for the non-uniformity under this choice of couplings is to look at the ratio between the most off-diagonal element J 1N and λ J , which can be expressed in terms of the nominal detunings ∆ nomi by is the filling ratio, a dimensionless quantity that characterizes how much of the bandwidth between ω bus and ∆ nomN is utilized by the oscillators. In Figure S8B, we plot (S4.7) and see that the non-uniformity scales with d. More concretely, if we allow for a maximum non-uniformity of, say, 10 %, the filling ratio should be at most 0.6. Thus, so long as J ij is approximately time-independent, we can ensure uniformity by choosing d sufficiently small.
Having considered the above, we want to work in a regime in which J ij is both approximately time-independent and uniform. Together with the Schrieffer-Wolff requirements, these considerations can be summarized by a set of constraints involving d. We introduce parameters 1. sw := g 1 /∆ nom1 , which specifies the tolerance for the Schrieffer-Wolff condition g 1 /(ω bus − ω i ) 1 in the uniform, time-independent approximation; 2. ti := δω bound1 /∆ nom1 , which specifies the tolerance for time-independence in the uniform approximation; and 3. uni := (2−d)/(2 √ 1 − d)−1, which specifies the tolerance for uniformity in the time-independent approximation.
If we recast all these parameters in terms of d, we see that in order to satisfy all three constraints, d should satisfy In the above, we only technically need d to be bounded by the right-hand side (i.e., a smaller d still satisfies all three constraints). However, that would result in a smaller nonlinear rate χ (as will be discussed in Section S5 C 3), which in turn implies more stringent requirements on the field decay rate. Thus, we choose d to saturate its bound above. Having chosen d, the nominal detunings are then given by

S5. SCALING CONSIDERATIONS FOR DYNAMICAL COUPLING
Having discussed now our principles for achieving dynamical coupling as well as the key features of the hardware model we would like to demonstrate our principles on, we address in this section how the various parameters, approximations, and constraints come together to specify a concrete system and dynamical-control scheme in a bus-coupled JPO network. First, we summarize our approach in a prescription for choosing dimensionless parameters given any problem instance. Then we analyze how the nonlinear timescale χ should be chosen, which sets the dimensionful realization of a JPO network for consideration. Finally, we analyze how our approach scales with the problem size, particularly in terms of requirements on the field decay rate.

A. Summary and prescription of system parameters
Up to the nonlinear timescale χ, we are able to supply values for the dimensionless system parameters in order to satisfy all the conditions needed to apply our dynamical-coupling scheme to a bus-coupled system of JPOs. Given the desired target problem couplings C ij with problem size N , we use the following prescription throughout this work: 1. Compute the problem-strength parameter λ C := λ C /χ via (S2.7) in Section S2 A: This choice ensures the coupling term in the target Hamiltonian scales appropriately with problem size.
2. Set the maximum pump rate r max := r max /χ to its optimal value r max = 5, as discussed in Section S2 A. This choice is empirically found to work well for the target quantum annealer.
3. Compute the oscillator frame-detunings δ i := δ i /χ via (S2.4) in Section S2 A: This choice ensures that the target Hamiltonian operates correctly as a continous-variable quantum annealer.
4. Compute the dynamical-coupling parameter η via (S1.24) in Section S1 E: This choice ensures that the Floquet design equation can be numerically solved with acceptable error E C 3 %.

5.
Compute the dimensionless native-coupling strength parameter via λ J = λ C /η. This parameter controls the nominal strength of the native couplings J ij and follows from the definition of η.
6. Compute the dimensionless Floquet frequency Λ := Λ/χ via (S1.25) in Section S1 F (specialized here for the JPO system): where A 100; this value of A is empirically found to ensure that the Floquet approximation holds.
7. Compute the dimensionless approximate bound on the slow component of the instantaneous frequency δω bound1 := δω bound1 /χ via (S4.3) in Section S4: (S5.5) This quantity estimates the time dependence of the native coupling using a first-order, uniform approximation for the modulations.

S24
8. Compute the filling ratio d via (S4.9) in Section S4: (S5. 6) where sw , ti , and uni must be small. (In this work, we use sw = ti = uni = 1/10.) This parameter controls the relative magnitudes of the bus-oscillator detunings in order to ensure that the resulting native couplings can be approximated as both uniform and time-independent (in addition to obeying the Schrieffer-Wolff condition).
9. Compute the dimensionless nominal detunings ∆ nomi := ∆ nomi /χ via (S4.10) in Section S4: These nominal detunings approximate the bus-oscillator detunings as evenly spaced after ignoring small shifts due to coupling non-uniformity and time-dependence. 10. Compute the dimensionless bus-resonator coupling strengths g i = g i /χ via (S4.6) in Section S4: This form for the coupling produces a set of native couplings which approximately have the uniform strength λ J , provided that the ∆ nomi and d have been chosen accordingly.
11. Approximate the dimensionless native couplings J ij (t) := J ij (t)/χ by (S4.5) in Section S4: Although the true native couplings J ij (t) are functions of ∆ i (t) and are thus functions of time, we approximate them by this time-independent expression, which is valid provided that ∆ nomi are chosen according to the steps above; in this latter case, this approximation also has the property that J ij ≈ λ J .

Solve for the Floquet design parameters
with the procedure(s) outlined in Section S1. To do so, make the formal replacement λ C /J ij → λ C / J ij in the Floquet integral (S1.5b), where (S5.9) is used to approximate J ij . This latter step is a good approximation provided that d has been chosen accordingly, and it crucially allows us to sidestep the problem of F (k) j being dependent on ∆ i (t), which is in turn dependent on F (k) j and so on. Note that for scalability and sufficiently small error E C 3 % as discussed in Section S1 E, the full-order objective function (S1.13) should be optimized when N < 100, while for N > 100, second-order approach (S1.14) should be used instead.
By following this above prescription for any given problem instance, we arrive at a concrete set of dimensionless parameters (namely, δ i , r max , g i , J ij , and F (k) j , the others being secondary to these) which allow us to construct a busmediated, dynamically-coupled JPO quantum annealing Hamiltonian (2), at least up to an overall timescale χ. This prescription is thus sufficient for performing simulations (e.g., as discussed in Section S6 A) to study the performance of the scheme. Furthermore, after selecting the timescale χ as described below, we also arrive at a concrete hardware realization, complete with control signals (as parameterized by F (k) j )-additionally see Section S3 C for more details on this latter point.
It is worth emphasizing again that this prescription is characterized by our particular approach to obtaining the Floquet design parameters F (k) j : we solve the Floquet design problem using a nominal, time-independent approximation for the native couplings J ij rather than their "true" values J ij (t). To enable this, the other system parameters (e.g., g i and so on) have been chosen in very particular ways (e.g., through d and its constrained values) such that this approximation holds well. As such, for the purposes of running simulations, we take the approximation (S5.9) to hold exactly, and while the design parameters F (k) j resulting from this procedure could, in principle, be further refined by iterated time-dependent optimization, we forgo doing so for both our simulations and our analysis of hardware realizations/scaling.

S25
Finally, we note that the above prescription is technically constructed for a lossless system, in that the field decay rate κ is not considered to affect any of the dimensionless parameters mentioned. This goes hand-in-hand with the fact that the parameter T ramp -which governs how adiabatically the pump is ramped up-also has not been discussed, as it should ideally be set to an arbitrarily high value to ensure adiabaticity. However, when the timescale κ := κ/χ is also brought into consideration, we prescribe that one should pick T ramp := T ramp /χ = 1/ κ (S5. 10) to obtain order unity number of photon emission events per oscillator, in accordance with the observations made in Section S2 A. In addition, when photon loss is encountered, one can also empirically tweak the above-prescribed values r max and λ J (among others) to further enhance success probability. Although we do not include such hyperoptimization here as part of any general procedure, we note that we do incorporate such an approach in some of our quantum simulations (see Section S6 A for more details).

B. Nonlinear (Kerr) rate
Up to this point, all our discussions have been implicitly in units of the nonlinear rate χ. Thus, selecting a value for χ produces an instantiation of all the other parameters in a dimensionful hardware system. Provided a fixed field decay rate, it is clear that we want to maximize the value of χ in order to obtain fast system rates relative to the field decay rate. However, we cannot pick the rate arbitrarily high, as it is limited by two experimental constraints: 1. g i ≤ g max , the maximal bus-resonator coupling achievable, and 2. ∆ N ≤ ∆ max , the maximal detuning possible due to bandwidth constraints.
To treat the first constraint, we use our form for the bus-oscillator couplings (S4.6) to write it as Since ∆ nomi ≤ ∆ nomN by the construction in (S4.10), the oscillator with the largest detuning (i.e., i = N ) has the largest bus-resonator coupling, and it is thus the most relevant oscillator for this constraint. We substitute the expression for ∆ nomN from (S5.7) into the constraint above and express the nonlinear rate in terms of dimensionless variables, namely χ = Λ/ Λ = λ J / λ J . Doing this, we arrive at the constraint To treat the second constraint, we approximate the detunings ∆ i by the nominal (i.e., evenly spaced) detunings ∆ nomi . Then using ∆ N ≈ ∆ nomN in the second constraint and using again (S5.7), we arrive at Thus, to satisfy both constraints (S5.12) and (S5.13) simultaneously, the nonlinear rate should be given by where χ max is the maximum allowable nonlinear rate (absolute-scale) before the physical constraints on the busresonator couplings and the maximal detuning are violated. As previously mentioned, given a fixed field decay rate, it is preferable to set the nonlinear rate to the maximum, i.e., χ = χ max . However, this is not technically necessary; for example, if, in a particular hardware implementation, κ is linked to χ in such a way that decreasing χ yields an improvement in χ/κ, such an approach would be compatible with all the constraints we have considered. As we will see in the following subsection, χ max generally decreases with the problem size N at sufficiently large N . This can be contrasted with the typical scenario for gate-based transmon qubit architectures, where it is desirable to use the maximum nonlinear rate allowed by the hardware (while maintaining acceptable field decay rates), more or less independently of N . Instead, in our dynamical-coupling scheme at sufficiently large N , the nonlinear rate χ is forced to be generally much smaller than the hardware-limited maximum, essentially because of the need to fit N oscillators, all distinct in frequency, into a limited bandwidth while still obeying the Floquet approximation, etc. Thus, at least at large N , the key experimental challenge is to obtain field decay rates low enough to support such a small nonlinear rate (i.e., at most χ max ). In this subsection, we explore quantitatively the implications of the above results for the scaling of the dynamical coupling scheme with problem size (at least for the JPO networks we consider in this work). To do so, we analyze the scaling of the maximum allowable nonlinear rate χ max of the system since this parameter ultimately sets the requirements on the field decay rate. In consideration of (S5.14), we first discuss the scaling of the three dimensionless parameters on which χ max depends: λ J , Λ, and d.
(S5. 15) In particular, because λ J shows up in the denominator of (S5.14), this result implies that SK7 problems have the smallest native coupling strength requirement, while Cubic MAX-CUT problems have the largest. Next, due to the Floquet timescale requirements, Λ = A max(1, δ i , r max , λ J ) by (S5.4). In the case where λ J dominates, λ J and Λ are linearly related, in which case λ J and Λ will have the same functional form. For our prescription, this is the most relevant situation, since r max is constant with N in our prescription, so unless Λ is a constant, it will be limited by λ J at sufficiently large N .
In Figure S9, we show the values of λ J following (S5.15) and Λ following (S5.4) as a function of problem size. From the figure, it turns out λ J ≥ 5 in all cases, and since r max = 5 in our prescription (see also Section S2), λ J limits the value of Λ in all cases. (Note also that δ i does not limit Λ provided we follow (S5.1) and (S5.2).)

Scaling of filling ratio with problem size
In (S5.14), the value of χ increases for larger filling ratio d, which implies that it is important for d to be large to maximize the performance of the dynamical-coupling scheme. In Section S4, we showed that the filling ratio is determined by three different constraints, which is summarized by (S5.6) in our prescription. We can rewrite this as In this form, we see that the uniformity constraint (dictated by d uni ) does not depend on the problem size N . As discussed in Section S4, this uniformity constraint fundamentally limits the bandwidth utilization to about 60% (i.e., d ≤ d uni = 0.6) for a non-uniformity in the native coupling matrix of 10% (i.e., uni = 0.1). On the other hand, both the Schrieffer-Wolff and time-independence conditions vary with N and are both described by a similar functional form, where d increases for larger D sw or D ti . Thus, for sufficiently large D sw and D ti , we have the simple picture that the filling ratio d is limited by the uniformity constraint to give d = d uni . To understand when this is the case, we study D sw and D ti below. The simpler of the two is the Schrieffer-Wolff condition. Assuming, as in the previous subsection, that Λ = A λ J , we can first rewrite Thus, D sw is linearly proportional to the problem size and does not depend on the problem class. For sufficiently large N , this condition ceases to limit d.
Turning to the time-independence constraint, we can substitute the form for δω bound from (S5.5) in our prescription into D ti , upon which we see that D ti depends on N and η. Then using the form for η from (S5.3) in our prescription, we arrive at for SK7 (5/3) ti · (1 + log N ) · N/(N − 1) for Dense MAX-CUT (25/9) ti · (1 + log N ) · N/(N − 1) for Cubic MAX-CUT .
(S5. 18) We can see that D ti increases with the problem size in all cases, so the time-independence constraint also ceases to limit d at sufficiently large N . Moreover, for the SK7 problems, D ti increases as N 1/2 while for the Dense and Cubic MAX-CUT problems, D ti increases more slowly. This implies that for the SK7 problems, d varies more drastically with respect to N than for Dense and Cubic MAX-CUT problems.
In Figure S10, we show the filling ratio d as a function of problem size following the above analysis. For the parameters we consider, it turns out that d is never limited by the Schrieffer-Wolff condition. Rather, it is dominated by the time-independence condition at small N before being clamped by the uniformity condition at approximately 60%, as argued above. Finally, these results also verify that d depends more weakly on N for the MAX-CUT problems.

Scaling of nonlinear rate with problem size
Putting the above results together via (S5.14) produces the maximum allowable nonlinear rate χ max for each desired problem size N . Since the effects of λ J and Λ are to decrease χ max with N , while the effects of d eventually saturates with N , we ultimately expect that χ max decreases with N .
This trend can be quantitatively seen in Figures S11, S12 and S13. In addition to the general trend of decreasing χ max with N , we also see that the particular scaling behavior of χ max with N changes dramatically when the system switches from being limited by the maximum coupling strength g max (where it is more gentle) to being limited by the maximum available bandwidth ∆ max (where it is sharper).
While Figures S11, S12 and S13 use the exact formula for χ max given by (S5.14), we can also qualitatively understand much of the scaling behavior using the arguments already presented. First, based on what we found for the scaling of d, we can make the simplifying assumption that d is constant (this holds at large N , and even when d increases at small N , it is a slow function). Then we can insert the expressions for λ J from (S5.15) along with Λ ∝ λ J as argued above. In doing so, we find that when χ max is constrained by g max , we have (with N − 1 ≈ N ) On the other hand, when the system is constrained by ∆ max , we can apply the same simplifying assumptions to get Thus, at large N , the bandwidth constraint due to ∆ max is the one that limits the scaling of χ max , while the milder coupling constraint due to g max is potentially relevant for intermediate values of N . The latter result is especially interesting for near-term demonstrations of dynamical coupling as (S5.19) implies that the scaling requirements for χ max (and thus ultimately of the field decay rate) are less stringent than the 1/N scaling that one might naively expect, and which indeed dominates at large N . Finally, we note that both (S5. 19) and (S5.20) show that the scaling requirements are more favorable for the SK7 problem class. This observation is supported quantitatively by Figures S11, S12 and S13. This advantage stems from the much smaller Floquet timescale requirement for Λ for SK7 problems, as discussed in Section S1 F.  In this section, we specify and review the mathematical framework we use to model the quantum states and dynamics of the KPO system, under both static coupling and dynamical coupling. We describe the quantum simulations we perform and the procedure we use to interpret the resulting state trajectories as success probabilities. We also describe the classical EOMs we use to obtain intuition about how the observations made in the quantum simulations scale for larger problem sizes.

A. Quantum Simulations
We perform quantum simulations in this work to analyze the performance of the dynamical-coupling scheme relative to the statically-coupled system, especially in the presence of dissipation. Our dissipation model is described by the standard Lindblad master equatioṅ whereĤ is the Hamiltonian of the system, andL m are the Lindblad operators that describe the effect of dissipation due to coupling to the environment; the time dependence of all operators have been omitted for brevity. For the statically-coupled scheme, the Hamiltonian is given by (S2.1) while the Lindblad operators are given by (S2.5). For the dynamical-coupling scheme, the Hamiltonian is given by (3) in the main manuscript, while the Lindblad operators, due to being in the rotating frame, are given byL However, by direct substitution of this Lindblad operator into (S6.1), we see that the overall phase is always canceled with its complex conjugate; this implies that the overall phase in the Lindblad operator does not have any observable consequences for the internal state of the open quantum system. Thus, without loss of generality, we take the Lindblad operator of the dynamical-coupling scheme to beL m = √ 2κâ m , as for the statically-coupled system. Directly simulating (S6.1) is difficult due to the large Hilbert-space dimension of the quantum state. To circumvent the memory requirement of storing a full density matrix, we utilize the Monte-Carlo wavefunction method (i.e., quantum jump method) to approximately solve (S6.1) via stochastic sampling [53]. Given a finite number of trajectories N traj , this method approximates the density matrixρ(t) using the ensemble mean of the trajectories |ψ ( ) (t) viâ It should be noted that for the idealized lossless case of κ = 0, the number of trajectories should be taken to be N traj = 1 as there is no stochasticity due to the lack of dissipation to the environment; this is equivalent to solving the Schrödinger equation. Given a spin configuration σ = (σ 1 , . . . , σ N ) where σ i = ±1, we can define, for the th trajectory, the probability of the quantum state |ψ ( ) (t) being in the specific spin configuration σ as [16] is the wavefunction in the position basis. (Note: We use the conventionx i := (â i +â † i )/2.) For our numerical approach to computing (S6.3), see Section S6 A 1 below. This definition of the success probability is motivated by the fact that the final state of the KPO system is approximately a superposition of coherent states |±α i for each KPO, and we interpret these coherent states to encode the Ising spins σ i = ±1. In practice, we obtain the result of the computation by an in-phase homodyne measurement ofx, and the sign of the result on the ith KPO is interpreted as σ i . Under this definition of the spin configuration probability, the vacuum state at the beginning of the computation has equal probability on all possible spin configurations, and as the computation progresses, the displacements of the KPOs evolve-in a correlated way-to remove the ambiguity, pushing the probability distribution into ideally the ground-state quadrants of phase space.
Using (S6.3), we define the success probability of the th trajectory as where S gs is set of spin configurations that are ground states of the specified Ising problem. These trajectory-wise definitions for the configuration and success probabilities are distinct from the ensembleaveraged configuration and success probabilities which would be computed from (S6.1). We estimate these ensembleaveraged quantities, P σ and P success , respectively, by taking the sample mean of the trajectory-wise quantities: Of course, since N traj is finite, there is statistical sampling uncertainty on this estimation, which we henceforth take to be the standard error of the mean. We show the standard error of the mean as a colored fill around the mean on Figures S16-S19,S22-S25, and S28-S31.
Our quantum simulation procedure is as follows. Given a problem matrix C ij , we chose the parameters r max , T ramp , λ C to maximize the success probabilities of the static annealer, summarized in Table S6 A for the simulations in this work. Then we use Steps 3-12 of the prescription outlined in Section S5 A to calculate the system parameters δ i , J, and Λ and F (k) j . Since we take the unit of time to be 1/χ, the simulations are independent of dimensionful constraints associated with particular physical realizations (i.e., due to g max and ∆ max ). Having obtained these dimensionless system parameters, we simulate both the statically-and dynamically-coupled systems using the Monte-Carlo wavefunction method with the QuTiP library (version 4.3.1) in Python [57]. To reliably compare the stochastic results from the statically-and dynamically-coupled systems, we use the same noise process (i.e., random number generator) to ensure that both systems physically experience the same instantiation of quantum noise.
As is the standard approach, we run our simulations with various truncation settings for the Fock basis of each mode, and we choose a setting which does not result in significant changes in the results upon increase. (Specifically, for all the quantum simulations in this work, the truncation was set to 12 basis states per mode, corresponding to 12 4 -dimensional states for N = 4 simulations.) As is also standard, we decrease the time step taken in our solver until no significant changes in the result occur upon further decreasing the step size.
This approach was taken to generate Figure 2 in the main manuscript. For the simulations with no cavity photon loss (κ = 0), N traj = 1; while for the low-loss (κ = 0.01χ) and high-loss (κ = 0.1χ) cases, N traj = 10. For the 2-mode wavefunctions shown in Figure 2A, we use the approach described below in Section S6 A 1. The success probability used in the histogram is defined as the success probability at the end of the computation P success := P success (T ). Note that even though the histogram in Figure 2D does not show the uncertainty of the success probabilities, these uncertainties can be obtained by examining the trajectory plots of each problem instances in Figures S16-S19.

Continuous-variable representations of state
One simple way to visualize the correlations among the oscillators is to look at the quantum state in a continuousvariable basis. For example, in position space, the population (i.e., diagonal) elements of the density operatorρ can be represented as a function over x ∈ R N as p x (x) := x|ρ |x . (For pure statesρ = |ψ ψ|, p x (x) = | x|ψ | 2 .) Of course, to obtain non-classical information about the state, we also need to look at a conjugate variable; for example, we can also consider the momentum-basis population p p (p), computed in the same way but with |x → |p .
In practice, sinceρ is represented (e.g., in QuTiP) in the Fock basis, a basis transformation is needed to compute such matrix elements. The state |x can be obtained in the Fock basis via a standard basis transformation: which can be readily computed since is the position-basis wavefunction of the nth eigenstate of the quantum harmonic oscillator (i.e., the nth Fock state).
Similarly, if we are interested in p|n , we can perform the same calculation using the Fock states in the momentum basis instead, using φ n (p). Explicitly, one can in practice (e.g., in QuTiP) compute p x (x 1 , . . . , x N ) by evaluating the "matrix element" of the "density matrix" P n n := n 1 , . . . , n N |ρ |n 1 , . . . , n N against the "state vector" c n (x) := n 1 , . . . , n N |x 1 , . . . , x N . Furthermore, we note that, in practice, the summations (e.g., in (S6.7)) are limited by truncation in the Fock basis, with upper limits M n such that the support of n 1 , . . . , n N |ρ |n 1 , . . . , n N is negligible for n i > M i .

Evaluation of success probabilities
A similar consideration is needed to evaluate the success probabilities defined in (S6.3) when the quantum state is represented in the Fock basis. To do so, we note that (focusing on the case of N = 1 so that x ∈ R) Explicitly, one can compute this in practice (e.g. in QuTiP) as an "expectation value" of the "operator matrix" P xmax xmin n n := n |Π xmax xmin |n on the "state vector" c n := n|ψ . For N > 1, the system composition rule of quantum mechanics implies that (S6.12) where ⊗ indicates the Kronecker product of matrices. Again, as indicated by the braced quantities above the expression, this quantity can be evaluated in practice (e.g., in QuTiP) as an expectation value of an operator matrix on a state vector (both over an N -mode Fock space).
In practice, there are two more issues we need to address: discretization and truncation in position basis. (Truncation in the Fock basis has already been discussed above and simply consists of restricting the upper limit of all the summations.) In order to evaluate the integral in (S6.10), we can discretize the interval x min < x < x max into L sections, and we approximate the matrix element by Note that if our Fock basis is truncated to M basis elements, (S6.13a) is simply a multiplication of an M × L matrix with an L × M matrix (the latter representing the change of basis from truncated-Fock to discretized-position, and the former vice-versa). Finally, the position basis is also truncated in practice, so to represent the improper integral R + |ψ(x)| 2 dx, we need to pick x min = 0 and x max sufficiently large as to cover the non-negligible support of ψ(x > 0), while to represent R − |ψ(x)| 2 dx, we pick x min sufficiently negative as to cover the non-negligible support of ψ(x < 0) and x max = 0.

B. Classical Simulations
We also perform classical simulations to analyze how the performance of the dynamical-coupling scheme scales with respect to problem size N . The idea here is to derive equations of motion (EOMs) with system dimension linear in N , in order to make numerical analysis tractable. We start with the quantum Heisenberg-Langevin EOMs describing the quantum system and then perform the formal substitutionâ i → α i , where α i ∈ C; the noise terms are then neglected. Although these classicalized EOMs are not a physically correct model for the KPO network in the quantum regime, such an approach has been used in, e.g., Ref. [17] to understand the operational mechanism of such quantum devices, and models of this kind have been shown to be effective heuristic Ising solvers in and of themselves [58]. We utilize the same approach to help inform the scaling of the Floquet requirements at larger problem sizes (of the quantum model).
For the statically-coupled system, this classicalization procedure produces the EOMs Using the same procedure, the dynamically-coupled system has the following EOMs: For all the classical simulations, we use κ = 0. From empirical investigations, the value of κ does not significantly affect the qualitative distribution of the success probabilities in this classicalized model. Since α i = 0 is an unstable equilibrium, we follow the approach of Reference [16] and sample random initial-field amplitudes α i (t = 0), according to where z i are iid random variables with the standard normal distribution while u i are iid random variables with the uniform distribution on the interval [0, 1]. For all classical simulations, we choose n 0 = 10 −2 .
As with the quantum simulations, we simulate the classical EOMs on a finite number of trajectories N traj . The th trajectory α ( ) (t) is an N -dimensional vector at each time point t, and each corresponds with a random sample from (S6.17) for the initial condition of the trajectory. Note that in contrast to the quantum case, we always need to take N traj > 1 in all cases, even though κ = 0.
In this approach, we identify the spin configuration of a trajectory as a function of time using the relation σ ( ) The success of a given trajectory at time t is then given by the indicator variable where S gs is the set of ground-state spin configurations. Then, as in the quantum case, we define the success probability of the ensemble of trajectories to be the sample mean Again, there is a statistical sampling uncertainty associated with the sample mean, which we take to be the usual standard error on the mean. We show the standard error of the mean as a colored fill around the mean on Figure 3A.
Our classical simulation procedure is as follows. Given a problem matrix C, we use step 1-12 in the prescription outlined in S5 A to calculate the system parameters r max , δ i , λ C , J, T and Λ and F (k) i . Since we take the unit of time to be 1/χ, the simulations are independent of dimensionful constraints associated with particular physical realizations (i.e., due to g max and ∆ max ). Finally, for simplicity, we useκ = 0 for these simulations.
For each trajectory index, both the statically-and dynamically-coupled systems are simulated with the same (random) initial field amplitude. We then simulate both the statically-and dynamically-coupled systems using the ODE solver library DifferentialEquations.jl (version 4.5.0) in Julia [59]. As is the standard approach, we decrease the time step taken in our solver until no significant changes in the results occur upon further decrease of the step size. Note that since the timescale of the dynamics (i.e., 1/Λ) become smaller with increasing N , the time step required for convergence decreases as ∼ 1/N .
In Figure 3A, we simulate the classical EOMs (S6.15) and (S6.16) with N traj = 5000. This large number of trajectory was chosen to ensure that the error on the sample mean is negligible small. In Figure 3B, we show one of the simulations in the ensemble used to generate Figure 3A (i.e., showing a specific value of k). In order to show more fine-grained information, we plotted the in-phase quadrature Re α  i (t) (which is just the sign of the former). To generate Figure 3C, the classical EOMs are simulated for problem sizes N up to N = 50. Here, we use for each problem instance N traj = 100 trajectories; this was done as the larger problem sizes required intensive computation time (on the order of one day of computation per trajectory for N = 50). As in the quantum simulations, the success probability used in the histogram is defined as the success probability at the end of the computation P success = P success (T ).

S7. ISING PROBLEM CLASSES AND PROBLEM INSTANCE GENERATION
In this work, we consider instances drawn from three specific classes of Ising problems, characterized by the statistical distribution of the C ij couplings specifying the Ising problem. Note that we choose the convention that the C ij couplings are normalized such that max ij |C ij | = 1.
• Dense MAX-CUT graphs: Every upper-triangular element of C is independently chosen with 50 % probability to be either 0 or 1.
• Cubic MAX-CUT graphs: We sample uniformly from the set of 3-regular graphs, where every vertex has degree 3. These graphs were generated with the LightGraphs.jl package [60] in Julia).

S8. ADDITIONAL DATA FIGURES
In this section, we provide additional figures which depict in greater detail the data generated from the quantum and classical simulations performed in this work. The more comprehensive statistics provided by these figures support the conclusion made in the main manuscript that the dynamical-coupling scheme results in time-evolution of success probabilities in a way which closely matches that of the corresponding statically-coupled system. They also provide a check that the figures presented in the main manuscript are statistically representative of the entire data set generated from our simulations.
In Figures S14 through S19, we compare the success probability P success (t) of the statically-and dynamicallycoupled systems for each of the 100 problem instances we used in our quantum simulations. These figures investigate, respectively, the zero-loss, low-loss, and high-loss cases, indicating that, in most cases, the dynamical-coupling scheme produces the desired evolution, at least in distribution. Corresponding to each of the plots, Figures S20 through S25 show, for the statically-coupled system, the configuration probabilities P σ (t) for each of the spin configurations σ, while Figures S26 through S31 show the corresponding configuration probabilities for the dynamically-coupled system. Again, we see that the trajectories match well, at least in distribution.
Finally, in Figure S32, we show the same correlation/histogram plots used in the bottom of Figure 3C of the main manuscript, for all the problem sizes that we have numerically simulated. Note that the projected histograms correspond to the histogram shown in Figure 3B of the main manuscript. The data shows good correspondence in the success probability between the dynamically-and statically-coupled systems for all problem sizes considered. Configuration probabilities of statically-coupled system, lossless case (part 1/2). Evolution of the probabilities Pσ(t) of obtaining each of the possible spin configurations in the statically-coupled system as a function of the evolution time, for the problem instances depicted inset in the corresponding sub-axis (see upper-left corner) of Figure S14. The configuration probability is defined according to S6.3 in Section S6 A. Note that due to the symmetries of the system, the configuration probability is exactly the same upon flipping every spin, so we take the convention that each curve represents the sum of the probabilities for the two degenerate configurations. The system parameters are chosen according to the prescription described in Section S5 A for each Ising problem instance. See Section S6 A for additional details of our quantum simulation methodology. Configuration probabilities of statically-coupled system, lossless case (part 2/2). Evolution of the probabilities Pσ(t) of obtaining each of the possible spin configurations in the statically-coupled system as a function of the evolution time, for the problem instances depicted inset in the corresponding sub-axis (see upper-left corner) of Figure S15. The configuration probability is defined according to S6.3 in Section S6 A. Note that due to the symmetries of the system, the configuration probability is exactly the same upon flipping every spin, so we take the convention that each curve represents the sum of the probabilities for the two degenerate configurations. The system parameters are chosen according to the prescription described in Section S5 A for each Ising problem instance. See Section S6 A for additional details of our quantum simulation methodology.  Figure S14. The configuration probability is defined according to S6.3 in Section S6 A. Note that due to the symmetries of the system, the configuration probability is exactly the same upon flipping every spin, so we take the convention that each curve represents the sum of the probabilities for the two degenerate configurations. The system parameters are chosen according to the prescription described in Section S5 A for each Ising problem instance. See Section S6 A for additional details of our quantum simulation methodology. Due to the stochastic nature of the simulation, Ntraj = 10 trajectories are run for each problem instance; the solid line indicates the mean over the trajectories, while the fill indicates the standard error of the mean, as described in Section S6 A.  Figure S15. The configuration probability is defined according to S6.3 in Section S6 A. Note that due to the symmetries of the system, the configuration probability is exactly the same upon flipping every spin, so we take the convention that each curve represents the sum of the probabilities for the two degenerate configurations. The system parameters are chosen according to the prescription described in Section S5 A for each Ising problem instance. See Section S6 A for additional details of our quantum simulation methodology. Due to the stochastic nature of the simulation, Ntraj = 10 trajectories are run for each problem instance; the solid line indicates the mean over the trajectories, while the fill indicates the standard error of the mean, as described in Section S6 A. . Configuration probabilities of statically-coupled system, high-loss case (part 1/2). Evolution of the probabilities (Pσ(t)) of obtaining each of the possible spin configurations in the statically-coupled system as a function of the evolution time, for the problem instances depicted inset in the corresponding sub-axis (see upper-left corner) of Figure S14. The configuration probability is defined according to (S6.3) in Section S6 A. Note that due to the symmetries of the system, the configuration probability is exactly the same upon flipping every spin, so we take the convention that each curve represents the sum of the probabilities for the two degenerate configurations. The system parameters are chosen according to the prescription described in Section S5 A for each Ising problem instance. See Section S6 A for additional details of our quantum simulation methodology. Due to the stochastic nature of the simulation, Ntraj = 10 trajectories are run for each problem instance; the solid line indicates the mean over the trajectories, while the fill indicates the standard error of the mean, as described in Section S6 A. . Configuration probabilities of statically-coupled system, high-loss case (part 2/2). Evolution of the probabilities (Pσ(t)) of obtaining each of the possible spin configurations in the statically-coupled system as a function of the evolution time, for the problem instances depicted inset in the corresponding sub-axis (see upper-left corner) of Figure S15. The configuration probability is defined according to (S6.3) in Section S6 A. Note that due to the symmetries of the system, the configuration probability is exactly the same upon flipping every spin, so we take the convention that each curve represents the sum of the probabilities for the two degenerate configurations. The system parameters are chosen according to the prescription described in Section S5 A for each Ising problem instance. See Section S6 A for additional details of our quantum simulation methodology. Due to the stochastic nature of the simulation, Ntraj = 10 trajectories are run for each problem instance; the solid line indicates the mean over the trajectories, while the fill indicates the standard error of the mean, as described in Section S6 A.  Figure S14. The configuration probability is defined according to (S6.3) in Section S6 A. Note that due to the symmetries of the system, the configuration probability is exactly the same upon flipping every spin, so we take the convention that each curve represents the sum of the probabilities for the two degenerate configurations. The system parameters are chosen according to the prescription described in Section S5 A for each Ising problem instance. See Section S6 A for additional details of our quantum simulation methodology.  Figure S15. The configuration probability is defined according to (S6.3) in Section S6 A. Note that due to the symmetries of the system, the configuration probability is exactly the same upon flipping every spin, so we take the convention that each curve represents the sum of the probabilities for the two degenerate configurations. The system parameters are chosen according to the prescription described in Section S5 A for each Ising problem instance. See Section S6 A for additional details of our quantum simulation methodology.  Figure S14. The configuration probability is defined according to (S6.3) in Section S6 A. Note that due to the symmetries of the system, the configuration probability is exactly the same upon flipping every spin, so we take the convention that each curve represents the sum of the probabilities for the two degenerate configurations. The system parameters are chosen according to the prescription described in Section S5 A for each Ising problem instance. See Section S6 A for additional details of our quantum simulation methodology. Due to the stochastic nature of the simulation, Ntraj = 10 trajectories are run for each problem instance; the solid line indicates the mean over the trajectories, while the fill indicates the standard error of the mean, as described in Section S6 A.  Figure S15. The configuration probability is defined according to (S6.3) in Section S6 A. Note that due to the symmetries of the system, the configuration probability is exactly the same upon flipping every spin, so we take the convention that each curve represents the sum of the probabilities for the two degenerate configurations. The system parameters are chosen according to the prescription described in Section S5 A for each Ising problem instance. See Section S6 A for additional details of our quantum simulation methodology. Due to the stochastic nature of the simulation, Ntraj = 10 trajectories are run for each problem instance; the solid line indicates the mean over the trajectories, while the fill indicates the standard error of the mean, as described in Section S6 A.  Figure S14. The configuration probability is defined according to (S6.3) in Section S6 A. Note that due to the symmetries of the system, the configuration probability is exactly the same upon flipping every spin, so we take the convention that each curve represents the sum of the probabilities for the two degenerate configurations. The system parameters are chosen according to the prescription described in Section S5 A for each Ising problem instance. See Section S6 A for additional details of our quantum simulation methodology. Due to the stochastic nature of the simulation, Ntraj = 10 trajectories are run for each problem instance; the solid line indicates the mean over the trajectories, while the fill indicates the standard error of the mean, as described in Section S6 A.  Figure S15. The configuration probability is defined according to (S6.3) in Section S6 A. Note that due to the symmetries of the system, the configuration probability is exactly the same upon flipping every spin, so we take the convention that each curve represents the sum of the probabilities for the two degenerate configurations. The system parameters are chosen according to the prescription described in Section S5 A for each Ising problem instance. See Section S6 A for additional details of our quantum simulation methodology. Due to the stochastic nature of the simulation, Ntraj = 10 trajectories are run for each problem instance; the solid line indicates the mean over the trajectories, while the fill indicates the standard error of the mean, as described in Section S6 A. For various problem sizes N , we show the correlation matrix of success probabilities Psuccess (evaluated at the end of the trajectory) between the statically-coupled and dynamically-coupled systems, using the classical EOMs for each respective system. Projected to the sides in blue and red are the corresponding histograms for the dynamically-coupled and staticallycoupled systems, respectively. The system parameters are chosen according to the prescription described in Section S5 A for each Ising problem instance. See Section S6 B for additional details on our classical simulation methodology.