Integrating quantum processor device and control optimization in a gradient-based framework

In a quantum processor, the device design and external controls together contribute to the quality of the target quantum operations. As we continuously seek better alternative qubit platforms, we explore the increasingly large device and control design space. Thus, optimization becomes more and more challenging. In this work, we demonstrate that the figure of merit reflecting a design goal can be made differentiable with respect to the device and control parameters. In addition, we can compute the gradient of the design objective efficiently in a similar manner to the back-propagation algorithm and then utilize the gradient to optimize the device and the control parameters jointly and efficiently. Therefore, our work extends the scope of the quantum optimal control to device design and provides an efficient optimization method. We also demonstrate the viability of gradient-based joint optimization over the device and control parameters through a few examples based on the superconducting qubits.

Quantum mechanics provides an entirely new way to think about information processing.Information is stored in quantum systems and the dynamics are described by the Schrödinger equation or the master equation.To achieve desired operations, we need to design systems with suitable Hamiltonians and dissipation as well as time-dependent external controls.Concrete examples of such quantum information processing tasks include quantum computing, quantum simulation, quantum error correction, and quantum metrology.At this level of abstraction, it is clear that we are dealing with optimization problems that consist of both Hamiltonian design and control.Traditionally, the optimization of the design [2,3] and the optimization of the control [4,5] are usually studied separately.In this work, we treat them as equal components in a single optimization problem and use gradient information to speed up optimization.
We consider superconducting circuits, which are one of the most promising hardware platforms due to their device design versatility and fabrication scalability.While the design versatility allows for many qubit and coupling types [6][7][8][9][10][11][12][13][14][15], it also makes the quest of finding the best design harder due to the larger parameter space.As mentioned earlier, the optimization of the device design is further complicated by the fact that we need to consider the control schemes simultaneously, which has been noted in previous work [16,17].For example, complex control schemes are needed for bosonic code qubits, and often numerical optimization is required to obtain the best gate performances [18][19][20].In scenarios where we cannot estimate the control performances with analytical formulas, the optimization problem of the qubit design inevitably becomes a joint optimization problem of design and control with an even larger parameter space.A common strategy to handle large optimization problems is using efficiently computed gradients to help us traverse the optimization landscape.One milestone in this direction is the GRadient Ascent Pulse Engineering (GRAPE) al- gorithm [4].In this work, we showed that the figure of merit of a target goal such as the gate fidelity can be made differentiable with respect to the parameters from the device and the control together.The gradients can be computed in a single computation similar to the GRAPE algorithm and the back-propagation algorithm [1].More accurately, the ratio of the time needed to compute the figure of merit and its gradient is independent of the number of control parameters.This makes our method feasible not only for optimizing complex pulse shapes but also for optimizing large processor device designs, and more interestingly, jointly optimizing the design and control.

Control pulses
Integrating design and control aspects.In this work, we mainly focus on the two stages of quantum processor design that involve finding optimal values of the device parameters in the system Hamiltonian and deriving optimal arXiv:2112.12509v1[quant-ph] 23 Dec 2021 control pulses to maximize the performance of the target quantum processor.When the device parameters are determined and we only need to optimize over the control parameters for better performance or robustness, this is precisely what quantum optimal control and robust control do.In contrast, we may sometimes need to optimize the spectral or other properties of the device Hamiltonian itself, e.g., for determining tunable ZZ-interaction on-off ratios [21,22] or for designing 4-local interaction couplers [2].Now we introduce our framework that unifies the above two scenarios, optimizing a system's performance over the control parameters and the device parameters jointly as illustrated in Fig. 1.The Hamiltonian of a quantum system with control fields takes the following form: where H dev , C k , and f k are the device Hamiltonian, control operators, and control fields, respectively, and h and c denote the device and control parameters, respectively.The goal is to optimize h and c for certain objectives, such as implementing a specific unitary operator within a given time τ .This extended setup formulates a devicecontrol codesign problem.In Appendix A, we explain why we expect that optimizing h and c simultaneously will be more efficient compared to the case where we artificially break the optimization problem into two and optimize h and c alternatively.Compared to the optimal control problem where h is fixed, the extended setup will open more possibilities.For example, we can optimize the device parameters h for a gate implementation that is robust against control fluctuations and deviations.
We will later showcase this by using our method to obtain a more robust iSWAP gate between capacitively coupled fluxonium qubits.In comparison, with conventional quantum control, we only optimize over control parameters for robust schemes.
Efficient gradient computation for both device and control parameters.Here, we discuss an efficient method for computing the gradient of both the device and control parameters.More explanation about why we consider this method to be efficient can be found in Appendix B. For a superconducting processor using qubits as its basic components, the total Hilbert space can be decomposed correspondingly as H = i H i , where each H i is typically an infinite-dimensional Hilbert space.The timeindependent Hamiltonian H dev in Eqn. 1 can be typically described by the Hamiltonian where H i and S i are single component Hamiltonian and coupling operators defined on H i , respectively.The concrete form of S i depends on the type of coupling between qubits.For notational simplicity, we assume that there is a single type of coupling, but our computation method naturally generalizes to the situation where there are multiple types.We will also assume that there is a single control operator C i ( h) (see Eqn. 1) for each subsystem H i .
Even though each component H i has infinite dimensions, typically only the lowest few energy eigenstates will participate in the computation.This is indeed the case for the examples presented in this work.Therefore, an efficient method for performing numerical simulations is to first truncate the total Hilbert space into this relevant subspace i H i , where H i is the low energy subspace under the above assumption.This can be done by diagonalizing H i ( h).We can then use the operators from projecting H i , S i , and C i into the subspace H i as an approximation and obtain an efficient finite dimensional Hamiltonian H for numerical computation.Both the Hamiltonian diagonalization and the Hilbert space truncation can be made differentiable.Please refer to Appendix B for technical details.To the best of our knowledge, this is the first attempt to extend the GRAPE algorithm [4] to superconducting circuit device parameters.
The evolved unitary operator at time τ can be then derived by solving the ordinary differential equation (ODE) After we compute U (τ ), we can use objective functions such as the average gate fidelity [23] to compute how close U (τ ) and the desired operator U target are.In general, we need to add more terms to the objective functions to ensure the devices are practical.For now, we assume that it is easy to perform the reverse-mode differentiation for the computation steps to compute the objective function O( h, c) from U (τ ).To make the whole design workflow differentiable, we still need to perform reversemode differentiation through the ODE solver.The adjoint sensitivity method [24] makes this possible.This method computes gradients by solving a second, augmented ODE backwards in time, independent of the parameter size.With the gradients computed in reverse mode, we can now optimize over the device and control parameters jointly and efficiently for an objective O( h, c).
The proposed framework may also be used to optimize for solutions robustly against control noise.Robust optimization over control parameters has been extensively studied [25][26][27].Here, we take a sample-based approach [25,26] and define where O( h, c) is the original design goal, e.g., the fidelity or another figure of merit, N is the number of samples, and − − → ∆c i is the deviation of c caused by noise in the ith sample.Optimizing O avg will make parameters ( h, c) < l a t e x i t s h a 1 _ b a s e 6 4 = " a H C R d N W f a I 7 + j V 9 n s h F e W z f T p g I = " > A   robust against the selected noise.Since we can compute the gradient of O( h, c), we can compute the gradient of O avg ( h, c) by summing over the gradients of O( h, c+ − − → ∆c i ).In this way, we can also use the gradient information to speed up the robustness optimization.
As mentioned above, there are scenarios where we want to optimize the properties of H dev ( h) alone.In Appendix F, we will show how we can choose an objective function for a large ZZ-interaction on-off ratio between two transmon qubits and the gradient computation method.
iSWAP gate with fluxonium qubits We will demonstrate our method for an iSWAP between two capacitively coupled fluxonium qubits, the diagram of which is depicted in Fig. 2(a).The Hamiltonian of the system is where J C is the coupling strength between the two qubits, and H f,i is the Hamiltonian of the i-th fluxonium qubit [13], defined as follows: where E C , E J , and E L are the charging energy, the Josephson energy, and the inductive energy, respectively, ϕ i is the phase operator, and n i is the conjugate charge operator.We set the external flux of the second qubit to be a time-dependent control field initially at π, while for the first qubit, the external flux is fixed to be π, that is, ϕ ext,1 = π and ϕ ext,2 (0) = π.The waveform of ϕ ext,2 (t) is chosen to be a trapezoid pulse as shown in Fig. 2(b), which is given by where ϕ p and t plateau are the external flux bias and holding time of the plateau of the waveform, t ramp is the time for the rising and falling edges, and ReLU(x) = max(x, 0) is a commonly used activation function in neural networks.We call E C , E J , E L , and J C the device parameters, and the parameters in ϕ ext,2 (t) are the control parameters [28].These are all parameters involved in the optimization.Some details of the construction of the finite dimensional Hamiltonian are provided in what follows.We first write the Hamiltonian of the single-qubit fluxonium in the discretized phase basis over a finite range of phase values.To obtain sufficiently accurate low-energy eigenstates for the optimization task, we select the range to be ϕ ∈ [−5π, 5π] and the number of basis points to be n basis = 400.
We select the average fidelity to be the objective function.The leakage is small for the iSWAP gate scheme, and the optimized results have a leakage smaller than 10 −4 .Therefore, it is a good approximation to compute the fidelity based on the truncated 4 × 4 unitary matrix.However, there are usually single qubit phases accumulated during the implementation of the gate, and we can compensate for these phases at a very low cost [29].Therefore, we will use a modified average fidelity F m (U ) with phase compensation included (see Appendix D).We include other terms in the total objective function.One is a penalty P decoh , which roughly estimates the infidelity caused by the decoherence of T 1 and T 2 .Another is P fDiff , which discourages the 0 ↔ 1 frequencies of the two fluxonium qubits from being too close.This is useful for having addressable single qubit gates and it mitigates ZZ-crosstalk when both fluxonium qubits are at their sweet spots.The last one is P fm , which forces the fluxonium parameters to be realistic for fabrication.The exact forms of these terms will be given in Appendix E.
In total, we set the objective function for minimization to be Another desired feature of the iSWAP gate scheme is robustness against control errors.Here, we demonstrate how we can improve the robustness of the gate fidelity F m (U ) with respect to the amplitude parameter ϕ p in Eqn. 7. To do this, we set ∆c 1,2 = (±∆ϕ p , 0, . . ., 0), i.e., all the control parameters in ∆c 1,2 other than ϕ p are set to 0. Then, we can construct the robustness objective function O avg from O according to Eqn. 4. We use the Adam optimizer [30] to minimize the objective functions O and O avg .For the initial condition and optimizer hy- With this example, we show that the joint gradient optimization is viable for a practical problem.Because both capcitively coupled fluxonium qubits and the control pulse Eqn.7 have been realized in the experiment [31] and their behaviours coincide with the numerical simulation, we expect the gain from doing the optimization can transfer to the experiments as long as we have the correct decoherence model.For the optimization of objective O, we reach a local minimum of both the device and control parameters in a single run of the gradient optimization.To understand the benefits of using reverse-mode gradient computations in the optimization, we can compare the speed of the reverse-mode gradient computation with the computation of finite differences (see Appendix B for an explanation of finite differences).This is because finite differences are widely used to estimate the gradients in the implementation of popular optimizers.For example, this is how the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm [32,33] in SciPy [34] estimates the gradients when they are not provided by the user.The speedup is about three times for this small optimization problem with seven device parameters and three control parameters (see Appendix E for the running time).In Appendix C, we use the task of diagonalizing fluxonium Hamiltonians to demonstrate that the speedup of the gradient optimization will grow when the system size and the number of parameters grow.Discussion In this work, we extend the optimal control framework to superconducting circuit parameters.We showcase that we can optimize over the device and control parameters jointly for a figure of merit reflecting a design goal through reverse-mode gradient computation.This suggests that there is almost no downside in terms of the efficiency to add device parameters to the optimization variables.
We can further extend the proposed framework to other stages of quantum processor design.The Hamiltonian representation with device parameters such as E C , E J , and E L we discussed in the paper is an intermediate representation when designing a quantum processor.This representation is derived from the underlying lumped-element circuit representation, which is again derived from the processor's two-dimensional (2D)/threedimensional (3D) layout representation.There could be more freedom or design choices in the lumped-element circuit or layout representations.Moreover, the layout representation allows us to estimate substantial losses, such as the dielectric loss, based on the electric field energy distribution on the interfaces with contamination.However, electromagnetic simulations, which are used to evaluate the layout design, are very computationally expensive.This means that the gradient evaluation of the Hamiltonian parameters with respect to the layout parameters may be of substantial importance, as they will significantly reduce the number of iterations.We leave this for future study.
Most optimal control use cases focus on optimizing control solutions to realize a given quantum operation.However, for NISQ (noisy intermediate-scale quantum) applications or quantum error correction, the overall performance depends on native gates available on the device, the compilation scheme used, and other device aspects like the readout and reset.
In Appendix F, we illustrate how we can simultaneously optimize the ZZ-interaction on-off ratios on different pairs of transmon qubits on a square lattice amenable to fault-tolerant quantum error correction architectures such as the surface code.This can be viewed as a first When using gradient optimization near a local minimum, it is generally better to optimize all the variables simultaneously instead of dividing them into groups and optimizing them one group at a time.We can illustrate this point by considering a quadratic objective function f (x) = v T Av, which provides a good approximation for general objective functions near their minima.If we start the optimization with the coordinate v 0 such that v 0 is an eigenvector of A, then the natural descent direction is along the vector −v 0 .However, optimizing the elements of v separately will force the optimization to move in less efficient directions.Fig. 5 shows a numerical simulation with the objective function f (x, y) = 100(x − y) 2 + (x + y) 2 , which confirms the above intuition.
It is not hard to construct toy gate optimization examples with similar objective function landscapes near local minima.For example, we consider the problem of implementing a controlled-Z (CZ) gate using the mechanism in Appendix F. For simplicity, we assume that we can only use a (nearly) square pulse to tune the gZZ interactions for time t, and the only penalty is decoherence.The objective function can be selected as where T 1.The descent direction is in the direction of decreasing t while maintaining gt ≈ π/2, which we cannot follow if we optimize them separately.

Appendix B: Efficient gradient computation
To have a baseline to compare to, we examine the following naive method of computing gradients.For a function f ( x), we can always the finite difference as a way to estimate the i-th element of the gradient, where e i is the unit vector corresponding to the i-th direction.The drawbacks of this method are as follows: 1. We need to evaluate the function d + 1 times to compute the gradient, where d is the dimension of x.
2. We need to have prior knowledge about max x |f ( x)| for x in a certain neighborhood to know how to choose .
3. The smaller is, the higher the precision of f needed to achieve the same level of precision in the final estimation becomes.In theory, this means we need longer computation times for small .In practice, this can be more complicated because commonly used numerical computation packages such as NumPy do not provide an option for arbitrary precision.
In this work, we use a more efficient and accurate method to compute the gradients.This is usually called reverse-mode differentiation in the literature [35].The method utilizes the decomposition of the function f , and then the chain rule can be used to compute the gradient of f .If all the f i 's are scalar valued, the reverse-mode differentiation can be performed in a single run, i.e., we need to compute the gradient of each f i once.For the more general case of vector valued functions, let us consider the example The partial derivatives of f and g form the Jacobian We know ∂zn ∂xm = l ∂zn ∂y l ∂y l ∂xm .In other words, J g•f nm = J g nl J f lm .For most operations used in the examples of this paper, the computational complexities of f i and f i are of the same order.Theoretically, this is related to the fact that in the function decomposition Eqn.B2, we can require functions {f i } to represent elemental computation steps, e.g., addition and multiplication.For this finite set of elemental steps, the costs of computing the derivatives are bounded by some constant times the costs of performing the steps.The above argument can be viewed as a sketch of the Baur-Strassen theorem [36].Therefore, we can typically expect a d-fold speedup with reverse-mode differentiation compared to the finite difference approach.In Appendix C, we perform some benchmarking for a common task, which also confirms the above conclusion.
Many matrix derivative formulas can be found in a previous publication [35].For demonstration, we consider the example of computing derivatives for matrix diagonalization, which is a key step for the computing spectral properties of Hamiltonians.Suppose that we have performed the forward computation D = U −1 AU , where D is a diagonal matrix.Then, dD = diag(U −1 dA • U ), where the diag(•) operator set all the non-diagonal elements to 0. For U , we have dU = U (F • (U −1 dA • U )), where F ij = (D jj − D ii ) −1 and • is the operator for element-wise multiplication.These derivatives are essentially the first-order perturbation theory.The Jacobian can be extracted from these expressions.
We also want to discuss the compatibility of the differentiation and the control flows in programming languages, as there are misconceptions that using control flows will make a program non-differentiable.We consider the popular activation function ReLU(x) = max(x, 0) that appeared in Eqn. 7. The derivative is The derivative does not exist at x = 0, but this fact does not stop a reverse-mode differentiation algorithm from converting ReLU to ReLU by defining ReLU (0) to be some arbitrary number.A typical numerical optimization process is unlikely to reach the non-differentiable point x = 0, so it may not matter in practice how we define ReLU (0).The same argument also holds for computing derivatives of the matrix diagonalization when the matrix has degenerate eigenvalues.For this work, we used the Python library JAX [37] to perform the reversemode differentiation.
n fm 3 4 5 6 ratio 2.77 3.06 2.39 2.37 TABLE I. Ratios of the time needed for computing both the value and gradient to the time needed for only computing value.The ratio is roughly a constant when we increase the number of fluxonium qubits n fm .
energy difference between |0 and |1 .This will result in single qubit phases that need to compensated.For example, we can perform the compensation through virtual Z gates [29] or real Z gates [38].For the case of using virtual Z, the fidelity is usually much higher than other operations, and thus, we can treat it as perfect.Naturally, we want to compute the fidelity after single qubit phase compensation is done.One way to perform the compensation, albeit not necessarily the best way, is as follows.Assume the unitary operator we obtained from simulating the time evolution is U .We set where arg(•) computes the phase of a complex number.
We then compensate U with a diagonal matrix D equal to the tensor product of two single qubit phase gates: (D2) Then, we can compute the average gate fidelity: Appendix E: iSwap optimization details In this section, we provide more details on the iSWAP gate optimization.First, we examine the objective function For P decoh , we will consider two sources of decoherence mentioned previously [39].The first is the dielectric loss.By Fermi's golden rule, we can obtain the corresponding relaxation rate as where ω is the qubit angle frequency.We set the effective dielectric loss tangent tan δ C = 2 × 10 −6 and the device temperature T = 50 mK.The second is white flux noise, with the corresponding dephasing rate expressed as where we set c f = 3.95×10 −15 s.Here, we will not pursue a precise and accurate decoherence model, as it can vary drastically for different experimental setups.Rather, all the losses listed in [40] have closed-form expressions that allow us to easily compute their gradients.Together, we will use a penalty that is linear in time to approximate the infidelity caused by these two decoherence sources: To obtain functioning single qubit gates, we consider a penalty P fDiff that stops the frequency difference between the two qubits from being too small: where we set ∆ = 0. work, only the bound for E J will be violated.Thus, we set where c fm,1 = 0.2 GHz −2 and c fm,2 = 2.1 GHz.
Here, we list the hyperparameters used in the Adam optimizer.We set the initial learning rate to be r init = 0.003 for optimizing objective O and r init = 0.004 for optimizing O avg .In addition, we let the learning rate r decay exponentially according to The reasoning is to increase the rate of the O avg optimization and simultaneously reduce the fluctuations in the later steps.Other hyperparameters for the Adam optimizer are b 1 = 0.9, b 2 = 0.999, and = 10 −8 .There are 4 × 10 4 gradient update steps for optimizing O and 10 5 steps for O avg .The values of the parameters before and after optimization are listed in Table III and Table IV, respectively.The initial values of the device parameters are chosen arbitrarily.For the control parameters, we compute ϕ p such that it will ensure that the two fluxonium qubits are resonant.It is chosen this way because if it is too far from resonance, the optimization landscape might be a barren plateau.Determining the best way to perform the optimization without this kind of prior knowledge is an interesting subject, and we will examine it in future work.The computational times needed to compute the objective function and the gradient are listed in Table VI.Since JAX provides just-in-time (jit) compilation functionality and it is applicable to our computation of O and grad(O), we also list the computation times after using a jit compilation.The speedup compared to the finite difference approach when not using jit is T (loss O) • (n param + 1)/T (grad(O)) = 3.28, while the speedup when using jit is We see that we already obtain speedups with a very small number of parameters.In Appendix C, we will use the common task Hamiltonian diagonalization as a first attempt to benchmark the speedup of the reversemode gradient computation with respect to the number of parameters.

Appendix F: Adiabatic flux-tuning gates
As mentioned earlier, several popular cQED 2-qubit gate schemes are adiabatic [14,22,41,42].For these adiabatic gate schemes, we can obtain a good estimate of the gate performance solely based on the spectral properties of the Hamiltonian.An example is that we can use the on-off ratio of the ZZ-interactions to benchmark the design of a tunable coupler.More generally, there is a class of "optimal design" problems where we only optimize the design parameters.Unsurprisingly, our gradient optimization techniques can also be applied to this class of problems, which we will demonstrate in the following example.Moreover, we will use the example to show how our method can be applied to transmons and how it can be applied to a chip-level optimization problem.The example is based on adiabatic flux-tuning CPhase gates obtained via ZZ-interactions caused by introducing one of the computational bases near an avoided crossing with a non-computational eigenstate.
For two capacitively coupled tunable transmons, the Hamiltonian has the form where H t,i is the transmon Hamiltonian For simplicity, we have set all effective offset charges to be n g = 0.
The desired properties that we will try to optimize in this example are: • Large ZZ-interactions at the gate operating point; • Sufficiently small ZZ-interactions at the idle point, so that single qubit gates including the identity gate have good fidelities; • Long coherence times.
As we approach an avoided crossing between a computational basis and an eigenstate outside the computational subspace, the ZZ-interactions will be larger, and we need to change the Hamiltonian slower to avoid leakage.For typical Hamiltonian parameters, we cannot reach the avoided crossing point adiabatically in a reasonable time compared to the coherence time.Therefore, as a first attempt, we use the following routine to estimate the gate operating points: 1. Initialize the flux ϕ ext ← 0 to the idle points.
where we set M = 0.8 in this example.Then, we can use the procedure specified in Appendix G to compute the gradient of the above algorithm with a conditional loop.The magnitudes of the ZZ interactions at the idle point and the gate operating point are computed from the energy of four computational basis states: where P decoh is a penalty determined by the coherence time, P ZZ,idle punishes large ZZ-interactions when qubits are at sweet spots, P tm,i ensures the E J /E C ratio is large, P ah,i requires the anharmonicity to be large enough for single qubit gates, and P fDiff forces the 0 ↔ 1 frequencies of the two transmons to be different.The exact formulas for these terms can be found in Appendix H.In the above equation, ϕ ext is determined by Eqn.F3.We can extend this optimization problem to a chiplevel setting.We consider the surface code scheme presented previously [43].In short, the plan is to have three types of transmon qubits with different parameters arranged in a certain way on a 2D lattice.Sorting by their 0 ↔ 1 transition frequencies, we label the three different types of transmon qubits as H(igh), M(iddle), and L(ow).We want to have a CPhase gate between the H-M and M-L qubit pairs, while we still want the crosstalk to be small when idling.Since H and L transmons are spatially separated, we will assume their interactions are small and ignore them in this example.Therefore, we will add the objective function for H-M and H-L qubit pairs together: (F6) We need to subtract the last two terms because they are counted twice in O H−M and O M−L .We can then use Adam [30] to perform the optimization.The loss during optimization is shown in Fig. 6, and details about the optimization can be found in Appendix H.

Appendix G: Computing gradients for conditional loops
Here, we explain how to perform the differentiation in the presence of the conditional loop that is used in the adiabatic flux-tuning gate optimization.The forward computation, i.e., the computation of the objective O, is carried out as usual.When doing the reverse computation for the gradient, we utilize the fact that the gate operating point ϕ stop ext is a function of parameters p in the optimization.

FIG. 1 .
FIG.1.Illustration of joint optimization in circuit quantum electrodynamics systems.Both device designs and control pulses contain many parameters, and together they determine the fidelity of the quantum operations.In this work, we consider the problem of optimizing them together.
s n b R b a e m D g c M 6 9 3 D M n T D j T 4 L r f V m V t f W N z q 7 p d 2 9 n d 2 z + w D + t d H a e K 0 A 6 J e a z 6 I d a U M 0 k 7 w I D T f q t plateau < l a t e x i t s h a 1 _ b a s e 6 4 = " 3 0 6 O B k n M 6 t q 7 b c n h E 3 j U H V D S D 6 o = " > A A A B + H i c b V D L S g M x F L 1 T X 7 U + O u r S T b A I r s q M i L o s u n F Z w T 6 g H Y Z M m r a h S W Z I M k I d + i V u X C j i 1 k 9 x 5 9 + Y a W e h r Q c C h 3 P u 5 Z 6 c K O F M G 8 / 7 d k p r 6 x u b W + X t y s 7 u 3 n 7

FIG. 2 .
FIG. 2. Device and control parameters optimized for the iSWAP gate.(a) Circuit diagram of two fluxonium qubits coupled by a capacitor.Each circuit component corresponds to a term in the Hamiltonian, and we label the components with their energy parameters.The capacitive coupling strength JC is determined by the capacitance of each qubit and between these two qubits.(b) Time-dependent part of the control pulse shape.It corresponds to the second term on the right-hand side in Eqn. 7.

FIG. 3 .
FIG. 3. iSWAP gate optimization loss during training.The blue curve corresponds to the optimization of O in Eqn. 8 and the orange curve corresponds to Oavg in Eqn. 4.

FIG. 4 .
FIG.4.Robustness of the iSWAP gate with respect to the control parameter ϕp.For different ∆ϕp, we compute O(add( p, ϕp, ∆ϕ)).The blue and orange dots mark the ∆ϕp values used to perform the optimization.The orange dotted curve is a translation of the orange curve.The curvature of the orange dotted curve is smaller than that of the blue curve.Therefore, the parameters obtained from the robustness optimization are more robust.

FIG. 5 .
FIG.5.Comparison of simultaneous optimization and optimization one at a time.The objective function is f (x, y) = 100(x − y) 2 + (x + y) 2 .The points corresponding to simultaneous optimization are from a single run of the Broyden-Fletcher-Goldfarb-Shanno (BFGS) optimization.The points corresponding to the separate optimizations are from a total number of 60 BFGS optimizations (30 for x and 30 for y).

O
chip = O H−M + O M−L − P tm,M − P ah,M .
1 GHz, and c fDiff = 1 GHz −1 .The last penalty is P fm .Considering the design feasibility, fabrication stability, and noise sensitivity, we set bounds for the variations of the fluxonium qubit parameters, as listed in TableII.For the optimization performed in this

TABLE VI .
Elapsed real time for computing the loss function O and its gradient.We also include the time needed for the jit compiled functions.
2. Changing the flux ϕ ext ← ϕ ext + ∆ϕ. 3. Use the following stop condition to check whether it is close to the avoided crossing.If not, go back to step 2. We choose the stop condition to be based on the overlap between the instantaneous eigenstates {ψ i (ϕ ext )} and the bare states {Ψ jk (ϕ ext )}, where 1 ≤ i ≤ 4 and 1 ≤ j, k ≤ 2. The instantaneous eigenstates {ψ i (ϕ ext )} are the eigenstates of the total Hamiltonian H(ϕ ext ), and the bare states {Ψ jk (ϕ ext )} are the tensor products of jth and kth single transmon eigenstates.

E
ZZ (ϕ ext ) = |E 00 + E 11 − E 01 − E 10 |, (F4)where E jk are the eigenvalues corresponding to instantaneous eigenstates by {ψ i (ϕ ext )}, which has the largest overlap with {Ψ jk (ϕ ext )}.Because we never reach the middle of the avoided crossing, the above E jk 's can be defined unambiguously.Again, we will include a few additional terms in the objective function to estimate the infidelity caused by decoherence and ensure the functionality of single qubit operations.The objective function for minimization is O =E ZZ (ϕ ext )/P decoh • c scale + P ZZ,idle [44]an then use the implicit function theorem to compute the gradient of ϕ stop ext ( p) with respect to p.In a previous publication[44], implicitly defines the function ϕ stop ext ( p).

TABLE VII .
Values before and after optimization for adiabatic CPhase gate.