Observation of Josephson harmonics in tunnel junctions

Approaches to developing large-scale superconducting quantum processors must cope with the numerous microscopic degrees of freedom that are ubiquitous in solid-state devices. State-of-the-art superconducting qubits employ aluminium oxide (AlOx) tunnel Josephson junctions as the sources of nonlinearity necessary to perform quantum operations. Analyses of these junctions typically assume an idealized, purely sinusoidal current–phase relation. However, this relation is expected to hold only in the limit of vanishingly low-transparency channels in the AlOx barrier. Here we show that the standard current–phase relation fails to accurately describe the energy spectra of transmon artificial atoms across various samples and laboratories. Instead, a mesoscopic model of tunnelling through an inhomogeneous AlOx barrier predicts percent-level contributions from higher Josephson harmonics. By including these in the transmon Hamiltonian, we obtain orders of magnitude better agreement between the computed and measured energy spectra. The presence and impact of Josephson harmonics has important implications for developing AlOx-based quantum technologies including quantum computers and parametric amplifiers. As an example, we show that engineered Josephson harmonics can reduce the charge dispersion and associated errors in transmon qubits by an order of magnitude while preserving their anharmonicity.

Superconducting quantum processors have a long road ahead to reach fault-tolerant quantum computing [1][2][3]. One of the most daunting challenges is taming the numerous microscopic degrees of freedom ubiquitous in solid-state devices. State-of-the-art technologies, including the world's largest quantum processors [4][5][6], employ aluminum oxide (AlOx) tunnel Josephson junctions (JJs) as sources of nonlinearity, assuming an idealized pure sin φ current-phase relation (CφR). However, this celebrated sin φ CφR is only expected to occur in the limit of vanishingly low-transparency channels in the AlOx barrier. Here we show that the standard CφR fails to accurately describe the energy spectra of transmon artificial atoms across various samples and laboratories. Instead, a mesoscopic model of tunneling through an inhomogeneous AlOx barrier [7,8] predicts %-level contributions from higher Josephson harmonics. By including these in the transmon Hamiltonian, we obtain orders of magnitude better agreement between the computed and measured energy spectra. The reality of Josephson harmonics transforms qubit design and prompts a reevaluation of models for quantum gates and readout [9][10][11][12][13], parametric amplification and mixing [14,15], Floquet qubits [16,17], protected Josephson qubits [18][19][20], etc. As an example, we show that engineered Josephson harmonics can reduce the charge dispersion and the associated errors in transmon qubits by an order of magnitude, while preserving anharmonicity.
The Josephson effect [21,22] is the keystone of quantum information processing with superconducting hardware: It constitutes a unique source of lossless nonlinearity, which is essential for the implementation of superconducting quantum bits, and it plays a similarly fundamental role as the non-linear current-voltage relation of diodes in semiconductor circuitry. In particular, tunnel Josephson junctions (JJs), formed by two overlapping superconducting films separated by a thin insulating barrier, have enabled superconducting hardware to For tunnel JJs, the CφR has been considered to be purely sinusoidal (dashed gray line, cf. Eq. (1)) with the maximum given by the critical current Ic. However, as we show in this work, even in tunnel JJs, the underlying microscopic complexity of the charge transport can manifest in the contribution of higher harmonics to the CφR. As an example, the red line shows a CφR consistent with measured data (cooldown 1 of the KIT sample; cf. main text), which includes the harmonics expected from a mesoscopic model assuming an inhomogeneous AlOx barrier. The shaded red area shows the difference to the purely sinusoidal CφR. We provide CφRs for all other measured samples in the Supplementary Information. effective parameter, the critical current I c , in the wellknown Josephson CφR (see gray line in Fig. 1), where φ is the superconducting phase difference across the junction. This simplification is remarkable given the fact that other types of junctions, such as weak links, point contacts and ferromagnetic JJs, generally exhibit non-sinusoidal CφRs containing higher Josephson harmonics sin(2φ), sin(3φ), etc. [29][30][31][32][33][34][35]. Here we show that Josephson harmonics are also relevant for tunnel JJs (see Fig. 1).
To understand the limits of the approximation Eq. (1) for tunnel junctions, we have to take a closer look at commonly used Al-AlO x -Al JJs, fabricated by shadow evaporation [39] and schematized in Fig. 2a-c, which reveals a complex microscopic reality. The CφR of the junction is obtained by summing the supercurrents of N conduction channels, I(φ) = N n=1 I n (φ). Each channel (cf. Fig. 2b) has a transparency-dependent CφR [30,40] which can be expressed as a Fourier series: The conduction channel transparency T n is defined as the tunnel probability for an electron impinging on the insulating barrier of channel n and c m (T n ) are the order m Fourier coefficients for I n (φ). These coefficients alternate in sign and decay in magnitude with increasing order m (cf. Fig. 2d). The ratio |c m+1 /c m | of successive coefficients increases with T n (cf. Supp. I A): the more transparent a channel, the more relevant is the contribution of higher harmonics. To put it simply, in highertransparency channels it is more likely for Cooper pairs to tunnel together in groups of m, which corresponds to the sin(mφ) terms in the CφR.
In the limit T n → 0, only the sin φ term of Eq. (2) survives. If all channels in a JJ are in this limit, we recover the purely sinusoidal CφR of Eq. (1), with the critical current of the junction I c proportional to the sum of transparencies. Assuming a perfectly homogeneous barrier, for a typical junction with ∼ µm 2 area and resistance comparable to the resistance quantum, one expects N ∼ 10 6 and T n ∼ 10 −6 [41,42], leading to insignificant (below 10 −6 ) corrections to the purely sinusoidal CφR.
But is this the reality? Here we argue that in the presence of contaminants, atomic scale defects [43] and random crystalline orientations of the grains in contact, evidenced by TEM images and molecular dynamics simulations (see Fig. 2c and Supp. IV), we have reasons to doubt it. In fact, about two decades ago, AlO x barrier inhomogeneity motivated the transition in magnetic junctions to more uniform oxides such as MgO [44][45][46]. Consequently, we expect a distribution of transparencies in AlO x [47,48] with possibly a few relatively hightransparency channels [7,8] introducing measurable corrections to the CφR (see Fig. 1). The microscopic structure of each barrier is therefore imprinted on the CφR of the JJ, and the challenge is how to experimentally access this information.
For our study of tunnel JJs, we use transmon devices [38], in which a JJ is only shunted by a large capacitor to form a non-linear oscillator with the potential energy defined by the CφR of the junction (cf. Fig. 2e). The resulting individually addressable transition frequencies in the microwave regime can be measured using circuit quantum electrodynamics techniques [49]. We compare the spectra of multiple samples to the prediction of the standard transmon Hamiltonian based on a sinusoidal CφR (Eq. (1)) and find increasing deviations for the higher energy levels of all samples, as sketched in Fig. 2e,f. Only by accounting for higher harmonics in the CφR are we able to accurately describe the entire energy spectrum. A similar methodology was used in [32] to reconstruct the CφR of a semiconductor nanowire Josephson element. While our study focuses on transmon qubits, the conclusions we draw regarding the CφR of tunnel junctions should trigger a reevaluation of the current models for tunnel-JJ-based devices used in quantum technology and metrology [49][50][51][52][53].
Since transmons are widely available in the community, we are able to measure and model the spectra of multiple samples from laboratories around the globe: fixed frequency transmons fabricated and measured at the Karlsruhe Institute of Technology (KIT) in three cooldowns (CDs) and Ecole Normale Supérieure (ENS) Paris (same  2)). We sketch a distribution of multiple low and a few high transparencies T1, . . . , TN in green and red, respectively. c False-colored high-angle annular dark field scanning transmission electron microscope (HAADF-STEM) image centered on the AlOx tunnel barrier of a typical JJ fabricated at KIT, with average thickness d ≈ 2 nm as indicated by the white arrow. Individual columns of atoms of the Al grain in the top electrode are visible due to zone axis alignment, which is not the case for the bottom Al electrode (additional STEM images with thickness variations and structural defects such as grain boundaries are shown in Supp. IV C). d Normalized Fourier coefficients cm(Tn) of the JJ current-phase relation (cf. Eq. (2)) for a low (10 −6 , green) and high (10 −2 , red) transparency channel. Note the alternating sign for even and odd order m and the fact that high transparency channel coefficients (in red) remain relevant to higher order. e Sketch of how the higher-order terms in the JJ Hamiltonian modulate the potential and shift the energy levels (red) of superconducting artificial atoms compared to a purely cos φ potential (gray). In our manuscript we focus on transmon devices, which consist of a large capacitor in parallel to the JJ (cf. circuit schematic). The discrepancy between the models generally increases at higher levels. f The higher-order Josephson harmonics also influence the charge dispersion of the transmon levels vs. offset charge ng. The two branches per energy level correspond to a change between even and odd charge parity (i.e. quasiparticle tunneling [36,37]; cf. device as in Ref. [54]), a tunable transmon subject to an in-plane magnetic field at the University of Cologne (Köln) (identical setup and similar device as in Ref. [55]), and 20 qubits from the IBM Hanoi processor (IBM). All transmons are based on standard Al-AlO x -Al tunnel junctions (cf. Fig. 2) and are measured either in a 3D architecture or a 2D coplanar waveguide geometry (for detailed descriptions of each sample see Supp. III). The spectroscopy data consists of (i) transition frequencies f 0j into transmon states j = 1, 2, . . . up to j = 6, each measured as j-photon transitions at frequencies f 0j /j, and (ii) the resonator frequencies f res,j depending on the transmon state j = 0, 1 (see Methods).
In Fig. 3, we compare the measured transition frequencies to predictions f model 0j , obtained by exact diagonalization of two different model Hamiltonians. The first model is the standard transmon model, which has served the community for over 15 years [38], where E C is the charging energy, E J is the Josephson energy, n g is the offset charge, and the operators n and φ represent the charge normalized by twice the electron charge and the phase difference across the junction, respectively. All models include the readout resonator Hamiltonian given by H res = Ωa † a + Gn(a + a † ), where Ω is the bare resonator frequency, G is the electrostatic coupling strength, and a † (a) is the bosonic creation (annihilation) operator. Including H res ensures that dressing of the states due to transmon-resonator hybridization is taken into account [38,49,56,57].
We obtain the parameter set (E C , E J , Ω, G) of the standard transmon model Eq. (3) by solving the Inverse Eigenvalue Problem (IEP) [58][59][60][61] for the measured spectroscopy data (see Methods). For the Köln sample this data includes the offset charge dispersion (additional data for different magnetic fields is given in Supp. II D). We note that the IEP is the very same science problem that was historically solved to model the energy spectra natural atoms and molecules (see e.g. [62][63][64]), which led to the discovery of the fine structure.
In Fig. 3a we show that the standard transmon model Eq. (3) fails to describe the measured frequency spectra for all samples. The observed deviations are much larger than the measurement imprecision, for which we can set a conservative upper bound on the order of 1 MHz. While the standard transmon model with two parameters can trivially match the f 01 and f 02 transitions, the measured f 03 can already deviate by more than 10 MHz. The deviations are positive for the KIT, ENS and Köln samples, while the IBM transmons mostly show negative deviations (cf. Supp. I C 5). It is important to remark that other corrections, such as the stray inductance in the JJ leads, hidden modes coupled to the qubit, the coupling between qubits as present on the IBM multi-qubit device, or an asymmetry in the superconducting energy gaps, while being relevant, cannot fully account for the measured discrepancy (see Supp. I D). Notably, similar deviations can be found in previously published transmon spectra [55,[65][66][67], as we detail in Supp. I C 2 and I C 4.
In Fig. 3b, we demonstrate that orders of magnitude better agreement with our measured spectra can be achieved by using the Josephson harmonics model : In general the values E Jm are a fingerprint of each junction's channel-transparency distribution ρ(T ) with many degrees of freedom. Here we consider two simplified models (further models are discussed in Supp. I C): (i) a phenomenological model truncated at E J4 (top panel) and (ii) a mesoscopic model of tunneling through a non-  Influence of Josephson harmonics on the charge dispersion. a Measured charge dispersion δf0j (blue diamonds) of states j = 1, 2, 3 for the experiment in Köln, plotted as a function of the f01 frequency. All transition frequencies are tuned as the Josephson energy is suppressed by up to 35 % by means of an in-plane magnetic field B ∥ swept to 0.4 T. The standard model Eq. (3) shown in dashed gray underestimates the charge dispersion by a factor of 2 to 7 (gray arrows), while the Josephson harmonics model Eq. (4) plotted in solid blue overlaps with the measured data. Note that both are computed with the same parameters used for Fig. 3; the Josephson energy is reduced with increasing magnetic field and the other parameters such as the EJm/EJ1 ratios are kept constant. The blue arrow indicates f01 = 5.079 GHz, corresponding to the dataset shown in Fig. 3a and b. b Evidence that Josephson harmonics can reduce the charge dispersion by an order of magnitude (gray arrows). The dashed gray lines represent the standard model predictions. In contrast, the green bars show results from all Josephson harmonics models. The data corresponds to IBM qubits 0-2 (cf. green bars in Fig. 3c) for the levels j = 1, 2, 3, 4; results for all other samples are shown in Fig. S6 of the Supplementary Information. c The values of EJ1/EC change compared to the standard model EJ/EC, which constitutes the main correction to the predicted charge dispersions in panels a and b. The bars represent the range of suitable ratios EJ1/EC (cf. Fig. 3c) for the successive cooldowns of the KIT sample (red bars), the ENS sample (yellow bar), the Köln sample (blue diamonds, using the same color coding as in panel a), and the IBM Hanoi device (green bars). The dashed diagonal indicates the case in which the ratios EJ1/EC of the harmonics model and EJ/EC of the standard model are equal. The inset shows the correction (ε har − ε std )/ε std to the relative charge dispersion ε = δf0j/f01 for the Köln sample, where ε std is given by the standard charge dispersion [38] and ε har is computed using the Josephson harmonics model. uniform oxide barrier (bottom panel). We note the phenomenological E J4 model guarantees agreement for the lowest 4 transitions (see Methods), and while many samples have physically reasonable E Jm coefficients when truncating at E J4 , a few JJs require terms up to E J6 (see Supp. I C 3).
The mesoscopic model allows us to derive ρ(T ;d, σ) based on a Gaussian thickness distribution with average thicknessd and standard deviation σ (see Supp. I B 4). As a consequence, all Josephson harmonics for m ≥ 2 are parameterized in terms of the two parametersd and σ according to where the Fourier coefficients c m (T ) (cf. Eq. (2) and Fig. 2d) Fig. 3b) are comparable to results from molecular dynamics simulation and STEM pictures of the oxide barrier (see Supp. IV). In Fig. 3c, we indicate the ranges of E Jm coefficients consistent with the measured spectra. The bars represent the lower and upper limits of Josephson harmonics ratios |E Jm /E J1 |. The corresponding sin(mφ) contribution to the CφR is given by m|E Jm /E J1 | (cf. Fig. 1 for the KIT sample). The ratios lie between two limiting cases spanning the physical regime (shaded gray area): (i) the upper limit, |E Jm /E J1 | = 3/(4m 2 − 1), corresponds to an open quantum point contact, i.e. one channel with T = 1, and (ii) the lower limit, |E Jm /E J1 | ∼ (T /4) m−1 /m 3/2 , corresponds to a perfectly homogeneous low-transparency barrier (T n = T = 10 −6 for all n). For the scanning routine, we include harmonics up to E J10 to obtain results within the physical regime and to see when truncation is possible (see Methods). Remarkably, for all samples the E J2 contribution is in the few % range even after considering additional corrections such as series inductance or gap asymmetry in the superconducting electrodes (see Supp. I D).
The Josephson harmonics ratios computed from the mesoscopic model Eq. (5) are shown with turquoise markers. Notice that the barrier evolved between cooldowns of the KIT sample due to aging (CD1 to CD2) and thermal annealing (CD2 to CD3) (cf. Supp. III A). Even for the most homogeneous barrier (CD3), the second-harmonic contribution is E J2 /E J1 ≈ −2.4 %, implying that there would be at least one conduction channel with a transparency T ≥ 0.29 (see Supp. I A). The methodology presented in Fig. 3 can serve as a tool to characterize Josephson harmonics and tunnel barrier homogeneity, independent of circuit design.
Since the charge dispersion increases for higher transmon levels (even for the standard transmon Hamiltonian [38], cf. Fig. 2f) and is exponentially sensitive to the shape of the JJ potential (cf. Fig. 2e), a natural question arises: What are the consequences of the Josephson harmonics on the transmon's susceptibility to offset charges? In Fig. 4a we show the measured charge dispersion δf 0j of the Köln device for states j = 1, 2, 3 vs. the first transition frequency f 01 , which is tuned by in-plane magnetic field B ∥ of up to 0.4 T (see Supp. III C for details). The charge dispersion predicted by the standard model (dashed gray) falls short of the measurements by a factor of 2 to 7 for the three measured transitions. In contrast, when using the Josephson harmonics model, the computed charge dispersion matches the data (blue lines). We emphasize that for both models we use the same parameters as in the Fig. 3 analysis (i.e., the standard model and the E J4 model), and vary the first Josephson energy to match the qubit frequency f 01 for different magnetic fields while keeping the E Jm /E J1 ratios constant.
Interestingly, the presence of large Josephson harmonics, as in the case of the IBM qubits (see Fig. 3c), can also reduce the charge dispersion, which directly decreases charge noise decoherence. We show evidence for this in Fig. 4b, on the first three IBM qubits, for which the charge dispersion of the qubit transition can be a factor of 4 lower than expected from the standard transmon model. This observation indicates a possible optimization route in which Josephson harmonics are engineered (e.g. by shaping the channel transparencies or by adding inductive elements in series) and the spectrum is steered towards regions of reduced charge dispersion and increased anharmonicity (see Supp. I C 7). A recent work [68] proposes a similar approach to engineer arbitrary shaped CφRs using networks of high transparency JJs.
The main reason for the failure of the standard transmon model in describing the charge dispersion (when fitted to f 01 and f 02 ) is that it misjudges the value of E J /E C . To quantify this effect, in Fig. 4c we plot the values of E J1 /E C from the Josephson harmonics model against the value of E J /E C from the standard model. Indeed, the E J1 /E C ranges for many of our measurements are not compatible with the standard model E J /E C ratio (dashed diagonal). We note that when evaluated for the same E J /E C , the Josephson harmonics correction to the charge dispersion is relatively small (see inset of Fig. 4c).
In summary, we have shown that for ubiquitous AlO x tunnel junctions, the microscopic structure, currently underappreciated in its complexity, causes level shifts and modifies the charge dispersion in superconducting artificial atoms. In order to fully describe the measured transmon energy spectra, we amend the standard sin φ Josephson current-phase relation for tunnel junctions to include higher order sin(mφ) harmonics, with the relative amplitude of the m = 2 term in the few % range. We confirm this finding in various sample geometries from four different laboratories, and we argue that the source of the Josephson harmonics is the presence of relatively higher transparency channels with T ≫ 10 −6 in the AlO x tunnel barrier. The methodology shown here can reveal percent-level deviations from a sinusoidal CφR, which are hard to detect in more standard measurements based on asymmetric DC SQUIDs [69].
The observation of Josephson harmonics in tunnel junctions highlights the need to revisit established models for superconducting circuits. Our work directly impacts the design and measurement of transmon qubits and processors: As an illustration, we show that by engineering Josephson harmonics, the dephasing due to charge noise can be reduced by an order of magnitude without sacrificing anharmonicity. These results ask for future research studying the implications of Josephson harmonics and associated Andreev bound states in other tunnel-JJ-based circuits, e.g. fluxonium or generalized flux qubits [70].
In general, we expect the inclusion of the harmonics will refine the understanding of superconducting artificial atoms and will directly benefit, among others, quantum gate and computation schemes relying on higher levels [9,13,71,72], quantum-non-demolition readout fidelities [73][74][75], and frequency crowding mitigation in quantum processors [76]. Josephson harmonics will probably also have to be accounted for in topological JJ circuits [19,20,77], in parametric pumping schemes employed in microwave amplifiers and bosonic codes [78,79], in JJ metrological devices [24][25][26][27], and can be harnessed to realize Josephson diodes [80]. As devices become increasingly sophisticated with progressively smaller error margins, higher-order Josephson harmonics will need to be either suppressed via the development of highly uniform and low transparency barriers, or engineered and included as an integral part of the device physics.

METHODS
Diagonalizing the Hamiltonians to obtain model predictions: We construct the matrices of H std in Eq. (3) and H har in Eq. (4) by first diagonalizing the bare transmon matrix (excluding H res ) in the charge basis {|n⟩}, where 4E C (n − n g ) 2 = n 4E C (n−n g ) 2 |n⟩⟨n| is diagonal and −E Jm cos(mφ) = − n E Jm /2 (|n⟩⟨n + m| + |n + m⟩⟨n|) has constant entries −E Jm /2 on the m th subdiagonal (we ensure enough terms by generally verifying that the predictions do not change if more terms are included). This yields the transmon eigenenergies E j and eigenstates |j⟩. Then we diagonalize the joint transmonresonator Hamiltonian To each resulting eigenenergy E l and eigenstate |l⟩, we assign a photon label k and a transmon label j based on the largest overlap max k,j | ⟨kj|l⟩ | (this only works for small k, cf. Supp. II C), which yields the dressed energies E kj and states |kj⟩. This procedure is done for both n g = 0 and n g = 1/2. From the resulting dressed energies E kj (n g ), we compute the transmon transition frequencies f model 0j (n g ) = (E 0j (n g ) − E 00 (n g ))/2π and the resonator frequencies f model res,j (n g ) = (E 1j (n g ) − E 0j (n g ))/2π (setting ℏ = 1). The predicted frequencies are then given by f model Solving the IEP to obtain model parameters: The inverse problem [61,81]  . . , f model 0N f , f model res,0 , f model res,1 ) agree with the measured data, is an instance of the Hamiltonian parameterized IEP (HamPIEP, see Supp. II A 2). We solve the HamPIEP using the globally convergent Newton method [82] with cubic line search and backtracking [83] and the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm [84] as implemented in TensorFlow Probability [85]. The Jacobian ∂f /∂x is obtained by performing automatic differentiation through the diagonalization with TensorFlow. For the E J4 model shown in | using the BFGS algorithm. The initial values for the minimization are given bȳ d = 1.64 nm (taken from the molecular dynamics result in Supp. IV), σ =d/4 (see also Table S2), and (E C , E J , Ω, G) from the standard transmon model. For the Köln data, where 288 data points have to be described by the same model parameters x (cf. Fig. 4a) and only the Josephson energy is varied, we use cubic interpolation as a function of f model 01 and include only a few central points for the available frequencies in the solution of the IEP (the residuals are given in Supp. II D). All model parameters are provided in the repository [86] accompanying this manuscript.
Scanning the Josephson energies: To obtain the range of suitable Josephson energies {E Jm } (shown in Fig. 3c) that are consistent with a measured spectrum, we use an exhaustive scanning procedure. A spectroscopy dataset of N f measured transition frequencies f 0j , j = 1, . . . , N f , and two resonator frequencies f res,0 and f res,1 uniquely determines-via the HamPIEP-the values x = (E J1 , . . . , E JN f , Ω, G). We then scan the values of four additional ratios y = , namely for each of these four E Jm /E J1 over 16 geometrically spaced values between the point contact limit 3(−1) m+1 /(4m 2 − 1) and (−1) m+1 min{10 −7 , |E Jm+1 /E J1 |} (always skipping the first to ensure |E Jm /E J1 | > |E Jm+1 /E J1 |). Additionally we include y = (0, 0, 0, 0) to see if truncation at E JN f is allowed. For each combination y, we solve the HamPIEP for the spectroscopy data to obtain the unique solution x. We call the ratios e = (1, E J2 /E J1 , . . . , E JN f +4 /E J1 ) a trajectory that can reproduce the spectrum. However, the trajectory e may not be physical, since (i) some of the leading ratios E Jm /E J for m ≤ N f might be beyond the quantum point-contact limit, (ii) the Josephson energies might not be strictly decreasing in absolute value for increasing order m, or (iii) the signs might not be alternating. Note that this can also happen when the Josephson harmonics model Eq.

COMPETING INTERESTS
The authors declare no competing interests.

ADDITIONAL INFORMATION
Supplementary information is available for this paper.
Correspondence and requests for materials should be addressed to I.M.P. We review the theory of superconducting tunnel junctions, starting from the general current-phase relation as a sum over conduction channels with certain transparencies. Then we derive a closed-form expression for the higherorder Josephson energies and study several transparency distributions. We also introduce a mesoscopic model of tunneling through an inhomogeneous (i.e. non-uniform) barrier, which can describe the non-negligible contributions from higher harmonics to the Josephson effect in regular SIS junctions. Based on these results, we give a motivation for some phenomenological models for the relative size of E Jm /E J and show results for these alternative models as well as for other transmon data extracted from the literature. Additionally, we perform an exhaustive scan over the higher-order Josephson contributions, yielding the full range of suitable E Jm /E J for each sample, independent of the particular models discussed. Finally, we study potential alternative corrections, both on a circuit level-such as a stray inductance, additional hidden modes, or the coupling to other qubits-and on the junction level-such as an asymmetry in the superconducting electrodes. We explain why we have come to the conclusion that these corrections cannot account for the mismatch between the datasets and the standard transmon model.

A. Theoretical background
The general formula for the current-phase relation in Josephson tunnel junctions can be written as a sum over conduction channels [40], where ∆ is the superconducting energy gap, N is the number of conduction channels, T n ∈ [0, 1] are their transparencies (the transmission amplitudes t n through the channels determine both their transparencies, T n = |t n | 2 , as well as the reflection coefficients for Andreev processes, see e.g. Section 4.4 in [88] and Fig. 4.7 there), φ is the phase difference across the Josephson junction, e is the electron charge, and ℏ is the reduced Planck quantum. Equation (S1) is Beenakker's multichannel generalization of a formula found by Haberkorn et al. [89] (see also Ref. [90] and [30,Eq. (17)]). We neglect here temperature-dependent corrections as they are exponentially small in T /T c ≪ 1. For tunnel junctions, it is generally assumed that T n ≪ 1 for all n [41,42], so keeping only the lowest order contribution would give where G = R −1 N = e 2 /(πℏ) n T n is the normal-state conductance of the junction. The corresponding contribution H J to the effective Hamiltonian is found using the relation and we recover the Josephson term of the standard transmon model, with g = G/(e 2 /h) the dimensionless conductance. However, in general, Eq. (S1) contains higher harmonics, and the condition T n ≪ 1 is not always satisfied for all transmission channels n. To obtain an expression for higher harmonic corrections, we integrate Eq. (S1) and use Eq. (S3) to get the Josephson part of the Hamiltonian Since H J is symmetric and 2π periodic in φ, it can be written as a Fourier cosine series  Figure S1. Bound for the largest transparency Tmax contributing to the conduction channels. For a given ratio |EJm/EJ| for a m = 2, b m = 3, c m = 4, d m = 5, the line indicates a lower bound for the largest transparency, of which at least one conduction channel must contribute. The area indicates all values |EJm/EJ| that are possible given Tmax. Note the change between linear and logarithmic scale at Tmax = 0.1 for the left two panels. An upper limit on the bound is given by the quantum point contact limit (see Section I B 1) which results in the flat part of the curve for Tmax close to 1.
To derive an analytic expression for the E Jm , we apply the relations for m ≥ 1 to Eq. (S5). We obtain with (a) k = a(a + 1)(a + 2) · · · (a + k − 1) denoting the Pochhammer symbol. Thus we find an expression for the higher-order contributions to the effective Josephson Hamiltonian, (S11) Note that, for small transparencies T n ≪ 1 where it is sufficient to consider only the leading-order term of 2 F 1 , we recover E J1 = E J = ∆g/8 and the sinusoidal current-phase relation (in this section, we identify E J1 ≡ E J ). We can derive a bound for the largest possible ratio |E Jm /E J | when all conduction channels have a transparency T n ≤ T max , independent of the particular distribution of transparencies {T n }. Using the monotonicity of 2 F 1 , we have in the numerator 2 F 1 (· · · ; T n ) ≤ 2 F 1 (· · · ; T max ) for all n, and in the denominator 2 F 1 (· · · ; T n ) ≥ 1 for all n. For the remaining quotient of sums over n, we use n T m n / n T n = T m−1 The bound is shown in Fig. S1 for the leading ratios. It yields, for instance, the following statement: Given some observation for |E J2 /E J | ≈ 2.40 %, there must at least be one conduction channel with a transparency T max ≥ 0.298. We remark that the estimation of T max in this way is based on the general current-phase relation hypothesis Eq. (S1).
be T max ≥ 0.096. We note that high-transparency conduction channels with energy as in Eq. (S5) have been observed experimentally, for example in break junctions (see [92], in which the authors identify a channel with T = 0.99217).
In general, the current-phase relation as a function of the Josephson energies E Jm reads The analysis presented in Fig. 3c of the main text shows that the corrections to the sinusoidal current-phase relation are on the level of several percent for all experiments under consideration. For given E Jm , the critical current I c can be obtained by finding the maximum of Eq. (S13) as a function of φ (we show three limiting cases of I S (φ)/I c for each sample in the insets of Fig. S7 below; values for I c are given in Table S3). We can use the result for E Jm in Eq. (S11) to obtain a closed-form expression for the Fourier coefficients c m (T n ) of the Josephson supercurrent discussed in the main text, yielding The first eight Fourier coefficients are shown in Fig. S2a. For the first three of them, we give here their series expansion for small transparencies, We note that successive coefficients are always alternating in sign, and for the leading terms we have so the contribution of higher harmonics becomes more relevant for larger transparencies. Furthermore, the prefactor in this ratio is of order unity and converges to −1/4 for large m. The properties of c m (T ) propagate to the Josephson energies, for which we have Thus we expect that the Josephson energies E Jm for a physical model alternate in sign and decrease in magnitude for increasing order m.

B. Transparency distributions
For a large number of channels N , summing over the channels corresponds to averaging over a distribution of transparencies ρ(T ), The ratio E Jm /E J can then be expressed as If the transparency distribution ρ(T ) is not narrowly peaked around a small average value or contains significant weight for higher transparencies, it is typically not justified to neglect higher harmonics. In this section, we look at several example distributions for which this is the case.

Single channel
A very elementary case is a single conduction channel with arbitrary transparency T 1 . In this case, the formal distribution would be ρ(T ) = δ(T − T 1 ), but the solution can directly be obtained from Eq. (S11) for N = 1, For the ratio of Josephson energies, we find accordingly We consider two important limits for |E Jm /E J |, namely the fully open quantum point contact with T 1 = 1 as an upper limit, and a homogeneous barrier of very small transparency T 1 ≪ 1 as a lower limit. For the point contact, setting This means that asymptotically, the ratios |E Jm /E J | should not scale weaker than 1/m 2 . Furthermore, in the case of a point contact, Eq. (S21) yields an upper limit for the first Josephson energy: For m = 1, we have E J ≈ 0.4244∆. Assuming an energy gap of ∆/h = 50 GHz (see [94]), we obtain E J /h ≈ 21.22 GHz, which is of the same order as the values of E J given in Table S3 below. We remark that the point contact T 1 = 1 also corresponds to the KO-2 model by Kulik and Omelyanchuk [95], which is the clean, fully ballistic counterpart [30] of the KO-1 model [96] discussed below.
For the lower limit with T 1 ≪ 1, taking the leading-order term in Eq. (S22) yields The binomial coefficient is bounded by and Stirling's approximation yields for large m Applying this approximation to the expression in Eq. (S24) yields

Small transparencies
Another simple but probably only pedagogical case is a uniform distribution of transparencies below some cutoff value T 0 . We model this case by ρ(T ) = ν Θ(T 0 − T ), where Θ denotes the step function and ν is the normalization. We then find and for the ratio If the cutoff value is very small, i.e., all transparencies T ≤ T 0 ≪ 1 are very low and it is sufficient to consider only the lowest-order term, we find Note that this expression is similar to the case of a homogeneous barrier in Eq. (S24), but the exponent of T 0 is larger by one.

Universal distributions from the literature
There are regimes in which the transparency distribution ρ(T ) is expected to take a universal form, independent of the details of the junction (see e.g. [30,93,97,98] and references therein): where n p is a normalization factor, and p = 1/2, 1, 3/2 enumerates three distinct cases which we discuss in the following (the corresponding distributions are shown in Fig. S2b). As we show in this work, typical qubit tunnel junctions are not in any of the universal regimes, so their microscopic and mesoscopic properties matter. The case p = 1/2 represents a chaotic dot connected via identical ballistic point contacts to two superconducting leads [99]. The case p = 1, also known as the KO-1 model by Kulik and Omelyanchuk [96], represents a diffusive quasi-1D wire of length L longer than the Fermi wavelength, L ≫ λ F , but still in the short-junction regime L ≪ √ ξ 0 l,   where ξ 0 is the superconducting coherence length and l the electron mean free path with l ≪ ξ 0 ; the wire transverse size is ≪ L [30]. It has been implemented with aluminum nanobridges [100][101][102] and has been proposed to describe transmons based on superconductor-constriction-superconductor (ScS) junctions [103]. The case p = 3/2 represents a disordered interface (L ≪ λ F ) [93] and it could be relevant to semiconductor quantum well junctions [104].
Evaluating Eq. (S19) with the distribution Eq. (S31) yields the higher-order Josephson energies . (S33) The absolute ratios |E Jm /E J | up to m = 25 are shown in Fig. S3a. Interestingly, in all cases the contribution of the second Josephson energy is approximately 10 % (we have E J2 /E J ≈ −11 %, −10 %, −7.5 % for p = 1/2, 1, 3/2). The increase in absolute value of the ratio with decreasing p is due to the relatively lower weight given to low transparencies in the respective transparency distributions (see upper limits in Fig. S2b). Table S1. Parameters for AlOx barriers taken from the indicated references. mi/m is the effective electron-mass ratio, ϕ is the average height of the potential barrier, and a and d0 are parameters for the tunnel-transmission probability in Eq. (S35).

Mesoscopic model of barrier inhomogeneity
In this section, we derive a transparency distribution ρ(T ) from a model of an inhomogeneous tunnel barrier. To this end, we describe the charge transport through the insulating barrier of a JJ in terms of tunneling processes through a potential barrier with a non-uniform thickness distribution ρ(d). We expect the result to be applicable to "regular" barriers, in the sense that the thickness d of the barrier between the leads can be described by a Gaussian distribution with average thicknessd and standard deviation σ (cf. Fig. 3c of the main text and the STEM images and the molecular dynamics simulations in Section IV).
Tunneling through AlO x barriers has previously been studied using a trapezoidal barrier model [7,105]. Here we use the simpler rectangular barrier model with height given by the average height of the trapezoidal barrier. This is a good approximation up to corrections quadratic in the ratio of the difference over the average height ϕ (as measured from the Fermi energy E F ). For a rectangular barrier of thickness d, the textbook result for the transmission probability T is [106] with and where we have assumed tunneling electrons to have the Fermi energy. For aluminum, within the free electron model, m is the electron mass and E F = 11.67 eV. The parameter m i denotes the effective mass in the band of the insulator closest to the energy of the tunneling electrons; m i enters Eq. (S36) as a prefactor for ϕ, so only the product of the two quantities is relevant. From the literature, see Table S1, we estimate a 2 = 2.87 and d 0 = 0.21 nm (we caution the reader that the parameters can depend on how the oxide is interfaced with the aluminum layer [107]). Since measurements of the barrier thickness d have been well described by a Gaussian distribution [8], we consider the distribution where Θ is the step function and Erf is the error function. The prefactor of the Gaussian ensures normalization and that d ≥ 0. Inverting Eq. (S35) to get the thickness d as a function of the transparency T , one can then find the probability density for the transparency, whered =d/d 0 ,σ = σ/d 0 , and For given (d,σ), the Josephson harmonics E Jm can be obtained by evaluating Eq. (S20) with the distribution ρ(T ) of Eq. (S38). Examples for the absolute ratios |E Jm /E J | for the samples listed in Table S2 are shown in Fig. S3a. Results for the KIT, ENS, and Köln experiments are shown in Fig. S4 and Table S3 below. Note that in this model a non-negligible ratio σ/d indicates that regions of smaller thickness and thus high-transparency conduction channels are likely, which is not completely unexpected given the inhomogeneity of AlO x barriers (see the STEM images in Section IV). For a ≫ 1 in Eq. (S35), at leading order the distribution in Eq. (S38) reduces for T ≫ 1/(1 + a 2 ) to that in Eq. (S31) with p = 3/2. However, it takes log-normal form for T ≪ 1/(1 + a 2 ), indicating that the channels with low transparencies are responsible for a deviation from universality; for Al/AlO x /Al, a 2 is not very large, so one should not expect to observe any universal behavior.
We can now investigate when a junction can be considered a "good" tunnel junction, meaning not only ⟨T ⟩ ≪ 1 but also ⟨T 2 ⟩/⟨T ⟩ ≪ 1. The latter is a necessary condition to be able to neglect higher harmonics, although one should also check what happens for higher moments; the second condition is not satisfied for the universal distributions. For the transparency distribution ρ(T ) in Eq. (S38), we have approximately where Erfc = 1 − Erf is the complementary error function. For a sharp barrier, σ ≪ d d 0 , this reduces to ⟨T n ⟩ ≃ (4/a 2 ) n e −2nd/d0 (assuming n not too large, so that the Erfc factors can be negligible), and a junction withd ≫ d 0 would indeed behave as a "good" tunnel junction. For AlO x barriers, since 4/a 2 is of order unity, the condition ⟨T ⟩ ≪ 1 note that since these ratios are in the argument of the exponential, the large inequality sign "≫" can mean a factor of 2 or 3, not orders of magnitude). Based on direct observations on three samples [8], the latter condition is not met, which implies that E J2 /E J can be non-negligible, see Table S2. It should be noted that barriers of similar thickness and smaller standard deviation can be fabricated if proper care is taken [43]; those junctions haved > 4σ 2 , see Table S2. They also have ⟨T ⟩ ≪ 1 and ⟨T 2 ⟩/⟨T ⟩ ≪ 1, but ⟨T 2 ⟩/⟨T ⟩ > ⟨T ⟩, meaning that significant fluctuations in transparency are nonetheless present.
As an additional check of our approach, we note that the distribution of transparencies could explain the observed magnitude of the subgap current in SIS and SIN junctions, see e.g. [110] and references therein. For a narrow distribution of transparencies, the ratio G sg /G N between subgap conductance and normal-state conductance should be of the order of the average transparency, since G sg ∝ T 2 and G N ∝ T . For the data in Fig. 2 of [110], we can convert the ratio G N /A into average transparencies by multiplying it by λ 2 F /G Q ≈ 1.6 × 10 −9 Ω mm 2 ; this means that for the Al-AlO x -Al junctions, ⟨T ⟩ varies roughly between 7 × 10 −7 and 10 −4 . Over that range, the ratio G sg /G N goes from about 3 × 10 −5 to 5 × 10 −3 . At the lower end of this range, the pair of numbers is similar to that of sample Fritz3 in Table S2. Assuming for simplicity G sg /G N = ⟨T 2 ⟩/⟨T ⟩, we can match the observation by takingd/d 0 = 8.19 and σ/d 0 = 0.97. Then decreasingd while keeping constant σ as to get ⟨T ⟩ ∼ 10 −4 , we get ⟨T 2 ⟩/⟨T ⟩ ∼ 4 × 10 −3 , in fair agreement with the upper end of the observed range (keeping the standard deviation σ constant is consistent with the finding of Ref. [43] that it is correlated with variations in thickness of the bottom aluminum layer, whose fabrication was presumably not changed from junction to junction in Ref. [110]).
Our mesoscopic model can describe the harmonics in the Josephson junctions of KIT, ENS and Köln very well (see Fig. 3b in the main text). For the IBM junctions, the Josephson harmonics are much larger and stay relevant until high order (cf. Fig. S5 below). They are close to the point contact limit, which asymptotically scales as 1/m 2 . The mesoscopic model, however, shows a stronger decay (see Fig. S3a) that cannot reproduce the large contributions required up to E J6 and thus cannot describe the observed spectra very well. A physical reason for this may lie in the fact that the assumption of a Gaussian thickness distribution with only two parameters is too simple to model the corresponding barriers. We note that for the samples for which the model works well, the fit mostly constrains the ratiod/σ rather than their absolute values.

C. Josephson harmonics
In this section, we test several models motivated by the discussion of transparency distributions in the previous section. Additionally, we apply these models to previously published transmon spectra that can be found in the Table S2. Values for the average thicknessd and the standard deviation σ of AlOx barriers in tunnel junctions found in the literature. Shown are the first three samples reported in Ref. [8] (Zeng) and Ref. [43] (Fritz), respectively. Columns 4-7 contain the corresponding unitless quantities,d =d/d0 andσ = σ/d0, in terms of d0 = 0.21 nm [107]. The quantities in columns 8-10 contain the moments ⟨T ⟩ and ⟨T 2 ⟩/⟨T ⟩ as well as the ratio EJ2/EJ, which have been computed using Eq. (S38). literature. Subsequently, we report the scan of the full range of Josephson harmonics for all samples considered in the main text, to distill the maximum information about the Josephson harmonics from the available experimental data.
Finally we discuss properties of the Josephson harmonics model such as the charge dispersion, the Hamiltonian, the Josephson potential and current-phase relations, a perturbative expansion, and how engineering Josephson harmonics can both reduce the charge dispersion and increase the anharmonicity.

Effective parameterizations
The results for E Jm for the transparency distributions ρ(T ) considered in the previous section suggest that a suitable parameterization of E Jm /E J should be (i) alternating in sign, (ii) decreasing in absolute magnitude, and (iii) should allow for both exponential and power-law decay. A suitable parameterization for m ≥ 2 that also includes the limits given by Eqs. (S23) and (S27) is where (a, b, c, d) are the model parameters to be determined. The point contact given by Eq. (S23) corresponds to and the lower bound of a homogeneous barrier with small transparency given by Eq. (S27) corresponds to Evidently, this choice captures the most prominent features, namely the power-law scaling of the free quantum point contact as well as an asymptotic exponential scaling towards zero as present in the lower limit of a homogeneous barrier with small transparency. In terms of fitting, we note that a data set might be describable by various combinations of a and b that differ by orders of magnitude. The reason is that, while the model is inspired by the asymptotic scaling as a function of m, for an experiment only the first E Jm coefficients may be relevant, and e.g. the same E J2 can be obtained by various combinations of a and b.
We emphasize in this context that, although one might want to ascribe some meaning to the model parameters (e.g. a and b), they are internal, unobservable variables (in the sense of the inverse problem, cf. [61,81]). There can be Table S3. Model parameters of all models presented in Fig. S4. The ratios EJm/EJ and the parameters (a, b, c, d) of the models discussed in Section I C 1 are unitless,d and σ of the mesoscopic model discussed in Section I B 4 are given in nanometers, Ic is given in nanoamperes, and all other parameters are given in gigahertz. Gray entries denote derived model properties that are computed from the model parameters. Note that for the KIT results, the bare resonator frequencies Ω agree very well with the high-power cavity frequency given in Figs. S19a and S21a below, see also [54]). In the second-last row, we show results for the Köln sample (note that EJ is not fixed but tuned according to the magnetic field bias). In the last row, we show representative results for qubit 0 of the IBM Hanoi device (additional data is available in the repository [86]). Note that for the (a, b) model for IBM Q0, although the spectrum is reproduced by definition, the value of EC comes out much larger than the reference value of 300 MHz. The last line of the last row contains the parameters of an engineered model with reduced charge dispersion and increased anharmonicity (see Fig. S8 below). All parameters have been obtained by solving the HamPIEP (see Section II A 2).

Sample
Model   Table S3.
well as for the mesoscopic (d, σ) model described in Section I B 4. The values of all model parameters are given in Table S3. Note that all parameterized models have physical ratios E Jm /E J (i.e., alternating, decaying, and within the two limits). This is not automatically the case for the truncated E J2 and E J4 models for which the ratios are fitted independently (e.g. the KIT-CD2 case in Table S3).
As these results show, none of the models should be considered as the ultimate answer for the effects caused by the higher harmonics. It is an important task for future research to increase the accuracy in measuring fine-structure effects on the experimental side, and to find and study other alternative models and transparency distributions on the theoretical side, which could shed further light on understanding the complex structure of the Josephson effect in tunnel junctions.

Additional evidence
We find further support for our Josephson harmonics hypothesis by analyzing experiments with published spectroscopy results available in the literature, namely the experiments by Peterer et. al. [65] (MIT), Schneider et. al. [111] (KIT2; additional information is available in [112]), and Xie et. al. [66] (the spectroscopy is shown in Fig. 2.20 of [67]). The results are shown in Fig. S4. In this figure, we also show evidence from the differences between predicted and measured resonator frequencies f res,j when the transmon is in state j (computed from the dispersive shifts for j ≥ 1), as an addition to the evidence from the transmon transition frequencies shown in the main text (see Fig. 3). We see that a model including the higher-order Josephson harmonics can describe the data better than the standard transmon model. The only exception is KIT2, which is either an example for a highly uniform tunnel junction with very low transparencies or the data reported may not be accurate enough to reveal percent-level corrections in the Josephson harmonics.

Exhaustive scan
In this section, we give some additional details about the scanning procedure described in the Methods section. Furthermore, we show in Fig. S5 the detailed scanning results for all experiments considered in the main text, including the remaining transmons of the IBM Hanoi chip (cf. Fig. 3c in the main text). Additionally, we show three specific trajectories corresponding to the minimum, maximum, and geometric mean of |E J2 /E J |.
Interestingly, when constraining E J5 and E J6 of some transmons to zero (see. e.g. IBM Q2), fitting only E J2 to E J4 yields unphysical results above the point-contact limit. These transmons have a very narrow range of allowed values for E Jm , even for high orders such as m = 6. Furthermore, we see that when the bars for |E JN f +1 /E J | do not extend down towards zero, truncation at E JN f is not possible to describe the spectrum. This is the case for the first two cooldowns of the KIT system (note however that it is indeed possible for KIT-CD3).
We note that by design, this procedure does not include non-perfect agreement with the experimental data. Taking into account experimental imprecision etc. would be a complementary study and it is to be expected that the allowed ranges for the E Jm /E J become slightly larger. Note, however, that the ranges can also be reduced by including additional information such as charge-dispersion measurements, as we discuss in the following section.  Köln IBM0  IBM1 IBM2  IBM3 IBM4  IBM5 IBM6  IBM7 IBM8  IBM9  IBM11  IBM13  IBM14  IBM16  IBM17  IBM18  IBM20  IBM22  IBM24  IBM26  1 IBM0 IBM1  IBM2 IBM3  IBM4 IBM5  IBM6 IBM7  IBM8 IBM9  IBM11  IBM13  IBM14  IBM16  IBM17  IBM18  IBM20  IBM22 IBM24 IBM26 If additional measurements δf experiment 0j of the charge dispersion are available, this information can also be used to reduce the uncertainty of suitable Josephson harmonics models. One would then filter out, from all suitable trajectories e = (1, E J2 /E J1 , . . . , E JN f +4 /E J1 ), only those trajectories that agree with the measured charge dispersions. For the ENS sample, for instance, one can extract from Fig. 6b(6) of [54] that the charge dispersion of level j = 6 is on the order of a few MHz. This is incompatible with the standard model (see the yellow "6" in Fig. S6) and would restrict the parameter range in the harmonics model. The charge dispersion is thus a sensitive probe for deviations from the standard model (cf. also Fig. 4 in the main text for the Köln experiment).
We note that both the standard model and the harmonics model give an exponential scaling of the charge dispersion with E J /E C [38]. For this reason, it is always possible to use the two parameters (E C , E J ) of the standard model to fit either (i) f 01 and f 02 or (ii) f 01 and δf 01 . The latter case has been successfully demonstrated for qubit 2 in [113]. For the Köln sample, one could analogously increase E C (and decrease E J ) such that the gray dashed lines in Fig. 4a rise towards the measured data. However, in this case, the predictions for the higher level transitions f 02 , f 03 , . . . would be even further away from the measured frequencies.

Hamiltonian
In this section, we state the transmon-resonator Hamiltonian including the higher-order Josephson harmonics and discuss some properties of this Hamiltonian. The Hamiltonian of the transmon-resonator system is given by where, in the charge basis {|n⟩}, the operator n = n n |n⟩⟨n| is diagonal and the operator cos(mφ) = n 1/2 (|n⟩⟨n + m| + |n + m⟩⟨n|) has constant entries on the m th subdiagonal (which represent correlated m-Cooper pair tunneling), a † a is the number operator of the harmonic oscillator with a = k √ k + 1 |k⟩⟨k + 1| the bosonic annihilation operator given in the harmonic oscillator's eigenbasis. E C denotes the transmon's charging energy, E Jm the m th -harmonic Josephson energy, Ω is the resonator frequency and G denotes the transmon-resonator coupling strength.
The energy levels of H corresponding to the dressed transmon states |0j⟩ for j = 0, 1, 2, . . . together with the φ-dependent Josephson potentials and the current-phase relations I S (φ) are shown in Fig. S7 for all samples. Note that the energy levels of all harmonics models are equal to all measured transitions and only differ for even higher levels. Depending on the height of the Josephson potentials, they can be either above or below the standard model. We can compute the height of the potential (a.k.a. the depth of the potential well) as Here we see that, compared to the height of the standard model 2E std J , a large contribution of odd higher harmonics (which are all positive) can increase the height of the potential significantly. This helps to understand why in Fig. 3a, the IBM transmons are the only devices for which most standard model predictions are lower than the measured transitions: Since the IBM transmons have the largest contributions from odd higher harmonics (cf. Fig. S5), the Josephson potentials are much deeper than expected from the standard model. Therefore, the energy levels of j = 4 (and often also j = 3) lie much higher than the standard model suggests (see Fig. S7.) Furthermore, the increase of the height of the IBM potentials due to strong odd harmonics in all trajectories explains the decrease of the charge dispersions, while for all other devices the dominating E J2 term (cf. Fig. S5) effectively flattens the potentials such that the charge dispersion increases (cf. Fig. S6).

Perturbative expansion
To obtain the leading-order Josephson harmonics corrections to the qubit frequency ω = 2πf 01 and the anharmonicity α = 2π(f 12 − f 01 ), we expand the cos(mφ) terms in φ. We emphasize that this step, which is frequently done to obtain an anharmonic "Duffing" oscillator approximation of the transmon (see e.g. [38,49,116,117]), is not suitable for an accurate study of higher transmon states (and neither the charge dispersion). We also note that this approximation is not applicable to all of the expressions for E Jm obtained for certain transparency distributions in Section I B, as the double series  Table S3). We note that the more extreme deviations from the sinusoidal current-phase relations (darker lines) might be easily detectable in DC measurements [69,114,115]. The results for the Köln transmon correspond to the same dataset considered in Fig. 3 of the main text.
may not converge absolutely (as in the case of e.g. the point contact Eq. (S23)) and rearranging the terms as well as neglecting higher orders may lead to inconsistent results. The second-order expression of the Josephson contribution H J = − m E Jm cos(mφ) to H is given by Equation (S50) implies that the second-order correction to the standard model corresponds to the substitution rule Considering terms up to fourth order in φ (see [116]), we find the analogous correction to the often-used approximation α ≈ −E C (using units with ℏ = 1), Similarly, we obtain for the qubit frequency, Considering the expansion up to E J2 only yields for which we can make the following observation: Since E J2 < 0 < E J (see Eq. (S18)), it follows that 1 + 16E J2 /E J < 1 + 4E J2 /E J < 1 and thus the fraction in Eq. (S55) is smaller than 1. This implies that, for the same measured values of α and ω, the value of E C resulting from Eq. (S55) has to be larger than the one obtained from the analogous standard model relation α ≈ −E C . Similarly, the value of E J resulting from Eq. (S54) has to be smaller than the one obtained from the analogous standard model relation ω ≈ √ 8E C E J − α. Hence, it is reasonable to expect that the ratio E J /E C is too large when using the standard model. This expectation is confirmed by several of the observed shifts of E J /E C presented in Fig. 4c in the main text.
We caution the reader, however, that this argument relies on crude approximations and does not apply in general. For instance, if the ratios of Josephson energies do not decay fast enough and are relevant to high order, the value of E J /E C computed from the standard model can actually be too small rather than too large (and the charge dispersion may be overestimated by the standard model). This is supported by the IBM data shown in Fig. 4c in the main text.

Engineering EJm coefficients
As discussed in the main text (see Fig. 4b), the IBM transmons give evidence that one can use strong contribution from higher harmonics E J2 , E J3 , E J4 (cf. Fig. S5) to engineer devices with a reduced charge dispersion δf while keeping the anharmonicity |α| = 2π|f 12 − f 01 | constant. A natural question is now whether one can also increase the anharmonicity (e.g. to mitigate leakage [118][119][120]) by further engineering the E Jm contributions. We note that in practice it is not straightforward to tune the values of coefficients (E J2 , E J3 , E J4 ) to arbitrary values. To achieve this, one probably requires a combination of various strategies, such as shaping the channels' transparencies, adding inductive elements in series, flux bias, etc. (see also [68]).
In Fig. S8a, we show that there exist sets of E Jm coefficients which give a reduced charge dispersion and increased anharmonicity (yellow arrow). The corresponding Josephson energy ratios are given in Fig. S8b. We see that the main reason for the increased anharmonicity is a reduction of E J4 , while E J2 and E J3 stay roughly the same. This makes sense as E J3 must stay large to maintain the height of the potential for the reduced charge dispersion (see Eq. (S48)). The reduced E J4 then makes the potential slightly wider around level |2⟩, which draws the level towards the bottom of the potential well and thus increases the anharmonicity (see Fig. S8c). We emphasize that whether it is possible or not to engineer such a set of E Jm coefficients in a real device is currently an open question. , the EJ4 harmonics model (green cross), and an engineered EJ4 model with increased anharmonicity (yellow diamond). The model parameters for the latter were chosen such that the qubit frequency f01, the resonator frequency fres,0, and the charge dispersion δf01 stay constant (they are listed in the last row of Table S3). b The corresponding Josephson energy ratios |EJm/EJ1|, using the same coloring. c The corresponding potentials and current-phase relations (cf. Fig. S7), using the same coloring.

D. Alternative corrections
After observing that the standard transmon model cannot describe the measured transition frequencies, it might seem natural to consider alternative possible modifications to the model, such as an additional stray inductance present in the circuit, hidden electromagnetic modes, the coupling to other qubits as present on the IBM device, or an asymmetry in the superconducting electrodes. Although the theoretical discussion given above suggests that higher harmonics are the expected correction to describe the measurements, from a phenomenological perspective, such alternative modifications should not be ruled out a priori.
For this reason, we here discuss alternative corrections to the model, and why we have come to the conclusion that these alternatives do not provide the same universal quality to solve the disagreement between standard model and experiment for the considered samples. It would be an interesting direction of research to study how the inclusion of Josephson harmonics would influence advanced circuit quantization and measurement techniques [49][50][51][52][53][121][122][123][124]].

Series inductance
In this section, we show that a small linear stray inductance L in series with the JJ also induces higher harmonic corrections. In a transmon qubit, such a linear stray inductance can arise from the leads that connect the Josephson junction to the shunt capacitance. In an equivalent circuit diagram, this inductance would appear in series with the Josephson junction as shown in Fig. S9a.
To get a feeling for the size of L, we consider the KIT system as an example (cf . Table S3 and Section III A). From electromagnetic simulation (see Fig. S18e), we obtain a value for a linear stray inductance of L ≈ 0.380 nH. In terms of energies, using the relation E L = (Φ 0 /2π) 2 /L with the magnetic flux quantum Φ 0 , we have E L /h ≈ 430 GHz, so E J /E L ≈ 0.058. The characteristic energy ratio E J /E L is also called the screening parameter in SQUID terminology [125].
The circuit in Fig. S9a is described by the Lagrangian where Φ J (Φ L ) is the flux across the junction (inductance). Furthermore, we define φ J ≡ 2πΦ J /Φ 0 , and we implicitly assume this relation between all lower-case (φ) and upper-case (Φ) flux variables in what follows. We perform a change of variables φ = φ J + φ L and φ ∆ = φ J − φ L . Then, asφ ∆ does not occur in the Lagrangian, we have d/dt (∂L/∂φ ∆ ) = 0 and thus ∂L/∂φ ∆ = 0. This yields an equation that we can use to eliminate φ ∆ , which is justified if E L /E C ≫ 1 and the stray capacitance in parallel to the JJ is much smaller than C (roughly 1 fF vs. 80 fF for the a b  from |0⟩ into a higher transmon state |j⟩ for the KIT experiment. Different colors represent different values of EJ/EL, corresponding to a different series inductance L (see legend). In particular, the result for the KIT inductance extracted from electromagnetic simulation, L = 0.380 nH (cf. Fig. S18) corresponding to EJ/EL ≈ 0.058, would fall between the blue stars and the red crosses; including the contribution from kinetic inductance would increase L to about 0.5 nH (cf. Section III A), corresponding approximately to the upward pointing green triangles. Gray circles denote the standard transmon model, corresponding to the limit L = 0. For each case, the given EJ/EL only fixes the ratios EJm/EL (cf . Table S4), and the four standard model parameters (EC, EJ, Ω, G) are adjusted slightly such that the first two transmon transition frequencies and the first two resonator frequencies match the observed values, in order to make the cases easily comparable. Dashed lines are guides to the eye.
KIT sample) such that the Born-Oppenheimer approximation is applicable (see [123,126] for more information). The equation reads and is related to Kepler's transcendental equation [126][127][128]. For 0 < E J < E L , Eq. (S57) has a unique solution for φ ∆ , defined by the intersection of a sine function and a straight line (see Fig. S9b). Although we cannot solve for φ ∆ analytically, Eq. (S57) uniquely determines φ ∆ as a function of φ, and we write φ ∆ (φ) to denote this solution. It is shown in Fig. S9c for several values of E J /E L . The function φ ∆ (φ) has two symmetry properties that can be extracted from Eq. (S57). The first is that it is odd, which can be shown by replacing φ → −φ in Eq. (S57). Using the symmetry of the sine function, we obtain the same equation with φ ∆ → −φ ∆ , and since Eq. (S57) is the uniquely defining equation for φ ∆ (φ), we have φ ∆ (−φ) = −φ ∆ (φ). The second symmetry property of φ ∆ (φ) is a 2π translation invariance, which can be shown in the same manner, yielding φ ∆ (φ + 2π) = φ ∆ (φ) + 2π. Given φ ∆ (φ), we perform the Legendre transformation of Eq. (S56) to obtain the Hamiltonian Note that the series inductance does not break the 2π periodicity of the Hamiltonian, despite the occurrence of φ 2 . This is in contrast to a shunt inductance, for which the flux φ and the charge n would need to be treated as non-compact operators with continuous spectrum R, see [129][130][131][132]. For a series inductance, though, due to the two symmetry properties of φ ∆ (φ), the two φ-dependent terms of Eq. (S58) are even and 2π-periodic. Therefore, we can write the two φ-dependent terms of Eq. (S58) as Fourier cosine series, The coefficients c m and s m can be obtained numerically by (i) solving Eq. (S57) for φ ∆ (φ k ), where φ k = π(k +1/2)/K with k = 0, . . . , K − 1 and K ≫ 1 controls the accuracy, and (ii) using the discrete cosine transform (DCT). To see this, we note that any even 2π-periodic function g(φ) can be written as a Fourier cosine series g(φ) = m g m cos(mφ), where for m ≥ 1 and DCT m (g) = 2 k g(φ k ) cos(mφ k ) denotes the type-II DCT, and g 0 = 1/π π 0 g(φ) dφ ≈ K−1 k=0 g(φ k ). Neglecting the constants c 0 and s 0 , we obtain the Hamiltonian with the higher harmonic contributions given by The ratios E Jm /E J are shown in Table S4 for several values of E J /E L . Note that here, we typically have E J1 ̸ = E J , since E J characterizes the JJ while E J1 includes the series inductance as a separate circuit element. We remark that the ratios alternate in sign and decay in magnitude with increasing order m, similar to the Josephson harmonics arising from the conduction channel transparencies (cf. Section I B). It is possible to obtain closed-form approximations for the leading-order ratios E Jm /E J by expanding the small quantity in Eq. (S57), (φ − φ ∆ (φ))/2, in powers of E J /E L . To this end, we rewrite Eq. (S57) as As Table S4 shows, these expressions are a good approximation to the numerically exact result for the leading-order harmonics if the series inductance and thus E J /E L is not too large. Finally, to see whether the higher harmonics from the linear stray inductance can explain the deviations between computed and measured spectra (see Fig. 2a in the main text), we perform the same test when using H in Eq. (S62) as a model. The result is shown in Fig. S10 for the KIT system. However, the correction corresponding to L = 0.380 nH (L = 0.5 nH) as extracted from electromagnetic simulation, between the blue stars and the red crosses (including kinetic inductance effects, green triangles) shows a similar deviation from the experiment as the standard transmon model. Also, when the inductance L is increased so much that the next transmon transition 0 → 3 matches the experiment (blue diamonds), the deviation for higher transmon transitions systematically bends into the other direction. Thus we conclude that, although the linear stray inductance does result in an appreciable correction (and it is probably part of the ranges shown in Fig. 3c in the main text and Fig. S5), it is not the main solution to revise the standard transmon model. We note that it would be interesting to work out the consequences of a series inductance on circuits with multiple JJs.

Additional hidden modes
The Hamiltonian in Eq. (S46) includes only a single-mode resonator contribution, namely the readout resonator with bare frequency Ω and coupling strength G. However, the electromagnetic environment of the device may contain additional modes. A valid hypothesis for a correction to the Hamiltonian is therefore the existence of spurious, hidden electromagnetic modes (so-called "dark" modes) that couple weakly to the transmon.
For this reason, we consider the addition of a second hidden mode to the Hamiltonian, where b † (b) are the hidden mode's bosonic creation (annihilation) operators, Ω dark is the frequency, G dark is the coupling strength to the transmon, and H is given by Eq. (3). We consider a coupling G dark /h = 0.009 GHz that is ten times weaker than the coupling G between the transmon and the resonator in the standard model (see Table S3).
We study the KIT system as an example. The effect of additional modes is shown in Fig. S11 as a function of Ω dark . Around certain frequencies of Ω dark , the transmon transition frequencies can deviate a lot from the predicted frequency when no hidden modes are considered. Thus we see that additional modes can indeed have a strong effect on the spectrum, despite the weak coupling strength. However, Fig. S11 also shows that there is no single frequency Ω dark that can bring all model frequencies (solid lines) to the experimental values (dotted lines).
In Fig. S12, we show the difference between model and experiment in the same way as in Fig. 3 of the main text, for the particular dark mode frequencies indicated in Fig. S11. These results show that additional hidden modes do not correct for the systematic deviation of the higher states, i.e., the curvature in the differences between the standard model prediction and the experimental values is still there if hidden modes are included.
Finally, we remark that no evidence for spurious hidden modes has been found in a 2D coplanar waveguide system experiment [133]. Also for the KIT sample, we have not seen any evidence for such dark modes from T 2 measurements (data not shown). Therefore, we do not consider the addition of spurious modes as the solution to the mismatch between the standard model and the experimental data.  from |0⟩ into a higher transmon state |j⟩ for the KIT experiment. Gray circles denote the standard model from the main text without any dark modes. Different colors represent different values of Ω dark (see Fig. S11 and legend). For each case, the four standard model parameters (EC, EJ, Ω, G) are adjusted slightly such that the first two transmon transition frequencies and the first two resonator frequencies match the observed values, in order to make the cases easily comparable. Dashed lines are guides to the eye.

Multi-qubit coupling
Since the IBM Hanoi device is a multi-qubit device, an obvious question is whether the coupling between the qubits on the chip has an effect on the transmon transition frequencies, and whether this effect may provide an alternative path to rescue the standard transmon model. Here we show that for the IBM Hanoi device, this effect is much too small to significantly shift the transmon frequencies.
The transmons on the IBM Hanoi device are coupled by very short coplanar waveguide resonators (cf. [134]). Due to the small size of the resonators, their frequencies are much larger than the transmon frequencies, so the coupling can be treated as capacitive transmon-transmon coupling (cf. [135]). The coupled Hamiltonian takes the form where the first term represents the standard model Hamiltonians of each transmon i including its readout resonator, and J is the capacitive coupling strength between all connected transmon pairs ⟨i, j⟩. The particular connectivity of the IBM Hanoi device is shown in Fig. S13a.
To examine whether the coupling can explain the deviations in the spectra shown in Fig. 3a of the main text, we diagonalize H for the joint system of up to three neighboring transmons and their respective readout resonators. In Fig. S13b-e, we show the absolute difference |f coupled 0j − f 0j | between the coupled and the uncoupled systems as a function of the transmon-transmon coupling strength J/h. Shown are only the qubits that could be measured up to level |4⟩ and that are also coupled to a qubit that could be measured. We scan the coupling strength J/h from 0 to 50 MHz. Most experiments have J/h ≲ 5 MHz although experiments beyond J/h ≈ 50 MHz are possible (see [135]). At this end of the scale, the coupling can have a significant impact on the spectrum, in the sense that deviations go up to 100 MHz (see Fig. S13b). However, the transmon qubits of the IBM Hanoi device are around J/h ≈ 2 MHz by design (see the arrows in Fig. S13b-e). For values of J/h around 2 MHz, the effect of the coupling on the spectrum is very small and cannot correct for the observed deviations from the experimental data. For this reason, we conclude that also the coupling to the other qubits cannot explain the failure of the standard transmon model shown in Fig. 3a of the main text.

Asymmetry in the superconducting gaps
In tunnel junctions with aluminum leads, the dependence of the superconducting gap on film thickness can cause a slight difference in the gaps of the two electrodes, ∆ 1 ̸ = ∆ 2 , because in general the top layer has to be thicker than the bottom layer for fabrication reasons. This difference has recently been shown to play a role in the context of quasiparticle tunneling [94,136]. In this section, we estimate the influence of gap asymmetry on the ratio between second and first Josephson harmonics. We show that the effect is negligible at typical gap asymmetries of around 10 %, and even the presence of larger gap asymmetries would only slightly suppress the effect of higher harmonics.
We start from Zaitsev's treatment of asymmetric S 1 cS 2 junctions [137], in which the single-channel current-phase relation is expressed as where the angular brackets average over an angle-dependent transparency T (α), the sum is over Matsubara frequencies ω > 0, and f i = ∆ i / ω 2 + ∆ 2 i and g i = ω/ ω 2 + ∆ 2 i depend on the two gaps. We consider the zero-temperature limit in which the sum over ω becomes an integral and assume the transparency to be independent of α; then for equal gaps ∆ i ≡ ∆ we recover Eq. (2).
For generic values of the gaps, keeping the first two terms in the Fourier expansion of I S (φ) we find with a = 1 + T 2 (g 1 g 2 − 1) and b = T 2 f 1 f 2 . Explicit expressions for the Fourier coefficients can be found in the low transparency limit T → 0: where K(k) = π/2 0 dθ/ 1 − k 2 sin 2 θ is the complete elliptic integral of the first kind and we used Eq. (19.8.12) from [91]. Note thatc 1 correctly displays the known dependence of the tunnel junction critical current on the two gaps, see [138,Eq. (3.2.5)]. Since K(0) = π/2, for small gap asymmetry, ∆ 1 ≈ ∆ 2 , we have E J2 /E J1 =c 2 /2c 1 ≈ −T /16, cf. Eqs. (S16a) and (S16b). In the opposite limit in which one of the two gaps goes to zero, the ratio would vanish; this shows that the gap asymmetry suppresses the impact of the higher harmonics. However, this suppression can in practice be neglected: for the realistic value ∆ 1 /∆ 2 = 0.9 we find E J2 /E J1 ≈ −T /16(1 − 0.0007), a value extremely close to that for the symmetric case. Even for ∆ 1 /∆ 2 = 0.5 the deviation from the symmetric value is small, E J2 /E J1 ≈ −T /16(1 − 0.029). Furthermore, numerical integration of the formulas for the Fourier coefficients in Eq. (S68) shows that the suppression is largest at small transparency. Hence we conclude that the effect of gap asymmetry is negligible.

II. NUMERICAL METHODS
In this section, we detail the numerical methods used to obtain the parameters of the Hamiltonian Eq. (S46) to describe the experimental data. This problem belongs to the class of inverse eigenvalue problems (IEPs) that have been covered in a large body of scientific literature [58][59][60][61]. We formalize the particular IEP under consideration, i.e. the Hamiltonian parameterized IEP (HamPIEP), and outline the numerical methods that we have used to find suitable solutions. Finally, we discuss several details specific to solving the HamPIEP for transmon systems.

A. Inverse eigenvalue problem
The IEP is an instance of the famous inverse problem [61], which is one of the most important problems that a scientist may face: Given some data observed in an experiment, find a model that describes the data. The model is usually characterized by a set of parameters that are themselves not observable.
In quantum physics, the model is typically a Hamiltonian, and the experimental data are frequencies measured in spectroscopy experiments. An early application of the IEP in that form was the study and the description of atomic or molecular spectra, which goes back to the year 1955 [62-64].

LiPIEP
One of the simplest types of IEPs is the linear parameterized IEP (LiPIEP). Here the task is to find a set of parameters x = (x 1 , . . . , x n ) such that an n × n matrix of the form where A 0 , A 1 , · · · , A n are fixed n×n matrices, has eigenvalues f (x) = (λ 1 (x), . . . , λ n (x)) equal to a given set of numbers f * = (λ * 1 , . . . , λ * n ). In theory, there exists a solution for almost all A i that is unique (up to the n! permutations of the eigenvalues) [61]. In practice, this solution can be found by applying Newton's root-finding method to the function and the fact that the derivative of an eigenvalue λ i (x) with respect to a parameter x k is given by where q i is the eigenvector corresponding to λ i . This means that one can iteratively find a solution by (i) diagonalizing A(x) for a given set of parameters x and (ii) computing the Jacobian J in Eq. (S72) to obtain an update ∆x for the next iteration (see below).

HamPIEP
The goal of the IEP solved in this work is to find a parameterized Hamiltonian that describes certain measured transition frequencies. This problem, which we call the HamPIEP, extends the simple LiPIEP in several ways: 5. The number of parameters #x and the number of eigenvalue combinations #f may be much smaller than the size of the Hamiltonians (i.e. #x, #f ≪ dim(H)).
6. The number of parameters is smaller than or equal to the number of eigenvalue combinations (i.e. #x ≤ #f ).
To solve the HamPIEP, we need the Jacobian J = ∂f /∂x. However, points 1-4 usually make it impossible to find a closed-form expression for J such as Eq. (S72) for the LiPIEP. Therefore, we obtain J by using automatic differentiation with TensorFlow [139] (only for the (d, σ) model, which involves integrals over the distribution ρ(T ) in Eq. (S38), we pre-compute a dense grid of Eq. (S20) using Mathematica [140]). This circumvents the need to approximate the gradients numerically using finite differences. We first consider the case #x = #f , in which one can use a root-finding algorithm to find the solution to the HamPIEP. Here we use the globally convergent Newton root-finding method with line search and backtracking [83] (Newton-LB). If #x = #f , the Jacobian J = ∂f /∂x is a square matrix. This matrix is usually invertible, so we can use LU-decomposition of J to compute the Newton step, to iteratively update x ← x + ∆x. Note that the Newton step automatically points in a direction that decreases the squared sum of differences, because ∇F · ∆x = −(f (x) − f * ) 2 < 0. The strategy to solve the HamPIEP is thus to follow the Newton step x ← x + ∆x as long as F decreases, to obtain quadratic convergence near the minimum. If the Newton step does not significantly reduce F , we backtrack by performing a line search along ∆x to find a step µ∆x with µ ∈ (0, 1) that reduces F . For this, we evaluate the function g(µ) = F (x + µ∆x) that we successively model to cubic order in µ (see [83,Section 9.7.1] for more information). This procedure ensures a globally convergent method [82].
In the case where #x < #f and an exact solution to the HamPIEP may not exist, we use the Broyden-Fletcher-Goldfarb-Shanno (BFGS) optimization algorithm [84] to minimize the weighted sum of absolute differences where | · | denotes the L1-norm and w represents the weights (see the following section for how the weights are chosen). In practice, we use the implementation of the BFGS algorithm from TensorFlow Probability [85] running on the NVIDIA A100 GPUs of JUWELS Booster [87]. If necessary, a suitable initial value for the BFGS algorithm is obtained from Newton-LB applied to the case in which only some elements of #f are used to ensure #x = #f .

B. Choosing appropriate weights
For the additional physical models for Josephson harmonics considered in Fig. S4 and Table S3, the number of model parameters #x is often smaller than the number of measured transition frequencies #f . To obtain the model parameters x by solving the HamPIEP, we then minimize the weighted sum of absolute differences Eq. (S75). Note that this does not apply to the results presented in the main text, where e.g. the HamPIEP for the E J2 model was solved unambiguously (see Methods). However, for the additional models, the explicit form of the objective function W (x) in Eq. (S75) is given by  where w 0j are weights for the transmon frequencies for the transition 0 → j, and w res,j are weights for the resonator frequencies when the transmon is in state j. Note that fitting is a complicated procedure that always requires one to make some potentially subjective choices, i.e. choosing initial values and weights. In all cases where a certain model cannot describe the data, this will influence the final result. It is the purpose of this section to argue that the choices that we have made in this regard are justified.
When fitting the models to the measured spectra, we always put larger weights on the first two transmon frequencies and the first two resonator frequencies, to ensure that where the symbol ! ≈ expresses that the fits shall try to match these values with priority. The argument for this is that the transition frequencies f 0j /j can be measured to roughly the same accuracy (say within 1 MHz), so lower-index transitions like the qubit frequency f 01 can be measured to better accuracy than the higher transition frequencies f 03 , f 04 , . . .. Furthermore, the offset charge dispersion of higher energy levels j is much larger (cf. Fig. 4 in the main text), which additionally increases the measurement imprecision.
However, one may argue that the choice given by Eq. (S77) results in a "whiplash effect", in the sense that ±1 MHz variations in f 01 and f 02 /2 might cause drastically different model predictions for the higher levels. In Fig. S14, we show that this is not the case (shaded gray areas). Hence, a model that is based on the standard transmon Hamiltonian Eq. (3) and that reproduces-within the experimental imprecision-the first two transition frequencies and the resonator frequencies can also not match the higher transmon transition frequencies.
Additionally, we test four different cases for choices of weights (markers) when fitting to the whole spectrum (see Table S5): Case (1) with weights w 0j = 1/j is the most obvious choice from a theoretical point of view, since equal weights are put on f 0j /j which are all of similar magnitude and measurable to about 1 MHz accuracy. Indeed, it works very well if the model that is fitted can describe the data (see bottom panel of Fig. 3b in the main text). However, if the model being fitted cannot describe the data, such as the standard model (see the blue squares in Fig. S14) Figure S15. Evolution of the energy spectrum from bare states |k, j⟩ to dressed states |k, j⟩. We show the evolution of the eigenenergies E l (λ) of H(λ) in Eq. (S78) as a function of the normalized coupling strength λ for the standard model of the KIT sample (cf. Table S3). Note that the vertical axes between the cuts do not use the same scale; the eigenenergies E l (λ) differ by several GHz, whereas the variation between E l (λ = 0) and E l (λ = 1) is less than 25 MHz. In this part of the spectrum, there are no avoided crossings.
the fit tries to match all frequencies on average (with a hit at f 02 and f 05 ) but the systematic error in the curvature stays. This causes the qubit frequency f 01 , which is the most accurately measurable transmon property, to be off way beyond the experimental imprecision. Case (2) resembles similar weights that still enforce the constraint Eq. (S77), even if the model being fitted cannot describe the data. This is shown by the yellow circles in Fig. S14.
Case (3) corresponds to equal weights on all transmon transition frequencies and larger weights on the resonator frequencies (which are measurable more accurately). However, here the fit similarly tries to match the transmon frequencies on average, at the cost of greatly overestimating the qubit frequency and the anharmonicity.
At first glance, the model corresponding to case (4) might look promising (red crosses in Fig. S14). However, the price for matching the transmon transition frequencies is a huge error in the resonator frequency f res,1 when the transmon is in state j = 1. This frequency is determined from the dispersive shift χ = f res,1 − f res,0 , and is measurable to an accuracy of less than 1 MHz. The model prediction in this case, however, is off by more than 30 MHz (see Fig. S14b). Furthermore, the bare resonator frequency Ω and the coupling strength G (see Table S5) are very different for this model. Therefore, we have to reject this model as a good alternative to describe the experiment. It might be an interesting idea, though, to analyze whether alternative forms of the transmon-resonator coupling Gn(a + a † ) in the Hamiltonian Eq. (S46) could remedy this shortcoming.

C. Identification of dressed states
In both the standard and the harmonics model, the joint transmon-resonator Hamiltonian couples the bare states |k, j⟩, given by the product of the transmon state |j⟩ and the bare Fock state |k⟩ of the resonator, through the transmon-resonator coupling Gn(a + a † ) (see Methods). The eigenstates of this Hamiltonian are the dressed states |k, j⟩, named because the hybridization of the two systems dresses each eigenenergy and eigenstate by a contribution from the other system. However, after numerical diagonalization, the eigenstates |l⟩ are typically ordered by increasing eigenenergies E l for l = 0, 1, . . ., and it is not immediately obvious how to uniquely assign labels k and j to every eigenstate, i.e., how to identify |k, j⟩ ↔ |l⟩, especially for large photon numbers k.
In this case, one has to make a potentially subjective choice. In the literature, one can find assignments based on iteratively constructing transmon ladders or branches using, for instance, the projection onto the bare transmon manifolds [65] or the overlap with the state a † |k − 1, j⟩ obtained from applying the bare raising operator of the resonator to the previously assigned state [74,75].
An alternative approach is to trace the evolution from bare to dressed states through the hybridization, by considering the Hamiltonian which interpolates between bare and dressed states using the parameter 0 ≤ λ ≤ 1. For λ = 0, the assignment of labels (k, j) to the energies E l (λ = 0) is clear by definition of the bare states. Then for increasing λ, whenever two 0.0 0.5  eigenenergies E l (λ) and E l ′ (λ) are about to cross, one needs to swap the labels (k, j) and (k ′ , j ′ ). After the crossing, the labels in the ordered list of eigenenergies have thus changed.
We remark that all such crossings are avoided crossings, since H(λ) in Eq. (S78) is a one-parameter matrix flow, and by the Hund-von Neumann-Wigner theorem [141,142] (see also [143]), the eigenvalue curves do not intersect. Still, the eigenstates hybridize (such that they form a superposition of the previous eigenstates), and by swapping the labels we can ensure that the eigenenergies leaving the crossing continue to carry the energy dependence that would cause them to be characterized as one or the other state in an experiment.
The relevant low-energy spectrum of H(λ) in Eq. (S78) is shown in Fig. S15 as a function of λ for the KIT sample. Here the eigenenergies are still well separated and no avoided crossings occur. We can thus use the overlap of the dressed states with the bare states to assign the proper labels (k, j), as shown in Fig. S16. However, it is important to realize that this procedure does not always yield the correct assignment (as one could test by investigating higher parts of the spectra for k > 1 or by further increasing λ).

D. Residuals for the Köln data
In contrast to the other transmon experiments considered in this work, the Köln data is special in the sense that it consists of 288 data points for the same frequency-tunable transmon sample, comprised of spectroscopy, Ramsey, and photon-number-splitting experiments. Furthermore, the data points correspond to many different qubit frequencies f 01 tuned by an in-plane magnetic field B ∥ . We cover this dependence by varying only the Josephson energy E J while keeping all other model parameters x std = (E C , Ω, G) of the standard model and  Table S3). The residuals of this procedure are shown in Fig. S17. As expected from the results shown in the main text, the Josephson harmonics model can describe the full dataset much better than the standard model. There are a few out of the 288 data points that still show deviations in f 02 and f 03 (left column for f 01 < 5 GHz). These data points correspond to very strong magnetic fields (the strongest B || is about 0.4 T). We assume that for these outliers, other higher-order, strong-field effects, not taken into account in the current . The frequency f 01,k=1 is shifted from f01 by the total dispersive shift χ (i.e. f 01,k=1 = f01 − χ), and f 01,k=2 is approximately shifted by 2χ from f01. Data in the left and middle columns is obtained from spectroscopy (yellow squares and blue circles) and Ramsey (yellow plusses and blue crosses) experiments. Data in the right column is obtained from photon-number-splitting experiments (triangles). models, start to play a role in this regime. However, the bulk of all data points is well described by the Josephson harmonics model.

A. KIT
The KIT transmon qubit contains a single JJ shunted by an in-plane plate capacitor with rectangular pads, and is capacitively coupled to a lumped-element readout resonator (Fig. S18). We determine the value of the charging energy E C /h = (242 ± 1) MHz from finite-element method (FEM) simulations. The simulated geometry includes the 3D-waveguide sample holder hosting three qubit samples and their readout resonators, as shown in Fig. S18 a) together with the electric field distribution of the eigenmode associated with the central qubit. While the transmons have identical capacitor electrodes (see Fig. S18b) but different JJ overlap areas, the three readout resonators differ in the number of meanders (Fig. S18c) to ensure a frequency detuning of around 800 MHz. The samples are located on a single c-plane sapphire substrate (t = 330 µm). The dielectric permittivity of the sapphire substrate is anisotropic and defined via a tensor, where the value in parallel to the c-axis (perpendicular to the surface) is ϵ r,∥ = 11.5 and the value perpendicular to the c-axis is ϵ r,⊥ = 9.3. The shunt capacitance C s , from which we calculate the charging energy E C = e 2 /(2C s ), and the linear stray inductance L s arising from the leads are extracted from the simulated eigenfrequencies by varying the lumped-element inductance L J (see Fig. S18d), and using the fit function The obtained stray inductance is L s ≈ 380 pH for all samples (see Fig. S18e). In addition to the geometric contribution to the stray inductance extracted from the FEM simulations, we expect a small contribution arising from the kinetic inductance of the pure Al thin film. We can estimate this contribution from the normal-state resistivity of the Al film ρ n = 4.3 µΩ cm [144], which we have measured for a sample fabricated in the same evaporator using the same recipe. For a film thickness t = 70 nm, we arrive at a kinetic sheet inductance L k,□ = 0.6 pH/□ from which we can estimate the contribution of the kinetic inductance L k ≈ 60 − 120 pH to the stray inductance in the sample. Hence, we expect a total stray inductance L s,tot = L s + L k ≈ 500 pH.

Fabrication
The sample is fabricated on c-plane, double side polished sapphire substrate. A bi-layer resist stack of 700 nm MMA EL-13 and 300 nm PMMA A4 and a 10 nm gold conduction layer is used for writing with a 50 keV e-beam writer. The structures are developed in a beaker with an isopropanol-water mixture with volume ratio 3:1 at 6 • C. Before the metal deposition in a Plassys evaporation system, the substrate is cleaned with a Kaufmann ion source in an Ar/O2 descum process and the vacuum is improved using titanium gettering. The target film thicknesses of the first and second aluminum layer are 30 nm and 40 nm, respectively, and the evaporation angles are 0 • and 20 • , respectively. The (100 nm) 2 sized JJ is fabricated in Dolan-style [145] and the insulating aluminum oxide barrier is grown under static oxidation at an oxygen pressure of 10 mbar for 2.5 min.

Cooldown 1
The sample is measured in reflection in a 3D copper waveguide sample holder similar to Refs. [146] and [147]. The amplitude of the reflection coefficient is shown in Fig. S19 as a function of the signal frequency f s and signal power P s around the resonance frequency f r = 7.4613 GHz of the readout resonator. With increasing signal power, the readout resonator shifts in frequency due to the non-linearity inherited from the linear coupling to the transmon, until it becomes independent of power when the transmon is in a highly excited state [54]. From the difference in the resonator frequency we extract the Lamb shift ∆ω r = −2π × 7 MHz, which is the difference between the bare and the dressed resonator frequency. In the simple transmon Hamiltonian and using a Schrieffer-Wolff transformation, the Lamb shift is identified with the partial dispersive shiftχ 12 /2 [38], whereχ ij = g 2 ij /(ω ij − ω r ), from which we estimate the coupling rate g 12 = 2π × 151 MHz. This value is in agreement with our FEM simulations for g 01 = 2π × 127 MHz when accounting for the numerical factor arising from the transition matrix element. For the fit to our extended transmon model including the higher harmonics in the Josephson effect, we do not rely on this indirect determination of the total dispersive shift χ 01 , which under the Schrieffer-Wolff transformation is given by χ 01 = 2χ 01 −χ 12 . Instead we use the value χ 01 = −2π × 2.6 MHz measured in CD2 from the pointer states in the IQ-plane associated to the transmon ground and first excited state (see Fig. S20a). The frequencies of the individual transitions in the transmon spectrum are extracted from two-tone spectroscopy, as shown in Fig. S19b and c, and are listed in Tab. S6 together with the readout frequencies estimated from the Lamb shift. The linewidth of the fundamental transition Fig. S19b is extracted from Lorentzian fits to the response in the reflection coefficient. As can be seen from the spectroscopic data in Fig. S19c, there are many additional features visible in the spectrum, especially at high drive powers, since we are probing the dressed energy spectrum of the transmon and the readout resonator. We argue that the frequencies associated with transitions in the transmon spectrum are independent of drive power, and only show power broadening. The transition frequencies of the multi-photon transitions indicated by the white arrows are extracted from individual measurements in the frequency vicinity of each transition, similar to Fig. S19b. Cooldown 2 In comparison to cooldown 1, a Dimer-Josephson-Junction-Array-Amplifier (DJJAA) [147] was added to the readout line to enable fast qubit readout and the measurement of quantum jump traces by monitoring the reflection coefficient in the vicinity of the readout resonator. At finite temperature, a histogram of such a trace reveals the pointer states associated to different transmon states, from which we can extract the dispersive shift using their phase difference. For the calculation of the dispersive shift between the ground state and the j-th transmon state, we use where φ 0 and φ j are the phase of the pointer state associated with the ground and j-th excited state, respectively, κ and γ are the external and internal decay rates of the readout resonator, ω r is the readout resonator frequency, and ω s is the frequency of the readout signal. From the pointer states shown in Fig. S20a, which are associated to the ground and first excited state, we extract a dispersive shift χ 01 = −2π × 2.6 ± 0.1 MHz. The internal and external decay rates of the readout resonator, γ = 2π × 190 kHz and κ = 2π × 3.90 MHz, respectively, are extracted from circle fits to the complex reflection coefficient (not shown) similar to Fig. S19a and Fig. S21a. The frequencies of transitions into higher excited states of the transmon spectrum are extracted from two-tone spectroscopy, exemplified for the fundamental transition in Fig. S20b, and on a larger scale in Fig. S20c.

Cooldown 3
At the end of cooldown 2, the sample was accidentally annealed during the warm-up of the measurement setup, which brought the fridge to approx. 100 • C. While the readout resonator changed only insignificantly in frequency (f r = 7.4564 GHz), the fundamental transition of the transmon sample shifted down in frequency by more than 1 GHz. As a consequence of the increased detuning between transmon and readout, the dispersive shift decreased noticeably, resulting in a smaller Lamb shift ∆ω r = −2π × 2.51 MHz (see Fig. S21a). The extracted shift in the resonator frequency corresponds to a coupling rate g 12 = 2π × 122 MHz, when using the simple transmon Hamiltonian and the Schrieffer-Wolff transformation to map the transmon onto an effective two-level system. The decrease in the coupling rate compared to CD1 is not surprising, since the transition matrix elements from which the coupling rates are calculated depend on the ratio E J /E C [38], and even change for the simple transmon Hamiltonian. The decrease in the dispersive shifts enabled the discrimination of multiple pointer states in the histogram of quantum jump traces measured at the same readout frequency (see Fig. S21b), from which we extracted the multi-photon dispersive shifts χ 0j from the relative angle of the pointer states using Eq. (S80) (see Fig. S21c). The internal and external decay rates of the readout resonator are γ = 2π × 63 kHz and κ = 2π × 4.13 MHz, respectively. We observe a weak power dependence of the extracted dispersive shifts, which is increasingly more pronounced for transitions into higher levels. We compare the measurement outcome to the prediction of the simple transmon Hamiltonian under the Schrieffer-Wolff transformation, which predicts these dispersive shifts from the partial dispersive shifts χ 0j =χ 01 +χ j−1,j −χ j,j+1 [38,148]. In our calculation, we use g 01 = 2π × 89 MHz for the fundamental transition, E J /h = 13.82 GHz and E C /h = 223 MHz (see Tab. S3). The numerical results are shown in Fig. S21c as horizontal solid black lines. While the agreement is good for the two lowest dispersive shifts, we observe increasing deviations for the higher levels j ≥ 2.
The frequencies of transitions into higher excited states of the transmon spectrum are obtained from a combination of time-domain measurements and two-tone spectroscopy. The fundamental transition is extracted from a Ramseyfringes measurement, which is particularly sensitive to the detuning with respect to a reference drive (see Fig. S22a). Moreover, we measure an average Ramsey coherence time of T * 2 = (12.0 ± 0.5) µs (see Fig. S22b) and an energy relaxation time of T 1 = (8.7 ± 0.3) µs (see Fig. S22c). The other transitions are obtained from two-tone spectroscopy (see Fig. S22d).

B. ENS
The ENS 3D transmon sample is the same as documented in refs. [54,149]. The transmon contains a single JJ shunted by an in-plane plate capacitor with rectangular pads, which are capacitively coupled to a copper cavity. The cavity is read out in transmission. It was fabricated in a single double-shadow evaporation step on a sapphire substrate. A MMA/PMMA resist stack was used. Before the metal deposition in a Plassys evaporation system, the sample was cleaned in-situ using an Ar/O 2 descum process. The two aluminum layers were grown with a target thickness of 35 nm and 100 nm at ±35 • angles. Between the two evaporations, the oxide barrier is grown under static oxidation at a pressure of 20 mbar with a 4:1 Ar:O 2 mixture for 7 min. The JJ size is 260 nm x 200 nm. The measured qubit transition and readout resonator frequencies are listed in Tab. S6. The qubit has an average energy relaxation time of T 1 = 15 µs and Ramsey coherence time of T * 2 = 11 µs. Figure S19. Spectroscopic measurements of the KIT sample (Cooldown 1). a) Amplitude of the reflection coefficient S11 in log-scale as a function of signal frequency fs and signal power Ps at room temperature (left panel). At low readout power, the readout resonator is found at fr = 7.4613 GHz. With increasing power the resonator shifts in frequency due to the inherited non-linearity until it eventually reaches a constant frequency at fr = 7.4543 GHz when the transmon is in a highly excited state. The right-hand panel shows two traces at low (red) and high power (violet), respectively, as indicated by the vertical lines in the 2D sweep. From the difference in transition frequency we extract the Lamb shift ∆ωr = −2π × 7 MHz. The dashed black lines indicate the results of a circle fit to the complex reflection coefficient, from which we determine the internal and external decay rates. b) Two-tone spectroscopy of the fundamental transition frequency of the transmon qubit. Due to the dispersive interaction, the readout resonator shifts in frequency when the additional drive tone excites the transmon, resulting in a change in the reflected signal amplitude from low (dark) to high (yellow). With increasing drive power P d , the transition broadens, as indicated in the bottom panel. c) Two-tone spectroscopy of the transmon spectrum measured by sweeping the drive frequency and drive power. The white arrows indicate the features identified as transitions in the transmon spectrum. Many other features are visible in the spectrum which are drive power dependent. The feature above the fundamental transition frequency around f01 = 6.0392 GHz is the transition frequency of another qubit, which is orders of magnitude more weakly coupled to the readout resonator. Figure S20. Spectroscopic measurements of the KIT sample (Cooldown 2). a) Reflection coefficient S11 of the readout resonator measured at the constant readout frequency fs = 7.4607 GHz and shown in the complex plane. The two pointer states visible are associated to the ground and first excited state of the transmon. From their relative phase, we extract the dispersive shift χ01 = −2π × 2.6 ± 0.1 MHz. b) Two-tone spectroscopy around the fundamental transition f01 = 5.8885 GHz. In comparison to the first cooldown, the frequency of the transmon changed by around 160 MHz. c) Transmon spectrum extracted from two-tone spectroscopy by changing the drive frequency f d and drive power P d . The features associated to transitions in the transmon are indicated by white arrows.  . a) Ramsey-fringes experiment with varying relative detuning between the microwave pulse to prepare the transmon in a superposition state and the fundamental transition 0 → 1. Since a finite detuning induces a deterministic time evolution during the idling time τ between the π/2-pulses in the Ramsey sequence, we can determine the fundamental transition frequency f01 = 4.72193 GHz with high accuracy. b) By repeating the Ramsey fringes measurement, we extract the coherence time T ⋆ 2 = 12 ± 0.5 µs from the statistics of the exponentially decaying envelope. c) The energy relaxation time of the transmon T1 = 8.7 ± 0.3 µs is measured in a separate experiment. d) Transmon spectrum measured in a two-tone experiment by applying a drive to the transmon which induces (multi-photon) transitions into higher excited states, resulting in a shift of the resonator frequency. The white arrows indicate the transitions we can unambiguously identify from their pointer state distribution in the complex plane (cf. Fig. S21b).

C. Köln
The Köln setup and sample geometry is shown in Fig. S23a-e. The sample is a SQUID transmon with rectangular capacitor pads, which are capacitively coupled to a copper cavity. It was fabricated with the same capacitor geometry and in the same batch as the device documented in [55]. The cavity is measured in reflection, but two-tone signals as well as time-domain pulses at the qubit frequency are applied to the other port. The energy relaxation time of the first excited state is on the order of T 1 = 10 µs, the Ramsey coherence time on the order of T * 2 = 3 µs and T echo 2 is usually similar to T 1 .

Fabrication
The sample is fabricated with a single electron-beam lithography step on a sapphire substrate. The Dolan bridge [145] mask is made using an MMA-PMMA resist stack. After developement, a weak oxygen plasma is applied to remove resist residues. The JJs are fabricated using double-shadow evaporation (angles are ±20 • ) in a Plassys MEB 550S system without substrate cooling or heating. The target film thicknesses of the first and second aluminum layer are 10 nm and 18 nm, respectively. The aluminum oxide barriers are grown under static oxidation at an oxygen pressure of 1 mbar for 6 min. To measure the film thicknesses and JJ geometry, we took an atomic force microscope (AFM) image of the SQUID region and close-up pictures of the JJs (Fig. S23c-e). The AFM measurements for the first and second layer thickness yield 15 nm and 21 nm respectively (including the oxide layer), while the combined thickness gives 32 nm (this should include two oxide layers). The Dolan-style JJs in the SQUID have approximate areas (160 nm) 2 and (260 nm) 2 and the JJ-widths are compatible with the Fraunhofer dependence of E J that we measure. The granularity of the thin-film aluminum is clearly visible in the AFM picture and the JJ area comprises many grains.
Tuning the EJ with an in-plane magnetic field A three-axis vector magnet is used to measure the field dependence of the transmon (cf. Ref. [55]). It changes the transmon E J (and harmonics) due to a combination of a geometric Fraunhofer contribution and the suppression of the superconducting gap. The SQUID can be tuned with a small out-of-plane field B ⊥ . Based on the SQUID oscillations, we align the magnetic field in the in-plane direction using the vector magnet.
The sample is fabricated with comparatively thin aluminum films and narrow leads to the JJ, in order to make it more magnetic field resilient. Moreover, we chose a relatively small SQUID area (cf. Fig. S23a) to be less sensitive to noise from the magnet. While we estimate a critical field above 0.86 T, detailed measurements for this sample were only possible up to 0.4 T, because for higher fields the SQUID starts to be unstable with flux jumps on a timescale of minutes.
Following the approach in Ref. [55], we measure the field dependence of E J . The E J s of the individual junctions can be extracted from the SQUID dependence with out-of-plane field which is measured at different in-plane fields. We estimate the Fraunhofer effect to dominate the in-plane field dependence of E J at low fields, thus the geometric asymmetry of the two JJs comprising the SQUID matters (cf. Fig. S23b-c). While E J of the larger JJ changes from ≈ 19.4 GHz at zero field to ≈ 12.3 GHz at 0.4 T, the small JJ changes from 6.0 GHz to 4.9 GHz. At 0.4 T, the expected change in the superconducting gap is on the order of 10 %. The data presented here is predominantly measured at the bottom sweetspot of the asymmetric SQUID, because the top sweetspot is close to the cavity at low magnetic field.
The fact that the Köln device is a SQUID transmon affects the Josephson harmonics. At the bottom sweetspot of a SQUID, even harmonics would be enhanced, while odd harmonics would be suppressed. However, likely due to the fact that we only use data close to the bottom sweetspot at different in-plane field, our simplified model (cf. Section II D) that considers only an effective E J and fixed ratios for the harmonics describes the data well.

Measuring the charge dispersion
Note that the charge dispersion grows for the higher levels and, if it cannot be explicitly resolved, it adds to the linewidth and can become a source of systematic errors in measurements. If the offset charge cannot be explicitly controlled, it will slowly drift, particularly between measurements of different transitions [113], which could for example lead to systematic errors in estimating the anharmonicity. For the Köln sample, the offset voltage can be explicitly controlled and for many transitions both f ij and δf ij are measured (see Section III C). Then the frequency f ij can be extracted as the mean frequency of the two parity branches reducing systematic errors and δf ij is a separately measured quantity that can be used to better fit the Hamiltonian.
The gate-voltage V g is applied to one of the cavity pins through a bias-tee to control the offset charge n g , such that spectroscopy as a function of frequency and n g can be measured [113,150]. This voltage is applied relative to the ground of the dilution refrigerator, which is connected to the 3D cavity. The voltage bias works because the pin is closer to one of the two floating transmon islands. The transitions f 01 , f 02 /2 and f 03 were measured in two-tone continuous-wave spectroscopy as a function of V g to extract f 0j and the corresponding charge dispersion δf 0j (example data shown in Fig. S23f,h). Both parity branches are visible in our data, as the spectroscopy measurements are slow compared to the parity switching timescale, which we measured to be on the order of 1 ms. The f 0j and δf 0j were extracted from the 2-D plots by peak finding and fitting the two branches as combined sin-functions to the data.
In addition to the spectroscopy data, Ramsey measurements on the 0-1, 1-2 and 0-2 transitions were performed as a function of V g , showing the characteristic beating for the two parities. Fits to the Ramsey fringe data can be used to extract f ij and δf ij . Alternatively, a Fourier transform of the Ramsey data for f 01 as a function of V g closely resembles the spectroscopy data and is plotted in Fig. S23g. The frequency resolution of the Ramsey measurements should be limited only by T * 2 and there is no power broadening, Stark shifts and Photon-number broadening or splitting to consider. We confirm that continuous-wave spectroscopy and time-domain datasets are consistent with each other. For the magnetic field setting B = 0.2 T used in Fig. 3 in the main text, the measured qubit transition and readout resonator frequencies are listed in Tab. S6. The full spectroscopy and charge dispersion data at different in-plane magnetic field is documented in the repository [86] accompanying this manuscript.

D. IBM
The IBM data was measured on the Hanoi, Falcon r5.11 processor for 20 out of the 27 qubits. The transition frequencies of the IBM transmons were obtained by multi-mode spectroscopy (at a single probe frequency) enabled by Qiskit Pulse [151,152] to measure f 0j /j for j = 1, 2, 3, 4. Since the j = 4 transition frequency was often near the bandwidth limit imposed by Qiskit (±500 MHz from the j = 1 transition), additional sideband modulation was  Table S6. Spectroscopy data for the measured samples. The qubit frequencies are listed as multi-photon transitions f0j/j between ground state and state j. The resonator frequencies fres,j are given for the transmon in state j. For Köln and IBM, we only show two selected field settings and qubits, respectively. The complete set of frequencies is available in the repository [86] accompanying this manuscript.
applied at the pulse level to probe those frequencies. The measured qubit transition and readout resonator frequencies for qubits 0 and 13 are listed in Tab. S6. Spectroscopy data for all qubits is documented in the repository [86] accompanying this manuscript.
All molecular dynamics (MD) calculations are based on ReaxFF force fields [153] developed by Hong and van Duin [154], as implemented in Gulp [155]. ReaxFF allows for both the description of oxygen molecules dissociation on the surface as well as the formation of new bonds (i.e. Al-O). The visual representation of the structures is made with xcrysden [156] and gdis [157].

A. System construction
We consider two distinct cases for the geometric structure of the surfaces, namely Al(100) and Al (111). Both represent opposite extremes that we average over because the orientation of the Al crystals in the sample is not known. The spacing between layers in terms of the lattice constant a in Al(100) (Al (111)) is a (a/ √ 3). A similar approach using Al (100) and Al (111) to investigate junction properties was recently proposed in [48].
Al (111) and Al(100) are generated with a bulk parameter of a = 4.027Å in both ideal (flat) configuration as well as with several types of defects (steps or islands) as schematically represented in Fig. S24. They were allowed to interact with gaseous oxygen atoms placed on top. The initial ratio between oxygen and aluminum is approximately 1:2, i.e. the initial oxygen represents a third of the total number of atoms in the system (12x12x20=2880 Al atoms and 1500 O atoms). For each model the system was propagated at 300 K using a Verlet-type algorithm for 3 ps, with a time step of 2 fs, as illustrated in Fig. S25 for model 5 (cf. Fig. S24). During propagation, the aluminum layer incorporates oxygen atoms (Fig. S25c). Since we are only interested in the resulting solid structure, the oxygen atoms that remain unbound are removed after the Verlet propagation, as shown in Fig. S25d.

Model 2 Model 1
Model 3 Model 4 Model 5 Figure S24. Schematic representation of the five surface models considered. We illustrate the starting conditions for the MD. The aluminum volume is schematized in blue and the oxygen volume in red. Model 1 describes an ideal smooth surface while Models 2 to 5 simulate closer to realistic geometries of rough aluminum surfaces, with step-like and respectively island-like defects.
The single layer configuration assumes periodic boundary conditions of the unit cell in the x and y directions; in this case z=infinite (2D type simulation), which means that there is no interaction between the last Al atoms from bulk and the first AlO x layer. The multi-layer configuration used to construct Josephson junctions assumes periodic boundary conditions of the unit cell in all three directions: x, y, z. In this case z = 40Å, which implies dynamic interaction between the last Al layer and the AlO x layer in the neighboring unit cell. All models were further propagated for 4 ps in Verlet molecular dynamics at 300 K with a time step of 2 fs.

B. Molecular dynamics results
After performing the MD relaxation, we observe differences between Al(100) and Al (111). The data is tabulated in Table S7. On average, Al(100) accepts up to 2 % more oxygen, where the percentage refers to the total number of atoms. We also see that surfaces with defects generally accept more oxygen.
The thickness of the AlO x layer was calculated for open surfaces and junctions in the case of all five surfaces models for both Al(100) and Al (111). The barrier thickness is defined as the difference between the coordinates of the first and the last atom that forms it. On average, the AlO x layer formed on Al(111) is 2Å thicker than the one formed on Al(100), as documented in table S8 and illustrated in Fig. S26. This is surprising considering that Al(111) accepts generally fewer O atoms than the Al in symmetry (100). If we compare the Al-O bond lengths, we obtain only a slightly larger length for Al (111), i.e. (1.96 ± 0.16)Å compared to (1.95 ± 0.17)Å for Al (100). Therefore the difference Figure S25. The stages of creating the molecular dynamics models (aluminum in violet and oxygen in red). The image shows an Al(100) model. a The initial construction of a Model 5 (double island) from Al atoms in a face-centered cubic crystal structure. b Oxygen atoms are randomly added to the model; the minimum distance between Al and O atoms is 1Å and the maximum is 10Å. c The system is propagated for 1500 steps of Verlet molecular dynamics at 300 K, with a time step length of 2 fs. c The remaining free oxygen is removed from the system, resulting in aluminum structures wrapped in a layer of AlOx that can be further used in surface-like or junction-like configurations for the analysis of geometric parameters.
Model Al (100) Table S8. AlOx layer thickness for all models. It is calculated as the difference between the position on the z axis of the first and last atom that forms the oxide layer. All values are expressed inÅ.
However, the analysis of the structure and composition of the systems shows that different chemical bonds are formed between aluminum and oxygen in the AlO x layer, depending on the surface orientation. In Al (111), even though it incorporates less oxygen compared to Al(100), a higher percentage of oxide and sub-oxide is formed, resulting in a thicker barrier. This fact, correlated with the bond lengths and the penetration depth of oxygen, indicates that the thickness of the AlO x layer depends on the type and quality of the surface. The rougher the surface, the more oxygen it absorbs and consequently the larger the barrier thickness. Since the typical Josephson junction barrier is obtained by oxidizing a poly-crystalline film with grains much smaller than the junction and with random orientations, we expect the barrier to be inhomogeneous: conduction channels have different transparencies, corresponding to the crystalline orientation and oxide thickness at their respective positions.

C. Additional STEM images of JJ barriers
The Al-AlO x -Al barriers resulting from MD models are in qualitative agreement with images obtained from STEM on JJ barriers fabricated by e-beam deposition of aluminum and thermal oxidation, as illustrated in Fig. S27.