Coherent acoustic control of a single silicon vacancy spin in diamond

Phonons are considered to be universal quantum transducers due to their ability to couple to a wide variety of quantum systems. Among these systems, solid-state point defect spins are known for being long-lived optically accessible quantum memories. Recently, it has been shown that inversion-symmetric defects in diamond, such as the negatively charged silicon vacancy center (SiV), feature spin qubits that are highly susceptible to strain. Here, we leverage this strain response to achieve coherent and low-power acoustic control of a single SiV spin, and perform acoustically driven Ramsey interferometry of a single spin. Our results demonstrate an efficient method of spin control for these systems, offering a path towards strong spin-phonon coupling and phonon-mediated hybrid quantum systems.


Supplementary Note 1: Coordinate systems
We use x, y, z to refer to the coordinate system of the devices. The axis normal to the diamond surface is denoted by z, which is the [001] lattice direction. The surface acoustic waves (SAWs) are launched along x, or the [110] lattice direction. The y direction is given by the cross product of z and x, and is along [110]. The local coordinate system of the SiVs is referred to as X, Y, Z. The C 3 symmetry axis of the SiV, which is the line joining the two missing carbon atoms, is denoted by Z. This can be along any of the 4 equivalent 111 lattice directions. Because the SAWs are launched along [110], the 4 SiV axis directions are divided into two classes based on their orientation with respect to the [110] direction. "Longitudinal" SiVs have Z along [111] or [111], and "transverse" SiVs have Z along [111] or [111]. The response of the two SiVs along both directions within each class is expected to be the same due to mirror symmetry of the SAWs about the (110) plane. The X and Y directions are indicated in Supplementary Fig. 1.

Supplementary Note 2: Time domain characterization of SAW devices
To perform time-domain characterization, half of the electrical input to the device is sent to a real-time oscilloscope (Tektronix MSO71604C) by way of a 50-50 RF power splitter. One of the IDTs in a pair is excited with the AWG, and the signal from the other IDT is also connected to the oscilloscope. Short electrical pulses are generated with the AWG and both the AWG pulse and the transmitted electrical pulse from the SAW are characterized ( Supplementary  Fig. 2). The ringing of the input signal is caused by the RF amplifier. The transmitted output signal has two main features: 1. The earliest feature is due to direct electrical crosstalk between the input and output ports. It has approximately the same length as the input pulse. This is likely from the wire bonds to the input and output transducers, because these GHz-frequency electrical signals have a relatively large wavelength in air. The delay between the input signal and the crosstalk signal is due to the longer path via RF cables to the device inside the cryostat and back.
2. Next, there is the feature from the transduced SAW pulse. This ramps up and down, indicating the build-up and decay of the acoustic pulse that is expected from the non-zero length of the transducer.
The delay between the electrical crosstalk and the first acoustic response indicates a time of about 21 ns for the SAW pulse to travel between the transducers. Given that the separation between the transducers is approximately 175 µm, the SAW velocity is approximately 8.3 km s −1 which agrees with simulations. The non-zero length of the SAW transducer imposes a lower bound on the shortest acoustic pulses it can generate, which is about 13 ns. As the length of the microwave pulse from the AWG is increased, the length of the acoustic pulse remains approximately constant while its amplitude increases. For pulses longer than 13 ns, the acoustic pulse reaches its maximum amplitude and its length starts to increase, as expected.
Because of the minimum duration of the SAW pulses, we use a minimum delay of 10 ns between π/2 pulses for the Ramsey interferometry measurement. At this delay, the acoustic pulses overlap neglegibly ( Supplementary Fig. 3). The fluorescence intensity from an SiV under optical excitation can vary in the presence of an acoustic pulse. This is observed when the laser frequency is resonant with a transition to a virtual state generated by the acoustic pulse, as explained in Supplementary Note 6. We explore this interaction by performing pulsed measurements of an SiV under a magnetic field, using different frequencies of optical and acoustic pulses, as shown in Supplementary Fig. 4. The SiV spin is initialized and read out using laser pulses, similar to the experiments in the main text, and an acoustic pulse is applied during the initialization pulse. In (a), the laser is detuned by f = 3.34 GHz from the C3 transition, and the SAW frequency is set to the detuning f . The laser frequency is not resonant with any of the other transitions. Hence we notice very low fluorescence intensity during the optical pulse, but there is a large increase in counts when the acoustic pulse is applied, due to the photon-phonon transition being resonant with C3. In (b), the laser is resonant with the C1 transition, and the SAW frequency is set to the difference between the C1 and C3 transitions, equivalent to the splitting between the lower ground states |e g+ ↓ and |e g− ↑ of the SiV. There is an initial increase in the fluorescence due to the optical pulse, followed by the exponential decay as the spin population is optically pumped to |e g− ↑ . When the acoustic pulse is applied, the photon-phonon transition is resonant with C3, thereby leading to a large increase of fluorescence. This response has practical significance for our experiment, as it allows the identification of the time of arrival of the acoustic pulse at the SiV. This is used to measure the velocity of the acoustic pulse on the diamond surface, and to establish that it is indeed the acoustic pulse that is responsible for controlling the SiV spin. To do this, we choose a different SiV, one located close (∼ few µm) to one of the transducers (i.e. the "right" transducer). We repeat the measurement from Supplementary Fig. 4(b), first generating the acoustic pulse by exciting the right transducer. Then we swap the RF cables connected to the IDTs, in order to switch the roles of the left and right transducers. We repeat the measurement, this time by exciting the left transducer. The resulting time-resolved fluorescence ( Supplementary  Fig. 5) indicates a time difference of about 13.3 ns between the arrival of the acoustic pulses from the right and left at the SiV. Because we swap the RF cables for the second measurement, the time difference from different lengths of RF cables to the left and right ports of the cryostat is eliminated. There is still a small difference in the lengths of the wires inside the cryostat, however this difference is on the order of a few cm and is thus neglegible. Therefore the measured time difference is close to the time taken for the SAW pulse to travel between the left and right transducer. Given that the separation between transducers is 175 µm, we estimate a velocity of 13.1 km s −1 for the SAW. While the estimate of the SAW velocity from the SiV fluorescence measurement and electrical measurements differ, they are both close (within 25%) of the simulated value (10.3 km s −1 ). This indicates that these are indeed acoustic pulses driving the SiV, and not microwave radiation that might originate from the patterned electrodes (e.g. the IDTs).

Supplementary Note 4: Fitting of Rabi oscillation data
As mentioned in the main text, we use a simple two-level model (exponentially damped sinusoid) to fit the Rabi oscillation data. This model fits the data fairly well for low acoustic powers, but begins to show deviations at higher powers ( Supplementary Fig. 6). Fitting the data using a master equation model which considers all 4 levels in the SiV ground state, similar to the approach in [1], does not provide any significant improvement over the simple model. We believe that the distortions in the oscillations may be due to reflections of the acoustic pulses from the opposing IDTs. Fabrication of IDTs individually, instead of in pairs, may provide an answer to this question.  The SiV has four states in both the ground and excited state manifolds, given by the direct product of the orbital {|e X , |e Y } and spin {|↑ , |↓ } bases. For the purposes of our experiments, the SiV Hamiltonian can be expressed as a sum of spin-orbit, Zeeman, and strain terms [2,3]. The constants appearing in these expressions have two sets of values for the ground and excited states. The spin-orbit term is given by in which λ SO is the spin-orbit coupling. The Zeeman term is given by in which γ L = µ B /h, γ S = 2µ B /h are the orbital and spin gyromagnetic ratiosrespectively, and q is the orbital quenching factor. The strain Hamiltonian is given by in which α, β, γ are symmetry-adjusted linear combinations of strain tensor components and are given by Hence α consists of strain components belonging to the A 1g representation of the D 3d symmetry group and β, γ consist of the E gX , E gY representation components. The measured values of the constants associated with each of the terms in the Hamiltonian have been reported previously and are summarized in Supplementary Table 1.

Supplementary Note 6: Strain response of the SiV
In the coordinate system of the devices (Fig. 2(a) of the main text), the SAW mostly has components in the x and z directions, with x being the direction of propagation. Hence the components of the strain involving y can be neglected. From our simulations, we notice that the components xx and zz are in phase while xz is π/2 out of phase. Hence the strain tensor of the SAW in device coordinates may be expressed as The SiV responds differently to strain components belonging to the A 1g and E g representations. The A 1g strain components ZZ and XX + Y Y (in SiV coordinates) shift the energies of the SiV electronic states, while the E g strain components XX − Y Y , XZ , XY and Y Z mix the electronic orbitals [3]. While the E g strain response is responsible for the acoustic spin control, the A 1g response is also of practical significance to this experiment, as mentioned in Supplementary Note 3.

A. A1g strain response
When the SiV is subjected to an oscillating A 1g strain field, the oscillating energy shift of the electronic states causes frequency modulation of the energy levels, generating virtual states from interaction with an integer number of phonons. These virtual states are revealed as sidebands in the resonant excitation spectrum under continuouswave acoustic driving ( Supplementary Fig. 7). As the acoustic power is increased, more sidebands are observed. We observe two sidebands using 6 dBm of microwave input power applied to the transducer, demonstrating photon-phonon transitions with up to two phonons. Since these measurements are performed at zero magnetic field, the SiV can be described by just the spin-orbit and strain Hamiltonian terms: resulting in the energies The instantaneous quantities α, β, γ are given by the sum of a static term like α 0 arising from pre-strain, and a time-dependent term like α sin(ωt) from the oscillating SAW drive. For the SiV used in our measurements, for which β, γ λ SO , the instantaneous energy shift of the C-transition is given by As this is linear in strain, the time-dependent part of the energy shift is independent of any existing pre-strain and is given by This energy shift induces virtual states that couple to the lower excited state, with relative quasi-energies given by integer multiples of the drive frequency, and populations in which J k are Bessel functions of the first kind. SAW power estimation. Assuming that the SiV is located at the focus of the Gaussian SAW beam, finite element simulation allows us to determine the value of ∆ α = α e − α g , for a given SAW power P , from the strain tensor amplitude . However, in our experiments, the power at the location of the SiV is lower than the input microwave power by an efficiency factor η. Therefore, the quantity ∆ α for an input power P in is given by where ∆ α sim , P sim are the values obtained from simulation. By using this relation, we fit the observed relative fluorescence intensities to the populations P k for different input powers, while varying the parameter η. The measured and fitted relative fluorescence intensities are shown in Supplementary Fig. 8. We determine that the SiV is oriented longitudinally with respect to the SAW propagation direction, and estimate η = 3.4 × 10 −4 . The various factors contributing to η are discussed further in Supplementary Note 7.

B. Eg strain response
Our calculations are performed in the basis of the eigenstates of the sum of the spin-orbit and axial Zeeman Hamiltonians, as we work in a non-zero magnetic field, in which we treat the strain and transverse magnetic field as perturbations. The initial Hamiltonian is given by in which we have neglected the effect of the magnetic field on the orbitals, due to the small orbital quenching factor q = 0.1 [2]. The eigenstates are given by {|e g+ ↑ , |e g− ↓ , |e g− ↑ , |e g+ ↓ }, in which |e g± = |e gX ± i |e gY , with energies given by (±λ SO ± γ S B Z )/2. The off-axis magnetic field term is which in the eigenstate basis is in which B ± = B X ± iB Y . This perturbs the eigenstates as The strain term in the eigenstate basis is which introduces a coupling between |e g+ ↓ and |e g− ↑ , given as which is the generalized Rabi frequency when the oscillating strain field is resonant with the |e g+ ↓ ↔ |e g− ↑ transition, and the instantaneous quantities β, γ are replaced by the amplitudes β, γ. The magnitude of Ω is equal to the measured Rabi frequency. SAW power estimation. Similar to the case of the A 1g strain response, the Rabi frequency for a given SAW power is estimated from simulation, and the measured Rabi frequencies are given by The linear fit of Ω against √ P in (Fig. 4(a) in the main text) gives η = 3.7×10 −5 . This differs from the efficiency factor estimated from the A 1g strain response by a factor of 9.2, and potential reasons for this discrepancy are discussed in Supplementary Note 7.

Supplementary Note 7: SAW power efficiency
The power efficiency factors estimated from the A 1g strain response and E g response (Rabi oscillations) are given by η Eg = 3.7 × 10 −5 (32) as mentioned in Supplementary Note 6. These factors can be expressed as a product of several individual efficiencies: The interpretation of each of these factors is explained below.
1. η input is the conversion efficiency of the input microwaves to SAW by the transducer. It is estimated to be about 8.8 × 10 −2 from the electrical S 11 measurement ( Fig. 2(b) in the main text).
2. η propagation is due to the propagation losses of the SAWs travelling from the IDT to the SiV. Assuming that the conversion efficiencies for microwaves to SAW and SAW to microwaves are the same, this factor is estimated to be about 3.2 × 10 −1 which we infer from the electrical S 11 and S 21 measurements.
3. η alignment is due to the spatial misalignment between the SiV and the focus of the Gaussian SAW beam. Due to the large separation between the two IDTs in our device, it is not possible to determine the exact location of the focus during experiments, as both IDTs cannot be simultaneously imaged within one optical field of our confocal microscope. To identify the focal point, we first locate one IDT and then translate the optical field to where we expect the focus to be, after which we search for SiVs. It is possible to be misaligned by up to a few µm. The effect of this misalignment on the efficiency factor is shown in Supplementary Fig. 9. Assuming a worst-case misalignment of 5 µm, this factor can be as low as 3.9 × 10 −3 . In addition, misalignment along y can also introduce yy, xy, and zy components in the strain tensor, which changes the relations between ∆α, Ω, and P . We do not attempt to evaluate these effects in detail.
4. η magnetic . In our simulations, we assume that the magnetic field is oriented along the [001] direction. However, the direction of the magnetic field lines of a permanent magnet vary significantly depending on the spatial location. This introduces some uncertainty in the magnetic field components, which can vary the Rabi frequency.
5. η pre-strain . The Rabi frequency from the E g strain response is significantly reduced by any existing pre-strain [3]. However, the A 1g strain response is robust against pre-strain. This factor accounts for the difference.
With the estimates for the efficiency factors mentioned above, η device varies between 2.8 × 10 −2 and 1.1 × 10 −4 depending on the misalignment. Therefore, the measured efficiency factor η device = 3.4 × 10 −4 is within reasonable limits. Additionally, η Eg /η A1g = η magnetic × η pre-strain = 0.11 indicates that η A1g and η Eg are reasonably consistent, suggesting good agreement of our experimental results with the theory of the SiV strain response [3], up to certain effects such as the precision of the magnetic field direction and uncontrollable pre-strain.