First-principles calculations of hyperfine interaction, binding energy, and quadrupole coupling for shallow donors in silicon

Spin qubits based on shallow donors in silicon are a promising quantum information technology with enormous potential scalability due to the existence of robust silicon-processing infrastructure. However, the most accurate theories of donor electronic structure lack predictive power because of their reliance on empirical fitting parameters, while predictive ab initio methods have so far been lacking in accuracy due to size of the donor wavefunction compared to typical simulation cells. We show that density functional theory with hybrid and traditional functionals working in tandem can bridge this gap. Our first-principles approach allows remarkable accuracy in binding energies (67 meV for bismuth and 54 meV for arsenic) without the use of empirical fitting. We also obtain reasonable hyperfine parameters (1263 MHz for Bi and 133 MHz for As) and superhyperfine parameters. We demonstrate the importance of a predictive model by showing that hydrostatic strain has much larger effect on the hyperfine structure than predicted by effective mass theory, and by elucidating the underlying mechanisms through symmetry analysis of the shallow donor charge density.


INTRODUCTION
The advent of quantum computers capable of completing tasks beyond the capabilities of the most powerful classical supercomputers represents a new era in computation 1 . In the emerging field of quantum information technology, researchers are exploring a vast array of qubit platforms in a race to build scalable, faulttolerant quantum computers [1][2][3][4][5][6] . Spin-based qubits leveraging atomic clock transitions of shallow donors in silicon are one such promising technology, exhibiting long coherence times, singlequbit control, and high fidelity [7][8][9][10][11] . Though it still faces many obstacles, an important advantage of this qubit platform is its use of doped silicon, allowing the potential for enormous scalability using existing infrastructure, that has been built up by the semiconductor industry over decades.
Silicon in modern electronics is doped using the same kinds of shallow impurities that make up proposed silicon-based qubits. In the case of n-type doping, shallow donors contribute electrons to the conduction-band minimum (CBM), while in p-type doping holes are contributed to the valence-band maximum (VBM). In conventional electronic device theory, it is often sufficient to treat the introduced electrons or holes as free carriers in the unperturbed band structure of the host. However, spin-based qubits rely on the detailed electronic structure of the donor state. According to "effective-mass theory" 12 a donor modifies the band occupied by the donated electron much like a point charge modifies the vacuum states. This creates "hydrogenic" states in which the electron is loosely bound to the donor, analogous to the bonding of an electron to a proton in a free hydrogen atom. Originally studied by Kohn and Luttinger in the 1950s [13][14][15][16] , this model has been extended and improved, but still faces limitations due to its intrinsic approximations. Within hydrogenic effective mass theory, screening of the impurity potential is approximated by using the macroscopic dielectric constant, but actual screening by the valence electrons proceeds in complex material-dependent and impurity-dependent ways. These are known as "central-cell" effects. Further complications arise from the fact that the conduction-band minimum of silicon is a set of six degenerate valleys 12 . These six valleys are not independent, but mix and split through a process known as "valley-orbit coupling". Group theoretic arguments show that the CBM splits into states corresponding to irreducible representations of the T d symmetry group: singlet A 1 (the ground state), doublet E, and triplet T 2 17,18 . Developments in the treatment of shallow impurities have largely focused on corrections for these effects and attempts to include them in models [17][18][19][20][21] . Within effective mass theory, this effort has met with success largely through the use of empirical fitting parameters. We take a different tack, seeking greater predictive power by calculating properties of shallow impurities from first principles.
The ability to accurately model shallow dopants has taken on particular relevance in the context of quantum information science, as exemplified by the recent experiments of Pla et al. 22 . They used electron spin resonance (ESR) to measure transitions between total-spin states of the coupled electron-nucleus spin degree of freedom of shallow bismuth donors in silicon, a system with applications in atomic clock transitions for silicon-based spin qubits 23,24 . They observed a split Hahn echo peak corresponding to the 4; À4 j i$ 5; À5 j i transition. Orientation-dependent measurements suggest that this results from strain caused by the thermal contraction of an aluminum resonator patterned on the surface of the sample. This result raises the question: how does strain cause such a dramatic splitting in the ESR line? Traditional approaches, such as the "valley repopulation model" 25 , fail to provide an explanation. Another model invokes the coupling of the quadrupole moment of the donor nucleus with the electric field gradient (EFG) induced by the electron wavefunction, but accurate information about the strength of this interaction is lacking. Ab initio calculations have already been shown to successfully determine EFGs at nuclei 26 . An accurate calculation of the donor wavefunction will thus allow the determination of the EFG at the nucleus and the strength of the quadrupole coupling. Additional motivation was provided by subsequent experiments 7 that identified an unexpectedly large dependency of the isotropic hyperfine parameter A on the hydrostatic component of strain. We will therefore also explicitly explore the strain dependence of A.
Ab initio approaches to donors have met with some success in the past. Density functional theory (DFT) has been shown to correctly predict hyperfine parameters for highly localized defect states ("deep" impurities) 27,28 , but explicit calculation of the relatively delocalized shallow states within the supercell approach 29 is extremely challenging due to the large spatial extent of the wavefunction. Real-space pseudopotential methods have been successful in nanostructures 30,31 . Calculations based on an impurity Green's function approach are in principle more suited, and have been successfully used to study the isotropic hyperfine parameter of shallow donors 32 , to explore their evolution for phosphorus donors at very high strain 33 , and to calculate superhyperfine parameters 34 . However, the Green's function technique is far less intuitive than the supercell approach, and is difficult to generalize.
Calculations of shallow dopant binding energies have also been attempted through the supercell approach. Wang 35 used a potential patching method to calculate supercells up to 64,000 atoms, achieving modest success in the calculation of binding energies of some acceptors, and Zhang et al. 36 improved on this method by incorporating GW corrections based on 64-atom cells, accurately predicting binding energies of various acceptors in Si, GaAs, and GaP. Yamamoto et al. 37 modeled an As donor in Si by performing DFT calculations using the generalized gradient approximation (GGA) for supercells up to 10,648 atoms. They calculated binding energies based on wavefunctions extracted from the supercells and found a good description of the Bohr radius and the binding energy, but their use of an empirical model for the screened impurity potential sacrifices predictive power. More recently, Smith et al. 38 used a similar approach for P in Si. These studies show that, while it is extremely difficult to capture the exponential tail of shallow impurity wavefunctions with firstprinciples methods, DFT may be able to describe the wavefunction correctly in the vicinity of the impurity. This is encouraging because the isotropic hyperfine coupling and quadrupole coupling both depend on the properties of the wavefunction at the donor nucleus. This also suggests that it may be possible to introduce a systematic correction for errors induced by the overlap of the exponential tail of the wavefunction into neighboring supercells.
Even if systematic corrections are applied, the required supercell sizes can only be tackled within DFT by using traditional functionals such as GGA. However, such functionals underestimate electron localization, thus limiting the accuracy of the wavefunction amplitudes in the vicinity of the donor. In this work, we overcome these obstacles by using a tandem approach, judiciously combining GGA results for very large supercells with results obtained with a hybrid functional. which provides a correct description of localization. We are thus able to extrapolate the hybrid functional results to the dilute limit. This technique retains predictive power because it does not use empirical fitting parameters. We will demonstrate the approach using the example of arsenic and bismuth shallow donors in Si, obtaining results that are in excellent agreement with experiment. We also show that the experimentally observed variation of the isotropic hyperfine parameter as a function of strain can be captured in feasible supercell calculations, with central-cell and valley-orbit effects automatically included in the ab initio method.
The present work is on silicon, but the same techniques should work for donors in other multivalley semiconductors such as germanium. Donors in single-valley semiconductors such as gallium arsenide are simpler due to the lack of multivalley effects, and are much better described by effective mass theory 17 . We also expect that our techniques can be generalized to shallow acceptors, including the treatment of the degeneracy at the valence-band maximum.

Binding energies
The binding energy of a shallow donor may be obtained from the Kohn-Sham eigenvalues: where ϵ donor Γ is the Kohn-Sham eigenvalue of the occupied donor state (evaluated at Γ, as discussed in the "Methods" section) and ϵ CB Γ is the Kohn-Sham eigenvalue of the CB in a bulk calculation of a supercell of the same size, also evaluated at Γ for consistency. Because the Kohn-Sham eigenvalues are referenced to the average electrostatic potential, we must adjust for any shifts in the potential due to the presence of the impurity in order to meaningfully compare eigenvalues between bulk and impurity cells. This is accomplished by introducing the correction term ΔV = 〈V bulk − V donor 〉, where 〈 ⋅ 〉 indicates a macroscopic average over a bulk-like region of the cell far from the donor. This term (identical to the potential alignment term used for defect formation energies in the Freysoldt method 39,40 ) aligns the Kohn-Sham levels between the bulk and donor cells, allowing for meaningful comparison. Use of the Kohn-Sham eigenvalues rather than DFT total energies is standard for ab initio calculations of donor binding energies [35][36][37][38] . Methods based on comparing total energies significantly underestimate E b 35 . The quantities used to calculate the binding energy (Eq. (1)) for bismuth and arsenic shallow donors in silicon are given in Table 1, and the resulting binding energies are plotted in Fig. 1. For sufficiently large supercells, the error in the calculated binding energy comes from the exponential tail of the donor wavefunction, which extends into neighboring supercells. This error scales inversely with the volume of the supercell (see the "Methods" section), and therefore we plot our results as a function of 1/N where N is the number of atoms in the supercell. Our results for bismuth donor supercells with n ≥ 4 (N ≥ 512) show a clear linear trend with 1/N, allowing extrapolation to the N → ∞ limit. Extrapolating results obtained with the GGA of Perdew, Burke, and Ernzerhof 41 (PBE) gives a value of 28.3 meV for bismuth donors, significantly underestimating the experimental result of 70.9 meV 17 . For the arsenic donors, the PBE extrapolated value is 22.5 meV, compared to an experimental value of 53.8 meV 17 .
This underestimation of the binding energy can be attributed to the excess delocalization of the wavefunctions inherent in PBE, which results in a lower spin density close to the impurity site and thus a lower Coulomb attraction between the electron and the donor nucleus. This effect has been found in previous work on shallow impurities using DFT [35][36][37][38] . A better description of localization can be achieved with more sophisticated functionals, such as the hybrid functional of Heyd, Scuseria, and Ernzerhof (HSE) 42,43 . Due to the vastly greater computational cost of HSE, we had to restrict ourselves to calculations up to n = 5 (N = 1000). We also note that computational limitations based on the size of the supercell prevented the calculation of the ΔV term for n = 7, so PBE data is only reported up to n = 6.
As we can see from the PBE results, results obtained with the N = 1000 (n = 5) supercells are not quite converged; however, the PBE results also show that N ≥ 512 (n ≥ 4) is within the regime in which the value of the hyperfine parameter A scales as 1/N. We find that the PBE slope, suitably modified to take into account the PBE underestimation of the exchange splitting (as discussed in the Supplementary Methods and shown in Supplementary Fig. 1), can be used to extrapolate the HSE data to the limit N → ∞. Extrapolating the n = 5 HSE data based on this slope gives the final prediction. This procedure gives 66.7 meV for Bi donors and 53.9 meV for As donors, in excellent agreement with the experimental values (70.9 meV for Bi and 53.8 meV for As) 17 . These results show that HSE is able to correct the PBE underestimation of shallow donor binding energies in silicon, while the limitation of HSE supercell size may be corrected by the PBE scaling, achieving remarkable agreement with experiment.
Hyperfine and superhyperfine parameters The isotropic hyperfine parameter (in SI units) is given by: where g e is the electron g-factor, g I is the g-factor of the nucleus in question, μ B is the Bohr magneton, μ N is the nuclear magneton, R is the position of the nucleus, and σ(r) is the spin density. In the non-relativistic case discussed in ref. 27 , δ T is a Dirac delta function, but here a more extended function is used to take relativistic effects into account 28,44 .
Results for the hyperfine parameter of bismuth and arsenic shallow donors are shown in Fig. 2. The error is expected to scale inversely with the volume of the supercell much as in the case of binding energies, so we again plot our results as a function of 1/N. The linear trend in hyperfine parameter for PBE calculations starts at even lower cell size of n ≥ 3 (N ≥ 216) than the binding energies, allowing extrapolation to the N → ∞ limit. Extrapolating  This underestimation of the hyperfine parameter is again due to a lack of localization in the PBE wavefunction, and can be improved using HSE in a similar way as for the binding energies. HSE results show linear scaling in the same regime as PBE and with the same slope, supporting our assertion that PBE scaling extends to HSE results for binding energies. This procedure gives 1262 MHz for Bi donors and 132.5 MHz for As donors (compared with experimental values of 1475 MHz for Bi 22 and 198.3 MHz for As 17 ), providing a vast improvement over PBE.
In addition to the Fermi contact interaction with the donor nucleus, the spin density induced by the donor also leads to socalled "superhyperfine" (shf) interactions with any 29 Si nuclei in the host 32,33 . We calculate shf parameters for various shells of Si atoms using a similar procedure as for the hyperfine parameter A [Eq.
(2); see Supplementary Notes and Supplementary Fig. 2]. The results are displayed in Fig. 3. Because the spin density does not decrease monotonically with distance from the donor ("Kohn-Luttinger oscillations" 14,33 ), the shf parameter peaks at the 4th-nearest-neighbor [A(004)] shell. This is reflected in our calculations (Fig. 3), which are in reasonable agreement with experiment and have comparable accuracy to earlier work using the Green's function method 32 . This provides further evidence (1)), including half the exchange splitting δ ex , which converges differently as a function of supercell size in PBE and HSE (see Supplementary Fig. 1). that our approach is able to accurately describe the shallow-donor spin density.
Electric field gradients and quadrupole couplings Our ability to model shallow donors using first-principles methods allows us to address unexpected strain-induced spin resonance splittings first observed in the experiments by Pla et al. 22 described in the Introduction. We want to identify the mechanism by which strain causes a dramatic splitting in the ESR lines for Bi donors in Si. Traditionally, strain effects have been explained through the "valley repopulation model" 25 , which describes changes in the valley-orbit ground state caused by shifts of the conduction-band valleys. This model provides no explanation for the observed ESR splittings. Pla et al. 22 therefore explored other potential mechanisms for the strain dependence of the ESR splittings. One of these mechanisms is based on a term in the ESR Hamiltonian, which couples the quadrupole moment of the donor nucleus with the electric field gradient (EFG) induced by the electron wavefunction. This model includes a parameter γ, which enters as a scaling term in the quadrupole Hamiltonian. The main contribution to this term should be the Sternheimer antishielding factor, a measure of how the core electrons amplify an externally applied EFG (in this case coming from the shallow donor wavefunction). Pla et al. 22 found that γ = −900 would be required in this model to fit their data, close to the value for isolated Bi + ions (γ = −925.6). However, it is not clear whether this value of γ is realistic for a Bi donor in Si.
Other observations, such as the variation in strain splittings seen for different ESR transitions, are also not well explained by the quadrupole mechanism 22 .
Being able to calculate the EFG from first principles would clearly shed light on this issue. Ab initio calculations using the projector-augmented-wave method have successfully determined EFGs at nuclei without using empirical corrections such as the Sternheimer antishielding factor 26 . An accurate calculation of the donor wavefunction will thus allow the determination of the EFG at the nucleus and the strength of the quadrupole coupling.
We have calculated the EFG at the nucleus of the Bi shallow donor in unstrained silicon and find it to be zero to within the accuracy of our calculations. We estimate this error bar to be approximately 1 V/Å 2 , corresponding to a quadrupole interaction strength of approximately 1 MHz. For the unstrained case, the zero value is as expected, since symmetry arguments show that the EFG should vanish 22 . However, the calculated EFGs remain zero (within the error bar) when strain is applied. This runs counter to the quadrupole mechanism considered, amongst others, in Pla et al. 22 , as this required a quadrupole interaction on the order of 100 MHz. This result would appear to rule out the quadrupole splitting as one of the potential mechanisms for the strain-induced splittings observed in ESR.
These results highlight the need for a different mechanism to explain the observed splitting. In the next section, we will explore the shift of the isotropic hyperfine parameter with the hydrostatic component of strain.

Strain
We now address how the splittings of the ESR lines observed in the refs. 7,22 can be explained by the strain dependence of the isotropic hyperfine parameter A. Symmetry restricts the form of the strain dependence of the isotropic hyperfine parameter 7 .   29 Si in various shells surrounding the Bi donor are shown in blue squares, and shells surrounding the As donor are plotted as orange circles. We plot the negative of the shf parameter because the 29 Si g factor is negative.
The predicted values are extrapolated to the N → ∞ limit in a similar way to the hyperfine parameter, though larger supercells are required to see 1/N scaling (see Supplementary Fig. 2). Top axis ticks identify the neighbor shells, including the alphabetic labels from refs. 32,57 . Shown for comparison are experimental values from ref. 57  Expanded to second order in strain, this dependence is given by where A 0 is the value in the absence of strain. Shear terms can be shown to be negligible 7 and are not discussed here. The parameters K and L can be obtained by fitting to experimental results 7,22 . To pinpoint the underlying mechanisms of the ESR line splittings, the K and L values calculated for these mechanisms need to agree with experiment. The common assumption is that the variation in A would be attributed to the non-hydrostatic component of strain; this is referred to as the "valley repopulation model" 25 and requires an assessment of the parameter L. L can be estimated as L ¼ À2Ξ 2 u =ð9Δ 2 Þ, where Ξ u ¼ 8:6 eV is the uniaxial deformation potential of the CBM and Δ is the splitting between the A 1 and the E valley-orbit states 7,25 . This estimate results in L = −9720, in good agreement with L = −9800 ± 2100 fit to experimental data and L = −9064 extracted from tight-binding theory 7,22 .
However, the quadratic dependence of A on nonhydrostatic strain implies that it is only ever reduced from the unstrained value, while the experimental results require both positive and negative contributions; i.e., the hydrostatic component, proportional to the parameter K, is playing an important role. Since hydrostatic strain acts on all the valleys equally, it does not lead to a repopulation effect. However, the magnitude of the wavefunction at the donor nucleus-which determines A-will be affected by hydrostatic strain, leading to a nonzero value of K.
The traditional assessment of this dependence on hydrostatic strain is based on effective mass theory, in which the dependence occurs through variations in the dielectric constant ε and effective mass m*. The donor Bohr radius a D is proportional to ε/m* 12 . Literature shows that, for small strains, these properties vary linearly as a function of hydrostatic pressure. According to ref. 45 , Δε/ε = − 2.8 × 10 −3 for P = 1 GPa, and according to ref. 46 , Δm*/ m* = −1.4 × 10 −3 for P = 1800 kg/cm 2 . Using the bulk modulus of silicon (97.88 GPa) 47 , these results give the dependence of ε and m* on hydrostatic strain ϵ xx = ϵ yy = ϵ zz = ϵ: Δε ε ¼ 0:819ϵ and Δm Ã m Ã ¼ 2:322ϵ: The donor Bohr radius is therefore given by The effective-mass-theory wavefunction is proportional to a À3=2 D and the hyperfine parameter is proportional to the square of the wavefunction at the nucleus (Eq. (2)) 12 . Therefore the hyperfine parameter is In the notation of Eq. (3), this corresponds to K = 4.524. The experimentally observed value is K = 19.1 22 . Based on effective mass theory, we would thus conclude that impact of hydrostatic strain on the hyperfine parameter is much too small to explain the experimental splittings. But is effective mass theory sufficiently accurate? We have already seen that central cell effects are key to the accuracy of our approach. The issue became particularly relevant in light of subsequent experiments 7 , that confirmed the unexpectedly large dependence of the isotropic hyperfine parameter A on the hydrostatic component of strain. A very informative treatment of the shortcomings of effective mass theory and how they can be overcome in first-principles calculations was provided by Huebl et al. in ref. 33 . They showed that, for P donors in strained silicon, valley repopulation theory is not adequate to describe the change in hyperfine parameter. They were able to achieve good agreement with experiment by performing ab initio calculations within the Green's function method.
We will therefore use our first-principles approach to explicitly explore the strain dependence for Bi donors. The calculated dependence of the isotropic hyperfine parameter is shown in Fig.  4 for a n = 6 (N = 1728) supercell using PBE; the behavior as a function of supercell size and functional will be discussed below. Figure 4a shows the data for hydrostatic strain. In each case the atomic structure was fully relaxed. The bismuth-silicon bond lengths in the unstrained case are 2.651 Å, compared to 2.367 Å in bulk silicon. When hydrostatic strain is applied, all Bi-Si and Si-Si bond lengths simply scale with the hydrostatic strain to within better than 0.001 Å. The values of A/A 0 − 1 in Fig. 4 are well described by a linear fit up to the highest strains tested (2 × 10 −3 ), with a coefficient that corresponds to K = 17.5 (Eq. (3)). This value is in good agreement with the experimental value K = 19.1 7 . Figure 4b shows the results for uniaxial strain. We varied ϵ zz up to 2 × 10 −3 , keeping ϵ xx = ϵ yy = 0. The calculated points are fitted to Eq. (3) using the K value obtained from the hydrostatic strain calculations. Deviations from the quadratic behavior are observed for higher strains, consistent with the higher-order repopulation effects. The fit results in L = −11,700, in reasonable agreement with the values L = −9720 predicted by the valley repopulation model and L = −9064 calculated from tight-binding theory 7,22 . The reason for the decrease in L with uniaxial strain will be discussed in the subsection on "Irreducible representations".
We now discuss the sensitivity of the strain results to supercell size and choice of functional. The calculation of the hyperfine parameter as a function of hydrostatic strain [ Fig. 4a for n = 6, N = 1728] was repeated at different supercell sizes and subjected to similar scaling analysis as binding energies (Fig. 1) and hyperfine parameters (Fig. 2). The results of this analysis are shown in Fig. 5. K also shows linear scaling with 1/N, but it begins at a higher N value (corresponding to n = 4) than observed for the hyperfine parameter itself. Extrapolation to the dilute limit gives a coefficient of K = 20.2, in good agreement with the experimental value K = 19.1 7 .
As to the influence of the functional, as shown in Fig. 5 the HSE values for n = 2, 3, and 4 are very close to the PBE results. We therefore feel that the PBE results can be considered accurate to within an error bar of ±1. We thus find that, while the A parameter itself requires a calculation using an advanced functional such as HSE, its variation with strain (which presumably is determined by a redistribution of the wavefunction as a function of volume) is adequately described with PBE.
Our results clearly show that the isotropic hyperfine parameter depends linearly on the hydrostatic component of strain with a coefficient that significantly differs from that predicted by effective mass theory, highlighting the importance of valleyorbit coupling and central-cell corrections.

Irreducible representations
The symmetry and group-theoretic properties of the ground-state wavefunction are key to all of our results. As discussed in the "Introduction" section, the 6-fold degenerate CBM splits into states corresponding to irreducible representations (irreps) of the T d symmetry group: singlet A 1 , doublet E, and triplet T 2 17,18 . The A 1 wavefunction (the fully symmetric state) has a peak at the center while the others have nodes. This allows the A 1 state to maximize the Coulomb binding energy, making it the ground state, with an energy that is significantly lowered compared to effective mass theory. This also means that the hyperfine parameter is directly related to the amount of A 1 character in the ground state; as more of the other irreps are mixed in, the hyperfine parameter decreases. In the case of uniaxial strain (Fig. 4b), the symmetry is lowered, at the expense of the weight of the fully symmetric A 1 state, and A decreases for both compressive and expansive uniaxial strain.
To demonstrate that our calculations correctly capture these symmetry effects, we project out the portions of the donor wavefunctions which transform according to the various irreps of T d . We use the projection operator where j indexes the various irreps, l j is the dimension of the irrep, h is the order of the T d group, R is summed over all the symmetry operations in the group, and χ(R) j is the character of the operation in the jth representation 48 . With cell volume Ω, we define the fraction of the spin density ρ which transforms under the jth irrep The x A1 values for bismuth donors as a function of strain ϵ zz with ϵ xx = ϵ yy = 0 are shown in Fig. 6. At zero strain, ρ transforms almost entirely under the A 1 representation, and as uniaxial strain is applied, the fraction of ρ which transforms under A 1 decreases, following a trend similar to the hyperfine parameter (Fig. 4b).
Visualizations of the projected spin density are shown in Fig. 7. The "Kohn-Luttinger oscillations" evident in the shf parameters (Fig. 3) may also be seen in the shape of the visualized spin density, which differs significantly from the effective-mass-theory prediction.

DISCUSSION
The study of shallow impurity wavefunctions has a long history, beginning with Kohn-Luttinger effective mass theory in the 1950s. We have overcome some of the challenges to modeling these wavefunctions from first principles, allowing us to employ DFT calculations which fully include central-cell and valley-orbit effects. This allows us to study the evolution of the hyperfine parameter and quadrupole coupling of shallow donors as a function of strain without the use of empirical fitting parameters.
We have calculated accurate hyperfine parameters, superhyperfine parameters, and binding energies of bismuth and arsenic shallow donors by using PBE, identifying a 1/N scaling with supercell size and extrapolating to the N → ∞ limit, and correcting the results for PBE self-interaction error using the HSE hybrid functional. This represents a significant step forward in the ab initio simulation of shallow donors in silicon, and we expect it to be generalizable to other shallow impurities in silicon and other materials.
The predictive nature of our model allows us to explore other properties based on the calculated electron density. We find that the electric field gradients at the site of the nucleus are negligibly low, even when strain is applied. This suggests that the splittings observed by Pla et al. 22 in ESR experiments on Bi donors in Si do not in fact arise from quadrupole interactions. Instead, our calculations lend strong support to the alternative mechanism explored in the ref. 7 , which suggests a large linear dependence of the isotropic hyperfine parameter A on the hydrostatic component of strain. Our calculations find just such a dependence, with a  coefficient K = 20.2 that is in good agreement with the experimental value K = 19.1. The linear dependence on hydrostatic strain is significantly larger than the dependence predicted by effective mass theory.
This accurate, predictive, and generalizable model represents an important step in improving the ab initio description of shallow donors and their hyperfine structure. These theoretical tools go hand-in-hand with experiments, and can be instrumental in the ongoing effort to develop silicon-based spin qubits.

Density functional theory
We use spin-polarized DFT with the projector-augmented wave method 49 in the Vienna Ab-initio Simulation Package (VASP) 50 . Calculations employed the Perdew-Burke-Ernzerhof (PBE) exchange-correlation functional 41 as well as the hybrid functional of Heyd, Scuseria, and Ernzerhof (HSE) 42,43 with the standard parameters. This predicts a band gap of 1.16 eV, in very good agreement with the experimental zero-temperature gap of 1.17 eV 51 , suggesting that this functional provides a good description of the electronic structure of silicon in agreement with previous work 52 . The PBE value is 0.63 eV, The HSE-predicted dielectric constant of 11.1 is in good agreement with the experimental value 11.7 53 , and the predicted lattice constant 5.433 Å is in excellent agreement with the experimental 5.431 Å 54 .
Hyperfine parameters were calculated following the methodology described in the refs. 28,44,55,56 . The core spin polarization is taken into account through the frozen valence approximation developed in the ref. 56 and validated in the context of HSE in the ref. 28 . We use the VASP standard PAW pseudopotentials for the PBE functional, with [Xe]4f 14 in the bismuth core and [Ne]3s 2 3p 6 in the arsenic core. Electric field gradients and quadrupole couplings were calculated using the method of ref. 26 . Calculations were performed in supercells that are n × n × n multiples of the conventional 8-atom cubic cell. The largest PBE supercell corresponds to n = 7, containing N = 2744 atoms. The largest HSE supercell corresponds to n = 5, containing N = 1000 atoms. A plane-wave basis with an energy cutoff of 250 eV was used and the supercells were relaxed so that all forces were smaller than 1 meV/Å. Details on the computational cost of the hybrid functional calculations may be found in the Supplementary Discussion. Eigenvalue differences are converged to within 0.8 meV with respect to the cutoff, and hyperfine parameters are converged to within 2.5%.

Finite size effects
The Bohr radius of a shallow donor in silicon within effective mass theory is approximately 23.8 Å. This is only slightly smaller than the side lengths of the largest (n = 7, N = 2744) supercells we use: 38.3 Å. The Coulomb envelope only drops to 1% of its central value 54.7 Å from the center. Therefore, a significant part of this exponential tail necessarily overlaps into neighboring supercells. We will therefore need to extrapolate to the N → ∞ limit, where N is the number of atoms in the supercell, in order to calculate physical results. For large enough supercells, the error scales inversely with the volume of the supercell, or equivalently with 1/N. This involves fitting lines to the data versus 1/N. All fits are based on the linear least-squares method.

Occupation of states
Occupancies of the Kohn-Sham states in DFT are typically set through a "smearing" procedure. This is done to avoid various convergence issues that can arise when bands are partially occupied, and a scheme for extrapolating to zero smearing is typically included. This procedure is benign for most situations, but it leads to severe problems in this particular case. The energy spacing between valley-orbit split states of the donor wavefunction is approximately 10 meV, comparable in size to the smallest smearing parameters typically used; smearing will mix states other than the ground state into the final charge density. This is a problem because the A 1 state has a peak at the nucleus, while the other states have nodes. We are comparing to experiments performed at low temperature, so the measured properties are determined entirely by the ground state. Mixing with excited states of the donor will defeat any chance of calculating the hyperfine parameter correctly. To overcome this issue, we do not use smearing in our final calculations, fixing the spin-up occupancy of the lowest-energy conduction band to one. This requires care to ensure proper convergence, but has proven feasible through the use of intermediate calculations which include small amounts of smearing.

Brillouin zone sampling
The A 1 valley-orbit ground state is made up of a combination of all six conduction-band valley states. Therefore a correct calculation of the donor wavefunction must include contributions from each valley. For the Fig. 7 Visualization of symmetry in the electron density. Results of projecting ρ onto the A 1 representation for a bismuth donor calculation with ϵ xx = 10 −3 uniaxial strain, in a N = 1728 (n = 6) supercell. a The portion of ρ which transforms as A 1 . Isosurface value 2.7 × 10 −4 e/Å 3 . b The remainder after projection. Isosurface value 5.4 × 10 −5 e/Å 3 . equilibrium structure, and under hydrostatic strain, the six valleys are equivalent and equally occupied. Under uniaxial deformation, splitting of the valleys occurs, and ensuring correct sampling of all of the valleys (which is necessary to obtain the correct valley-orbit splitting) could be tricky.
Most of the supercells used in the present study are large enough to ensure that, in reciprocal space, the conduction-band valleys are folded back to a point very close to the zone center. Therefore sampling the Brillouin zone at a single k point automatically allows for all of the proper interactions and mixing between the valleys in the self-consistent calculation. Our calculations use the Γ point to maintain an unbiased sampling of the valleys and to reduce the computational demand. A detailed analysis of Brillouin-zone sampling was included in the ref. 38 , and also reached the conclusion that sampling at the Γ point provided reliable results.
We verified that our calculations capture the correct valley-orbit state through symmetry analysis of the wavefunction.