replying to V. Soni et al. Nature Communications https://doi.org/10.1038/s41467-024-46261-4 (2024)

In comments on our paper “Within-host genetic diversity of SARS-CoV-2 lineages in unvaccinated and vaccinated individuals,” Soni et al. argue that the methods we employed for detecting natural selection are unreliable. Our study examined nucleotide diversity (π)1, the mean number of pairwise differences per nucleotide site, which is a common metric for quantifying within-host viral polymorphism2. Comparison of π at nonsynonymous (πN) and synonymous (πS) sites is thought to provide evidence for positive (πN > πS or πN/πS > 1) or purifying (πN < πS or πN/πS < 1) selection acting on amino acid changes3,4. This method has been used to study the intrahost evolution of viruses like influenza, often with evidence of positive selection in regions encoding immune epitopes5. Intrahost πN and πS have also been examined in SARS-CoV-26,7,8,9,10, and our study11 compared πNπS across distinct COVID-19 patient subsets. We found that breakthrough infections in 2- or 3-dose Comirnaty and CoronaVac vaccinated individuals do not show elevated viral πN and may not change the direction of selection. These negative conclusions inherently control for viral demographic factors like bottlenecks that operate similarly in each patient, allowing straightforward interpretation of πNπS differences.

Soni et al.12 challenge our null hypothesis of πNπS = 0 (i.e., πN = πS), instead proposing that simulation is necessary for defining a precise expectation under neutrality. Indeed, πNπS has widely recognized limitations13; for detecting positive selection, it is both overly conservative (may fail to detect positive selection when it has occurred) and susceptible to false positives (may spuriously detect positive selection when it has not occurred). Value is therefore placed on complementing the metric with other approaches. While recognizing these points, we believe the criticisms of Soni et al. may not be entirely valid. In fact, their own simulations demonstrate that selection is often readily detectable using a simple πN versus πS method.

First, Soni et al. employ analytical methods that do not reflect our study11. In our approach, the codon is treated as the observational unit, such that πN and πS values for each codon are averaged across all 2,820 intrahost samples or subsets thereof. Selection is then evaluated with a Z-test of the null hypothesis πNπS = 0 by bootstrapping codons. This detects codon-specific patterns that are consistent across samples; takes advantage of the independent diversity generated in each sample; and compensates for the typically small number of intrahost single nucleotide variants (iSNVs) that pass quality control for any one sample. In contrast, Soni et al.12 use the sample as the observational unit and report values of πN and πS for 200 replicates, analogous to only 200 samples. Their simulations also fail to recapitulate key aspects of the observed biological data, including πNπS values and numbers of iSNVs per sample (Supplementary Fig. 1).

Next, Soni et al. report no statistical tests. However, based on data simulated with SLiM14, they suggest that large variances make πN > πS probable even under purifying selection alone. This claim relies on the visual inspection of standard deviations in their Figs. 1–3. To assess it, we used the models of Soni et al. to simulate intrahost data for 100 samples, estimating standard errors of mean πN and πS as in our study. Purifying selection is highly significant for all models (P ≤ 5.0 × 10−7, Z-tests) (Supplementary Fig. 1). Purifying selection is detected even using their own sample-based approach (P ≤ 1.6 × 10−6, Wilcoxon Signed Rank tests). Thus, in contrast to their conclusions, a relatively small number of samples has sufficient statistical power to detect widespread selection using both methods.

Soni et al. then offer several simulations of positive selection. First, directional selection is modelled by introducing a single highly beneficial mutation (i.e., a selective sweep) in the context of a neutral/deleterious distribution of mutational fitness effects (DFE). Because the fraction of nonsynonymous mutations that are beneficial (fb) in this scenario is ~0.00007%, it is not surprising that πNπS fails to detect positive selection. Specifically, πNπS is tailored to detecting pervasive (multi-site), incomplete positive selection that is ‘caught in the act’. Population genetics theory suggests that the substitution of beneficial mutations takes an average of approximately \(2{{{{\mathrm{ln}}}}}(2{N}_{e}s)/s\) generations15. For selection coefficients (s) of 0.01–0.1 and intrahost effective population sizes (Ne) of 103–105, this implies an average of 45–644 days for SARS-CoV-2 (i.e., 106–1,520 replication cycles of 610 minutes16). A selective sweep is therefore not expected to complete over the course of a typical acute infection within a host. Furthermore, within-host viral evolution is likely to involve trade-offs, compensatory mutations, shifting fitness landscapes, and potentially balancing selection as a result of intrahost heterogeneity and frequency dependence17. In all cases, segregating nonsynonymous mutations will elevate πN.

In a second scenario of positive selection, Soni et al. set fb to 1.0% or 9.7% (s = 0.05–0.13) in the context of a DFE derived from Flynn et al. for Mpro (nsp5)18. We again used their models to simulate 100 samples (Fig. 1). Although they claim that πNπS cannot detect selection, positive selection was highly significant at the whole-genome level for fb = 9.7% (πN/πS = 4.43, P < 2.2 × 10−16), whereas purifying selection was detected for fb = 1.0% (πN/πS = 0.90, P = 0.0033; Z-tests). Thus, under the simulation parameters of Soni et al., positive selection becomes highly significant for fb somewhere in the range 1–10%, due to multiple beneficial mutations segregating at intermediate frequencies.

Fig. 1: Characterization of simulated data generated using models that allow multiple beneficial mutations.
figure 1

The SLiM14 simulations of Soni et al.12. were modified to generate 100 whole-genome (30 kbp) samples for each of three distributions of mutational fitness effects (DFEs) based on Flynn et al.18 and Bloom & Neher19. Flynn et al.18 refers to a DFE background estimated for Mpro (nsp5), with either 1.0% (blue text and arrow) or 9.7% (green text and arrow) of mutations beneficial (selection coefficients [s] = 0.05–0.13). Bloom & Neher19 (grey arrow) refers to a DFE estimated from publicly available viral consensus sequence data, where the fractions of each mutation effect type were set to the whole-genome values given in Table 1 (bottom row). For the latter, s values were approximated by dividing fitness effects (range −7.14–6.17) by 7.14 (maximum absolute value), yielding a range of −1.0–0.86. These values were simulated as lethal = −1.0; deleterious = gamma (mean −0.32, shape 1.70); neutral = 0.0; and beneficial = exponential (mean 0.087). For the gamma distribution shape parameter, a maximum likelihood estimate was obtained from the absolute values of all negative s using the MASS::fitdistr() function in R. All other parameters were retained from the scripts of Soni et al.: mutation rate = 2.135 × 10−6 per site per cycle; recombination rate = 5.5 × 10−5 per site per cycle; infection bottleneck size = 1; carrying capacity = 100,000; runtime = 168 cycles (https://github.com/vivaksoni/Gu_etal_2023_response, accessed 2023/09/26). Simulated data were analyzed using the method of our original study11, i.e., eliminating iSNVs with frequency <2.5% and estimating πNπS with codon-based bootstrapping. a DFEs for nonsynonymous mutations. Violin plots show the emergent s distributions of the three DFE models, each determined by simulating 10,000 mutations. b Nucleotide diversity under each DFE. Error bars show standard errors of mean πN (red) and πS (blue), each determined using 1,000 bootstrap replicates (codon unit, with codon values calculated as means across all 100 samples). P values refer to two-sided Z-tests of πN = πS (three tests; no adjustment for multiple tests). πN/πS ratios are displayed in grey text; for comparison, the mean empirical πN/πS value observed across all biological samples in our original study11 was 0.62. Scripts, analysis code, input data, and intermediate files are available at https://doi.org/10.5281/zenodo.10552831. Source data are provided as a Source Data file.

To estimate fb for SARS-CoV-2, we utilized the fitness effect calculations of Bloom and Neher19. The central 95% of synonymous mutational effects was considered a null (neutral) distribution, such that nonsynonymous mutations were classified as beneficial if their effects fell above the 97.5th percentile of synonymous mutations. Results are summarized in Table 1. For the whole genome, fb is 1.5%. For individual ORFs, fb ranges from 0.8% (ORF1ab) to 6.6% (ORF7a). For sliding windows of 30 codons such as used in our study11, fb ranges from 0% to 13.7%. Maximum regional fb values occur near Spike codons ~127–175 and ~461–512, overlapping the antigenically important amino-terminal (NTD) and receptor-binding (RBD) domains20. Thus, at the levels of whole ORFs and functional domains, fb for SARS-CoV-2 often falls in a range that allows detection of positive selection by πNπS.

Table 1 Estimated fractions of SARS-CoV-2 nonsynonymous mutations that are lethal, deleterious, neutral, and beneficial

Last, we modified the simulations of Soni et al. by introducing a DFE based on the nonsynonymous fitness effect estimates of Bloom and Neher19. Whole-genome mutation effect fractions (bottom row of Table 1) were used as a background. Deleterious and beneficial selection coefficients (s) were modelled using gamma (mean = −0.32, shape = 1.70) and exponential (mean = 0.087) distributions, respectively. Under these parameters, at the whole-genome level, selection was not significant (πN/πS = 1.03, P = 0.51) (Fig. 1b bottom). At the level of 30-codon sliding windows, we considered regions with πN > πS to be candidates for positive selection at various P value cut-offs, detecting 131 true positives (windows with at least one beneficial mutation) and 0 false positives for P < 0.0124. Thus, even under a nonideal scenario where the precise genomic targets of selection (codons with beneficial mutations) differ stochastically across samples, sliding windows are a reasonable candidate generator for regions undergoing positive selection.

All simulation results reported by Soni et al. and herein are subject to many limitations and likely do not reflect biological reality. First, DFEs were derived from functional assays18 or clinical isolates19 and therefore describe between-host evolution, but it is known that purifying selection is weaker within hosts6,21. Second, the models may contain important misspecifications, including (1) sequencing coverage of only 100 effective reads (median coverage in our study was 20,782 reads); (2) 2/3 of sites nonsynonymous (compared to ~3/4 in most real ORFs); (3) s > 1.0 in a SLiM non-Wright-Fisher context (Soni et al. Figure 2); (4) intrahost dynamics that may deviate from expected viral population sizes; and (5) no tendency for the same site to be under similar selection pressures across multiple samples (e.g., no convergent selected changes). Model complexity potentiates increased misspecification bias, and it is important for both biological parameters and analytical methods to match between simulated and empirical data.

To summarize, πNπS has limitations. Care must be exercised, as factors other than positive selection can yield πN > πS, especially in short genome regions where πS is subject to stochastic fluctuation. The expected value of πN/πS depends on fb and DFE properties. More work is needed to determine the precise values of fb necessary for detecting positive selection, intrahost DFEs, and additional criteria for lowering the false-discovery rate (e.g., a minimum πN cutoff). All parameters are likely to vary by host, virus, lineage, and many other contexts. SLiM offers unprecedented opportunities for simulating complex evolutionary scenarios in order to test specific hypotheses14. Nevertheless, we maintain that simple methods like πNπS have value. In the same way, simple dN/dS analyses continue to yield highly informative results22 even though viral consensus sequences do not incorporate real-world complexity, and each site in a genome may in reality follow its own ‘model’ of evolution which changes over time23. As the aphorism suggests, the question is not whether models are realistic, but rather whether they are useful24. While more advanced methods are always welcome, there is no one ‘right’ way to analyze evolutionary genomics data23.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.