Reply to: Population genetic considerations regarding the interpretation of within-patient SARS-CoV-2 polymorphism data

Nelson, Chase W.; Poon, Leo L. M.; Gu, Haogao

doi:10.1038/s41467-024-46262-3

Download PDF

Matters Arising
Open access
Published: 16 April 2024

Reply to: Population genetic considerations regarding the interpretation of within-patient SARS-CoV-2 polymorphism data

Nature Communications volume 15, Article number: 3239 (2024) Cite this article

730 Accesses
7 Altmetric
Metrics details

Subjects

The Original Article was published on 16 April 2024

replying to V. Soni et al. Nature Communications https://doi.org/10.1038/s41467-024-46261-4 (2024)

In comments on our paper “Within-host genetic diversity of SARS-CoV-2 lineages in unvaccinated and vaccinated individuals,” Soni et al. argue that the methods we employed for detecting natural selection are unreliable. Our study examined nucleotide diversity (π)¹, the mean number of pairwise differences per nucleotide site, which is a common metric for quantifying within-host viral polymorphism². Comparison of π at nonsynonymous (π_N) and synonymous (π_S) sites is thought to provide evidence for positive (π_N > π_S or π_N/π_S > 1) or purifying (π_N < π_S or π_N/π_S < 1) selection acting on amino acid changes^3,4. This method has been used to study the intrahost evolution of viruses like influenza, often with evidence of positive selection in regions encoding immune epitopes⁵. Intrahost π_N and π_S have also been examined in SARS-CoV-2^6,7,8,9,10, and our study¹¹ compared π_N – π_S across distinct COVID-19 patient subsets. We found that breakthrough infections in 2- or 3-dose Comirnaty and CoronaVac vaccinated individuals do not show elevated viral π_N and may not change the direction of selection. These negative conclusions inherently control for viral demographic factors like bottlenecks that operate similarly in each patient, allowing straightforward interpretation of π_N – π_S differences.

Soni et al.¹² challenge our null hypothesis of π_N – π_S = 0 (i.e., π_N = π_S), instead proposing that simulation is necessary for defining a precise expectation under neutrality. Indeed, π_N – π_S has widely recognized limitations¹³; for detecting positive selection, it is both overly conservative (may fail to detect positive selection when it has occurred) and susceptible to false positives (may spuriously detect positive selection when it has not occurred). Value is therefore placed on complementing the metric with other approaches. While recognizing these points, we believe the criticisms of Soni et al. may not be entirely valid. In fact, their own simulations demonstrate that selection is often readily detectable using a simple π_N versus π_S method.

First, Soni et al. employ analytical methods that do not reflect our study¹¹. In our approach, the codon is treated as the observational unit, such that π_N and π_S values for each codon are averaged across all 2,820 intrahost samples or subsets thereof. Selection is then evaluated with a Z-test of the null hypothesis π_N – π_S = 0 by bootstrapping codons. This detects codon-specific patterns that are consistent across samples; takes advantage of the independent diversity generated in each sample; and compensates for the typically small number of intrahost single nucleotide variants (iSNVs) that pass quality control for any one sample. In contrast, Soni et al.¹² use the sample as the observational unit and report values of π_N and π_S for 200 replicates, analogous to only 200 samples. Their simulations also fail to recapitulate key aspects of the observed biological data, including π_N – π_S values and numbers of iSNVs per sample (Supplementary Fig. 1).

Next, Soni et al. report no statistical tests. However, based on data simulated with SLiM¹⁴, they suggest that large variances make π_N > π_S probable even under purifying selection alone. This claim relies on the visual inspection of standard deviations in their Figs. 1–3. To assess it, we used the models of Soni et al. to simulate intrahost data for 100 samples, estimating standard errors of mean π_N and π_S as in our study. Purifying selection is highly significant for all models (P ≤ 5.0 × 10⁻⁷, Z-tests) (Supplementary Fig. 1). Purifying selection is detected even using their own sample-based approach (P ≤ 1.6 × 10⁻⁶, Wilcoxon Signed Rank tests). Thus, in contrast to their conclusions, a relatively small number of samples has sufficient statistical power to detect widespread selection using both methods.

Soni et al. then offer several simulations of positive selection. First, directional selection is modelled by introducing a single highly beneficial mutation (i.e., a selective sweep) in the context of a neutral/deleterious distribution of mutational fitness effects (DFE). Because the fraction of nonsynonymous mutations that are beneficial (f_b) in this scenario is ~0.00007%, it is not surprising that π_N – π_S fails to detect positive selection. Specifically, π_N – π_S is tailored to detecting pervasive (multi-site), incomplete positive selection that is ‘caught in the act’. Population genetics theory suggests that the substitution of beneficial mutations takes an average of approximately \(2{{{{\mathrm{ln}}}}}(2{N}_{e}s)/s\) generations¹⁵. For selection coefficients (s) of 0.01–0.1 and intrahost effective population sizes (N_e) of 10³–10⁵, this implies an average of 45–644 days for SARS-CoV-2 (i.e., 106–1,520 replication cycles of 610 minutes¹⁶). A selective sweep is therefore not expected to complete over the course of a typical acute infection within a host. Furthermore, within-host viral evolution is likely to involve trade-offs, compensatory mutations, shifting fitness landscapes, and potentially balancing selection as a result of intrahost heterogeneity and frequency dependence¹⁷. In all cases, segregating nonsynonymous mutations will elevate π_N.

In a second scenario of positive selection, Soni et al. set f_b to 1.0% or 9.7% (s = 0.05–0.13) in the context of a DFE derived from Flynn et al. for Mpro (nsp5)¹⁸. We again used their models to simulate 100 samples (Fig. 1). Although they claim that π_N – π_S cannot detect selection, positive selection was highly significant at the whole-genome level for f_b = 9.7% (π_N/π_S = 4.43, P < 2.2 × 10⁻¹⁶), whereas purifying selection was detected for f_b = 1.0% (π_N/π_S = 0.90, P = 0.0033; Z-tests). Thus, under the simulation parameters of Soni et al., positive selection becomes highly significant for f_b somewhere in the range 1–10%, due to multiple beneficial mutations segregating at intermediate frequencies.

**Fig. 1: Characterization of simulated data generated using models that allow multiple beneficial mutations.**

To estimate f_b for SARS-CoV-2, we utilized the fitness effect calculations of Bloom and Neher¹⁹. The central 95% of synonymous mutational effects was considered a null (neutral) distribution, such that nonsynonymous mutations were classified as beneficial if their effects fell above the 97.5^th percentile of synonymous mutations. Results are summarized in Table 1. For the whole genome, f_b is 1.5%. For individual ORFs, f_b ranges from 0.8% (ORF1ab) to 6.6% (ORF7a). For sliding windows of 30 codons such as used in our study¹¹, f_b ranges from 0% to 13.7%. Maximum regional f_b values occur near Spike codons ~127–175 and ~461–512, overlapping the antigenically important amino-terminal (NTD) and receptor-binding (RBD) domains²⁰. Thus, at the levels of whole ORFs and functional domains, f_b for SARS-CoV-2 often falls in a range that allows detection of positive selection by π_N – π_S.

Table 1 Estimated fractions of SARS-CoV-2 nonsynonymous mutations that are lethal, deleterious, neutral, and beneficial

Full size table

Last, we modified the simulations of Soni et al. by introducing a DFE based on the nonsynonymous fitness effect estimates of Bloom and Neher¹⁹. Whole-genome mutation effect fractions (bottom row of Table 1) were used as a background. Deleterious and beneficial selection coefficients (s) were modelled using gamma (mean = −0.32, shape = 1.70) and exponential (mean = 0.087) distributions, respectively. Under these parameters, at the whole-genome level, selection was not significant (π_N/π_S = 1.03, P = 0.51) (Fig. 1b bottom). At the level of 30-codon sliding windows, we considered regions with π_N > π_S to be candidates for positive selection at various P value cut-offs, detecting 131 true positives (windows with at least one beneficial mutation) and 0 false positives for P < 0.0124. Thus, even under a nonideal scenario where the precise genomic targets of selection (codons with beneficial mutations) differ stochastically across samples, sliding windows are a reasonable candidate generator for regions undergoing positive selection.

All simulation results reported by Soni et al. and herein are subject to many limitations and likely do not reflect biological reality. First, DFEs were derived from functional assays¹⁸ or clinical isolates¹⁹ and therefore describe between-host evolution, but it is known that purifying selection is weaker within hosts^6,21. Second, the models may contain important misspecifications, including (1) sequencing coverage of only 100 effective reads (median coverage in our study was 20,782 reads); (2) 2/3 of sites nonsynonymous (compared to ~3/4 in most real ORFs); (3) s > 1.0 in a SLiM non-Wright-Fisher context (Soni et al. Figure 2); (4) intrahost dynamics that may deviate from expected viral population sizes; and (5) no tendency for the same site to be under similar selection pressures across multiple samples (e.g., no convergent selected changes). Model complexity potentiates increased misspecification bias, and it is important for both biological parameters and analytical methods to match between simulated and empirical data.

To summarize, π_N – π_S has limitations. Care must be exercised, as factors other than positive selection can yield π_N > π_S, especially in short genome regions where π_S is subject to stochastic fluctuation. The expected value of π_N/π_S depends on f_b and DFE properties. More work is needed to determine the precise values of f_b necessary for detecting positive selection, intrahost DFEs, and additional criteria for lowering the false-discovery rate (e.g., a minimum π_N cutoff). All parameters are likely to vary by host, virus, lineage, and many other contexts. SLiM offers unprecedented opportunities for simulating complex evolutionary scenarios in order to test specific hypotheses¹⁴. Nevertheless, we maintain that simple methods like π_N – π_S have value. In the same way, simple d_N/d_S analyses continue to yield highly informative results²² even though viral consensus sequences do not incorporate real-world complexity, and each site in a genome may in reality follow its own ‘model’ of evolution which changes over time²³. As the aphorism suggests, the question is not whether models are realistic, but rather whether they are useful²⁴. While more advanced methods are always welcome, there is no one ‘right’ way to analyze evolutionary genomics data²³.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

All input data, intermediate files, and simulated data have been deposited at Zenodo under accession code https://doi.org/10.5281/zenodo.10552831. Data for estimating f_b were obtained from the aamut_fitness_all.csv file of Bloom and Neher¹⁹ (public_2023-10-01 dataset; accessed 2023/10/05). Figure source data are provided as a Source Data file. Source data are provided with this paper.

Code availability

Simulation and analysis scripts have been deposited at Zenodo under accession code https://doi.org/10.5281/zenodo.10552831.

References

Nei, M. & Li, W.-H. Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc. Natl Acad. Sci. USA 76, 5269–5273 (1979).
Article ADS CAS PubMed PubMed Central Google Scholar
Lauring, A. S. Within-host viral diversity: a window into viral evolution. Annu. Rev. Virol. 7, 63–81 (2020).
Article CAS PubMed PubMed Central Google Scholar
Nelson, C. W. & Hughes, A. L. Within-host nucleotide diversity of virus populations: Insights from next-generation sequencing. Infection Genet. Evol. 30, 1–7 (2015).
Article CAS Google Scholar
Nelson, C. W., Moncla, L. H. & Hughes, A. L. SNPGenie: estimating evolutionary parameters to detect natural selection using pooled next-generation sequencing data. Bioinformatics 31, 3709–3711 (2015).
Article CAS PubMed PubMed Central Google Scholar
Moncla, L. H. et al. Selective bottlenecks shape evolutionary pathways taken during mammalian adaptation of a 1918-like avian influenza virus. Cell Host Microbe 19, 169–180 (2016).
Article CAS PubMed PubMed Central Google Scholar
Nelson, C. W. et al. Dynamically evolving novel overlapping gene as a factor in the SARS-CoV-2 pandemic. eLife 9, e59633 (2020).
Article CAS PubMed PubMed Central Google Scholar
Lythgoe, K. A. et al. SARS-CoV-2 within-host diversity and transmission. Science 372, eabg0821 (2021).
Article CAS PubMed PubMed Central Google Scholar
Bashor, L. et al. SARS-CoV-2 evolution in animals suggests mechanisms for rapid variant selection. Proc. Natl. Acad. Sci. USA 118, e2105253118 (2021).
Article CAS PubMed PubMed Central Google Scholar
Tonkin-Hill, G. et al. Patterns of within-host genetic diversity in SARS-CoV-2. eLife 10, e66857 (2021).
Article CAS PubMed PubMed Central Google Scholar
San, J. E. et al. Transmission dynamics of SARS-CoV-2 within-host diversity in two major hospital outbreaks in South Africa. Virus Evol. 7, veab041 (2021).
Article PubMed PubMed Central Google Scholar
Gu, H. et al. Within-host genetic diversity of SARS-CoV-2 lineages in unvaccinated and vaccinated individuals. Nat. Commun. 14, 1793 (2023).
Article CAS PubMed PubMed Central Google Scholar
Soni, V., Terbot II, J. W. & Jensen, J. D. Population genetic considerations regarding the interpretation of within-patient SARS-CoV-2 polymorphism data. Nat. Commun. This issue (2023).
Kryazhimskiy, S. & Plotkin, J. B. The population genetics of dN/dS. PLoS Genet. 4, e1000304 (2008).
Article PubMed PubMed Central Google Scholar
Haller, B. C. & Messer, P. W. SLiM 3: forward genetic simulations beyond the Wright–Fisher model. Mol. Biol. Evol. 36, 632–637 (2019).
Article CAS PubMed PubMed Central Google Scholar
Walsh, B. & Lynch, M. Evolution and Selection of Quantitative Traits (Oxford University Press, 2018).
Terbot, J. W. et al. Developing an appropriate evolutionary baseline model for the study of SARS-CoV-2 patient samples. PLoS Pathog. 19, e1011265 (2023).
Article CAS PubMed PubMed Central Google Scholar
Daugherty, M. D. & Malik, H. S. Rules of engagement: molecular insights from host-virus arms races. Annu. Rev. Genet. 46, 677–700 (2012).
Article CAS PubMed Google Scholar
Flynn, J. M. et al. Comprehensive fitness landscape of SARS-CoV-2 Mpro reveals insights into viral resistance mechanisms. eLife 11, e77433 (2022).
Article CAS PubMed PubMed Central Google Scholar
Bloom, J. D. & Neher, R. A. Fitness effects of mutations to SARS-CoV-2 proteins. Virus Evol. 9, vead055 (2023).
Article PubMed PubMed Central Google Scholar
Carabelli, A. M. et al. SARS-CoV-2 variant biology: immune escape, transmission and fitness. Nat. Rev. Microbiol. https://doi.org/10.1038/s41579-022-00841-7 (2023)
Holmes, E. C. The Evolution and Emergence of RNA Viruses (Oxford University Press, 2009).
Lucaci, A. G. et al. RASCL: rapid assessment of selection in CLades through molecular sequence analysis. PLoS ONE 17, e0275623 (2022).
Article CAS PubMed PubMed Central Google Scholar
Hughes, A. L., Friedman, R. & Glenn, N. L. The future of data analysis in evolutionary genomics. Curr. Genomics 7, 227–234 (2006).
Article CAS Google Scholar
Box, G. E. P. Science and Statistics. J. Am. Statistical Assoc. 71, 791–799 (1976).
Article MathSciNet Google Scholar
Jungreis, I. et al. Conflicting and ambiguous names of overlapping ORFs in the SARS-CoV-2 genome: a homology-based resolution. Virology 558, 145–151 (2021).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

The authors acknowledge the Research Grants Council of HK theme-based research schemes (T11-705/21-N (L.L.M.P.)), Health and Medical Research Fund (COVID190205 (L.L.M.P.)), and InnoHK grant (L.L.M.P.) for the Centre for Immunology and Infection. H.G. was supported by the RGC Postdoctoral Fellowship Scheme (PDFS2324-7S03 (H.G.)) by the University Grants Committee of Hong Kong. C.W.N. was supported by the NCI Research Participation Program administered by the Oak Ridge Institute for Science and Education (ORISE) through an interagency agreement between the U.S. Department of Energy (DOE) and the National Institute of Health (NIH). ORISE is managed by ORAU under DOE contract number DESC0014664. All opinions expressed are the authors’ and do not necessarily reflect the policies and views of their organizations. The authors thank Jesse Bloom, Ben Haller, Sarah P. Otto, Helen Piontkivska, April (Xinzhu) Wei, Zachary Ardern, Louise H. Moncla, Ming-Hsueh Lin, Lisa Mirabello, and Meredith Yeager for feedback.

Author information

Authors and Affiliations

Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, 20850, USA
Chase W. Nelson
Institute for Comparative Genomics, American Museum of Natural History, New York, NY, 10024, USA
Chase W. Nelson
School of Public Health, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
Leo L. M. Poon & Haogao Gu
Centre for Immunology & Infection, Hong Kong Science and Technology Park, Hong Kong SAR, China
Leo L. M. Poon
HKU- Pasteur Research Pole, School of Public Health, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
Leo L. M. Poon

Authors

Chase W. Nelson
View author publications
You can also search for this author in PubMed Google Scholar
Leo L. M. Poon
View author publications
You can also search for this author in PubMed Google Scholar
Haogao Gu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

C.W.N., L.L.M.P., and H.G. conceived of the project and wrote the manuscript; C.W.N. performed simulations and analyses; L.L.M.P. provided funding for the project.

Corresponding author

Correspondence to Leo L. M. Poon.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Reporting Summary

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Nelson, C.W., Poon, L.L.M. & Gu, H. Reply to: Population genetic considerations regarding the interpretation of within-patient SARS-CoV-2 polymorphism data. Nat Commun 15, 3239 (2024). https://doi.org/10.1038/s41467-024-46262-3

Download citation

Received: 30 June 2023
Accepted: 21 February 2024
Published: 16 April 2024
DOI: https://doi.org/10.1038/s41467-024-46262-3

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Reply to: Population genetic considerations regarding the interpretation of within-patient SARS-CoV-2 polymorphism data

Subjects

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Supplementary Information

Reporting Summary

Source data

Source Data

Rights and permissions

About this article

Cite this article

Comments

Within-host genetic diversity of SARS-CoV-2 lineages in unvaccinated and vaccinated individuals

Search

Quick links

Subjects

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Supplementary Information

Reporting Summary

Source data

Source Data

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links