Characterizing the underlying demographic histories and selective pressures shaping the evolutionary trajectories of both extant and extinct species is a focal point of population genetics. When the organism in question is a virus, this demographic history will include the infection (and, when applicable, compartmentalization) dynamics of within-host populations (Jensen 2021). With regards to selection, a major focus of the human pathogen literature is on positive selection, be it identifying mutations conferring drug resistance or underlying immune-evasion (see review of Irwin et al. 2016a). However, despite receiving less attention in this literature, it is well-understood that regardless of the organism in question, most new fitness-impacting mutations have deleterious effects (e.g., Crow 1993; Lynch et al. 1999; Bank et al. 2014b; and see reviews of Eyre-Walker and Keightley 2007; Bank et al. 2014a). The removal of these deleterious mutations via purifying selection is expected to reduce the effective population size to an extent largely dictated both by recombination rates and the strength of selection (Charlesworth et al. 1993; Charlesworth 2013; and see review of Charlesworth and Jensen 2021). Furthermore, linkage to this abundant input of deleterious mutations may impact the fixation probabilities of alleles at other genomic sites, including reducing the likelihood of adaptive fixations (Hill and Robertson 1966; Pénisson et al. 2017).

Yet, in order to quantify how deleterious mutations impact the evolutionary trajectory of a population, it is necessary to understand the shape of the distribution of fitness effects (DFE) of new mutations entering the population (i.e., the selective effects characterizing newly arising mutants). Herein lies a challenge, as the accurate estimation of the DFE is a difficult task. There are three general approaches for such inference. The most direct is achieved via site-directed mutagenesis - measuring the fitness effect, one at a time or in combination, of an artificially created mutation(s) on an otherwise wildtype background under set environmental conditions (e.g., Fowler et al. 2010; Hietpas et al. 2011, 2012; Bank et al. 2014b). As this method is experimentally demanding, the DFE can generally only be obtained for a highly localized genomic region, and the approach is only feasible in certain experimentally tractable organisms. In order to characterize genome-wide effects, mutation accumulation (MA) studies are also used to experimentally infer the DFE. By allowing mutations to accumulate over time - generally under minimal selection - the fitness of the resulting lines can be compared allowing for inference of underlying selective effects (e.g., Lynch et al. 2016; Long et al. 2018). As opposed to the directed-mutagenesis approach, MA lines may obtain poor inference of the most strongly deleterious classes of mutations, as they would not be expected to segregate in the population. Relatedly, by directly tracking individual allele frequencies over time in these experimental lines, selective effects may be inferred based on observed per-generation allele frequency changes; namely, by first determining if the extent of change is consistent with genetic drift alone, and, if not, fitting a selection coefficient to match the degree of change (e.g., Foll et al. 2014; Ferrer-Admetlla et al. 2016).

While these experimental approaches have provided major insights into the general shape of the DFE, they are naturally restricted to laboratory settings. As such, a final class of methods has been developed to estimate the DFE from natural population polymorphism data - both for common single time-point (e.g., Keightley and Eyre-Walker 2007; Schneider et al. 2011; Tataru et al. 2017) as well as for time-sampled data (e.g., Malaspinas et al. 2012; Mathieson and McVean 2013; Acevedo et al. 2014; Foll et al. 2015; Sohail et al. 2021). The earliest class of approach relies on synonymous sites in coding regions to estimate the demographic effects, and conditional on that inferred history, a DFE is fit to the observed data from non-synonymous sites. As such, these methods rely on the neutrality of the former, which has been called into question in many organisms (e.g., Chamary and Hurst 2005; Zeng and Charlesworth 2010; Lawrie et al. 2013; Choi and Aquadro 2016; Jackson et al. 2017), including in viruses (e.g., Canale et al. 2018). Equally importantly, these methods neglect the fact that levels and patterns of variation at synonymous sites may be shaped by their linkage to directly selected non-synonymous sites - this neglect of these background selection effects may result in serious mis-inference of the demographic history (Zeng 2013; Ewing and Jensen 2016). In order to account for these factors, more recent methods have been developed to jointly infer the demographic history simultaneously with the underlying DFE, without making assumptions about the selective effects of any particular class of sites. For example, the approximate Bayesian (ABC) approach recently proposed by Johri et al. (2020) was shown to obtain accurate DFE inference using single time-point datasets, while uniquely accounting for the effects of background selection and the potential non-neutrality of synonymous sites (and see Johri et al. 2021). Importantly, background selection expectations are themselves incorporated into the inference procedure; as such, the full DFE of newly arising mutations is estimated, even if, for example, the strongly deleterious mutations comprising the most deleterious class are not themselves sampled as polymorphic sites.

With such estimators now at hand - which have previously only been applied to model organisms (e.g., Drosophila melanogaster) - we here examine the utility of these DFE inference procedures for the study of viruses. In order to do so, we have modified the framework of Johri et al. (2020) to account for the violation of common Wright-Fisher assumptions inherent to viruses - namely, highly skewed progeny distributions (Irwin et al. 2016b; Matuszewski et al. 2018; Sackman et al. 2019; and see Vahey and Fletcher 2019). Owing both to the reduction in effective population size (increasing the effects of genetic drift / decreasing the efficacy of selection) as well as changing expectations in the shape of the site frequency spectrum, such progeny skew is known to result in potential mis-inference of both the DFE and demography. For the sake of illustration, Fig. 1 presents the results of a commonly used estimator of the DFE together with population size change (DFE-alpha; Keightley and Eyre-Walker 2007) - an approach designed neither to account for background selection effects nor skewed progeny distributions - when applied to populations characterized by differing levels of progeny skew. As shown, in the absence of a violation of progeny distribution assumptions, DFE inference is correct, but the inference of population size is incorrect. Namely, and consistent with previous results, population growth is inferred for these constant-sized populations, owing to the left-skewed frequency spectra being generated by the unaccounted for background selection effects (Ewing and Jensen 2016; Johri et al. 2021). With the addition of progeny skew, DFE inference also becomes incorrect for the weakly and moderately deleterious classes, with a bias towards under-estimation owing to the increased effects of genetic drift under these models. For the strongly deleterious class, inference remains largely accurate, as purifying selection is sufficiently strong relative to genetic drift.

Fig. 1: Effects of progeny-skew and background selection on the inference of demography and the DFE, using the DFE-alpha approach (Keightley and Eyre-Walker 2007).
figure 1

The left panels show the inference of the DFE while the right panels show the inference of fold-change in population size. Inference is shown when 30% of new mutations are neutral, and the remainder are: (A) weakly deleterious, (B) moderately deleterious, and (C) strongly deleterious. Estimates are shown only for the selected classes. Black bars depict true values, gray bars show inference in the absence of progeny skew (thus no violation of the assumption), and the blue bars correspond to populations with levels of progeny skew characterized by ψ = 0.075 (light blue) and ψ = 0.15 (dark blue).

To provide examples of how DFE inference may be more accurately performed in organisms characterized by progeny skew - as well as to highlight the generality of the approach - we present estimates, together with the necessary power analyses, for two viruses of critical public health concern: a patient-sampled DNA virus (human cytomegalovirus (HCMV)), and an experimentally-passaged RNA virus (influenza A virus (IAV)). Finally, we discuss how inferred DFEs may be utilized to generate appropriate null expectations, against which patterns of positive selection (e.g., resistance evolution) may be better characterized, as well as how these estimates may be leveraged in the design of novel therapeutics.



Influenza A virus (IAV)

Published control line data was taken from Renzette et al. (2014) and Foll et al. (2014). In brief, viruses were serially passaged in MDCK cells, with a multiplicity of infection (MOI) of 0.01 for the initial infection. Viral titers were determined by plaque assay per passage in order to maintain a constant MOI. Here we analyzed 13 passages in total in which high-throughput Illumina sequencing was performed. For each control replicate analyzed, we down-sampled all sites to a coverage of 1000 for the calculation of per-site allele frequencies, only bi-allelic SNPs were retained, and only SNPs >2% frequency were considered in order to eliminate false-inference owing to sequencing errors. For the PB2 segment chosen, a mean of 518 SNPs was observed across control line replicates, with a minimum of 48 and maximum of 980 SNPs after filtering.

Human cytomegalovirus (HCMV)

Published patient data (patient_B103) was taken from Renzette et al. (2013). In brief, serial specimens were collected from HCMV-infected patients at the University of Minnesota Medical Center or the University of Massachusetts Memorial Health Center, amplification was performed and quantified, and whole genomes were sequenced on Illumina technology. For this study, we chose the urine sample collected at 6-months post-birth from a congenitally infected infant. We aligned the sequence reads from the sample to the Merlin strain reference genome (Ref Seq ID: NC_006273). Whole-genome alignments were generated using the Burrows Wheeler Aligner v.0.7.17 mem algorithm (Li and Durbin 2009). Reads were sorted using SAMtools v.1.9 (Li et al. 2009), and duplicates were marked using sambamba v. 0.7.1 (Tarasov et al. 2015). Based on the genomic scan of Renzette et al. (2013), we chose a genomic region with little evidence of positive selection. Namely, we considered nucleotide positions 62,500–86,000 (23,500 nucleotides length) representing a 10th portion of the total genome size.

We used Freebayes v.1.3.2-dirty (Garrison and Marth 2012) for variant calling. SNP calls were only made if the mapping quality of the read was ≥30 and the Phred score for base quality ≥20. Further, we required that each SNP was supported with no fewer than one read for each sequencing direction. We discarded SNPs for which we detected evidence for strand bias in the supporting reads, down-sampled all sites to 100 in order to estimate allele frequencies from allele counts, and only SNPs with frequencies ≥2% were considered. This filtering resulted in 449 SNPs for the region analyzed.


We conducted forward-in-time simulations using the SLIM version 3 software package (Haller and Messer 2019). DFE estimation was based on Johri et al. (2020), with an extension to account for skewed progeny distributions. Functional genomic elements of 2314 bp and 23500 bp, for IAV and HCMV respectively, were simulated under a variety of discrete DFEs comprising four fixed bins representing effectively neutral (0≤|2Nes|<1; referred to as f0), weakly deleterious (1≤|2Nes|<10; referred to as f1), moderately deleterious (10≤|2Nes|<100; referred to as f2) and strongly deleterious/lethal mutations (100≤|2Nes|≤2Ne; referred to as f3). Ne was assumed to be 176 for the experimentally passaged IAV data as previously estimated by Foll et al. (2014), based on time-sampled neutral allele frequency change. For HCMV, we used the inferred size of 104 haploid individuals for the purpose of scaling selection coefficients (Renzette et al. 2013). Selection coefficients were sampled uniformly within each bin of the DFE, and the shape of the DFE was proportionally varied by each class of mutation (f0, f1, f2, f3), such that Σi fi = 1. In order to perform ABC, fi were sampled uniformly such that f0 [0, 1], f1 [0, 1 − f0] and f2 [0, 1 − f0 − f1].

IAV simulations

We simulated a population that grew from a single virion to N = 1 × 106 and then experienced twelve population bottlenecks that varied in intensity, corresponding to the experimental passaging as described in Table 1 of Foll et al. (2014). After each bottleneck, the population grew exponentially, reaching a size of 106 in 13 generations. Given that this was an experimental population, these demographic details are well-characterized. In total, the full 183 generations of the experiment were simulated (Fig. 2A). A constant progeny skew value of (ψ) = 6.7% every generation was imposed, as previously estimated by Sackman et al. (2019).

Fig. 2: Graphical representation of the demographic models of IAV and HCMV.
figure 2

A IAV: in which the population size changes correspond to the experimental passaging. B HCMV: in which the size changes correspond to the initial infection and subsequent compartmentalization.

As previous experiments measured the neutral mutation rate (e.g., Sanjuán et al. 2010), rather than the total mutation rate (that is, including the range of newly arising deleterious mutations), we first simulated 200 different parameter combinations of f0, f1, f2, and f3, for mutation rates of 1 × 10−5, 1 × 10−4 and 1 × 10−3. As 1 × 10−3 was most consistent with the observed statistics, this total rate was fixed for subsequent analysis (Supplementary Table 1). Drawing from these prior distributions, 500 points (i.e., parameter combinations) were sampled. For each parameter combination, we conducted 100 replicates in order to characterize both the mean and variance of summary statistics. In order to match the empirical data, 1000 individuals were sampled, and alleles >2% frequency were considered in the simulated data.

HCMV simulations

Following an initial burn-in period of 10 N generations, we considered a four-stage demographic model characterizing a single patient infection, following Renzette et al. (2013) and Pokalyuk et al. (2017): (1) a neutral equilibrium ancestral population of size N, (2) an initial infection bottleneck (B1) from the mother to the plasma of the fetus, leading to an instantaneous population reduction to size N2, (3) a subsequent population size recovery to size N and, (4) a final bottleneck (B2) representing infection from the plasma into the urine compartment, leading to a second reduction to size N3, followed by population size recovery to the initial N (Fig. 2B). As previously inferred, N = 1 × 104, N2 = 433, and N3 = 100. A recombination rate of 1 × 10−7/site/generation (Renzette et al. 2015) and ψ = 6.7% (Sackman et al. 2019) were assumed. Based on the previous estimate of the neutral mutation rate from segregating variation of 2 × 10−7/site/generation (Renzette et al. 2015), simulations were conducted spanning total mutation rates of 1 × 10−7, 1 × 10−6, and 1 × 10−5/site/ generation, in order to infer the total rate that matched the observed data (Supplementary Table 2). Using the best matching 1 × 10−5, 500 replicates were simulated for each DFE draw. In order to mirror the empirical data, 100 individuals were sampled, and alleles >2% frequency were considered for downstream power and performance analyses. Importantly, by simulating data that matches the empirical sampling, we can directly account for the differences in inference power attained in the down-sampled read-depth of 100 in HCMV, relative to that of 1000 in IAV.

DFE-alpha, and an evaluation of the effects of unaccounted for background selection and progeny skew

Inference was performed using DFE-alpha 2.16 (Keightley and Eyre-Walker 2007) using the folded and pooled site frequency spectrum (SFS). Under this approach, the SFS of segregating neutral mutations was used to infer the demographic history (2-epoch size change), and the SFS at directly selected sites was used to infer the DFE conditional on that inferred history. For the first step, both the current population size and the time of change (with an initial value of 50 generations) were allowed to vary. When inferring the DFE, the initial value of mean s and β were assumed to be −0.1 and 0.5 respectively, where s is the selection coefficient and β is the shape parameter of the assumed gamma distribution of s. The inferred DFE was scaled with respect to the current population size (N2 in DFE-alpha).

The approach was evaluated by simulating N = 10,000 diploid individuals in SLiM 3.1 (Haller and Messer 2019) under constant population size, with mutation rate = 1 × 10−5 per site/generation and the recombination rate = 1 × 10−8 per site/generation. A 10 kb region was simulated with 30% of all mutations being neutral (i.e., s = 0), and the remainder of the sites experiencing three different deleterious DFEs: (a) a uniform distribution between 1 ≤ Ns < 10 (weakly deleterious); (b) a uniform distribution between 10 ≤ Ns < 100 (moderately deleterious); and (c) a uniform distribution between 100 ≤ Ns < N (strongly deleterious). Simulations were run for 10N generations and 50 diploid individuals were sampled with 10 replicates for each evolutionary scenario. In order to test the effect of progeny-skew on DFE inference, populations with ψ = 0.075 and ψ = 0.15 were also simulated.

Calculation of summary statistics and ABC

For the entire genomic elements simulated, the mean and variance of the following statistics were calculated: number of segregating sites (S), nucleotide site diversity (π), Watterson’s θ, Tajima’s D, and Fay and Wu’s H (both absolute and normalized), using the Python package pylibseq 0.2.3 (Thornton 2003). While all summary statistics were used for inference in IAV, HCMV inference was only based on a subset of statistics (S, π, Watterson’s θ, Tajima’s D), owing to the lack of information needed to unfold the SFS. ABC inference was performed using the “abc” package in R (Csillery et al. 2012) When a large number of summary statistics are used, it can be difficult to find sufficient number of simulations that match the observed data and thus ABC methods suffer from the curse of dimensionality. In order to deal with this issue, one can use larger acceptance rates and then perform linear local adjustment to correct for the discrepancy between the simulated and observed summary statistics by weighting the accepted simulations accordingly. In the “abc” package, this discrepancy can be accounted for by using ridge regression (that assumes a linear relationship between the parameters and statistics) or by using neural nets (that can account for non-linear relationships between the parameters and statistics) which can also reduce the dimensionality, and thus deal with multicollinearity (i.e., highly correlated statistics). Neural net was used to perform inference in IAV, owing to superior performance over ridge regression in this parameter space. Conversely, for the HCMV parameter space, ridge regression performed better (e.g., absolute error in performance of ABC using ridge regression vs neural net was 0.137 vs 0.157 for f0; 0.155 vs 0.176 for f1; 0.047 vs 0.044 for f2; and 0.031 vs 0.026 for f3 respectively). A 50-fold cross-validation procedure was employed to choose the appropriate tolerance levels, such that 1 randomly chosen simulation was excluded and its parameters were inferred using n−1 simulations, where n is the total number of simulations. A tolerance of 0.05, and weighted medians of the posterior distribution, were used to determine point estimates of the inferred parameters.


The DFE of newly arising mutations was inferred from experimental populations of a reassorting RNA virus (IAV), as well as from a patient-population of a recombining DNA virus (HCMV). The DFE was modeled as a discrete distribution with four fixed bins - the effectively neutral, mildly deleterious, moderately deleterious, and strongly deleterious classes of mutation (see Methods). By varying the proportion of each class of mutations referred to as f0, f1, f2, and f3, respectively, all possible DFE shapes could be considered for their respective fit to the observed data. In addition, the history of population size change, and progeny skew, was directly and uniquely accounted for in this inference scheme. As both of these neutral processes additionally act to shape levels and patterns of variation, their frequent neglect in such analyses has been shown to lead to a serious mis-inference of the contribution of selection (e.g., Teshima et al. 2006; Thornton and Jensen 2007; Mathew and Jensen 2015; Harris et al. 2018; Sackman et al. 2019, and see Jensen et al. 2019).

Inference of the DFE in IAV

The genome of IAV is composed of eight segments (for a total length of ~13 kb) that can be exchanged by reassortment (Palese and Young 1982; Dadonaite et al. 2019). From an infected cell, IAV progeny release occurs every ~6 h and previous estimates suggest a mutation rate of 2.3 × 10−5 per site/cell infection cycle (Parvin et al. 1986; Sanjuán et al. 2010; Abdoli et al. 2013). Though the IAV population in question has a large census size (N), the effective population size (Ne) has been inferred to be only on the order of 103 (Foll et al. 2014, 2015; and see Poon et al. 2016) - a disparity that likely owes to a combination of factors including strong purifying and positive selection, severe bottlenecks, as well as progeny skew. This observation is consistent with the general notion that Ne is much more strongly constrained than N across organisms, suggesting an upper-bound to the efficacy of natural selection (Lynch 2007; Lynch and Trickovic 2020).

In order to directly compare results with the time-sampled approaches of Foll et al. (2014) and Ferrer-Admetlla et al. (2016), we utilized the same experimentally passaged dataset (see Methods). As we here focused on characterizing the neutral and deleterious DFE distribution, we sought to minimize the effects of positive selection. As such, we used the control lines from each experiment (that is, passaged in the absence of drug treatment), as well as the PB2 segment (2,314nt in length) as scant evidence of positive selection has been observed in this region (Renzette et al. 2014; Foll et al. 2014). Given that the populations under consideration in IAV were of oscillating size owing to experimental passaging (Fig. 2A), it was first necessary to assess the performance of the statistical inference scheme under this rather unique demographic history. Helpfully, the census population sizes at each passage are known experimentally, allowing many aspects of this model to be fixed when performing inference.

In addition, the degree of progeny skew (ψ) has been previously inferred to be ~7% (Sackman et al. 2019), and was here treated as a fixed parameter. This skew is expected to result in an excess of both rare as well as high-frequency alleles relative to the standard Wright-Fisher expectation, resulting in a U-shaped frequency spectrum (Eldon and Wakeley 2006; Eldon et al. 2015; Blath et al. 2016; Matuszewski et al. 2018). As purifying selection is also expected to result in a higher proportion of rare alleles, inferring an accurate DFE could therefore be challenging under this model. However, this combination of summary statistics was found to enable reasonably accurate inference of all four parameters of the DFE (f0, f1, f2, and f3; Fig. 3A). Moderately and strongly deleterious mutations were estimated with the highest accuracy with mean absolute errors of 0.037 and 0.028, respectively, while the proportion of mildly deleterious and neutral mutations were associated with the largest variance (absolute errors of 0.107 and 0.093, respectively).

Fig. 3: Inference of the DFE in the IAV population.
figure 3

A Power and performance analyses concerning the inference of the DFE in an oscillating population size model mirroring that of the experimental IAV populations in question. The black diagonal lines represent the points at which estimated parameters match their true values, where f0 is the proportion of new mutations in the neutral class, f1 is the proportion of the weakly deleterious class, f2 is the proportion of the moderately deleterious class, and f3 is proportion of the strongly deleterious/lethal class. B Posterior estimates of the parameters of the DFE in an experimental population of IAV. Dashed lines indicate the distribution of sampled priors, the histograms present the posterior distribution, and the red vertical lines show the point estimates calculated as the weighted median.

With this statistical performance quantified under the appropriate demographic model, we evaluated the experimental data. Thirteen experimental replicate lines were used to obtain means and standard deviations of all statistics utilized in the ABC method. In order to first fit an accurate total mutation rate (that is, including strongly deleterious mutations not observed in MA lines, and not expected to have contributed to divergence as measured by phylogenetic estimates), we simulated a range of mutation rates. We found the that the summary statistics obtained from simulations on the order of 10−3 per site/generation to be highly consistent with the observed data (see Supplementary Table 1).

Using this inferred mutation rate, the estimated DFE is characterized by a strong skew towards the neutral class of mutations, with f0 = 0.78, f1 = 0.16, f2 ~ 0, and f3 = 0.06 (Fig. 3B). These estimates are similar to the DFE inferred using an alternative Markov model-based approach (Ferrer-Admetlla et al. 2016), and the general bi-modal shape of the DFE is also consistent with the mutational effects estimated by Visher et al. (2016) using genome-wide data. Though it may at first seem peculiar that more than 90% of mutations are inferred to be neutral or weakly deleterious, and that only ~6% of mutations are strongly deleterious, it is worth reiterating that this analysis is based on control lines without any selective challenge. Further, the segment analyzed (PB2), which is thought to play a primary role in the genome packing process (Muramoto et al. 2006), was chosen as it appeared to be evolving in a relatively neutral manner. As such, this inferred DFE likely reflects largely biophysical constraints on potential amino acid changes (see Discussion). For example, the whole-genome DFE inferred by Visher et al. (2016) referenced above estimated greater density in the strongly deleterious class, likely owing in part to the inclusion of more strongly constrained segments across the genome, as well as their directed mutagenesis approach (as opposed to the mutation-accumulation data considered here).

Inference of the DFE in HCMV

HCMV is a linear DNA virus with a genome size nearly 20x that of IAV (~236 kb; Dolan et al. 2004). HCMV presents a high level of genetic diversity relative to other DNA viruses (Renzette et al. 2011, 2015; and see Sackman et al. 2018; Jensen and Kowalik 2020), despite a slowly replicating process of ~24 h that includes encoding a polymerase with proofreading capacity (Nishiyama et al. 1983). Additionally, HCMV compartmentalizes within a host, and diversifies strongly between compartments (Renzette et al. 2013). Furthermore, previous studies have suggested important roles of both purifying and background selection, as well as episodic positive selection, in shaping HCMV genetic diversity (Renzette et al. 2016, 2017; Hage et al. 2017).

The genetic data available from HCMV represents a within-patient population sample collected from the urine of a congenitally infected infant (see Methods). The population has previously been inferred to have experienced multiple bottlenecks - corresponding to the initial infection and subsequent compartmentalization (Renzette et al. 2011, 2013). As such, it was again necessary to evaluate whether the DFE could be accurately inferred under such a complex demographic history. The specific demographic history (Fig. 2B) inferred by Renzette et al. (2011) was fixed for HCMV, while the four parameters of the DFE were varied in order to perform power and performance simulations within the ABC framework. Under this demographic model, HCMV populations experience extremely rapid recent growth post-infection, which is expected to result in a strong excess of rare alleles, potentially mimicking the effects of purifying selection. As observed in our cross-validation (Fig. 4A), our estimate of the proportion of strongly (absolute error: 0.030) and moderately (absolute error: 0.047) deleterious mutations is quite accurate, however the method performs less accurately when distinguishing between neutral (absolute error: 0.138) and mildly deleterious mutations (absolute error: 0.151), likely owing to this conflation of processes contributing to rare alleles.

Fig. 4: Inference of the DFE in the HCMV population.
figure 4

A Power and performance analyses concerning the inference of the DFE under an infection scenario mirroring that previously inferred for the HCMV population in question. The black diagonal lines represent the points at which estimated parameters match their true values, in which f0 is the proportion of new mutations in the neutral class, f1 is the proportion of the weakly deleterious class, f2 is the proportion of the moderately deleterious class, and f3 is the proportion of the strongly deleterious/lethal class. B Posterior estimates of the parameters of the DFE in a patient population of HCMV. Dashed lines indicate the distribution of sampled priors, the histograms present the posterior distributions, and the red vertical lines show the point estimates defined as the weighted median.

Turning to the empirical data, a total mutation rate of 1 × 10−5 provided a good match of the simulated summary statistics to the observed data (Supplementary Table 2), and was thus fixed in subsequent inference. We inferred the four parameters corresponding to the DFE of new mutations in HCMV to be f0 = 0.51, f1 = 0.24, f2 = 0.12, and f3 = 0.13 (Fig. 4B). It should be noted that because our method has less accuracy under this demographic model for estimating f0 and f1, the underlying posterior distributions are fairly broad. However, inference confidently suggests that ~70% of all new mutations are neutral or weakly deleterious, with the remaining being moderately and strongly deleterious.


We here provide an example of estimating the DFE from both patient-sampled as well as passaged viral populations, using a commonly studied RNA (IAV) and DNA (HCMV) virus for the purposes of illustration. To do so, we suggest a modification of the joint approximate Bayesian estimator of Johri et al. (2020) in order to account for the non-Wright-Fisher replication dynamics of viruses. It should be noted that there are currently no standard methods for inferring the DFE for such populations; thus, although our proposed method assumed a fixed measure of progeny skew, it is nonetheless a first step towards incorporating such life history traits of microbial organisms. As the inference of selection may be strongly confounded by demography, we conducted performance analyses under the strongly non-equilibrium experimental (IAV) and infection (HCMV) histories known for these samples, thereby quantifying uncertainty in the resulting DFE estimates. These two examples indicate the broader applicability of this framework for the study of pathogen evolution.

The DFE estimates provided by this analysis speak to a number of points of evolutionary interest. Firstly, estimates suggest that ~90% of new mutations in the PB2 segment of the experimental IAV population are neutral or weakly deleterious. This large fraction probably partly owes to the fact that inference was performed on control populations evolved in the absence of any experimental challenge, and that the segment itself was chosen for its lack of evidence of strongly selected sites in earlier studies. As such, the fraction of strongly deleterious sites inferred likely represents a biophysical/biochemical constraint on potential amino acid changes (e.g., Shakhnovich 2006; Zeldovich et al. 2007). For comparison, in the patient population of HCMV - in which the population is, at a minimum, exposed to immune pressures - the fraction of neutral and weakly deleterious sites was estimated at 70%, with nearly a third of all mutations in the genomic region under study being inferred to experience strong purifying selection. Accounting for this full DFE, as well as the diversity-reducing effects of progeny skew, also suggests somewhat faster mutation rates than have been estimated using phylogenetic approaches or MA lines - both of which will disproportionately measure neutral mutation rates at the neglect of strongly deleterious rates.

Secondly, inter-virus comparisons of this sort will allow for a consideration of the effects, and evolution, of recombination and mutation rates themselves. For example, as HCMV frequently recombines (Renzette et al. 2016), while IAV rather reassorts between segments, Hill-Robertson effects (Hill and Robertson 1966; and see Muller 1964; Felsenstein 1974) may be expected to differ substantially between these two example viruses. Specifically, as recombination breaks up linkage effects allowing natural selection to more efficiently purge deleterious variants, HCMV might be expected to better tolerate a higher deleterious input (as here inferred, and see Lynch et al. 1995; Charlesworth and Charlesworth 1998).

Thirdly, these comparisons also allow for an examination of genome size determinants, which itself tends to scale inversely with mutation rate (e.g., Drake 1991; Lynch 2010; and see Gago et al. 2009; Bradwell et al. 2013). In our examples, the per-site mutation rate in the smaller IAV genome (~13 kb) is expected to be larger than HCMV (~236 kb), as has long been known. Specifically, given that selection acts on the genome-wide input of deleterious variants (Kimura 1967), the smaller IAV genome would be expected to survive a higher per-site input than the larger HCMV genome. Notably, given the DFE and mutation rates estimated here, as well as the genome sizes, the expected per-replication deleterious mutational input of moderately and strongly deleterious variants is roughly similar between the two viruses (on the order of 1-2 deleterious mutations per genome/replication). However, owing to multiple differences in polymerase and proof-reading activity amongst viruses, accumulating larger numbers of examples will be important in order to determine the generality of this genome-wide input.

Finally, the underlying shape of the DFE has important clinical implications as well. This particularly relates to the concept of mutational meltdown (Lynch and Gabriel 1990; Gabriel et al. 1993; Lynch et al. 1993), the ability to induce it in viral populations (e.g., Lynch et al. 1995; Bank et al. 2016; Jensen et al. 2020), and the specific outcome of lethal mutagenesis (e.g., Crotty et al. 2001; Bull et al. 2007; Wylie and Shakhnovich 2012; and see review of Matuszewski et al. 2017). In short, high mutation rates can overwhelm the ability of natural selection to purge deleterious variants, even in large recombining populations, and this increasing deleterious load can result in a snowball effect leading towards ultimate extinction. This transition to mutation-driven rather than genetic drift-driven meltdown occurs when the deleterious mutation rate is ~1 per individual per generation (Lynch et al. 1993) - similar to the numbers for IAV and HCMV reported here. However, the shape of the underlying DFE is critical in understanding the likelihood of meltdown, the required therapeutic increase in mutation rates necessary to induce it, and the expected time to within-host viral extinction. Promisingly, and further supporting the result that IAV and HCMV may reside near a mutational boundary - hence making them particularly susceptible to therapeutic meltdown - recent results have demonstrated that experimentally increasing mutation rates in IAV using the mutation-inducing drug favipiravir can indeed induce meltdown dynamics and ultimate extinction (Baranovich et al. 2013; Bank et al. 2016; Ormond et al. 2017; Goldhill et al. 2018). Similar investigations have been suggested, and are currently underway, as a novel treatment strategy for SARS-CoV2 (Sheahan et al. 2020; Jensen and Lynch 2020; Santiago and Caballero 2020; Jensen et al. 2020). Future characterizations of the DFE across viruses currently posing critical public health threats, of the variety presented here, will be essential for further exploring meltdown as a generally applicable therapeutic.

Data accessibility

Full simulation and summary statistic results are available on GitHub: