Introduction

Solid knowledge on the rate (U) at which deleterious mutations arise per genome per generation and their average effects (dominance and selection coefficients and ) is important in many subdisciplines in biology (see Deng, 1998 for references). However, despite extensive efforts in the past few decades (Morton et al, 1956; Mukai et al, 1972; Charlesworth et al, 1990a; Deng and Lynch, 1996, 1997; Drake et al, 1998; Nachman and Crowell, 2000), the data are still lacking. The few limited results are not consistent with each other. Even the order of the magnitude of the parameters of deleterious genomic mutations (DGM) is still under debate (Crow and Simmons, 1983; Kondrashov, 1988; Crow, 1993a, 1993b; Peck and Eyre-Walker, 1997; Eyre-Walker and Keightley, 1999; Garcia-Dorado et al, 1999; Lynch et al, 1999; Caballero et al, 2002; Charlesworth et al, 2004; Fry, 2004; Houle and Nuzhdin, 2004).

Several methods have been proposed to obtain the estimates of DGM parameters, including mutation accumulation approach, inbreeding depression approach, the Deng–Lynch method, and a molecular-based method (Morton et al, 1956; Bateman, 1959; Mukai et al, 1972; Charlesworth et al, 1990a; Kondrashov and Crow, 1993; Keightley, 1996; Deng and Lynch, 1996). By minimizing selection, the mutation accumulation approach (Bateman, 1959; Mukai et al, 1972) uses changes of mean and variance of fitness (ΔM and ΔV) to estimate bounds on U (≥ΔM2V) and s (≤ΔVM). When mutational effects are assumed to follow certain distributions and some parameters known, Keightley (1994) used maximum-likelihood method to estimate other mutation parameters. The inbreeding depression approach (Morton et al, 1956; Charlesworth et al, 1990a) estimates U when h is known. Another indirect method is the Deng and Lynch (1996) method, which estimates U, h, and s simultaneously, without assuming any unknown mutation parameters to be known. The molecular-based method developed by Kondrashov and Crow (1993) uses the difference on substitution rates for synonymous and nonsynonymous mutations.

For each of the above approaches, both advantages and disadvantages exist. Mutation accumulation method is quite straightforward, but is labor intensive. The inbreeding depression approach only estimates U and has to assume that h is known. The molecular approach has some pitfalls (Kondrashov and Crow, 1993) and its statistical properties (the direction and magnitude of bias and sampling variance) are generally not known and difficult to ascertain. The Deng–Lynch method (an extension of the Morton–Charlesworth method) estimates all the parameters of U, , and simultaneously in outcrossing or selfing populations and has desirable statistical properties (Deng and Fu, 1998; Deng et al, 1999), although it may have its limitation. With reasonable time and labor, an application of the Deng–Lynch method in Daphnia (Deng and Lynch, 1997), a freshwater microcrustacean, yielded estimates for parameters of DGM similar to those obtained in Drosophila via arduous mutation accumulation experiments (Drake et al, 1998).

All of the available approaches are developed under some ideal assumptions that may be violated when applied to biological situations (Deng and Fu, 1998; Deng et al, 1999; Deng and Li, 2001) (and consequently biased estimators may be obtained). These assumptions are often necessary in order to derive estimators of the mutational parameters. However, for these approaches to be applicable in practice, the statistical properties of the estimators should be investigated under various biologically plausible scenarios. Only by quantifying the magnitude and the direction of bias, and the magnitude of sampling variance of an estimation method under practical situations, can the applicability and the robustness of the estimation method be correctly assessed. Such an investigation is also necessary in order to infer correctly the magnitude of the true mutation parameters by stating the correct bounds on them (up- or downwards) when unbiased estimates are almost impossible to obtain.

The Deng–Lynch method has been investigated a number of situations, such as with variable and epistatic mutation effects (Deng and Lynch, 1996, 1997), with overdominance mutations maintained by balancing selection (Li et al, 1999), with partial selfing or outcrossing populations, and with nonequilibrium large populations approaching mutation–selection (M–S) balance (Li and Deng, 2000). All the previous investigations assume infinite population size and unlinked loci with deleterious mutations that are in linkage equilibrium (LD). However, no natural populations are infinite in size, and LD for fitness related loci may exist even in large populations (Lynch and Deng, 1994; Deng and Lynch, 1996).

To investigate the applicability of the Deng–Lynch method to estimating the parameters (U, , and ) of DGM in finite populations in the presence of LD, its performance is investigated in this study. We obtained the results on the direction and the magnitude of bias and sampling variation of the estimates by extensive computer simulations in finite populations of various sizes. These simulations were performed under various situations (constant and variable mutation effects) and various parameters of DGM, population size, and recombination rate among adjacent loci. Our results here may provide a basis for the applicability, feasibility, and the robustness of characterizing DGM in natural populations of finite sizes by the Deng–Lynch method.

Theory

With the assumptions of infinite population size, unlinked loci under M–S balance with LD, and with multiplicative fitness functions, the Deng–Lynch method (Deng and Lynch, 1996) estimates U, , and simultaneously in completely outcrossing or completely selfing populations. Letting a circumflex(^) denote an estimate throughout, in outcrossing populations, we have

In selfing populations,

where x, y, and z are defined, respectively, as

where (O) and σ(O)2 are, respectively, the mean and genetic variance of fitness for outcrossed generations (parents in outcrossing populations or progenies in selfing populations); (S) and σ(S)2 are, respectively, the mean and genetic variance of fitness for selfed generations (selfed progeny families in outcrossing populations or parents in selfing populations).

Note that formula (5) in Deng and Fu (1998) is incorrect due to a typing error, where the ‘2’ within parenthesis under the square root symbol in the estimation formulas for h should be outside the square root symbol. The ‘z’ in the numerator in the estimation formulas for s should be ‘x’. In addition, in Appendix A1 of Deng and Lynch (1997), there should be a factor of ‘ŷ’ in the numerator on the right-hand side of formula (A4b).

The experimental procedures for outcrossing and selfing populations are detailed previously (Deng and Lynch, 1996, 1997; Deng and Fu, 1998; Deng, 1998) and thus will not be elaborated here. Briefly, in outcrossing populations: (1) A sample of genotypes are selfed to obtain a number of selfed progenies from each parent. (2) Parental genotypes are cloned. Genotypes from both generations are assayed together in one common environment to estimate (O) and (S). (3) One-way analyses of variance (ANOVA) are performed. In the outcrossed parental generation, parental genotypes are treated as between-group effects and clonal replicates as within-group effects, so that we can obtain the estimate of the genetic variance (σ(O)2). In the selfed offspring generation, selfed families are treated as between-group effects and selfed progeny genotypes within each family as within-group effects, so that we can obtain the genetic variance among the mean of selfed families (σ(S)2). In the ANOVA design, the between-group effects are treated as random, as the goal is to estimate population genetic variance. In selfing populations: (1) Random pairs of genotypes are sampled and outcrossed. (2) The selfed parent and outcrossed progeny genotypes are cloned and assayed in one common environment. (3) One-way ANOVA's are performed, with genotypes as between-group effects and clonal replicates as within-group effects, to estimate the genetic variances in the outcrossed progeny (σ(O)2) and selfed parental (σ(S)2) generations, together with (O) and (S). The estimated means and genetic variances are then used to estimate U, , and (equations (1), (2) and (3)). Since the above procedure is not generally straightforward, the Deng–Lynch method has been extended to outcrossing species where selfing and cloning are not feasible, but sib mating or half-sib mating is (Deng, 1998). However, since the focus of this specific investigation is to study effects of finite population sizes with LD on estimation, we will choose experimental designs using species where individuals can be selfed and cloned.

Simulations

Following the method of Fraser and Burnell (1970), which was also employed by Charlesworth et al (1992, 1993), a stochastic approach is used to simulate finite populations with constant or variable mutation effects. With multiplicative fitness function across loci and starting from homogeneous individuals free of DGM, populations are simulated until they reach a quasi-equilibrium when deleterious mutations are built up in the genome to a steady but fluctuating state (Charlesworth et al, 1992) (a quasi-equilibrium refers to a stage when observed mean and variance of fitness do not change appreciably. The time needed to reach this stage depends on several factors such as DGM parameters and population sizes). Then, following the respective experimental designs described above, the Deng–Lynch method was applied to the simulated outcrossing or selfing populations. Similar to our previous studies (Li et al, 1999; Li and Deng, 2000), we focused on the robustness (the direction and particularly the magnitude of bias) in finite populations with LD for the estimation by the Deng–Lynch method. As supported by our previous investigation (Deng and Lynch, 1996; Deng and Fu, 1998; Deng et al, 1999), the degree of the estimation bias was unlikely to be changed when measurement error for genotypic values was ignored, although this may reduce the sampling error of estimates. Therefore, as in our previous investigations (Li et al, 1999; Li and Deng, 2000), we assumed that the measurement of genotypic values was accurate in simulations. Without confusion, the variance of fitness hereafter refers to genetic variance unless otherwise specified.

Constructing finite populations

Owing to random drift, only quasi-equilibrium (Charlesworth et al, 1992, 1993) can be reached in finite populations. The alleles at different loci for each individual had to be traced. Computer binary sequences consisting of 0's and 1's were used to simulate haplotypes and to store the state of alleles at different loci with 0 for normal alleles and 1 for mutant alleles. In our study, diploid populations were employed and two binary sequences were used to record the state of each individual. In our simulations, the number of loci was fixed to be 5120. This is probably a reasonably large number to represent genomic loci prone to deleterious mutations. The population sizes were generally 1600, unless otherwise specified. The simulated life cycles for generations were similar to that in infinite populations (Li and Deng, 2000) and were outlined below. The simulation was stopped after 1500 simulation cycles when quasi-equilibrium had generally been reached (Charlesworth et al, 1992, 1993).

Starting from a population free of DGM, mutation, mating, and selection were simulated sequentially. First, mutations occurred randomly in the genome of the adult generation, changing the state of a normal allele from 0 to 1 at a locus that has mutated. As in Charlesworth et al (1992), backward mutations were ignored. After mutations occurred, random mating is simulated. A parent and a haplotype of it were randomly chosen. For this haplotype, the allele of the first locus was chosen for a new gamete. In determining the allele on the next locus, a random uniform number t (from 0 to 0.5) was generated. It was compared with the recombination rate θ between the two loci: if t<θ, the allele was derived from the same haplotype; otherwise, it was from the other haplotype of the parent. This procedure was repeated for all the rest of the loci until all alleles had been determined for a new gamete. A second gamete was produced in the same way. The parent for the second gamete was the same in a selfing population and different in an outcrossing population. To maintain viable populations, we employed highly selfing populations with selfing rate 0.95 instead of complete selfing populations (see Discussion for further details). As in Charlesworth et al (1992), no crossing-over interference was assumed. The gametes generated were combined to make a zygote for the next generation. Its survivability was determined by selection. The procedure continued until surviving zygotes reached a preselected population size. The simulation cycle was repeated for 1500 generations.

With constant mutational effects, the dominance coefficient h and the selection coefficient s are the same for all mutations in the genome. The fitness of an individual is given by

where i and j are the number of heterozygous and homozygous mutations, respectively, in the genome. For each population, samples were assayed every 300 generations during the simulations. The Deng–Lynch method was applied to the samples. For both outcrossing and selfing populations, 200 parents were assayed. In outcrossing populations, each parent had 40 selfed progenies; in selfing populations, two random individuals mated and had one offspring. The population was sampled repeatedly 100 times (with replacement) at each assay generation. Owing to the extremely high demand of computation, ‘only’ 10 populations were simulated for each set of parameters. The recombination rate between adjacent loci was 0.1, unless otherwise specified. Since the estimates after 600 simulated generations was very similar to each other, we used the results for generation 1500 for comparisons of the estimates with different simulation parameter sets.

For variable mutation effects, an exponential distribution for the homozygous effects s and an inverse relationship between h and s for mutations at different loci are plausible (Gregory, 1965; Crow and Simmons, 1983; Mackay et al, 1992; Keightley, 1994; Kacser and Burns, 1981) and were employed in Deng and Lynch (1996). The distribution of s and the relationship between h and s were as follows,

The fitness of an individual was given by

where i and j are the number of heterozygous and homozygous loci, respectively, sl and sm are the selection coefficients for the lth heterozygous and mth homozygous loci, respectively, and both follow exponential distributions with means . Note that selection coefficients (s) are variable across loci, but they are fixed over time after generated. The assay procedure was the same as the case for constant mutation effects. Mutation effects on fitness are multiplicative across loci.

Results

Estimation with different population sizes (Table 1)

Table 1 Estimation of mutation parameters in finite populations with different population sizes

With an increasingly large population size, under constant mutation effects, better estimation (with less bias and smaller sampling variance) was achieved. However, with variable mutation effects, smaller sampling variance is achieved with an increasingly large estimation bias. For example, with U=1.0, =0.36, and =0.03, when population sizes range from 400 to 3200, under constant mutation effects, Û is upwardly biased and ranges from 1.19 (±0.09) to 1.04 (±0.02) in outcrossing populations and from 1.50 (±1.18) to 1.23 (±0.06) in highly selfing populations. (Throughout, the numbers within parentheses represent one standard deviation, SD, for the point estimates.) With variable mutation effects, Û is downwardly biased and ranges from 0.67 (±0.07) to 0.60 (±0.03) in outcrossing populations and from 0.82 (±0.26) to 0.65 (±0.05) in highly selfing populations. Despite the relatively minor difference of estimation, it can be seen that with population size greater than 400, the method can yield relatively reliable results on the magnitude of mutation parameters.

The larger sampling variances with smaller population sizes can be explained qualitatively by comparing the mean and variance of mean fitness of populations with different sizes at different generations. In Figure 1a, it can be seen that there is little difference between the means of the mean fitness of populations of two different sizes (400 and 1600 individuals). However, the difference between the variances of the mean fitness of populations with the two different sizes is relatively large (Figure 1b), with higher values in populations of smaller sizes (such as 400 here). Figure 1a also indicates that, even starting from a population free of DGM in finite populations, a quasi-equilibrium when the mean population fitness is roughly steady can be reached in about 300 generations. Since the difference between the estimates is minor for populations of 1600 and 3200 individuals (Table 1) and since larger populations involve much more computation time, in later simulations we used a population size of 1600, unless otherwise specified. Figure 1 is for outcrossing populations with constant effects. Our data (not shown) from other situations (selfing populations with constant mutation effects, selfing and outcrossing populations with variable mutation effects) showed a similar pattern of results.

Figure 1
figure 1

Comparison of the mean and variance of mean fitness of outcrossing populations with different sizes. The thick line is for populations of 400 individuals and the thin line for the populations of 1600 individuals. The dash line in plot a is the expected mean fitness (0.368) for populations of infinite size under M–S equilibrium. Simulations were run with constant mutational effects and U=1.0, h=0.36, s=0.03, and θ=0.5. In Figures 1, 2 and 3 the X-axis is for the number of generations passed in simulations when starting from a population free of DGM; the Y-axis is for the mean or variance of the mean fitness of 10 simulated populations under the same set of parameters.

Populations with 100–200 individuals are also simulated and the corresponding estimation results for these populations are given in Appendix A1. The results indicate that estimates of DGM parameters by the Deng–Lynch method for close-to-neutral mutations are not so good as those for deleterious mutations. This is somewhat expected since the method is designed for DGM, not for close-to-neutral mutations. In addition, estimates for variable situations are relatively better than those for constant ones. Therefore, we do not recommend using the Deng–Lynch method in situations with very small population sizes (much less than 400 individuals) and mutations with very small effects.

Estimation under different recombination rate θ (Table 2)

Table 2 Estimation of mutation parameters in finite populations with different recombination rate s(θ)

θ has relatively little effects on the estimation (partially due to the stable mean and variance for the mean fitness under different θ, see Figure 2). Under the simulated parameters when U=1.0, =0.36, and =0.03, for outcrossing populations with constant effects, Û ranges from 1.05 (±0.03) to 1.07 (±0.01) when θ ranges from 0.001 to 0.4. The results are similar to the Û obtained for populations with unlinked loci (Û±SD=1.06±0.02, Table 1).

Figure 2
figure 2

Comparison of the mean and variance of mean fitness of outcrossing populations with different recombination rate θ. Data representing different θ are indicated. Simulations were run under constant mutation effect with U=1.0, h=0.36, and s=0.03, and the population size is 1600. Note that panel b contains two subfigures with common y-axis. The one on the left is for θ=0.001, 0.005, and 0.01, and the one on the right is for θ=0.05, 0.2, and 0.5.

Estimation with different values of DGM parameters (Tables 3, 4 and 5)

Table 5 Estimation of mutation parameters in finite populations under different selection coefficients
Table 4 Estimation of mutation parameters in finite populations under different dominance coefficients
Table 3 Estimation of mutation parameters in finite populations with different mutation rates

The changes of the parameter values of DGM have only minor effects on the estimation. With =0.36, =0.03, and θ=0.1, with constant mutation effects, when U changes from 0.5 to 1.5 in outcrossing populations, Û is quite accurate with little bias, and ĥ and ŝ are almost unchanged, with values of about 0.368 (±0.001) and 0.028 (±0.001), respectively (Table 3). In highly selfing populations, Û is upwardly biased, ĥ and ŝ are relatively stable with the values of 0.379–0.395 for ĥ and 0.023–0.027 for ŝ, respectively. With variable mutation effects when U ranges from 0.5 to 1.5, Û is downwardly biased and ranges from 0.30U to 0.97U for outcrossing populations and 0.33 to 1.01 for selfing populations. s̄̂ is upwardly biased and ranges from 1.47 to 1.83. h̄̂ is downwardly biased and ranges from 0.70 to 0.84 (Table 3).

With variable mutational effects, h varies across loci. Hence, we only simulated under constant mutation effect to test the effects of changing h on the estimation (Table 4). When U is 1.0 and s=0.03, in outcrossing populations, Û is 1.24 (±0.03) and 1.07 (±0.02) for h=0.2 and 0.36, respectively (Table 4). ĥ is only slightly larger than the true parameter values and ŝ is only slightly smaller than its true parameter values. Hence, higher h may lead to more accurate estimate of U in finite populations; however, it does not affect the estimation for h and s much. For highly selfing populations, the results are similar, except that ŝ is larger than the true value when h=0.2 (Table 4).

When U=1.0 and s increases from 0.01 to 0.05, in outcrossing populations with constant mutational effects (Table 5), Û changes from 1.23 (±0.03) to 1.02 (±0.03), ĥ varies between 0.384 (±0.001) to 0.365 (±0.001), and ŝ is nearly unbiased. For variable mutational effects, when U=1.0 and s increases from 0.01 to 0.05, Û increases and is less biased, h̄̂ changes from 1.10 to 0.51, and s̄̂ is about 1.70–1.87. When U=1.0 and s increases from 0.01 to 0.05, in highly selfing populations, with constant mutational effects, Û ranges from 1.27U to 1.41U, ĥ is 1.04h–1.08h, and ŝ is unbiased or slightly downwardly biased; with variable mutation effect, Û is about 0.67U–0.77U, ĥ is 1.15h–0.68h, and s̄̂ about 1.42–1.60 (Table 5).

Discussion

In this paper, the estimation of DGM parameters by the Deng–Lynch method in finite populations with linkage is studied. The results demonstrate that the estimation of parameters of DGM by the Deng–Lynch method is fairly robust in finite populations that have 400 individuals or more. With constant mutational effects, Û and ĥ in outcrossing populations are unbiased or slightly upwardly biased, and ŝ is unbiased in most cases; in highly selfing populations, Û and ĥ are upwardly biased (generally for the parameter space simulated, the former is no more than 1.5U and the latter is less than 1.1h), and ŝ is unbiased or slightly downwardly biased. With variable mutational effects, Û is downwardly biased and s̄̂ is upwardly biased in both outcrossing and highly selfing populations. Û ranges from 0.56 to 0.72U; and s̄̂ from 1.4 to 1.8. Generally speaking, the estimation in outcrossing populations is better than in highly selfing populations.

The pattern of bias for the estimates of DGM in this study is not the same as that in the study of Bataillon and Kirkpatrick (2000), who found that U is always downwardly biased in finite populations due to smaller inbreeding depression compared with infinite populations with the same DGM parameters. The estimation method for DGM in Bataillon and Kirkpatrick (2000) is not clear and most likely they used the Morton–Charlesworth method, which only employs data on the change of mean fitness due to inbreeding or outcrossing. The Deng–Lynch method investigated here employs both the mean and genetic variance in generations across inbreeding or outcrossing. As evident from equations (1), (2) and (3), there is no simple relationship between DGM parameters and the inbreeding depression as reflected in the mean fitness. Hence, it is not surprising that the pattern of bias pattern in this study and that of Bataillon and Kirkpatrick (2000) are different.

Our results demonstrate that factors such as population size, recombination rate and the values of the DGM parameters usually have some (although relatively minor) effects on the estimation, and the effects are smaller in outcrossing than in selfing populations. This is partly because the Deng–Lynch method makes use of the mean and variance of fitness to obtain the estimates, and the above factors (except U) have relatively weak effects on the mean and variance of fitness. The results in Charlesworth et al (1992, 1993) and our own result (Figure 3) support this point. For example, in Figure 3, the six curves with different recombination rate θ are difficult to distinguish from each other. The curves fall into a small region around the dashed lines after about 300 generations. This indicates that the populations with linked loci have similar dynamics to an infinite population with unlinked loci in LD in terms of the mean and variance of fitness, thus leading to similar estimates of the mutation parameters.

Figure 3
figure 3

Effects of recombination rate θ on the mean and variance of fitness. Simulations were run under constant mutation effect with U=1, h=0.36, and s=0.03, and the population size is 1600. The recombination rates of the curves in plots a and b are 0.001, 0.01, 0.05, 0.2, 0.3, and 0.4, respectively. The dashed lines are the expected mean and variance of fitness (0.368 and 1.47 × 10−3, respectively, in plots a and b) for infinite population with unlinked loci under the same mutation parameters at mutation-selection balance.

In our study, a selfing rate of 0.95 was used instead of complete selfing (with selfing rate of 1.0). This is because according to previous studies (Charlesworth et al, 1993) and our own simulations, in finite populations, when complete selfing is performed under certain ranges of parameters (such as those in our simulations), the mean fitness of the populations will reduce to biologically nonrealistically small values. This phenomenon is caused by Muller's ratchet, as explained by Charlesworth et al (1993). To avoid spending much time simulating complete selfing populations with very low (even nonrealistic) mean fitness leading to inviable populations, we chose to use highly selfing populations. Our previous study (Li and Deng, 2000) shows that estimation of the parameters of DGM in complete and highly selfing large populations is similar.

In outcropping populations, the effects of lethal and semilethal mutations on the estimation of the Deng and Lynch method was investigated by Deng and Lynch (1996). It was shown that in practice, when inviable selfed progeny (largely due to exposure of lethal and semilethal mutations) were dropped from fitness assay, the Deng and Lynch method can recover DGM parameters accurately.

Natural populations usually do not completely meet the ideal assumptions underlying the development of the estimation methods of Morton et al (1956), Charlesworth et al (1990a), and Deng and Lynch (1996), although some assumptions (such as M–S balance) have been critically reviewed and partially supported in some populations (Charlesworth and Charlesworth, 1999; Charlesworth and Hughes, 2000). It is intuitive that estimation of DGM from data on natural populations may result in biased estimates (Drake et al, 1998; Bataillon and Kirkpatrick, 2000). However, the critical questions concern the direction and magnitude of bias. More importantly, given that none of the current estimation approaches and experimental designs can yield unbiased estimation under biologically plausible situations, is a given method useful when its ideal assumptions do not hold? Our results here show that the Deng–Lynch method should be useful in that the estimation bias is usually quite small in finite populations with 400 or more individuals with LD. This is true given that we even do not know the order of the magnitude of U and the current estimates of DGM parameters are quite different from one study to another. Therefore, investigation conducted here should be useful in proving the feasibility and the robustness for the estimation of DGM parameters in finite natural populations with the Deng–Lynch method. This work, together with our earlier investigation on epistasis (Deng and Lynch, 1996, 1997), overdominance (Li et al, 1999), partial selfing or outcrossing, and nonequilibrium large populations (Li and Deng, 2000), supports that the Deng and Lynch method is useful in characterizing DGMs. Employing multiple and different approaches should help us to obtain solid knowledge about DGM and their parameters by resolving the differences and consolidating consistencies.