To the Editor — Cassa et al.1 have recently presented an interesting analysis of the selection coefficients against heterozygous carriers of protein-truncating variants (PTVs) in several human populations, concluding that the mean selection coefficient against such a mutation when heterozygous is approximately 0.05, with a wide distribution around the mean (p. 809 and Fig. 1 in ref. 1). With random mating in a large population, selection against the heterozygous carriers of strongly deleterious mutations is the predominant selective force, because homozygotes are very rare. The equilibrium frequency of mutant alleles at a locus, q*, is then equal to u/shet, where u is the mutation rate of a deleterious allele, and shet is the decrease in fitness experienced by heterozygous carriers, measured relative to the fitness of normal individuals2.

For this purpose, it is reasonable to assume that, for a given gene, u is the net rate of mutation to all possible PTVs that can be generated for the gene in question and that shet is the same for all the PTV mutations in the gene. If the mutations are sufficiently severe in their fitness effects that they are destined for rapid elimination from the population, the mean frequency of mutant alleles over the probability distribution of q generated by random genetic drift is approximately equal to q* (ref. 3), thus apparently justifying the assumption of mutation-selection equilibrium. In their analysis, Cassa et al. assumed that the observed number of copies of a mutant allele for a given gene in a set of N alleles sampled from a population is drawn from a Poisson distribution with mean Nq*. For this assumption to be valid, the fluctuations in q around q* produced by drift must be negligible. Cassa et al. justified this assumption through a heuristic argument (first section in Methods in ref. 1).

We believe that this assumption is questionable, as can be seen by considering the probability density of q, ϕ(q), at the stationary state among mutation, selection and drift in a randomly mating population with effective size Ne, first studied by Wright4 (formally, the existence of the stationary state requires a small amount of back mutation from mutant to wild type, but this has a trivial effect and can be ignored). Nei3 has shown that ϕ(q) for a strongly selected mutation with a heterozygous selection coefficient shet is well approximated by a gamma distribution, with a mean of q* and shape parameter θ = 4Neu. Poisson sampling from a gamma distribution generates a negative binomial distribution5 for the number of copies i of a mutant allele in a sample of N alleles:

$$P(i) = \left( \begin{array}{l}i + \theta - 1\\ i\end{array} \right)\left( {\frac{z}{{z + 1}}} \right)^\theta \left( {\frac{1}{{z + 1}}} \right)^i$$

where z = 4Neshet/N.

The mean and variance of the distribution are Nq* and θ(1 + z)/z2, respectively. The ratio of the coefficient of variation of this distribution to that for a Poisson distribution with the same mean is √(1 + z –1). It follows that, if z << 1, there is a much wider spread in the sampling distribution of the observed numbers of copies of mutant alleles across different genes than was assumed by Cassa et al. For example, with N = 60,000, shet = 0.05, and Ne = 10,000 (a frequently used estimate for the species effective population size of humans6), the ratio is equal to 5.57.

This result implies that there may be a substantial upward bias in the spread of the distribution of shet values estimated by the method of Cassa et al. We recognize that it is probably not appropriate to use the above value of Ne = 10,000–20,000 for humans, which is obtained from putatively neutral DNA sequence diversity and reflects the harmonic mean of the species effective population size over several hundred thousand years in the past6. Mutations destined for loss persist in a population for only a few tens of generations at most7, and so the Ne relevant for PTVs is likely to reflect the much larger population sizes characteristic of the last few hundred years, thus decreasing the size of the bias. For example, with Ne = 100,000, the ratio of the coefficients of variation becomes 2.

In addition, the elimination of strongly deleterious alleles is also affected by population subdivision, and their fate is then strongly determined by the local effective population size, as shown by the classic studies of Dobzhansky and Wright8 on the allelism of lethal mutations in populations of Drosophila pseudoobscura. Even for the simple case of an island model with an infinite number of demes, the expression for ϕ(q) becomes more complex than a gamma distribution, and the mean allele frequency can depart substantially from q* (ref. 9). A detailed analysis of the effects of drift on the frequency distribution of the numbers of deleterious mutations with the demographies characteristic of the populations used in their study would be needed to determine whether the conclusions reached by Cassa et al. concerning the width of distribution of the heterozygous selection coefficient are valid. In addition, their estimates of the selection coefficients for individual genes were based on the inferred distribution of shet, thus also prompting questions about their accuracy.

In response to these comments, the authors10 have conducted an analysis that includes a model of recent population-size change for Europeans. The results appear to substantiate their previous conclusions, notwithstanding the approximations made in their original study.

Reporting summary

Further information on experimental design is available in the Nature Research Reporting Summary linked to this paper.