Earlier this year, we discovered that an extreme age estimate for a Y chromosomal haplotype (237 000–581 000 years ago) by Mendez et al1 was based on analytical choices that consistently inflated its value.2

As stated in our original criticism,2 estimating divergence time is not different, in principle, from estimating the time it takes two cars traveling in opposite directions at known speeds to reach a certain distance from each other. The time inferences will be overestimated if the distance between the two cars is overestimated, or if the speed of either car is underestimated. Similarly, a divergence time estimate will seem larger than the actual divergence time if the genetic distances between sequences are overestimated and/or the rates of substitution are underestimated.

Let us consider a very simple estimation model for the time of divergence,

where t is the divergence time, d is the genetic distance, and r is the substitution rate per unit time. To overestimate t, one needs to overestimate d and/or underestimate r. d is usually estimated by dividing the number of differences between two sequences, n, by the length of the aligned sequences, l, and correcting for multiple hits and the like

d can, thus, be overestimated by either overestimating n or underestimating l. The unit time for r is years. However, r is often derived from data on number of substitutions per generation. r can, thus, be overestimated by assuming that the generation time, tg, is larger than it really is.

In selecting values for d, r, n, l, and tg, Mendez et al1 consistently and without exception chose values that led to overestimating the time of divergence.

In Elhaik et al,2 we discussed many such choices. In the following we will focus on two choices left unexplained by Mendez et al.3 The first choice concerns the substitution rate used in the calculation of the TMRCA. Using an estimate based on Y-chromosome substitution rate (1 × 10–9 substitutions per nucleotide per year)4 we can calculate divergence times of 43/240 000/10−9≈179 000 years and 45/180 000/10−9≈250 000 years, for an average of 214500 years, very similar to the TMRCA obtained using a likelihood-based method: 209 500 (95% CI: 168 000–257 400) years.2 Not surprisingly, by employing an autosomally derived value of 0.617 × 10–9 as the mutation rate constant, which is 1.6 times smaller, Mendez et al1 obtained a divergence time 1.6 times higher than that estimate of 290 000–404 000 years, with an average value of 347 000 years. More appropriate choices would have resulted in a much lower estimate. Mendez et al1 other choices, such as the unprecedented 40 years for human generation time, resulted in overestimating the time of divergence by 20–130%.

The second choice concerns the irregular and questionable comparison of mutation numbers based on sequences of unequal lengths. Mendez et al3 compared 240 000 bases of the A00 Y-chromosome that contained 43 mutations with 180 000 bases of the A0 Y-chromosome that contained 45 mutations. In other words, they used data from two segments, in which one segment was smaller than the other by about 25%. In response to Mendez et al’s3 allegations of ‘misunderstanding of population genetic theory,’ we challenge the authors to come up with one example in the evolutionary literature in which the branches on a phylogenetic tree were estimated by using pairwise distances based on alignments of different lengths. We note that textbooks in molecular evolution (for example, Graur and Li5) specifically caution against such practices.