Evolution of the germline mutation rate across vertebrates

The germline mutation rate determines the pace of genome evolution and is an evolving parameter itself1. However, little is known about what determines its evolution, as most studies of mutation rates have focused on single species with different methodologies2. Here we quantify germline mutation rates across vertebrates by sequencing and comparing the high-coverage genomes of 151 parent–offspring trios from 68 species of mammals, fishes, birds and reptiles. We show that the per-generation mutation rate varies among species by a factor of 40, with mutation rates being higher for males than for females in mammals and birds, but not in reptiles and fishes. The generation time, age at maturity and species-level fecundity are the key life-history traits affecting this variation among species. Furthermore, species with higher long-term effective population sizes tend to have lower mutation rates per generation, providing support for the drift barrier hypothesis3. The exceptionally high yearly mutation rates of domesticated animals, which have been continually selected on fecundity traits including shorter generation times, further support the importance of generation time in the evolution of mutation rates. Overall, our comparative analysis of pedigree-based mutation rates provides ecological insights on the mutation rate evolution in vertebrates.


Model of the effect of parental age
To account for the effect of parental age we create a Poisson model describing the effect of parental age on the number of mutations. For most species the age of the father is most important but for some the age of the mother is more important. To take this into account we use a weighted average of the age of the parents using our estimate of the fraction of mutations originating from the father for the species.
Where p s is the paternal fraction observed using read-backed phasing in species s and s i is the specie of trio i. And age_father i and age_mother i are the ages when the child is born of the father and mother of the i'th trio. Model comparisons show that the model using this age_mix variable rather than just the age_father results in a slightly better model fit.
Where n i is the number of mutations in the i'th trio and is the denominator of the i'th trio. Using the exponential function as prior means that both the intercept and slope are required to be positive. But that is a fair assumption since it is impossible to remove mutations that have already occurred. 2.510e-10 2.20e-11 2.080e-10 2.960e-10 The model fits most of the trios well:

Differences between species
To model deviations from the model and to see of which species tend to have higher (or lower) rate, compared to what is expected given this model, we created a model where we have a species specific scaling factor x for each species.
The table below show the fitted parameters for this model: For most species the 95% credible intervals (highest posterior density intervals) overlap 1. But we see a few species where the lower bound is higher than 1. And we find a few species where the upper bound is lower than 1.

Checking assumptions on parameters
In the model we use the exponential function as prior for both the intercept (a) and the slope (b) of the models. This means that we assume that the intercept and the slope cannot be negative. That is a fair assumption for the slope since it is impossible to remove mutations once they have occurred and thus there cannot be a negative correlation between the number of mutations and generation time. With regards to the intercept we cannot be as sure about the validity of the assumption. One could imagine a scenario where the parents didn't start accumulating mutations in their germ cells until they had reached a certain age, which would correspond to a model with a negative intercept. However in this data we see a clearly positive intercept, which we can also show by fitting a model without such assumptions. To make a model without assuming that the intercept is positive we have to use a log link function to ensure that λ is non-negative: The estimated parameters show that both a and b are clearly positive:

Model selection
The chosen model uses the same intercept for all species. While we do not have sufficient data to learn a separate intercept for each species it is possible that we could improve the model fit by having different intercepts for different groups of species. To test that we fitted a model with a different intercept for each of the four major classes of vertebrates: The results show that we end up with very similar estimates in each group: 5.381e-09 9.81e-10 3.592e-09 7.429e-09 b 2.520e-10 2.20e-11 2.080e-10 2.960e-10 We can also estimate a model with a different slope for each group: The results show that we get an estimate in fish that is a bit higher than the other species but the credible intervals (highest posterior density intervals) overlap. 2.340e-10 3.70e-11 1.630e-10 3.080e-10 b [Fish] 3.880e-10 2.30e-10 3.100e-11 9.030e-10 b[Mammal] 2.610e-10 2.50e-11 2.130e-10 3.100e-10 b [Reptile] 2.640e-10 1.99e-10 1.100e-11 7.430e-10 If we compare the model with a single intercept and single slope to the models with group specific intercepts of group specific slopes we see that the slightly better fit of the latter models do not warrent the added number of parameters: