A direct characterization of human mutation based on microsatellites

Journal name:
Nature Genetics
Year published:
Published online

Mutations are the raw material of evolution but have been difficult to study directly. We report the largest study of new mutations to date, comprising 2,058 germline changes discovered by analyzing 85,289 Icelanders at 2,477 microsatellites. The paternal-to-maternal mutation rate ratio is 3.3, and the rate in fathers doubles from age 20 to 58, whereas there is no association with age in mothers. Longer microsatellite alleles are more mutagenic and tend to decrease in length, whereas the opposite is seen for shorter alleles. We use these empirical observations to build a model that we apply to individuals for whom we have both genome sequence and microsatellite data, allowing us to estimate key parameters of evolution without calibration to the fossil record. We infer that the sequence mutation rate is 1.4–2.3 × 10−8 mutations per base pair per generation (90% credible interval) and that human-chimpanzee speciation occurred 3.7–6.6 million years ago.

At a glance


  1. Examples of mutations in a trio and in a family.
    Figure 1: Examples of mutations in a trio and in a family.

    The proband is the individual inheriting a mutation, and all other individuals are named relative to the proband. All alleles are given in repeat units and are shifted so that the ancestral allele has length of 0. The mutating allele is underlined. (a) Mutation detected using the trio approach. The mutation was confirmed by multiple genotyping of the trio: the father, mother and proband were genotyped 3×, 3× and 4×, respectively. (b) Mutation detected using the family approach. One ancestral allele was verified by its presence in the proband's sibling, and one mutant allele was verified by its presence in the proband's child. The phasing of alleles from the mutant locus and other loci from the same chromosome shows that the sibling with the (0, −2) alleles did not inherit the ancestral 0 allele but rather the other 0 allele from the father.

  2. Characteristics of the microsatellite mutation process.
    Figure 2: Characteristics of the microsatellite mutation process.

    (a) Paternal (blue) and maternal (red) mutation rates. The x axis shows the parental age at childbirth. Data points are grouped into ten bins (vertical bars show one standard error). The paternal rate shows a positive correlation with age (logistic regression of raw data: P = 9.3 × 10−5; slope = 1.1 × 10−5 mutations per year), with an estimated doubling of the rate from age 20 to 58. The maternal rate shows no evidence of increasing with age (P = 0.47). (b) Mutation length distributions differ for dinucleotide (top) and tetranucleotide (bottom) microsatellites. Whereas the dinucleotide loci experience multistep mutations in 32% of instances, tetranucleotide loci mutate almost exclusively by a single step of 4 bases. (c) Mutation rate increases with allele length. Dinucleotide loci (blue) have a slope of 1.65 × 10−5 mutations per repeat unit (P = 1.3 × 10−3), and tetranucleotide loci (red) have a slope of 6.73 × 10−5 mutations per repeat unit (P = 1.8 × 10−3).(d) Constraints on allele lengths. When the parental allele is relatively short, mutations tend to increase in length, and, when the parental allele is relatively long, mutations tend to decrease in length. Di- and tetranucleotide loci are shown as blue crosses and red circles, respectively. Probit regression of the combined di- and tetranucleotide data show highly significant evidence of an effect (P = 2.8 × 10−18).

  3. Empirical validation of our model with sequence-based estimates of TMRCA.
    Figure 3: Empirical validation of our model with sequence-based estimates of TMRCA.

    Shown in red is the simulation of ASD as a function of TMRCA for the standard random walk (GSMM) model. In blue is the simulation of our model in which the nonlinearity compared to GSMM is primarily due to the length constraint that we empirically observed in microsatellites. In black is the empirically observed ASD at microsatellites in 23 HapMap individuals as a function of sequence-based estimates of TMRCA, which is estimated using qseq/2mseq, where qseq is the local sequence diversity surrounding each microsatellite locus and mseq is 1.82 × 10−8 (obtained from Table 2). The close match of the empirical curve to our model simulations indicates that our model is consistent with the data and motivates the analysis in which we use the sequence substitution rate in small windows around the microsatellites to make inferences about evolutionary parameters such as the sequence mutation rate.

  4. Human-chimpanzee speciation date inferred without calibration with the fossil record.
    Figure 4: Human-chimpanzee speciation date inferred without calibration with the fossil record.

    The 90% Bayesian credible interval for human-chimpanzee speciation time (gray) for a range of values of the ratio of speciation time to divergence time (τHC/tHC). The blue histogram shows our Bayesian prior distribution for τHC/tHC, justified in the Supplementary Note. The red horizontal lines are the dates of fossils that are candidates for being on the hominin lineage after the speciation of humans and chimpanzees. Australopithecus anamensis, Orrorin tugenensis and Ardipithecus kadabba are within our plausible speciation times, whereas Sahelanthropus tchadensis predates the inferred speciation time for all plausible values of τHC/tHC. Bottom histogram, our Bayesian prior distribution for τHC/tHC; left histogram, our posterior distribution of human-chimpanzee speciation time. MYA, million years ago.


  1. Roach, J.C. et al. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328, 636639 (2010).
  2. 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 10611073 (2010).
  3. Conrad, D.F. et al. Variation in genome-wide mutation rates within and between human families. Nat. Genet. 43, 712714 (2011).
  4. Crow, J.F. The origins, patterns and implications of human spontaneous mutation. Nat. Rev. Genet. 1, 4047 (2000).
  5. Crow, J.F. Age and sex effects on human mutation rates: an old problem with new complexities. J. Radiat. Res. 47 Suppl B, B75B82 (2006).
  6. Nachman, M.W. & Crowell, S.L. Estimate of the mutation rate per nucleotide in humans. Genetics 156, 297304 (2000).
  7. Arnheim, N. & Calabrese, P. Understanding what determines the frequency and pattern of human germline mutations. Nat. Rev. Genet. 10, 478488 (2009).
  8. Ellegren, H. Microsatellites: simple sequences with complex evolution. Nat. Rev. Genet. 5, 435445 (2004).
  9. Weber, J.L. & Wong, C. Mutation of human short tandem repeats. Hum. Mol. Genet. 2, 11231128 (1993).
  10. Xu, X., Peng, M. & Fang, Z. The direction of microsatellite mutations is dependent upon allele length. Nat. Genet. 24, 396399 (2000).
  11. Whittaker, J.C. et al. Likelihood-based estimation of microsatellite mutation rates. Genetics 164, 781787 (2003).
  12. Huang, Q.Y. et al. Mutation patterns at dinucleotide microsatellite loci in humans. Am. J. Hum. Genet. 70, 625634 (2002).
  13. Kong, A. et al. A high-resolution recombination map of the human genome. Nat. Genet. 31, 241247 (2002).
  14. Makova, K.D. & Li, W.H. Strong male-driven evolution of DNA sequences in humans and apes. Nature 416, 624626 (2002).
  15. Lynch, M. Rate, molecular spectrum, and consequences of human mutation. Proc. Natl. Acad. Sci. USA 107, 961968 (2010).
  16. Slatkin, M. A measure of population subdivision based on microsatellite allele frequencies. Genetics 139, 457462 (1995).
  17. Goldstein, D.B., Ruiz Linares, A., Cavalli-Sforza, L.L. & Feldman, M.W. An evaluation of genetic distances for use with microsatellite loci. Genetics 139, 463471 (1995).
  18. Ballantyne, K.N. et al. Mutability of Y-chromosomal microsatellites: rates, characteristics, molecular bases, and forensic implications. Am. J. Hum. Genet. 87, 341353 (2010).
  19. Cummings, C.J. & Zoghbi, H.Y. Fourteen and counting: unraveling trinucleotide repeat diseases. Hum. Mol. Genet. 9, 909916 (2000).
  20. Kruglyak, S., Durrett, R.T., Schug, M.D. & Aquadro, C.F. Equilibrium distributions of microsatellite repeat length resulting from a balance between slippage events and point mutations. Proc. Natl. Acad. Sci. USA 95, 1077410778 (1998).
  21. Zhivotovsky, L.A., Feldman, M.W. & Grishechkin, S.A. Biased mutations and microsatellite variation. Mol. Biol. Evol. 14, 926933 (1997).
  22. Feldman, M.W., Bergman, A., Pollock, D.D. & Goldstein, D.B. Microsatellite genetic distances with range constraints: analytic description and problems of estimation. Genetics 145, 207216 (1997).
  23. Sainudiin, R., Durrett, R.T., Aquadro, C.F. & Nielsen, R. Microsatellite mutation models: insights from a comparison of humans and chimpanzees. Genetics 168, 383395 (2004).
  24. Garza, J.C., Slatkin, M. & Freimer, N.B. Microsatellite allele frequencies in humans and chimpanzees, with implications for constraints on allele size. Mol. Biol. Evol. 12, 594603 (1995).
  25. Kondrashov, A.S. Direct estimates of human per nucleotide mutation rates at 20 loci causing Mendelian diseases. Hum. Mutat. 21, 1227 (2003).
  26. Patterson, N., Richter, D.J., Gnerre, S., Lander, E.S. & Reich, D. Genetic evidence for complex speciation of humans and chimpanzees. Nature 441, 11031108 (2006).
  27. Steiper, M.E. & Young, N.M. Primate molecular divergence dates. Mol. Phylogenet. Evol. 41, 384394 (2006).
  28. Green, R.E. et al. A draft sequence of the Neandertal genome. Science 328, 710722 (2010).
  29. Burgess, R. & Yang, Z. Estimation of hominoid ancestral population sizes under bayesian coalescent models incorporating mutation rate variation and sequencing errors. Mol. Biol. Evol. 25, 19791994 (2008).
  30. McVicker, G., Gordon, D., Davis, C. & Green, P. Widespread genomic signatures of natural selection in hominid evolution. PLoS Genet. 5, e1000471 (2009).
  31. Lebatard, A.E. et al. Cosmogenic nuclide dating of Sahelanthropus tchadensis and Australopithecus bahrelghazali: Mio-Pliocene hominids from Chad. Proc. Natl. Acad. Sci. USA 105, 32263231 (2008).
  32. Brunet, M. et al. A new hominid from the Upper Miocene of Chad, Central Africa. Nature 418, 145151 (2002).
  33. Lieberman, D.E. The Evolution of the Human Head (Belknap Press of Harvard University Press, Cambridge, Massachusetts, 2011).
  34. Wood, B. & Harrison, T. The evolutionary context of the first hominins. Nature 470, 347352 (2011).
  35. Kong, A. et al. Rate of de novo mutations and the importance of father's age to disease risk. Nature 488, 471475 (2012).
  36. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 17541760 (2009).
  37. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 20782079 (2009).
  38. Hinch, A.G. et al. The landscape of recombination in African Americans. Nature 476, 170175 (2011).
  39. Weber, J.L. & Broman, K.W. Genotyping for human whole-genome scans: past, present, and future. Adv. Genet. 42, 7796 (2001).
  40. Johansson, A.M. & Sall, T. The effect of pedigree structure on detection of deletions and other null alleles. Eur. J. Hum. Genet. 16, 12251234 (2008).
  41. Callen, D.F. et al. Incidence and origin of “null” alleles in the (AC)n microsatellite markers. Am. J. Hum. Genet. 52, 922927 (1993).
  42. Gudbjartsson, D.F., Thorvaldsson, T., Kong, A., Gunnarsson, G. & Ingolfsdottir, A. Allegro version 2. Nat. Genet. 37, 10151016 (2005).
  43. Fenner, J.N. Cross-cultural estimation of the human generation interval for use in genetics-based population divergence studies. Am. J. Phys. Anthropol. 128, 415423 (2005).
  44. Helgason, A., Hrafnkelsson, B., Gulcher, J.R., Ward, R. & Stefansson, K. A populationwide coalescent analysis of Icelandic matrilineal and patrilineal genealogies: evidence for a faster evolutionary rate of mtDNA lineages than Y chromosomes. Am. J. Hum. Genet. 72, 13701388 (2003).
  45. Marjoram, P., Molitor, J., Plagnol, V. & Tavare, S. Markov chain Monte Carlo without likelihoods. Proc. Natl. Acad. Sci. USA 100, 1532415328 (2003).
  46. Efron, B. & Gong, G. A leisurely look at the bootstrap, the jackknife, and cross-validation. Am. Stat. 37, 3648 (1983).

Download references

Author information


  1. Division of Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

    • James X Sun
  2. Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA.

    • James X Sun,
    • Heng Li,
    • Swapan Mallick &
    • David Reich
  3. deCODE Genetics, Reykjavik, Iceland.

    • Agnar Helgason,
    • Gisli Masson,
    • Sigríður Sunna Ebenesersdóttir,
    • Augustine Kong &
    • Kari Stefansson
  4. Department of Anthropology, University of Iceland, Reykjavik, Iceland.

    • Agnar Helgason
  5. Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA.

    • Heng Li,
    • Sante Gnerre,
    • Nick Patterson &
    • David Reich
  6. Faculty of Medicine, University of Iceland, Reykjavik, Iceland.

    • Kari Stefansson


J.X.S., A.H., G.M. and D.R. conceived and performed the research. A.H., G.M., A.K., D.R. and K.S. jointly supervised the study, with A.H. acting as the coordinator at deCODE Genetics and D.R. at Harvard Medical School. A.H. and G.M. prepared the raw microsatellite data. J.X.S., A.H. and S.S.E. designed and analyzed the regenotyping, resequencing and electropherogram re-examination experiments; and A.H. analyzed next-generation sequencing data to independently validate mutations. J.X.S., A.H., N.P., A.K. and D.R. designed and analyzed the microsatellite modeling and the statistics. S.M., H.L. and J.X.S. processed and extracted sequence data for the 23 HapMap individuals. S.M., S.G. and D.R. performed the analyses of human-chimpanzee genetic divergence and developed the Bayesian prior distributions relevant to human-chimpanzee speciation. The manuscript was written primarily by J.X.S., A.H. and D.R. The supplementary information was prepared by J.X.S. and D.R.

Competing financial interests

The authors at deCODE Genetics (A.H., G.M., S.S.E., A.K. and K.S.) work for a for-profit company carrying out genetic research and thus declare competing financial interests.

Corresponding authors

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (7.7M)

    Supplementary Figures 1–17, Supplementary Tables 1–9 and Supplementary Note

Additional data