Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Genome-wide patterns and properties of de novo mutations in humans

Abstract

Mutations create variation in the population, fuel evolution and cause genetic diseases. Current knowledge about de novo mutations is incomplete and mostly indirect1,2,3,4,5,6,7,8,9,10. Here we analyze 11,020 de novo mutations from the whole genomes of 250 families. We show that de novo mutations in the offspring of older fathers are not only more numerous11,12,13 but also occur more frequently in early-replicating, genic regions. Functional regions exhibit higher mutation rates due to CpG dinucleotides and show signatures of transcription-coupled repair, whereas mutation clusters with a unique signature point to a new mutational mechanism. Mutation and recombination rates independently associate with nucleotide diversity, and regional variation in human-chimpanzee divergence is only partly explained by heterogeneity in mutation rate. Finally, we provide a genome-wide mutation rate map for medical and population genetics applications. Our results provide new insights and refine long-standing hypotheses about human mutagenesis.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Mutations in the offspring of younger fathers are biased toward later-replicating regions.
Figure 2: The offspring of older fathers harbor a higher percentage of de novo mutations in genes.
Figure 3: Mutation clusters exhibit a unique mutational spectrum.
Figure 4: Influence of mutation and recombination rates on human-chimpanzee divergence.

Similar content being viewed by others

References

  1. Sawyer, S.A. & Hartl, D.L. Population genetics of polymorphism and divergence. Genetics 132, 1161–1176 (1992).

    CAS  PubMed  PubMed Central  Google Scholar 

  2. Felsenstein, J. & Churchill, G.A. A hidden Markov model approach to variation among sites in rate of evolution. Mol. Biol. Evol. 13, 93–104 (1996).

    Article  CAS  PubMed  Google Scholar 

  3. Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Cooper, G.M. et al. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 15, 901–913 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Veltman, J.A. & Brunner, H.G. De novo mutations in human genetic disease. Nat. Rev. Genet. 13, 565–575 (2012).

    Article  CAS  PubMed  Google Scholar 

  6. Friedberg, E.C., Walker, G.C. & Siede, W. DNA Repair and Mutagenesis (ASM Press, 1995).

  7. Kondrashov, A.S. Direct estimates of human per nucleotide mutation rates at 20 loci causing mendelian diseases. Hum. Mutat. 21, 12–27 (2003).

    Article  CAS  PubMed  Google Scholar 

  8. Lynch, M. Rate, molecular spectrum, and consequences of human mutation. Proc. Natl. Acad. Sci. USA 107, 961–968 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Hodgkinson, A. & Eyre-Walker, A. Variation in the mutation rate across mammalian genomes. Nat. Rev. Genet. 12, 756–766 (2011).

    Article  CAS  PubMed  Google Scholar 

  10. Schaibley, V.M. et al. The influence of genomic context on mutation patterns in the human genome inferred from rare variants. Genome Res. 23, 1974–1984 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Michaelson, J.J. et al. Whole-genome sequencing in autism identifies hot spots for de novo germline mutation. Cell 151, 1431–1442 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Kong, A. et al. Rate of de novo mutations and the importance of father's age to disease risk. Nature 488, 471–475 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Genomes of the Netherlands Consortium. Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat. Genet. 46, 818–825 (2014).

  14. Jenkins, T.G., Aston, K.I., Pflueger, C., Cairns, B.R. & Carrell, D.T. Age-associated sperm DNA methylation alterations: possible implications in offspring disease susceptibility. PLoS Genet. 10, e1004458 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Koren, A. DNA replication timing: coordinating genome stability with genome regulation on the X chromosome and beyond. Bioessays 36, 997–1004 (2014).

    Article  CAS  PubMed  Google Scholar 

  16. Arndt, P.F., Petrov, D.A. & Hwa, T. Distinct changes of genomic biases in nucleotide substitution at the time of Mammalian radiation. Mol. Biol. Evol. 20, 1887–1896 (2003).

    Article  CAS  PubMed  Google Scholar 

  17. Schmidt, S. et al. Hypermutable non-synonymous sites are under stronger negative selection. PLoS Genet. 4, e1000281 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Subramanian, S. & Kumar, S. Neutral substitutions occur at a faster rate in exons than in noncoding DNA in primate genomes. Genome Res. 13, 838–844 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Polak, P. et al. Reduced local mutation density in regulatory DNA of cancer genomes is linked to DNA repair. Nat. Biotechnol. 32, 71–75 (2014).

    Article  CAS  PubMed  Google Scholar 

  20. Pleasance, E.D. et al. A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature 463, 184–190 (2010).

    Article  CAS  PubMed  Google Scholar 

  21. Campbell, C.D. et al. Estimating the human mutation rate using autozygosity in a founder population. Nat. Genet. 44, 1277–1281 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Roberts, S.A. et al. Clustered mutations in yeast and in human cancers can arise from damaged long single-strand DNA regions. Mol. Cell 46, 424–435 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Nik-Zainal, S. et al. Mutational processes molding the genomes of 21 breast cancers. Cell 149, 979–993 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Chan, K., Resnick, M.A. & Gordenin, D.A. The choice of nucleotide inserted opposite abasic sites formed within chromosomal DNA reveals the polymerase activities participating in translesion DNA synthesis. DNA Repair (Amst.) 12, 878–889 (2013).

    Article  CAS  Google Scholar 

  25. Arndt, P.F., Hwa, T. & Petrov, D.A. Substantial regional variation in substitution rates in the human genome: importance of GC content, gene density, and telomere-specific effects. J. Mol. Evol. 60, 748–763 (2005).

    Article  CAS  PubMed  Google Scholar 

  26. Hellmann, I., Ebersberger, I., Ptak, S.E., Pääbo, S. & Przeworski, M. A neutral explanation for the correlation of diversity with recombination rates in humans. Am. J. Hum. Genet. 72, 1527–1535 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Begun, D.J. & Aquadro, C.F. Levels of naturally occurring DNA polymorphism correlate with recombination rates in D. melanogaster. Nature 356, 519–520 (1992).

    Article  CAS  PubMed  Google Scholar 

  28. Lercher, M.J. & Hurst, L.D. Human SNP variability and mutation rate are higher in regions of high recombination. Trends Genet. 18, 337–340 (2002).

    Article  CAS  PubMed  Google Scholar 

  29. Kong, A. et al. A high-resolution recombination map of the human genome. Nat. Genet. 31, 241–247 (2002).

    Article  CAS  PubMed  Google Scholar 

  30. Duret, L. & Arndt, P.F. The impact of recombination on nucleotide substitutions in the human genome. PLoS Genet. 4, e1000071 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  31. Kong, A. et al. Fine-scale recombination rate differences between sexes, populations and individuals. Nature 467, 1099–1103 (2010).

    Article  CAS  PubMed  Google Scholar 

  32. McVicker, G., Gordon, D., Davis, C. & Green, P. Widespread genomic signatures of natural selection in hominid evolution. PLoS Genet. 5, e1000471 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Asthana, S., Roytberg, M., Stamatoyannopoulos, J. & Sunyaev, S. Analysis of sequence conservation at nucleotide resolution. PLOS Comput. Biol. 3, e254 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  34. Gratten, J., Visscher, P.M., Mowry, B.J. & Wray, N.R. Interpreting the role of de novo protein-coding mutations in neuropsychiatric disease. Nat. Genet. 45, 234–238 (2013).

    Article  CAS  PubMed  Google Scholar 

  35. DePristo, M.A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  38. Koren, A. et al. Differential relationship of DNA replication timing to different forms of human mutation and variation. Am. J. Hum. Genet. 91, 1033–1040 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  41. Ryba, T. et al. Evolutionarily conserved replication timing profiles predict long-range chromatin interactions and distinguish closely related cell types. Genome Res. 20, 761–770 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Kent, W.J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Paten, B., Herrero, J., Beal, K., Fitzgerald, S. & Birney, E. Enredo and Pecan: genome-wide mammalian consistency-based multiple alignment with paralogs. Genome Res. 18, 1814–1828 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Flicek, P. et al. Ensembl 2013. Nucleic Acids Res. 41, D48–D55 (2013).

    Article  CAS  PubMed  Google Scholar 

  46. R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2014).

  47. Blanchette, M. et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14, 708–715 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Murphy, W.J. et al. Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science 294, 2348–2351 (2001).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank D. Gordenin for very helpful comments. The Genome of the Netherlands (GoNL) Project is funded by the Biobanking and Biomolecular Research Infrastructure (BBMRI-NL), which is financed by the Netherlands Organization for Scientific Research (NWO project 184.021.007). S.R.S., P.P.P. and S.C. are funded by US National Institutes of Health grants 1 R01 MH101244 and 1 R01 GM078598.

Author information

Authors and Affiliations

Authors

Consortia

Contributions

S.R.S. and P.I.W.d.B. planned and directed the research. L.C.F. called and filtered the mutations. W.P.K. and I.R. validated candidate mutations. L.C.F. designed and executed the simulations. A.K., L.C.F. and A.M. performed replication timing analyses. P.P.P., L.C.F. and A.M. analyzed factors influencing regional mutation rates and spectra. L.C.F. and P.P.P. analyzed mutation clusters. P.P.P. and P.F.A. computed the comparative genomics model and compared it against observed mutation rates. S.C. and P.P.P. created the mutation rate map. L.C.F., P.P.P., A.K., P.I.W.d.B. and S.R.S. wrote the manuscript. A.M., S.C., C.M.v.D., M.S., C.W., G.v.O., P.E.S., D.I.B., K.Y., V.G., P.F.A. and W.P.K. provided critical feedback on the manuscript.

Corresponding authors

Correspondence to Paul I W de Bakker or Shamil R Sunyaev.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

A full list of members and affiliations appears in the Supplementary Note.

Integrated supplementary information

Supplementary Figure 1 Association of paternal age of de novo mutation location with epigenetic variables.

Using a linear regression model, we tested the association of seven epigenetic variables (DNA replication timing, expression levels, recombination rates, and the H3K27ac, H3K4me1 and H3K4me3 histone modifications), while correcting for GC content, CpG status and sequence coverage. Here we plot the significance of the associations we found with the different epigenetic variables along with the significance threshold level after Bonferroni correction for the six tests we performed (gray dashed line).

Supplementary Figure 2 Separation of replication timing profiles between the offspring of younger and older fathers.

We separated our de novo mutation data in two groups based on paternal age. To select a threshold that maximizes the difference in the groups, we considered every integer age in our study as a possible threshold and applied a Kolmogorov-Smirnoff test to compare the distribution of replication timing of de novo mutations between the groups. This plot shows the P values obtained for each of the 27 tests as well as the significance threshold after Bonferroni correction (gray dashed line).

Supplementary Figure 3 Paternal age effect on de novo mutation replication timing measured in six cell types.

The distribution of replication timing around de novo mutations in the offspring of younger fathers (orange curves; <28 years old), older fathers (blue curves; ≥28 years old) and simulations (gray areas; 200 simulation sets of 11,020 mutations) in 3 cell types from 4 cell lines41: embryonic stem cells (BG01 ESC, H7 ESC), induced pluripotent stem cells (iPS4) and neural precursor cells (BG01 NPC).

Supplementary Figure 4 Paternal and maternal age effect estimates on replication timing based on de novo mutations with known parental origin.

All effect estimates were computed using a linear regression model. The red line represents the effect estimate from the 630 maternal mutations. The black curve shows the effect estimate for 10,000 samplings of 630 paternal mutations out of the 1,991 available. The effect estimate is significantly larger for paternal mutations (P = 0.0019) when considering the same number of mutations.

Supplementary Figure 5 Power to detect non-CpG mutation depletion in regulatory regions marked by DNase I–hypersensitive sites (DHSs).

The graph shows the power to detect non-CpG mutation depletion in regulatory regions marked by DNase I–hypersensitive sites (DHSs) using the 9,048 non-CpG observed de novo mutations in our data and 177,347 uniformly simulated mutations. The y axis illustrates the power for detecting an effect at the 0.05 significance level using a χ2 test for different effect sizes (here expressed as the relative depletion of mutations in DHSs when compared to other regions).

Supplementary Figure 6 Comparison of nucleotide context–specific mutation rates based on comparative genomics and observed de novo mutations.

Each point represents one of the 96 substitutions in a specific trinucleotide context. The black line shows the best fit (r2 = 0.993). The rate of transition mutations is 2.15 times greater than the transversion rate (Ti/Tv ratio). The highest mutation rates are observed for cytosine bases in a CpG context.

Supplementary Figure 7 Mutational spectrum in transcribed regions.

The proportion of de novo mutations of each substitution type in transcribed regions classified based on their corresponding strand (transcribed or non-transcribed). There is a strong asymmetry of mutations between the two strands, with significantly elevated A>G substitutions on the transcribed strand, consistent with the action of transcription-coupled nucleotide excision repair.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–7, Supplementary Tables 1–4 and Supplementary Note. (PDF 1185 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Francioli, L., Polak, P., Koren, A. et al. Genome-wide patterns and properties of de novo mutations in humans. Nat Genet 47, 822–826 (2015). https://doi.org/10.1038/ng.3292

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.3292

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing