Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A mutation rate model at the basepair resolution identifies the mutagenic effect of polymerase III transcription

Abstract

De novo mutations occur at substantially different rates depending on genomic location, sequence context and DNA strand. The success of methods to estimate selection intensity, infer demographic history and map rare disease genes, depends strongly on assumptions about the local mutation rate. Here we present Roulette, a genome-wide mutation rate model at basepair resolution that incorporates known determinants of local mutation rate. Roulette is shown to be more accurate than existing models. We use Roulette to refine the estimates of population growth within Europe by incorporating the full range of human mutation rates. The analysis of significant deviations from the model predictions revealed a tenfold increase in mutation rate in nearly all genes transcribed by polymerase III (Pol III), suggesting a new mutagenic mechanism. We also detected an elevated mutation rate within transcription factor binding sites restricted to sites actively used in testis and residing in promoters.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Roulette accounts for extended nucleotide context, strand asymmetries and local variation in mutation rate.
Fig. 2: Roulette outperforms existing mutational models, under both per-gene and per-site metrics.
Fig. 3: Accurate per-site mutation rate estimates improve population genetic inference.
Fig. 4: Pol III transcripts are mutational hotspots.
Fig. 5: TFBS are prone to high mutation rate.

Similar content being viewed by others

Data availability

Polymorphism data used in the study is freely available at https://gnomad.broadinstitute.org/.

De novo mutations have been aggregated from supplementary materials to refs. 13,18.

Mutation rate estimates for autosomes http://genetics.bwh.harvard.edu/downloads/Vova/Roulette/.

Shet values, which measure gene constraints, recalculated with the help of Roulette could be found here http://genetics.bwh.harvard.edu/genescores/selection.html.

Code availability

All the code used to perform the analysis is available at https://github.com/vseplyarskiy/Roulette.

References

  1. Hodgkinson, A. & Eyre-Walker, A. Variation in the mutation rate across mammalian genomes. Nat. Rev. Genet. 12, 756–766 (2011).

    Article  CAS  PubMed  Google Scholar 

  2. Terekhanova, N. V., Seplyarskiy, V. B., Soldatov, R. A. & Bazykin, G. A. Evolution of local mutation rate and its determinants. Mol. Biol. Evol. 34, 1100–1109 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  3. Seplyarskiy, V. B. & Sunyaev, S. The origin of human mutation in light of genomic data. Nat. Rev. Genet. 22, 672–686 (2021).

    Article  CAS  PubMed  Google Scholar 

  4. Agarwal, I. & Przeworski, M. Signatures of replication timing, recombination, and sex in the spectrum of rare variants on the human X chromosome and autosomes. Proc. Natl Acad. Sci. USA 116, 17916–17924 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Seplyarskiy, V. B. et al. Population sequencing data reveal a compendium of mutational processes in the human germ line. Science 373, 1030–1035 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Ehrlich, M. et al. DNA cytosine methylation and heat-induced deamination. Biosci. Rep. 6, 387–393 (1986).

  9. Aggarwala, V. & Voight, B. F. An expanded sequence context model broadly explains variability in polymorphism levels across the human genome. Nat. Genet. 48, 349–355 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Carlson, J. et al. Extremely rare variants reveal patterns of germline mutation rate heterogeneity in humans. Nat. Commun. 9, 3753 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Bethune, J., Kleppe, A. & Besenbacher, S. A method to build extended sequence context models of point mutations and indels. Nat. Commun. 13, 7884 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Fang, Y., Deng, S. & Li, C. A generalizable deep learning framework for inferring fine-scale germline mutation rate maps. Nat. Mach. Intell. 4, 1209–1223 (2022).

    Article  Google Scholar 

  13. Halldorsson, B. V. et al. Characterizing mutagenic effects of recombination through a sequence-level genetic map. Science 363, eaau1043 (2019).

    Article  CAS  PubMed  Google Scholar 

  14. Goldmann, J. M. et al. Germline de novo mutation clusters arise during oocyte aging in genomic regions with high double-strand-break incidence. Nat. Genet. 50, 487–492 (2018).

    Article  CAS  PubMed  Google Scholar 

  15. Marteijn, J. A., Lans, H., Vermeulen, W. & Hoeijmakers, J. H. J. Understanding nucleotide excision repair and its roles in cancer and ageing. Nat. Rev. Mol. Cell Biol. 15, 465–481 (2014).

    Article  CAS  PubMed  Google Scholar 

  16. Seplyarskiy, V. B. et al. Error-prone bypass of DNA lesions during lagging-strand replication is a common source of germline and cancer mutations. Nat. Genet. 51, 36 (2019).

    Article  CAS  PubMed  Google Scholar 

  17. Kaplanis, J. et al. Evidence for 28 genetic disorders discovered by combining healthcare and research data. Nature 586, 757–762 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. An, J.-Y. et al. Genome-wide de novo risk score implicates promoter variation in autism spectrum disorder. Science 362, eaat6576 (2018).

    Article  PubMed  Google Scholar 

  19. Satterstrom, F. K. et al. Large-scale exome sequencing study implicates both developmental and functional changes in the neurobiology of autism. Cell 180, 568–584 (2020).

    Article  CAS  PubMed  Google Scholar 

  20. Weghorn, D. et al. Applicability of the mutation-selection balance model to population genetics of heterozygous protein-truncating variants in humans. Mol. Biol. Evol. 36, 1701–1710 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Dukler, N. et al. Extreme purifying selection against point mutations in the human genome. Nat. Commun. 13, 4312 (2022).

  22. Lee, S. Y. et al. The shaping of cancer genomes with the regional impact of mutation processes. Exp. Mol. Med. 54, 1049–1060 (2022).

  23. Xia, B. et al. Widespread transcriptional scanning in the testis modulates gene evolution rates. Cell 180, 248–262 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Mao, P. et al. ETS transcription factors induce a unique UV damage signature that drives recurrent mutagenesis in melanoma. Nat. Commun. 9, 2626 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Perera, D. et al. Differential DNA repair underlies mutation hotspots at active promoters in cancer genomes. Nature 532, 259–263 (2016).

    Article  CAS  PubMed  Google Scholar 

  26. Sabarinathan, R. et al. Nucleotide excision repair is impaired by binding of transcription factors to DNA. Nature 532, 264–267 (2016).

  27. Wakeley, J., Fan, W. L., Koch, E. & Sunyaev, S. Recurrent mutation in the ancestry of a rare variant. Genetics 224, iyad049 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  28. Hodgkinson, A., Ladoukakis, E. & Eyre-Walker, A. Cryptic variation in the human mutation rate. PLoS Biol. 7, e1000027 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  29. Seplyarskiy, V. B., Kharchenko, P., Kondrashov, A. S. & Bazykin, G. A. Heterogeneity of the transition/transversion ratio in Drosophila and Hominidae genomes. Mol. Biol. Evol. 29, 1943–1955 (2012).

    Article  CAS  PubMed  Google Scholar 

  30. Johnson, P. L. F. & Hellmann, I. Mutation rate distribution inferred from coincident SNPs and coincident substitutions. Genome Biol. Evol. 3, 842–850 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Nagelkerke, N. J. D. A note on a general definition of the coefficient of determination. Biometrika 78, 691–692 (1991).

    Article  Google Scholar 

  32. Cassa, C. A. et al. Estimating the selective effects of heterozygous protein-truncating variants from human exome data. Nat. Genet. 49, 806–810 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Gao, F. & Keinan, A. Explosive genetic evidence for explosive human population growth. Curr. Opin. Genet. Dev. 41, 130–139 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Gutenkunst, R. N., Hernandez, R. D., Williamson, S. H. & Bustamante, C. D. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 5, e1000695 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Crow, J. F. & Kimura, M. An Introduction to Population Genetics Theory (The Blackburn Press, 2009).

  36. Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Harpak, A., Bhaskar, A. & Pritchard, J. K. Mutation rate variation is a primary determinant of the distribution of allele frequencies in humans. PLoS Genet. 12, e1006489 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  38. Agarwal, I. & Przeworski, M. Mutation saturation for fitness effects at human CpG sites. eLife 10, e71513 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Thornlow, B. P. et al. Transfer RNA genes experience exceptionally elevated mutation rates. Proc. Natl Acad. Sci. USA 115, 8996–9001 (2018).

  40. Zhang, X.-O., Gingeras, T. R. & Weng, Z. Genome-wide analysis of polymerase III–transcribed Alu elements suggests cell-type-specific enhancer function. Genome Res. 29, 1402–1414 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Jinks-Robertson, S. & Bhagwat, A. S. Transcription-associated mutagenesis. Annu. Rev. Genet. 48, 341–359 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Abascal-Palacios, G. et al. Structural basis of RNA polymerase III transcription initiation. Nature 553, 301–306 (2018).

  43. Reijns, M. A. M. et al. Lagging strand replication shapes the mutational landscape of the genome. Nature 518, 502–506 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Vierstra, J. et al. Global reference mapping of human transcription factor footprints. Nature 583, 729–736 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Sasani, T. A. et al. A natural mutator allele shapes mutation spectrum variation in mice. Nature 605, 497–502 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Jónsson, H. et al. Parental influence on human germline de novo mutations in 1,548 trios from Iceland. Nature 549, 519–522 (2017).

    Article  PubMed  Google Scholar 

  47. Chen, Y.-H. et al. Transcription shapes DNA replication initiation and termination in human cells. Nat. Struct. Mol. Biol. 26, 67–77 (2019).

    Article  CAS  PubMed  Google Scholar 

  48. GTEx Consortium. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).

    Article  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank J. Wakeley and L. Fan for helpful suggestions on population genetics theory. We thank D. J. Balick for providing a forward Wright–Fisher simulator. This research was supported by National Institutes of Health under grants R35-GM127131, R01-MH101244, U01-HG012009 and R01-HG010372 along with funding from NGM Biopharmaceuticals. D.J.L. was supported by NLM T15LM007092.

Author information

Authors and Affiliations

Authors

Contributions

V.S., E.M.K. and D.J.L. analyzed the data. V.S., E.M.K., D.J.L. and S.S. wrote the paper. All authors designed the study and read and corrected the paper. V.S., E.M.K. and D.J.L. have agreed to alternate the order of their names for respective individual citations.

Corresponding author

Correspondence to Shamil R. Sunyaev.

Ethics declarations

Competing interests

J.S.L. and H.H.L. are employed by NGM Biopharmaceuticals Inc. V.S., E.M.K. and S.R.S. are partially funded by NGM Biopharmaceuticals Inc. D.J.L. declares no competing interests.

Peer review

Peer review information

Nature Genetics thanks Martin Taylor and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Effect of replication fork direction on the rate of rare synonymous SNVs.

Four contexts with strongest replication asymmetry. Mutation rate calculated for the regions with the strongest replication fork polarity (top quartile). Mutation rate is relative to the least mutable strand. Error bars show 95% confidence intervals for the ratio of two Poisson variables.

Extended Data Fig. 2 Roulette captures mutation rate variation associated with epigenetic features.

Ten pairs of mutation type and epigenetic features with the strongest effects on mutation rate. To generate bins, we subdivided the genome into five equal size bins by the value of genomic features and then calculated observed and expected mutation rates for each trinucleotide context among synonymous sites. This test was performed on synonymous SNVs and mutation rates were normalized to the rate observed in the first epigenetic bin. RT stands for replication timing. Overall, we analyzed the effect of replication timing, H3k27me3, H3k27me1 and recombination.

Extended Data Fig. 3 Roulette captures accelerated mutation rate in ‘maternal’ regions.

De novo mutation rate inside and outside of maternal regions. Maternal regions are defined as in ref. 5.

Extended Data Fig. 4 Roulette predicts the rate of triallelic SNVs.

Multiple derived alleles could co-occur in the same genomic site. Using Roulette, we predicted the probability of a site containing two derived variants simultaneously (triallelic site) by multiplying the probabilities of each derived allele (this is the correct procedure if derived alleles accumulated independently). In contrast to early studies of multiallelic variants, we do not find deviation from independence.

Extended Data Fig. 5 Pseudo-R2 for noncoding regions.

Pseudo-R2 is calculated for noncoding regions for two datasets: gnomAD v3 and UK Biobank. Since Roulette was trained on noncoding variants from the gnomAD v3, it is expected that Roulette performs better for noncoding variants than synonymous variants. De novo sequencing and UK Biobank population sequencing is an independent dataset from trained data.

Extended Data Fig. 6 An elevated number of de novo mutations at sites with observed SNVs.

Sites were divided into mutation rate bins for the three different models. De novo mutation rates were calculated from whole-genome family sequencing data. Horizontal bars represent 95% Poisson confidence intervals for the de novo mutation rate within each bin. Vertical bars represent 95% confidence intervals for the ratio of Poisson rates between SNV and non-SNV sites within each bin.

Extended Data Fig. 7 Recurrence affects site frequency spectra (SFS).

Proportion of sites in five different classes: monomorphic sites, singletons, doubletons, tripletons and other SNVs with higher allele counts. X-axis shows the per-generation mutation rate, as estimated by Roulette. The dotted line is the expected trend under the infinite sites model.

Extended Data Fig. 8 Roulette performance at different DNA regions, as annotated by ENCODE.

Observed to expected ratio of rare SNVs at different ENCODE annotations. PLS stands for promoters, ELS for enhancers, pPLS and pELS are proximal promoters/enhancers (less than 2 KB from transcription start site), dPLS and dELS (more than 2 KB from transcription start site), DNAse-H3K4me3 are sites that are both hypersensitive to DNase and have signal of H3K4me3, CTCF stands for binding sites of CTCF, multiple labels corresponding to overlapping annotations.

Extended Data Fig. 9 Mutation rate around RNU genes.

Shaded area is 95% Poisson confidence intervals.

Extended Data Fig. 10

Deviation from Roulette’s predictions for three hypermutable classes of genes (RNU, tRNA and Imunoglobulins) and for other sites in the genome (Remaning genome).

Supplementary information

Supplementary Information

Supplementary methods and Supplementary Figs. 1–12.

Reporting Summary

Supplementary Tables

Supplementary Tables 1–3.

Supplementary Data

Data to draw figures.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Seplyarskiy, V., Koch, E.M., Lee, D.J. et al. A mutation rate model at the basepair resolution identifies the mutagenic effect of polymerase III transcription. Nat Genet 55, 2235–2242 (2023). https://doi.org/10.1038/s41588-023-01562-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-023-01562-0

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing