Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

A natural mutator allele shapes mutation spectrum variation in mice


Although germline mutation rates and spectra can vary within and between species, common genetic modifiers of the mutation rate have not been identified in nature1. Here we searched for loci that influence germline mutagenesis using a uniquely powerful resource: a panel of recombinant inbred mouse lines known as the BXD, descended from the laboratory strains C57BL/6J (B haplotype) and DBA/2J (D haplotype). Each BXD lineage has been maintained by brother–sister mating in the near absence of natural selection, accumulating de novo mutations for up to 50 years on a known genetic background that is a unique linear mosaic of B and D haplotypes2. We show that mice inheriting D haplotypes at a quantitative trait locus on chromosome 4 accumulate C>A germline mutations at a 50% higher rate than those inheriting B haplotypes, primarily owing to the activity of a C>A-dominated mutational signature known as SBS18. The B and D quantitative trait locus haplotypes encode different alleles of Mutyh, a DNA repair gene that underlies the heritable cancer predisposition syndrome that causes colorectal tumors with a high SBS18 mutation load3,4. Both B and D Mutyh alleles are present in wild populations of Mus musculus domesticus, providing evidence that common genetic variation modulates germline mutagenesis in a model mammalian species.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Rent or buy this article

Prices vary by article type



Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Accumulation of homozygous singletons over many generations of laboratory inbreeding.
Fig. 2: A QTL on chromosome 4 for the germline C>A mutation rate.
Fig. 3: Context-dependent C>A mutation enrichment in lines with the D haplotype at the C>A QTL.
Fig. 4: Nonsynonymous differences between B and D Mutyh alleles segregate in both wild and inbred mouse strains and appear to be ancestral in DBA/2J.

Data availability

BXD mutations and other data files necessary to reproduce the manuscript are available at (archived at Zenodo ( A VCF file containing variant calls from the sequenced BXDs is available in the European Nucleotide Archive with project accession PRJEB45429. The germline mutation calls from TOY-KO triple knockout mice24 are available as supplementary data file 1 from The SBS18 COSMIC mutation signature data are available at the COSMIC web page: The strain-private mutation data from Dumont17 are available as supplementary data from the following: The wild mouse data from Harr et al.35 are available at, as described in the manuscript at The mm10/GRCm38 reference genome used for these analyses is version GCA_000001635.2, and can be obtained at

Code Availability

All code used for data analysis and figure generation is deposited at (archived at Zenodo (


  1. Lynch, M. et al. Genetic drift, selection and the evolution of the mutation rate. Nat. Rev. Genet. 17, 704–714 (2016).

    Article  CAS  PubMed  Google Scholar 

  2. Ashbrook, D. G. et al. A platform for experimental precision medicine: the extended BXD mouse family. Cell Syst. 12, 235–247.e9 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Viel, A. et al. A specific mutational signature associated with DNA 8-oxoguanine persistence in MUTYH-defective colorectal cancer. eBioMedicine 20, 39–49 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  4. Pilati, C. et al. Mutational signature analysis identifies MUTYH deficiency in colorectal cancers and adrenocortical carcinomas. J. Pathol. 242, 10–15 (2017).

    Article  CAS  PubMed  Google Scholar 

  5. Sniegowski, P. D., Gerrish, P. J. & Lenski, R. E. Evolution of high mutation rates in experimental populations of E. coli. Nature 387, 703–705 (1997).

    Article  CAS  PubMed  ADS  Google Scholar 

  6. Dawson, K. J. Evolutionarily stable mutation rates. J. Theor. Biol. 194, 143–157 (1998).

    Article  CAS  PubMed  ADS  Google Scholar 

  7. Sasani, T. A. et al. Large, three-generation human families reveal post-zygotic mosaicism and variability in germline mutation accumulation. eLife 8, e46922 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Rahbari, R. et al. Timing, rates and spectra of human germline mutation. Nat. Genet. 48, 126–133 (2016).

    Article  CAS  PubMed  Google Scholar 

  9. Kessler, M. D. et al. De novo mutations across 1,465 diverse genomes reveal mutational insights and reductions in the Amish founder population. Proc. Natl Acad. Sci. USA 117, 2560–2569 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Robinson, P. S. et al. Increased somatic mutation burdens in normal human cells due to defective DNA polymerases. Nat. Genet. 53, 1434–1442 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Harris, K. Evidence for recent, population-specific evolution of the human mutation rate. Proc. Natl Acad. Sci. USA 112, 3439–3444 (2015).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  12. Harris, K. & Pritchard, J. K. Rapid evolution of the human mutation spectrum. eLife 6, 415 (2017).

    Article  Google Scholar 

  13. Mathieson, I. & Reich, D. Differences in the rare variant spectrum among human populations. PLoS Genet. 13, e1006581 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  14. Jónsson, H. et al. Parental influence on human germline de novo mutations in 1,548 trios from Iceland. Nature 549, 519–522 (2017).

    Article  PubMed  ADS  CAS  Google Scholar 

  15. Ségurel, L., Wyman, M. J. & Przeworski, M. Determinants of mutation rate variation in the human germline. Annu. Rev. Genomics Hum. Genet. 15, 47–70 (2014).

    Article  PubMed  CAS  Google Scholar 

  16. Halligan, D. L. & Keightley, P. D. Spontaneous mutation accumulation studies in evolutionary genetics. Annu. Rev. Ecol. 40, 151–172 (2009).

    Article  Google Scholar 

  17. Dumont, B. L. Significant strain variation in the mutation spectra of inbred laboratory mice. Mol. Biol. Evol. 36, 865–874 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Lindsay, S. J., Rahbari, R., Kaplanis, J., Keane, T. & Hurles, M. E. Similarities and differences in patterns of germline mutation between mice and humans. Nat. Commun. 10, 4053 (2019).

    Article  PubMed  PubMed Central  ADS  Google Scholar 

  19. Broman, K. W. et al. R/qtl2: software for mapping quantitative trait loci with high-dimensional data and multiparent populations. Genetics 211, 495–502 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6, 80–92 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Gene Ontology Consortium. The Gene Ontology resource: enriching a GOld mine. Nucleic Acids Res. 49, D325–D334 (2021).

    Article  CAS  Google Scholar 

  22. Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).

  23. David, S. S., O’Shea, V. L. & Kundu, S. Base-excision repair of oxidative DNA damage. Nature 447, 941–950 (2007).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  24. Ohno, M. et al. 8-oxoguanine causes spontaneous de novo germline mutations in mice. Sci. Rep. 4, 4689 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  25. Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  26. Georgeson, P. et al. Evaluating the utility of tumour mutational signatures for identifying hereditary colorectal cancer and polyposis syndrome carriers. Gut 70, 2138–2149 (2021).

    Article  CAS  PubMed  Google Scholar 

  27. Islam, S. M. A. et al. Uncovering novel mutational signatures by de novo extraction with SigProfilerExtractor. Preprint at (2021).

  28. Mulligan, M. K., Mozhui, K., Prins, P. & Williams, R. W. GeneNetwork: a toolbox for systems genetics. Methods Mol. Biol. 1488, 75–120 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Keane, T. M. et al. Mouse genomic variation and its effect on phenotypes and gene regulation. Nature 477, 289–294 (2011).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  30. Segovia, R., Shen, Y., Lujan, S. A., Jones, S. J. M. & Stirling, P. C. Hypermutation signature reveals a slippage and realignment model of translesion synthesis by Rev3 polymerase in cisplatin-treated yeast. Proc. Natl Acad. Sci. USA 114, 2663–2668 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Landrum, M. J. et al. ClinVar: improvements to accessing data. Nucleic Acids Res. 48, D835–D844 (2020).

    Article  CAS  PubMed  Google Scholar 

  32. Choi, Y. & Chan, A. P. PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics 31, 2745–2747 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Ng, P. C. & Henikoff, S. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 31, 3812–3814 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Yang, H. et al. Subspecific origin and haplotype diversity in the laboratory mouse. Nat. Genet. 43, 648–655 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Harr, B. et al. Genomic resources for wild populations of the house mouse, Mus musculus and its close relative Mus spretus. Sci. Data 3, 160075 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  36. Huber, C. D., Kim, B. Y., Marsden, C. D. & Lohmueller, K. E. Determining the factors driving selective effects of new nonsynonymous mutations. Proc. Natl Acad. Sci. USA 114, 4465–4470 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Geraldes, A. et al. Inferring the history of speciation in house mice from autosomal, X-linked, Y-linked and mitochondrial genes. Mol. Ecol. 17, 5349–5363 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  38. Phifer-Rixey, M. et al. Adaptive evolution and effective population size in wild house mice. Mol. Biol. Evol. 29, 2949–2955 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Gou, L., Bloom, J. S. & Kruglyak, L. The genetic basis of mutation rate variation in yeast. Genetics 211, 731–740 (2019).

    Article  CAS  PubMed  Google Scholar 

  40. Jiang, P. et al. A modified fluctuation assay reveals a natural mutator phenotype that drives mutation spectrum variation within Saccharomyces cerevisiae. Evol. Biol. Genet. Genomics 10, e68285 (2021).

    CAS  Google Scholar 

  41. Robinson, P. S. et al. Inherited MUTYH mutations cause elevated somatic mutation rates and distinctive mutational signatures in normal human cells. Preprint at (2021).

  42. Goldberg, M. E. & Harris, K. Mutational signatures of replication timing and epigenetic modification persist through the global divergence of mutation spectra across the great ape phylogeny. Genome Biol. Evol. 14, evab104 (2021).

    Article  PubMed Central  Google Scholar 

  43. Köster, J. & Rahmann, S. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics 28, 2520–2522 (2012).

    Article  PubMed  CAS  Google Scholar 

  44. DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Wang, X. et al. High-throughput sequencing of the DBA/2J mouse genome. BMC Bioinf. 11, O7 (2010).

    Article  Google Scholar 

  46. Pedersen, B. S. & Quinlan, A. R. cyvcf2: fast, flexible variant analysis with Python. Bioinformatics 33, 1867–1869 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. DeWitt, W. S. mutyper: assigning and summarizing mutation types for analyzing germline mutation spectra. Preprint at (2020).

  48. Neph, S. et al. BEDOPS: high-performance genomic feature operations. Bioinformatics 28, 1919–1920 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Papadopoulos, J. S. & Agarwala, R. COBALT: constraint-based alignment tool for multiple protein sequences. Bioinformatics 23, 1073–1079 (2007).

    Article  CAS  PubMed  Google Scholar 

  50. Suyama, M., Torrents, D. & Bork, P. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 34, W609–W612 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Huerta-Cepas, J., Serra, F. & Bork, P. ETE 3: reconstruction, analysis, and visualization of phylogenomic data. Mol. Biol. Evol. 33, 1635–1638 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Cock, P. J. A. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references


We thank B. Dumont, M. Gymrek, M. Mortazavi, A. Clark, D. Rothschild, U. Arora, M. Maksimov, H. Chen and J. Sebat for contributing helpful feedback as part of the BXD Genome Sequencing Consortium; M. Przeworski for providing comments on a manuscript draft; Y.-Y. Ren for assistance with preliminary genotype calling; members of the Harris and Pritchard laboratories for additional helpful discussions; the staff of HudsonAlpha—S. Levy and team—for DNA library preparation and sequencing, and for providing us with great support on data transfer; staff at the UT ISAAC facility for storage and processing of all sequence-associated data files; A. Centeno for assisting with the upload of phenotype data to GeneNetwork; C. Lutz and A. Valenzuela and J. Ingels at UTHSC for DNA sample acquisition, handling and assistance. We acknowledge support from NIH T32 Postdoctoral Training Grant 5T32HG000035-25 (to T.A.S.), a University of Tennessee Center for Integrative and Translational Genomics grant (to R.W.W., D.G.A. and L.L.), NIH Biological Mechanisms of Healthy Aging Training Grant T32AG066574 (to A.C.B.), NIH NIDA grant P50DA037844 (to A.A.P.), NIH NIDA grant U01DA051234 (to A.A.P.), NIH NIGMS grant 1R01GM123489 (to R.W.W.), NIH NIDA grant P30 DA044223 (to R.W.W.), NIH R01 HG008140 (to J.K.P.), NIH NIGMS grant 1R35GM133428-01 (to K.H.), a Searle Scholarship (to K.H.), a Sloan Research Fellowship (to K.H.), a Burroughs Wellcome Fund Career Award at the Scientific Interface (to K.H.) and a Pew Biomedical Scholarship (to K.H.).

Author information

Authors and Affiliations



T.A.S., D.G.A., A.A.P., R.W.W., J.K.P. and K.H. contributed to study conceptualization and design. D.G.A., L.L., J.K.P., A.A.P. and R.W.W. contributed to data curation, including maintenance of the BXD family, DNA sequencing and data warehousing. T.A.S., D.G.A., K.H. and A.C.B. contributed to formal data analysis. T.A.S. and K.H. wrote the original draft of the manuscript. T.A.S., D.G.A., A.C.B., A.A.P., R.W.W., J.K.P. and K.H. contributed to review and editing of the manuscript.

Corresponding author

Correspondence to Kelley Harris.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature thanks Hákon Jónsson and the other, anonymous reviewer(s) for their contribution to the peer review of this work. Peer review reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Cross design for BXD RIL construction.

(a) BXDs derived from F2 crosses were subject to many generations of brother-sister mating in order to generate inbred RILs. The genomes of the parents of the BXD crosses (DBA/2J and C57BL/6J) are largely derived from Mus musculus domesticus. M. m. d. is Mus musculus domesticus, M. s. is Mus spretus, M. m. m. is Mus musculus musculus, and M. m. c. is Mus musculus castaneus. (b) To generate advanced intercross lines (AILs), pseudo-random pairs of F2 animals were crossed for N generations, and then subject to many generations of brother-sister mating to generate inbred RILs.

Extended Data Fig. 2 Generation times in the BXD lines.

a) The elapsed time since the founding of each BXD line (n = 130 biologically independent animals) was calculated by subtracting its initial breeding date from 2017. The elapsed number of years was then divided by the cumulative number of generations of inbreeding undergone by the line, to obtain an estimate of the line’s generation time in years. Boxplots are centered at the median of each distribution, with lower and upper hinges corresponding to the 25th to 75th percentiles (i.e., first and third quartiles), and whiskers extending to no further than 1.5 times the interquartile range from either hinge; data points outside of the range defined by the whiskers are displayed as individual points. b) A linear model predicting the C>A singleton fraction of each line as a function of both generation time and the line’s epoch of origin was trained using the BXD singleton mutations. C>A fraction is not significantly correlated with generation time (F = 0.055, DoF = 1, p = 0.815).

Extended Data Fig. 3 Singletons are enriched in highly conserved regions of the genome.

Cumulative distributions of phastCons conservation probabilities of either singletons (n = 47,659 mutations) or “fixed” variants (n = 81,186 mutations) that were randomly sampled from non-overlapping 50-kbp windows across the genome. The latter were present in a founder genome and inherited by all BXDs with the founder’s haplotype at that site. P-value of one-sided Kolmogorov-Smirnov test comparing distributions of phastCons scores is 3.8 x 10−53. Shaded area around each line indicates the bootstrap 95% confidence interval.

Extended Data Fig. 4 Results of QTL scans for other mutation rate phenotypes.

a) Using the same BXD lines and covariates as described in the Online Methods, a QTL scan was performed for the overall mutation rate of each line. The green dashed line indicates the genome-wide significance threshold using 1,000 permutations (Bonferroni-corrected alpha = 0.05/15). b)Using the same BXD lines and covariates as described in the Online Methods, QTL scans were performed for the rates and fractions of all mutation types other than C>A. Green and blue dashed lines indicate the genome-wide significance thresholds for the rate and fraction scans, respectively, using 1,000 permutations (Bonferroni-corrected alpha = 0.05/15).

Extended Data Fig. 5 3-mer mutation sequence contexts enriched in DBA/2J-like Mouse Genomes Project strains.

For each mutation type defined by its 3-mer sequence context, we can compute its Log-2 compositional enrichment in BXD strains with D vs. B haplotypes at the QTL on chromosome 4, as well as its log-2 compositional enrichment in Sanger Mouse Genomes Project strains that are D-like vs. B-like. These two odds ratios are correlated, indicating that the same mutational signature is enriched in the BXD D strains and the D-like Sanger MGP strains. Mutation types significantly enriched in BXDs with D haplotypes are colored red, outlined in black and labeled.

Extended Data Fig. 6 Site frequency spectra of C>A mutations in the four Mus species/subspecies on chromosome 4.

The site frequency spectra of M.m. domesticus, M.m. castaneus, M.m. musculus, and M. spretus were computed using a dataset of publicly available wild mouse genomes and a polarized version of the GRCm38/mm10 reference genome. Mmd is Mus musculus domesticus, Ms is Mus spretus, Mmm is Mus musculus musculus, and Mmc is Mus musculus castaneus.

Extended Data Fig. 7 Comparisons of singleton spectra between wild M.m.domesticus and other wild species.

Log-2 ratios of singleton fractions of each 3-mer mutation type in Mus musculus domesticus, compared to three other wild subspecies or species of Mus. Comparisons with Chi-square test of independence p-values < 0.05/96 are annotated with white circles.

Extended Data Fig. 8 Comparisons of singleton spectra between wild mouse populations in the genomic neighborhood of Mutyh.

Singleton fractions of each mutation type in each wild species or subspecies were computed in 50-kilobase pair windows in the QTL interval surrounding Mutyh (114.8 Mbp to 118.3 Mbp). The median absolute deviations of C>A fractions in the species or subspecies were: 0.00985 (Mmc), 0.0195 (Mmd), 0.0155 (Mmm), and 0.0214 (Ms).

Extended Data Table 1 Numbers and provenance of BXD lines analyzed in this manuscript
Extended Data Table 2 Mutyh missense mutations in the BXD family

Supplementary information

Supplementary Information

This file contains Supplementary text, tables and references.

Reporting Summary

Peer Review File

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sasani, T.A., Ashbrook, D.G., Beichman, A.C. et al. A natural mutator allele shapes mutation spectrum variation in mice. Nature 605, 497–502 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing