Although germline mutation rates and spectra can vary within and between species, common genetic modifiers of the mutation rate have not been identified in nature1. Here we searched for loci that influence germline mutagenesis using a uniquely powerful resource: a panel of recombinant inbred mouse lines known as the BXD, descended from the laboratory strains C57BL/6J (B haplotype) and DBA/2J (D haplotype). Each BXD lineage has been maintained by brother–sister mating in the near absence of natural selection, accumulating de novo mutations for up to 50 years on a known genetic background that is a unique linear mosaic of B and D haplotypes2. We show that mice inheriting D haplotypes at a quantitative trait locus on chromosome 4 accumulate C>A germline mutations at a 50% higher rate than those inheriting B haplotypes, primarily owing to the activity of a C>A-dominated mutational signature known as SBS18. The B and D quantitative trait locus haplotypes encode different alleles of Mutyh, a DNA repair gene that underlies the heritable cancer predisposition syndrome that causes colorectal tumors with a high SBS18 mutation load3,4. Both B and D Mutyh alleles are present in wild populations of Mus musculus domesticus, providing evidence that common genetic variation modulates germline mutagenesis in a model mammalian species.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Mouse phenome database: curated data repository with interactive multi-population and multi-trait analyses
Mammalian Genome Open Access 15 August 2023
Nature Communications Open Access 29 June 2023
BMC Bioinformatics Open Access 15 May 2023
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
BXD mutations and other data files necessary to reproduce the manuscript are available at https://github.com/tomsasani/bxd_mutator_manuscript (archived at Zenodo (https://doi.org/10.5281/zenodo.5941048)). A VCF file containing variant calls from the sequenced BXDs is available in the European Nucleotide Archive with project accession PRJEB45429. The germline mutation calls from TOY-KO triple knockout mice24 are available as supplementary data file 1 from https://doi.org/10.1038/srep04689. The SBS18 COSMIC mutation signature data are available at the COSMIC web page: https://cancer.sanger.ac.uk/cosmic/signatures/SBS/SBS18.tt. The strain-private mutation data from Dumont17 are available as supplementary data from the following: https://doi.org/10.1093/molbev/msz026. The wild mouse data from Harr et al.35 are available at https://wwwuser.gwdg.de/~evolbio/evolgen/wildmouse/, as described in the manuscript at https://doi.org/10.1038/sdata.2016.75. The mm10/GRCm38 reference genome used for these analyses is version GCA_000001635.2, and can be obtained at https://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/mm10.fa.gz.
All code used for data analysis and figure generation is deposited at https://github.com/tomsasani/bxd_mutator_manuscript (archived at Zenodo (https://doi.org/10.5281/zenodo.5941048)).
Lynch, M. et al. Genetic drift, selection and the evolution of the mutation rate. Nat. Rev. Genet. 17, 704–714 (2016).
Ashbrook, D. G. et al. A platform for experimental precision medicine: the extended BXD mouse family. Cell Syst. 12, 235–247.e9 (2021).
Viel, A. et al. A specific mutational signature associated with DNA 8-oxoguanine persistence in MUTYH-defective colorectal cancer. eBioMedicine 20, 39–49 (2017).
Pilati, C. et al. Mutational signature analysis identifies MUTYH deficiency in colorectal cancers and adrenocortical carcinomas. J. Pathol. 242, 10–15 (2017).
Sniegowski, P. D., Gerrish, P. J. & Lenski, R. E. Evolution of high mutation rates in experimental populations of E. coli. Nature 387, 703–705 (1997).
Dawson, K. J. Evolutionarily stable mutation rates. J. Theor. Biol. 194, 143–157 (1998).
Sasani, T. A. et al. Large, three-generation human families reveal post-zygotic mosaicism and variability in germline mutation accumulation. eLife 8, e46922 (2019).
Rahbari, R. et al. Timing, rates and spectra of human germline mutation. Nat. Genet. 48, 126–133 (2016).
Kessler, M. D. et al. De novo mutations across 1,465 diverse genomes reveal mutational insights and reductions in the Amish founder population. Proc. Natl Acad. Sci. USA 117, 2560–2569 (2020).
Robinson, P. S. et al. Increased somatic mutation burdens in normal human cells due to defective DNA polymerases. Nat. Genet. 53, 1434–1442 (2021).
Harris, K. Evidence for recent, population-specific evolution of the human mutation rate. Proc. Natl Acad. Sci. USA 112, 3439–3444 (2015).
Harris, K. & Pritchard, J. K. Rapid evolution of the human mutation spectrum. eLife 6, 415 (2017).
Mathieson, I. & Reich, D. Differences in the rare variant spectrum among human populations. PLoS Genet. 13, e1006581 (2017).
Jónsson, H. et al. Parental influence on human germline de novo mutations in 1,548 trios from Iceland. Nature 549, 519–522 (2017).
Ségurel, L., Wyman, M. J. & Przeworski, M. Determinants of mutation rate variation in the human germline. Annu. Rev. Genomics Hum. Genet. 15, 47–70 (2014).
Halligan, D. L. & Keightley, P. D. Spontaneous mutation accumulation studies in evolutionary genetics. Annu. Rev. Ecol. 40, 151–172 (2009).
Dumont, B. L. Significant strain variation in the mutation spectra of inbred laboratory mice. Mol. Biol. Evol. 36, 865–874 (2019).
Lindsay, S. J., Rahbari, R., Kaplanis, J., Keane, T. & Hurles, M. E. Similarities and differences in patterns of germline mutation between mice and humans. Nat. Commun. 10, 4053 (2019).
Broman, K. W. et al. R/qtl2: software for mapping quantitative trait loci with high-dimensional data and multiparent populations. Genetics 211, 495–502 (2019).
Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6, 80–92 (2012).
Gene Ontology Consortium. The Gene Ontology resource: enriching a GOld mine. Nucleic Acids Res. 49, D325–D334 (2021).
Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).
David, S. S., O’Shea, V. L. & Kundu, S. Base-excision repair of oxidative DNA damage. Nature 447, 941–950 (2007).
Ohno, M. et al. 8-oxoguanine causes spontaneous de novo germline mutations in mice. Sci. Rep. 4, 4689 (2014).
Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).
Georgeson, P. et al. Evaluating the utility of tumour mutational signatures for identifying hereditary colorectal cancer and polyposis syndrome carriers. Gut 70, 2138–2149 (2021).
Islam, S. M. A. et al. Uncovering novel mutational signatures by de novo extraction with SigProfilerExtractor. Preprint at https://doi.org/10.1101/2020.12.13.422570 (2021).
Mulligan, M. K., Mozhui, K., Prins, P. & Williams, R. W. GeneNetwork: a toolbox for systems genetics. Methods Mol. Biol. 1488, 75–120 (2017).
Keane, T. M. et al. Mouse genomic variation and its effect on phenotypes and gene regulation. Nature 477, 289–294 (2011).
Segovia, R., Shen, Y., Lujan, S. A., Jones, S. J. M. & Stirling, P. C. Hypermutation signature reveals a slippage and realignment model of translesion synthesis by Rev3 polymerase in cisplatin-treated yeast. Proc. Natl Acad. Sci. USA 114, 2663–2668 (2017).
Landrum, M. J. et al. ClinVar: improvements to accessing data. Nucleic Acids Res. 48, D835–D844 (2020).
Choi, Y. & Chan, A. P. PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics 31, 2745–2747 (2015).
Ng, P. C. & Henikoff, S. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 31, 3812–3814 (2003).
Yang, H. et al. Subspecific origin and haplotype diversity in the laboratory mouse. Nat. Genet. 43, 648–655 (2011).
Harr, B. et al. Genomic resources for wild populations of the house mouse, Mus musculus and its close relative Mus spretus. Sci. Data 3, 160075 (2016).
Huber, C. D., Kim, B. Y., Marsden, C. D. & Lohmueller, K. E. Determining the factors driving selective effects of new nonsynonymous mutations. Proc. Natl Acad. Sci. USA 114, 4465–4470 (2017).
Geraldes, A. et al. Inferring the history of speciation in house mice from autosomal, X-linked, Y-linked and mitochondrial genes. Mol. Ecol. 17, 5349–5363 (2008).
Phifer-Rixey, M. et al. Adaptive evolution and effective population size in wild house mice. Mol. Biol. Evol. 29, 2949–2955 (2012).
Gou, L., Bloom, J. S. & Kruglyak, L. The genetic basis of mutation rate variation in yeast. Genetics 211, 731–740 (2019).
Jiang, P. et al. A modified fluctuation assay reveals a natural mutator phenotype that drives mutation spectrum variation within Saccharomyces cerevisiae. Evol. Biol. Genet. Genomics 10, e68285 (2021).
Robinson, P. S. et al. Inherited MUTYH mutations cause elevated somatic mutation rates and distinctive mutational signatures in normal human cells. Preprint at https://doi.org/10.1101/2021.10.20.465093 (2021).
Goldberg, M. E. & Harris, K. Mutational signatures of replication timing and epigenetic modification persist through the global divergence of mutation spectra across the great ape phylogeny. Genome Biol. Evol. 14, evab104 (2021).
Köster, J. & Rahmann, S. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics 28, 2520–2522 (2012).
DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
Wang, X. et al. High-throughput sequencing of the DBA/2J mouse genome. BMC Bioinf. 11, O7 (2010).
Pedersen, B. S. & Quinlan, A. R. cyvcf2: fast, flexible variant analysis with Python. Bioinformatics 33, 1867–1869 (2017).
DeWitt, W. S. mutyper: assigning and summarizing mutation types for analyzing germline mutation spectra. Preprint at https://doi.org/10.1101/2020.07.01.183392 (2020).
Neph, S. et al. BEDOPS: high-performance genomic feature operations. Bioinformatics 28, 1919–1920 (2012).
Papadopoulos, J. S. & Agarwala, R. COBALT: constraint-based alignment tool for multiple protein sequences. Bioinformatics 23, 1073–1079 (2007).
Suyama, M., Torrents, D. & Bork, P. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 34, W609–W612 (2006).
Huerta-Cepas, J., Serra, F. & Bork, P. ETE 3: reconstruction, analysis, and visualization of phylogenomic data. Mol. Biol. Evol. 33, 1635–1638 (2016).
Cock, P. J. A. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).
We thank B. Dumont, M. Gymrek, M. Mortazavi, A. Clark, D. Rothschild, U. Arora, M. Maksimov, H. Chen and J. Sebat for contributing helpful feedback as part of the BXD Genome Sequencing Consortium; M. Przeworski for providing comments on a manuscript draft; Y.-Y. Ren for assistance with preliminary genotype calling; members of the Harris and Pritchard laboratories for additional helpful discussions; the staff of HudsonAlpha—S. Levy and team—for DNA library preparation and sequencing, and for providing us with great support on data transfer; staff at the UT ISAAC facility for storage and processing of all sequence-associated data files; A. Centeno for assisting with the upload of phenotype data to GeneNetwork; C. Lutz and A. Valenzuela and J. Ingels at UTHSC for DNA sample acquisition, handling and assistance. We acknowledge support from NIH T32 Postdoctoral Training Grant 5T32HG000035-25 (to T.A.S.), a University of Tennessee Center for Integrative and Translational Genomics grant (to R.W.W., D.G.A. and L.L.), NIH Biological Mechanisms of Healthy Aging Training Grant T32AG066574 (to A.C.B.), NIH NIDA grant P50DA037844 (to A.A.P.), NIH NIDA grant U01DA051234 (to A.A.P.), NIH NIGMS grant 1R01GM123489 (to R.W.W.), NIH NIDA grant P30 DA044223 (to R.W.W.), NIH R01 HG008140 (to J.K.P.), NIH NIGMS grant 1R35GM133428-01 (to K.H.), a Searle Scholarship (to K.H.), a Sloan Research Fellowship (to K.H.), a Burroughs Wellcome Fund Career Award at the Scientific Interface (to K.H.) and a Pew Biomedical Scholarship (to K.H.).
The authors declare no competing interests.
Peer review information
Nature thanks Hákon Jónsson and the other, anonymous reviewer(s) for their contribution to the peer review of this work. Peer review reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
(a) BXDs derived from F2 crosses were subject to many generations of brother-sister mating in order to generate inbred RILs. The genomes of the parents of the BXD crosses (DBA/2J and C57BL/6J) are largely derived from Mus musculus domesticus. M. m. d. is Mus musculus domesticus, M. s. is Mus spretus, M. m. m. is Mus musculus musculus, and M. m. c. is Mus musculus castaneus. (b) To generate advanced intercross lines (AILs), pseudo-random pairs of F2 animals were crossed for N generations, and then subject to many generations of brother-sister mating to generate inbred RILs.
a) The elapsed time since the founding of each BXD line (n = 130 biologically independent animals) was calculated by subtracting its initial breeding date from 2017. The elapsed number of years was then divided by the cumulative number of generations of inbreeding undergone by the line, to obtain an estimate of the line’s generation time in years. Boxplots are centered at the median of each distribution, with lower and upper hinges corresponding to the 25th to 75th percentiles (i.e., first and third quartiles), and whiskers extending to no further than 1.5 times the interquartile range from either hinge; data points outside of the range defined by the whiskers are displayed as individual points. b) A linear model predicting the C>A singleton fraction of each line as a function of both generation time and the line’s epoch of origin was trained using the BXD singleton mutations. C>A fraction is not significantly correlated with generation time (F = 0.055, DoF = 1, p = 0.815).
Cumulative distributions of phastCons conservation probabilities of either singletons (n = 47,659 mutations) or “fixed” variants (n = 81,186 mutations) that were randomly sampled from non-overlapping 50-kbp windows across the genome. The latter were present in a founder genome and inherited by all BXDs with the founder’s haplotype at that site. P-value of one-sided Kolmogorov-Smirnov test comparing distributions of phastCons scores is 3.8 x 10−53. Shaded area around each line indicates the bootstrap 95% confidence interval.
a) Using the same BXD lines and covariates as described in the Online Methods, a QTL scan was performed for the overall mutation rate of each line. The green dashed line indicates the genome-wide significance threshold using 1,000 permutations (Bonferroni-corrected alpha = 0.05/15). b)Using the same BXD lines and covariates as described in the Online Methods, QTL scans were performed for the rates and fractions of all mutation types other than C>A. Green and blue dashed lines indicate the genome-wide significance thresholds for the rate and fraction scans, respectively, using 1,000 permutations (Bonferroni-corrected alpha = 0.05/15).
Extended Data Fig. 5 3-mer mutation sequence contexts enriched in DBA/2J-like Mouse Genomes Project strains.
For each mutation type defined by its 3-mer sequence context, we can compute its Log-2 compositional enrichment in BXD strains with D vs. B haplotypes at the QTL on chromosome 4, as well as its log-2 compositional enrichment in Sanger Mouse Genomes Project strains that are D-like vs. B-like. These two odds ratios are correlated, indicating that the same mutational signature is enriched in the BXD D strains and the D-like Sanger MGP strains. Mutation types significantly enriched in BXDs with D haplotypes are colored red, outlined in black and labeled.
Extended Data Fig. 6 Site frequency spectra of C>A mutations in the four Mus species/subspecies on chromosome 4.
The site frequency spectra of M.m. domesticus, M.m. castaneus, M.m. musculus, and M. spretus were computed using a dataset of publicly available wild mouse genomes and a polarized version of the GRCm38/mm10 reference genome. Mmd is Mus musculus domesticus, Ms is Mus spretus, Mmm is Mus musculus musculus, and Mmc is Mus musculus castaneus.
Extended Data Fig. 7 Comparisons of singleton spectra between wild M.m.domesticus and other wild species.
Log-2 ratios of singleton fractions of each 3-mer mutation type in Mus musculus domesticus, compared to three other wild subspecies or species of Mus. Comparisons with Chi-square test of independence p-values < 0.05/96 are annotated with white circles.
Extended Data Fig. 8 Comparisons of singleton spectra between wild mouse populations in the genomic neighborhood of Mutyh.
Singleton fractions of each mutation type in each wild species or subspecies were computed in 50-kilobase pair windows in the QTL interval surrounding Mutyh (114.8 Mbp to 118.3 Mbp). The median absolute deviations of C>A fractions in the species or subspecies were: 0.00985 (Mmc), 0.0195 (Mmd), 0.0155 (Mmm), and 0.0214 (Ms).
About this article
Cite this article
Sasani, T.A., Ashbrook, D.G., Beichman, A.C. et al. A natural mutator allele shapes mutation spectrum variation in mice. Nature 605, 497–502 (2022). https://doi.org/10.1038/s41586-022-04701-5
This article is cited by
BMC Bioinformatics (2023)
Nature Communications (2023)
Mouse phenome database: curated data repository with interactive multi-population and multi-trait analyses
Mammalian Genome (2023)