Reply to: Re-evaluating evidence for adaptive mutation rate variation

Monroe, J. Grey; Murray, Kevin D.; Xian, Wenfei; Srikant, Thanvi; Carbonell-Bejerano, Pablo; Becker, Claude; Lensink, Mariele; Exposito-Alonso, Moises; Klein, Marie; Hildebrandt, Julia; Neumann, Manuela; Kliebenstein, Daniel; Weng, Mao-Lun; Imbert, Eric; Ågren, Jon; Rutter, Matthew T.; Fenster, Charles B.; Weigel, Detlef

doi:10.1038/s41586-023-06315-x

Download PDF

Matters Arising
Open access
Published: 26 July 2023

Reply to: Re-evaluating evidence for adaptive mutation rate variation

Nature volume 619, pages E57–E60 (2023)Cite this article

6750 Accesses
3 Citations
11 Altmetric
Metrics details

Subjects

The Original Article was published on 26 July 2023

replying to L. Wang et al. Nature https://doi.org/10.1038/s41586-023-06314-y (2023)

Wang and colleagues¹ argue that our report² of lower mutation rates in gene bodies, essential genes and regions marked by H3K4me1 must result from DNA sequencing errors. We appreciate the issues raised by them and by other colleagues³. Although we overlooked some sources of errors, these are insufficient to invalidate our conclusions, which are confirmed by more stringent reanalyses of our original data, new analyses restricted to high-confidence germline mutations⁴, and direct demonstration of plant DNA repair proteins being recruited to gene bodies, essential genes and H3K4me1, where they reduce local mutation rates^5,6.

Wang and colleagues¹ identify issues with somatic mutation calling, suggesting that homopolymer bleed-through errors in Illumina sequencing are responsible for patterns observed in somatic mutations, and that elevated cytosine deamination in transposable elements is responsible for the patterns in germline mutations. Here we address these concerns.

Consecutive runs of identical nucleotides, or homopolymers, pose challenges to discovering rare mutations because they can lead to Illumina sequencing errors at immediately neighbouring nucleotides through homopolymer bleed-through⁷. At the same time, homopolymer regions have higher true mutation rates even at local but non-adjacent sites (for example, ref. ⁸). Wang and colleagues¹ found that the distribution of simulated homopolymer errors mirrors the overall distribution of mutations we reported around genes (their Fig. 1a). However, there are several reasons why such homopolymer errors cannot be the source of inferred mutation bias.

**Fig. 1: Potential homopolymer bleed-through sequencing errors cannot explain differences in mutation rate.**

Wang and colleagues¹ assume that homopolymer bleed-through errors affect sequences up to five nucleotides away from homopolymers, although these errors occur on modern Illumina platforms at positions immediately adjacent to a run of identical bases⁷. Moreover, their simulation of sequencing errors apparently assumes that 100% of sequencing errors occur as a product of homopolymer bleed-through. By contrast, empirical estimates of sequencing errors report only 0.7 to 5.2% to be the result of homopolymer bleed-through⁷. Across all data in our study, only 12.0% of total single-nucleotide variant calls (10.2% for high-quality germline calls) could be potential homopolymer-adjacent bleed-through errors, and thus on their own cannot explain the approximately 50% mutation rate reduction we observed in gene bodies relative to intergenic regions².

More importantly, Wang and colleagues’ own analysis¹ reports that the proportion of potential homopolymer bleed-through errors in our data is actually higher in gene bodies (exons plus introns), which should lead to gene body mutation rates being overestimated, not underestimated. We confirm across our datasets that the proportion of potential homopolymer bleed-through errors is not lower in gene bodies (Fig. 1a, left), and differs from the pattern of mutation calls (Fig. 1a, right). Similarly, the proportion of potential homopolymer bleed-through errors is not reduced in essential genes (Fig. 1b). The distribution of potential homopolymer bleed-through errors, therefore, disagrees with the hypothesis of Wang and colleagues¹. By contrast, the observed pattern is expected if gene bodies and essential genes experienced a reduction in true mutation rates, as noise introduced by sequencing errors should have a proportionally larger effect on regions with truly low mutation rates.

Homopolymeric sequences (but not potential homopolymer bleed-errors) are enriched outside gene bodies, as reported by Wang and colleagues¹. Thus, the observed mutation rate heterogeneity is consistent with previous evidence that homopolymer-rich regions have higher true mutation rates⁸ and that their enrichment in these regions is consistent with the expected long-term evolutionary consequence of lower DNA repair activity, as the expansion of homopolymers is a signature of lower mismatch repair activity (Supplementary Note 3). Moreover, both preferential repair of exons by mismatch repair and higher intronic mutation rates in somatic tissues have been widely documented (Supplementary Note 3). Likewise, considerable differences in mutation rate and spectra between somatic and germline cells are well known, with somatic cells having orders of magnitude higher mutation rates. Indeed, differences between mitotic and meiotic cells have been previously proposed for Arabidopsis thaliana by Wang and colleagues⁹ (Supplementary Note 3).

Wang and colleagues¹ further suggest that the patterns we observed in germline mutations might result largely from elevated deamination of methylated cytosines (GC-to-AT mutations) in transposable elements. Several findings are inconsistent with this hypothesis: cytosine methylation was included as a covariate in our original models, mutation accumulation experiments consistently indicate that mutation rates are lower in gene bodies relative to non-transposable element intergenic regions in A. thaliana (Fig. 2a,b; see below), and removing all GC-to-AT mutations from our original germline dataset does not alter the observed pattern, with H3K4me1 remaining the strongest epigenomic predictor of lower mutation (described in detail recently⁴). The same has been demonstrated for mutation rate variation in rice, in which mutation rates are lower in gene bodies relative to both intergenic regions and transposable elements⁶.

**Fig. 2: Joint analyses of germline mutations in several published *A. thaliana* mutation accumulation studies align with mechanistic models of mutation bias.**

To further address concerns with somatic mutation calls in general, we re-called putative somatic mutations in the original 107 lines¹⁰ by mapping reads to an improved reference genome¹¹ and applying more stringent filtering (Supplementary Note 1). This led to more complete and higher-quality read mapping (Supplementary Fig. 1) and resolved several issues described by Wang and colleagues¹ (for example, high intron-versus-exon mutation ratio and the proportion of potential homopolymer bleed-through errors; Supplementary Fig. 2). These data confirm that gene bodies experience lower mutation rates, including when manually removing potential homopolymer bleed-through errors (Supplementary Note 1). Many of the analyses by Wang and colleagues are affected by unreliable centromeric mutations, which constituted 41% of questioned somatic mutations¹. These sites, however, could not have affected our conclusions because they were excluded from all of our original analyses (Supplementary Note 2 and Supplementary Fig. 3).

Wang and colleagues¹ examined essential genes with approaches that were not in our original study. They used subsets of our initial datasets, focusing on either about 2,000 germline or about 4,000 somatic single-nucleotide variants, finding that neither dataset directly revealed a statistically significantly lower mutation rate in essential genes. This approach seems underpowered, yielding near-zero values for mutation counts in entire gene classes, an indication that the data are poorly suited for χ² approximation (Supplementary Note 5).

In our study², we had instead modelled genome-wide mutation rates, and using these models, identified a connection between gene essentiality and mutation rate corresponding to epigenome differences—essential genes are enriched for H3K4me1, for example, which we found to be associated with lower mutation rates. We subsequently tested whether this expectation is met in a very large set of several hundred thousand loosely filtered putative somatic mutations with ample power to compare gene classes. We agree that somatic mutation calling is very difficult, as most real somatic mutations and unrepaired damaged sites (with DNA damage occurring 10,000 to 100,000 times per day per cell; Supplementary Note 3) are expected to be present in only one cell and thus detectable only by a single read. In Supplementary Note 4 and an accompanying Correction¹², we discuss why singletons were called as putative mutations in one of our reanalyses, from 64 leaves¹³, owing to inadvertently mapping forward reads twice. However, analyses of variant quality in these data do not support the hypothesis that our results are simply due to higher rates of poor-quality calls in non-genic regions or non-essential genes (Supplementary Note 4 and Supplementary Fig. 4).

Finally, to directly address the possibility that our conclusions reflect unknown sources of bias in inherently uncertain somatic calls, we reanalysed germline mutations from our study² along with mutation accumulation experiment data generated in several independent studies (Supplementary Table 1). This meta-analysis of >10,000 germline mutations confirmed the previously reported, nearly universal reduction in single-nucleotide mutation rates in gene bodies, essential genes and regions marked by H3K4me1 (Fig. 2a–c; ref. ⁴). The notable exception comes from plants lacking the mismatch repair protein MSH2 (Fig. 2a; ref. ⁵). A similar pattern is seen when somatic mutations were called with very stringent criteria in plants deficient for the MSH2 partner MSH6, using a tool specifically designed for rare somatic mutations¹⁴ (Fig. 2d). This was as predicted from H3K4me1 directly attracting MSH6 to gene bodies⁶, confirming that DNA repair in A. thaliana is targeted to gene bodies, as is well known in humans (Supplementary Note 3). Finally, analyses of >43,000 experimentally induced de novo germline mutations in rice (previously validated with 99% accuracy) also show that gene bodies, conserved genes, and H3K4me1-marked regions experience lower mutation rates, even when considering only silent (synonymous) mutations⁶.

Relationships between histone modifications, DNA repair, and mutation rate are widely known (Supplementary Note 3). Our work² considered the evolutionary implication of these relationships. We had leveraged models of the drift-barrier hypothesis to discover that natural selection could favour mechanisms linking DNA repair to widely distributed epigenomic features, such as H3K4me1, which is not only enriched in gene bodies and essential genes in A. thaliana, but also the histone modification most strongly associated with lower mutation rates in our data². An important higher-order test of our conclusions is therefore whether they are mechanistically supported. Since publication of our article², it has been demonstrated that plant DNA repair proteins are recruited by H3K4me1 to gene bodies and essential genes. These repair proteins, which contain Tudor ‘reader’ domains that bind H3K4me1, include PDS5C, involved in homology-directed repair, and MSH6, which functions as a dimer with MSH2 in the mismatch repair pathway and recruits MutY of the base-excision repair pathway¹⁵. The genome-wide distribution of PDS5C, as measured by chromatin immunoprecipitation followed by sequencing^4,6,16, confirms that regions subject to elevated repair protein activity coincide with features at which we detected lower spontaneous mutation rates^4,6,16.

We conclude that the reported relationships between epigenomic features and mutation rates² are well supported mechanistically (Fig. 2e). We agree that there are issues and inherent uncertainties with somatic mutation calling, which make it difficult to know the accuracy of individual calls in the very large set of loosely filtered somatic variants². However, the proposal that the observed patterns result only from sequencing errors is inconsistent with multiple lines of evidence from the original study, independent analyses and emerging parallel work.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The TAIR10 A. thaliana reference genome can be found at https://arabidopsis.org/download. The more recent, improved A. thaliana reference genome can be found at https://github.com/schatzlab/Col-CEN. Sequencing reads for 107 A. thaliana mutation accumulation lines are stored in the National Center for Biotechnology Information Short Read Archive, accession number SRP133100. Additional mutation datasets were downloaded from publications cited in Supplementary Table 1.

Code availability

Code for this work uses functions maintained in https://github.com/greymonroe/polymorphology, with additional scripts and data for analyses and figures in https://github.com/greymonroe/mutation_bias_analysis2.

References

Wang, L., Ho, A. T., Hurst, L. D. & Yang, S. Re-evaluating evidence for adaptive mutation rate variation. Nature https://doi.org/10.1038/s41586-023-06314-y (2023).
Monroe, J. G. et al. Mutation bias reflects natural selection in Arabidopsis thaliana. Nature 602, 101–105 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Liu, H. & Zhang, J. Is the mutation rate lower in genomic regions of stronger selective constraints? Mol. Biol. Evol. 39, msac169 (2022).
Article CAS PubMed PubMed Central Google Scholar
Monroe, J. G. et al. Report of mutation biases mirroring selection in Arabidopsis thaliana unlikely to be entirely due to variant calling errors. Preprint at bioRxiv https://doi.org/10.1101/2022.08.21.504682 (2022).
Belfield, E. J. et al. DNA mismatch repair preferentially protects genes from mutation. Genome Res. 28, 66–74 (2018).
Article CAS PubMed PubMed Central Google Scholar
Quiroz, D. et al. The H3K4me1 histone mark recruits DNA repair to functionally constrained genomic regions in plants. Preprint at bioRxiv https://doi.org/10.1101/2022.05.28.493846 (2022).
Stoler, N. & Nekrutenko, A. Sequencing error profiles of Illumina sequencing instruments. NAR Genom. Bioinform. 3, lqab019 (2021).
Article PubMed PubMed Central Google Scholar
Tran, H. T., Keen, J. D., Kricker, M., Resnick, M. A. & Gordenin, D. A. Hypermutability of homonucleotide runs in mismatch repair and DNA polymerase proofreading yeast mutants. Mol. Cell. Biol. 17, 2859–2865 (1997).
Article CAS PubMed PubMed Central Google Scholar
Yang, S. et al. Parent–progeny sequencing indicates higher mutation rates in heterozygotes. Nature 523, 463–467 (2015).
Article ADS CAS PubMed Google Scholar
Weng, M.-L. et al. Fine-grained analysis of spontaneous mutation spectrum and frequency in Arabidopsis thaliana. Genetics 211, 703–714 (2019).
Article CAS PubMed Google Scholar
Naish, M. et al. The genetic and epigenetic landscape of the Arabidopsis centromeres. Science 374, eabi7489 (2021).
Article PubMed PubMed Central Google Scholar
Monroe, J. G. et al. Author Correction: Mutation bias reflects natural selection in Arabidopsis thaliana. Nature https://doi.org/10.1038/s41586-023-06387-9 (2023).
Wang, L. et al. The architecture of intra-organism mutation rate variation in plants. PLoS Biol. 17, e3000191 (2019).
Article CAS PubMed PubMed Central Google Scholar
Kim, S. et al. Strelka2: fast and accurate calling of germline and somatic variants. Nat. Methods 15, 591–594 (2018).
Article CAS PubMed Google Scholar
Gu, Y. et al. Human MutY homolog, a DNA glycosylase involved in base excision repair, physically and functionally interacts with mismatch repair proteins human MutS homolog 2/human MutS homolog 6. J. Biol. Chem. 277, 11135–11142 (2002).
Article CAS PubMed Google Scholar
Niu, Q. et al. A histone H3K4me1-specific binding protein is required for siRNA accumulation and DNA methylation at a subset of loci targeted by RNA-directed DNA methylation. Nat. Commun. 12, 3367 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhu, X. et al. Non-CG DNA methylation-deficiency mutations enhance mutagenesis rates during salt adaptation in cultured Arabidopsis cells. Stress Biol. 1, 12 (2021).
Article CAS Google Scholar
Willing, E.-M. et al. UVR2 ensures transgenerational genome stability under simulated natural UV-B in Arabidopsis thaliana. Nat. Commun. 7, 13522 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Ossowski, S. et al. The rate and molecular spectrum of spontaneous mutations in Arabidopsis thaliana. Science 327, 92–94 (2010).
Article ADS CAS PubMed Google Scholar
Lu, Z. et al. Genome-wide DNA mutations in Arabidopsis plants after multigenerational exposure to high temperatures. Genome Biol. 22, 160 (2021).
Article CAS PubMed PubMed Central Google Scholar
Jiang, C. et al. Environmentally responsive genome-wide accumulation of de novo Arabidopsis thaliana mutations and epimutations. Genome Res. 24, 1821–1829 (2014).
Article CAS PubMed PubMed Central Google Scholar
Belfield, E. J. et al. Thermal stress accelerates Arabidopsis thaliana mutation rate. Genome Res 31, 40–50 (2021).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

Research was conducted at the University of California, Davis, which is located on land that was the home of the Patwin people for thousands of years.

Author information

Authors and Affiliations

University of California Davis, Davis, CA, USA
J. Grey Monroe, Mariele Lensink, Marie Klein & Daniel Kliebenstein
Max Planck Institute for Biology Tübingen, Tübingen, Germany
Kevin D. Murray, Wenfei Xian, Thanvi Srikant, Pablo Carbonell-Bejerano, Claude Becker, Julia Hildebrandt, Manuela Neumann & Detlef Weigel
Department of Plant Biology, Carnegie Institution for Science, Stanford, CA, USA
Moises Exposito-Alonso
Department of Biology, Stanford University, Stanford, CA, USA
Moises Exposito-Alonso
Department of Biology, Westfield State University, Westfield, MA, USA
Mao-Lun Weng
ISEM, University of Montpellier, Montpellier, France
Eric Imbert
Department of Ecology and Genetics, EBC, Uppsala University, Uppsala, Sweden
Jon Ågren
Department of Biology, College of Charleston, Charleston, SC, USA
Matthew T. Rutter
Oak Lake Field Station, South Dakota State University, Brookings, SD, USA
Charles B. Fenster

Authors

J. Grey Monroe
View author publications
You can also search for this author in PubMed Google Scholar
Kevin D. Murray
View author publications
You can also search for this author in PubMed Google Scholar
Wenfei Xian
View author publications
You can also search for this author in PubMed Google Scholar
Thanvi Srikant
View author publications
You can also search for this author in PubMed Google Scholar
Pablo Carbonell-Bejerano
View author publications
You can also search for this author in PubMed Google Scholar
Claude Becker
View author publications
You can also search for this author in PubMed Google Scholar
Mariele Lensink
View author publications
You can also search for this author in PubMed Google Scholar
Moises Exposito-Alonso
View author publications
You can also search for this author in PubMed Google Scholar
Marie Klein
View author publications
You can also search for this author in PubMed Google Scholar
Julia Hildebrandt
View author publications
You can also search for this author in PubMed Google Scholar
Manuela Neumann
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Kliebenstein
View author publications
You can also search for this author in PubMed Google Scholar
Mao-Lun Weng
View author publications
You can also search for this author in PubMed Google Scholar
Eric Imbert
View author publications
You can also search for this author in PubMed Google Scholar
Jon Ågren
View author publications
You can also search for this author in PubMed Google Scholar
Matthew T. Rutter
View author publications
You can also search for this author in PubMed Google Scholar
Charles B. Fenster
View author publications
You can also search for this author in PubMed Google Scholar
Detlef Weigel
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.G.M., K.D.M., W.X., T.S., P.C.-B. and D.W. contributed to data analysis. J.G.M., K.D.M., W.X., T.S., P.C.-B. and D.W. contributed to the writing. J.G.M., K.D.M., W.X., T.S., P.C.-B., C.B., M.L., M.E.-A., M.K., J.H., M.N., D.K., M.-L.W., E.I., J.Å., M.T.R., C.B.F. and D.W. contributed to the interpretation of the results. K.D.M. and W.X., who were not part of the study by J.G.M. et al.², carried out analyses to validate the impact of an improved genome reference sequence on reducing sequencing errors.

Corresponding authors

Correspondence to J. Grey Monroe or Detlef Weigel.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

This file contains Supplementary Table 1, Notes 1–5 (with Figs 1–4) and References.

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Monroe, J.G., Murray, K.D., Xian, W. et al. Reply to: Re-evaluating evidence for adaptive mutation rate variation. Nature 619, E57–E60 (2023). https://doi.org/10.1038/s41586-023-06315-x

Download citation

Published: 26 July 2023
Issue Date: 27 July 2023
DOI: https://doi.org/10.1038/s41586-023-06315-x

This article is cited by

Author Correction: Mutation bias reflects natural selection in Arabidopsis thaliana
- J. Grey Monroe
- Thanvi Srikant
- Detlef Weigel
Nature (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.