Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

Revisiting mutagenesis at non-B DNA motifs in the human genome

Abstract

Non-B DNA structures formed by repetitive sequence motifs are known instigators of mutagenesis in experimental systems. Analyzing this phenomenon computationally in the human genome requires careful disentangling of intrinsic confounding factors, including overlapping and interrupted motifs and recurrent sequencing errors. Here, we show that accounting for these factors eliminates all signals of repeat-induced mutagenesis that extend beyond the motif boundary, and eliminates or dramatically shrinks the magnitude of mutagenesis within some motifs, contradicting previous reports. Mutagenesis not attributable to artifacts revealed several biological mechanisms. Polymerase slippage generates frequent indels within every variety of short tandem repeat motif, implicating slipped-strand structures. Interruption-correcting single nucleotide variants within short tandem repeats may originate from error-prone polymerases. Secondary-structure formation promotes single nucleotide variants within palindromic repeats and duplications within direct repeats. G-quadruplex motifs cause recurrent sequencing errors, whereas mutagenesis at Z-DNAs is conspicuously absent.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Defining repeat motifs and their flanking positions.
Fig. 2: Mutagenesis flanking non-B motifs.
Fig. 3: Polymerase slippage generates mutations, indels and sequencing errors within STRs.
Fig. 4: Duplications at direct repeat motifs.
Fig. 5: G4 motifs are prone to recurrent sequencing errors.

Similar content being viewed by others

Data availability

The datasets analyzed during this study are freely available from the gnomAD Consortium (https://gnomad.broadinstitute.org/downloads), the UCSC Genome Browser (https://genome.ucsc.edu), the non-B Database (https://nonb-abcc.ncifcrf.gov/apps/nBMST/default/) and other studies as cited. Instructions for accessing specific datasets are further detailed in code repository (Code availability).

Code availability

The code to perform the analysis in this study is available in a Github repository (https://github.com/ryanmcggg/nonb_motifs). For software/packages (with version numbers), please visit the Github repository.

References

  1. Khristich, A. N. & Mirkin, S. M. On the wrong DNA track: molecular mechanisms of repeat-mediated genome instability. J. Biol. Chem. 295, 4134–4170 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Du, X. et al. Potential non-B DNA regions in the human genome are associated with higher rates of nucleotide mutation and expression variation. Nucleic Acids Res. 42, 12367–12379 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Zou, X. et al. Short inverted repeats contribute to localized mutability in human somatic cells. Nucleic Acids Res. 45, 11213–11221 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Georgakopoulos-Soares, I. et al. Noncanonical secondary structures arising from non-B DNA motifs are determinants of mutagenesis. Genome Res. 28, 1264–1271 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Guiblet, W. M. et al. Non-B DNA: a major contributor to small- and large-scale variation in nucleotide substitution frequencies across the genome. Nucleic Acids Res. 49, 1497–1516 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Murat, P., Guilbaud, G. & Sale, J. E. DNA polymerase stalling at structured DNA constrains the expansion of short tandem repeats. Genome Biol. 21, 209 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Carlson, J. et al. Extremely rare variants reveal patterns of germline mutation rate heterogeneity in humans. Nat. Commun. 9, 3753 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Tiao G. and Goodrich J. gnomAD v3.1 New content, methods, annotations, and data availability. GnomAD browser https://gnomad.broadinstitute.org/news/2020-10-gnomad-v3-1-new-content-methods-annotations-and-data-availability/ (2020).

  9. Chambers, V. S. et al. High-throughput sequencing of DNA G-quadruplex structures in the human genome. Nat. Biotechnol. 33, 877–881 (2015).

    Article  PubMed  Google Scholar 

  10. Guiblet, W. M. et al. Long-read sequencing technology indicates genome-wide effects of non-B DNA on polymerization speed and error rate. Genome Res. 28, 1767–1778 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Li, H. Toward better understanding of artifacts in variant calling from high-coverage samples. Bioinformatics 30, 2843–2851 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Muyas, F. et al. Allele balance bias identifies systematic genotyping errors and false disease associations. Hum. Mutat. 40, 115–126 (2019).

    Article  CAS  PubMed  Google Scholar 

  13. Gadgil, R. Y. et al. Replication stress at microsatellites causes DNA double-strand breaks and break-induced replication. J. Biol. Chem. 295, 15378–15397 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Baptiste, B. A. et al. Mature microsatellites: mechanisms underlying dinucleotide microsatellite mutational biases in human cells. G3 (Bethesda). 3, 451–463 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Wang, Q. et al. Landscape of multi-nucleotide variants in 125,748 human exomes and 15,708 genomes. Nat. Commun. 11, 2539 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Kockler, Z. W., Osia, B., Lee, R., Musmaker, K. & Malkova, A. Repair of DNA breaks by break-induced replication. Annu. Rev. Biochem. 90, 165–191 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Seplyarskiy, V. B. et al. Population sequencing data reveal a compendium of mutational processes in the human germ line. Science 373, 1030–1035 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Wang, G. & Vasquez, K. M. Z-DNA, an active element in the genome. Front. Biosci. 12, 4424–4438 (2007).

    Article  CAS  PubMed  Google Scholar 

  19. Brázda, V. et al. Cruciform structures are a common DNA feature important for regulating biological processes. BMC Mol. Biol. 12, 33 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Quilez, J. et al. Polymorphic tandem repeats within gene promoters act as modifiers of gene expression and DNA methylation in humans. Nucleic Acids Res. 44, 3750–3762 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Guiblet, W. M. et al. Selection and thermostability suggest G-quadruplexes are novel functional elements of the human genome. Genome Res. 31, 1136–1149 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Fotsing, S. F. et al. The impact of short tandem repeat variation on gene expression. Nat. Genet. 51, 1652–1659 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Jakubosky, D. et al. Properties of structural variants and short tandem repeats associated with gene expression and complex traits. Nat. Commun. 11, 2927 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Kim, J. C. & Mirkin, S. M. The balancing act of DNA repeat expansions. Curr. Opin. Genet Dev. 23, 280–288 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Ananda, G. et al. Microsatellite interruptions stabilize primate genomes and exist as population-specific single nucleotide polymorphisms within individual human genomes. PLoS Genet. 10, e1004498 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Bacolla, A. et al. Local DNA dynamics shape mutational patterns of mononucleotide repeats in human genomes. Nucleic Acids Res. 43, 5065–5080 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Pfeiffer, F. et al. Systematic evaluation of error rates and causes in short samples in next-generation sequencing. Sci. Rep. 8, 10950 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  28. Mukherjee, P., Lahiri, I. & Pata, J. D. Human polymerase kappa uses a template-slippage deletion mechanism, but can realign the slipped strands to favour base substitution mutations over deletions. Nucleic Acids Res. 41, 5024–5035 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. McCulloch, S. D. & Kunkel, T. A. The fidelity of DNA synthesis by eukaryotic replicative and translesion synthesis polymerases. Cell Res. 18, 148–161 (2008).

    Article  CAS  PubMed  Google Scholar 

  30. Lovett, S. T. Encoded errors: mutations and rearrangements mediated by misalignment at repetitive DNA sequences. Mol. Microbiol. 52, 1243–1253 (2004).

    Article  CAS  PubMed  Google Scholar 

  31. Tirman, S. et al. Temporally distinct post-replicative repair mechanisms fill PRIMPOL-dependent ssDNA gaps in human cells. Mol. Cell. 81, 4026–4040.e8 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Stone, J. E., Lujan, S. A. & Kunkel, T. A. DNA polymerase zeta generates clustered mutations during bypass of endogenous DNA lesions in Saccharomyces cerevisiae. Environ. Mol. Mutagen. 53, 777–786 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Seplyarskiy, V. B., Bazykin, G. A. & Soldatov, R. A. Polymerase ζ activity is linked to replication timing in humans: evidence from mutational signatures. Mol. Biol. Evol. 32, 3158–3172 (2015).

    CAS  PubMed  Google Scholar 

  34. Lovett, S. T. Template-switching during replication fork repair in bacteria. DNA Repair (Amst.). 56, 118–128 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Löytynoja, A. & Goldman, N. Short template switch events explain mutation clusters in the human genome. Genome Res. 27, 1039–1049 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  36. Walker, C. R., Scally, A., De Maio, N. & Goldman, N. Short-range template switching in great ape genomes explored using pair hidden Markov models. PLoS Genet. 17, e1009221 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Bacolla, A., Tainer, J. A., Vasquez, K. M. & Cooper, D. N. Translocation and deletion breakpoints in cancer genomes are associated with potential non-B DNA-forming sequences. Nucleic Acids Res. 44, 5673–5688 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. McKinney, J. A. et al. Distinct DNA repair pathways cause genomic instability at alternative DNA structures. Nat. Commun. 11, 236 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Meng, Y. et al. Z-DNA is remodelled by ZBTB43 in prospermatogonia to safeguard the germline genome and epigenome. Nat. Cell Biol. 24, 1141–1153 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Goldmann, J. M. et al. Parent-of-origin-specific signatures of de novo mutations. Nat. Genet. 48, 935–939 (2016).

    Article  CAS  PubMed  Google Scholar 

  42. Yuen, R. K. et al. Genome-wide characteristics of de novo mutations in autism. NPJ Genom. Med. 1, 160271–1602710 (2016).

    Article  PubMed  Google Scholar 

  43. Jónsson, H. et al. Parental influence on human germline de novo mutations in 1,548 trios from Iceland. Nature 549, 519–522 (2017).

    Article  PubMed  Google Scholar 

  44. An, J. Y. et al. Genome-wide de novo risk score implicates promoter variation in autism spectrum disorder. Science 362, eaat6576 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  45. Halldorsson, B. V. et al. Characterizing mutagenic effects of recombination through a sequence-level genetic map. Science 363, eaau1043 (2019).

    Article  CAS  PubMed  Google Scholar 

  46. Sasani, T. A. et al. Large, three-generation human families reveal post-zygotic mosaicism and variability in germline mutation accumulation. eLife 8, e46922 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  47. Jonsson, H. et al. Differences between germline genomes of monozygotic twins. Nat. Genet. 53, 27–34 (2021).

    Article  CAS  PubMed  Google Scholar 

  48. Goes, F. S. et al. De novo variation in bipolar disorder. Mol. Psychiatry 26, 4127–4136 (2021).

    Article  PubMed  Google Scholar 

  49. Cer, R. Z. et al. Non-B DB v2.0: a database of predicted non-B DNA-forming motifs and its associated tools. Nucleic Acids Res. 41, D94–D100 (2013).

    Article  CAS  PubMed  Google Scholar 

  50. Kent, W. J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Marsico, G. et al. Whole genome experimental maps of DNA G-quadruplexes in multiple species. Nucleic Acids Res. 47, 3862–3874 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Sung, W. et al. Evolution of the insertion-deletion mutation rate across the tree of life. G3 (Bethesda). 6, 2583–2591 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank V. Seplyarskiy and E. Koch for their important contributions. The project has been funded by National Institutes of Health grants R35-GM127131, 67 R01-MH101244, U01-HG012009 and R01-HG010372 (awarded to S.S.).

Author information

Authors and Affiliations

Authors

Contributions

R.M. conceived the study, performed the analysis and wrote the manuscript. S.S. supervised the project, discussed results and consulted on the manuscript.

Corresponding author

Correspondence to S. R. Sunyaev.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Structural & Molecular Biology thanks Martin Taylor and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Carolina Perdigoto and Dimitris Typas were the primary editors on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Caveats of the ‘Non-B Database’.

a, Overlapping motifs in Non-B DB. X-axis indicates motif category. Y-axis indicates the portion of motifs that do not overlap other categories (‘unique’) and those that overlap additional categories (indicated by color and abbreviation). b, Overlapping flanking regions in Non-B DB. For each motif in Non-B DB, the distance to the nearest repeat (including transposable elements from Repeatmasker where indicated) on either side was measured, and the smaller of the upstream or downstream values was taken. Overlapping motifs have a distance of 0. X-axis represents the percentile for the range of distances, and the Y-axis indicates the distance in nucleotides.

Extended Data Fig. 2 Mutagenesis flanking non-B motifs.

X-axes are coordinates relative to central motif (0 position encompasses entire repeat). Y-axes are relative mutation frequency (compared to the gnomAD average, normalized by trinucleotide mutation type). Data presented as mean values, with 95% binomial confidence intervals indicated in transparency. Confidence values derived from n = 76,156 individuals and the dynamic loci count. Blue lines: no sequencing quality filters, green and yellow lines: increasingly stringent filters (‘pass’ indicates gnomAD’s passing quality filter based on a VQSLOD score of −2.774). Left: STR motifs (combined with their respective reverse complements). Right: Other non-B motifs. STRs and Symmetrical motifs exclude the shortest 80% by motif length. G4 motifs (strand-specific) detected under K + conditions. Z-DNA motifs detected using standard definition. See Supplementary Fig. 2a–c and Methods for more details.

Extended Data Fig. 3 Absolute mutation frequency contributing to STR interruption reversions.

Mutation frequencies within STRs, represented as absolute frequencies (mutations per base). Frequency of interruption-perfecting SNVs (left), insertions within imperfect motifs (middle), and deletion of imperfections (right). X-axes are motif lengths. Y-axes are relative mutation frequency. Data presented as mean values, with error bars indicating 95% binomial confidence intervals. Confidence values derived from n = 76,156 individuals and the dynamic loci count. Blue: no sequencing quality filters, green and yellow: increasingly stringent quality filters. Motif sequence (combined with its reverse-complement) indicated at left of each row.

Extended Data Fig. 4 Duplications within direct repeats.

a, Examples of duplications within direct repeats. Reference sequence shown. Blue text indicates location of repeat motifs. Blue highlight shows location of motif mismatches. Orange highlight indicates region duplicated in gnomAD. Orange text with blue highlight indicates that the duplication includes the interruption. b, Duplication position bias. Duplication start/end positions (blue) or positions flanking duplications (orange), categorized by their location (ie. start of left motif, mismatch position in left motif, end of spacer, etc.). Frequency of the duplication at each position (observed) is divided by the portion of the motif for which the position accounts (expected). Represents n = 3170 DR loci with spacer length < 10nt and containing a duplication >5 nt. c) Gap repair model explaining direct repeat duplications. The initial A-B-A pattern forms a slipped-strand structure. Post-replicative gap-filling (Polζ) fills in single-stranded loop regions, resulting in 4-way pseudo-Holliday junctions. Cleavage of the top strand allows the filled-in sequence to be ligated into the top strand. Either this can be repeated for the other loop, or replication of the top strand will produce a daughter cell with the A-B-A-B-A pattern.

Extended Data Fig. 5 G4 motifs are prone to recurrent sequencing errors.

a, Insertions and b, deletions within G4 motifs. X-axis indicates positions within G-runs, or spacers between G-runs. Spacers of 1 nt in length are categorized separately. Y-axis is relative mutation frequency. Data presented as mean values, with error bars indicating 95% binomial confidence intervals. Confidence values derived from n = 76,156 individuals and the dynamic loci count. Blue: no sequencing quality filters, green and yellow: increasingly stringent filters. Arrows indicate magnitude and direction of change for individual trinucleotide mutation frequencies before and after sequencing quality filters. Mutations with large magnitude changes are highlighted in text.

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

McGinty, R.J., Sunyaev, S.R. Revisiting mutagenesis at non-B DNA motifs in the human genome. Nat Struct Mol Biol 30, 417–424 (2023). https://doi.org/10.1038/s41594-023-00936-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41594-023-00936-6

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer