Clustering of mutations has been observed in cancer genomes as well as for germline de novo mutations (DNMs). We identified 1,796 clustered DNMs (cDNMs) within whole-genome-sequencing data from 1,291 parent–offspring trios to investigate their patterns and infer a mutational mechanism. We found that the number of clusters on the maternal allele was positively correlated with maternal age and that these clusters consisted of more individual mutations with larger intermutational distances than those of paternal clusters. More than 50% of maternal clusters were located on chromosomes 8, 9 and 16, in previously identified regions with accelerated maternal mutation rates. Maternal clusters in these regions showed a distinct mutation signature characterized by C>G transversions. Finally, we found that maternal clusters were associated with processes involving double-strand-breaks (DSBs), such as meiotic gene conversions and de novo deletion events. This result suggested accumulation of DSB-induced mutations throughout oocyte aging as the mechanism underlying the formation of maternal mutation clusters.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This study was funded by the Inova Health System with support from Fairfax County and philanthropic support from the Odeen family. We thank the Inova Translational Medicine Institute staff for supporting the study. We also thank the families who participated in the genomic studies that made this research possible. This work was partly financially supported by grants from the Netherlands Organization for Scientific Research (916-14-043 to C.G. and 918-15-667 to J.A.V.) and the European Research Council (ERC Starting grant DENOVO 281964 to J.A.V.).
This study used data generated by the Genome of the Netherlands Project. A full list of the investigators is available from http://www.nlgenome.nl/. Funding for the project was provided by the Netherlands Organization for Scientific Research under award number 184021007, dated July 9, 2009 and made available as a Rainbow Project of the Biobanking and Biomolecular Research Infrastructure Netherlands (BBMRI-NL). The sequencing was carried out in collaboration with the Beijing Institute for Genomics (BGI).
Integrated supplementary information
(a) Linear models for the numbers of clustered and unclustered DNMs. (b) Linear models for the numbers of cluster events. Grey shades indicate standard errors.
(a) Primary cohort and (b) replication cohort. Boxplot whiskers depict distance from quartile to a maximum of 1.58 times the interquartile range. Numbers indicate number of individuals per group. While the maternal age increases with the number of clusters, the paternal age does not.
(a) The fraction of probands with maternal and paternal clustered mutations (y-axis), grouped by parental age quantiles. Error bars indicate the binomial 95% confidence intervals. (b) The number of paternal and maternal cDNMs (y-axis) stratified by the distance to the nearest other cDNM (x-axis). (c) The size of paternal and maternal age effect of clusters with at least one phased cDNM (y-axis) by inter-mutational distance (x-axis). Whiskers indicate the 95% confidence interval. (d) Age of fathers at conception and (e) age of the mothers at conception (y-axis) by the number of mutations in the offspring’s largest mutation cluster originating from the respective parent (x-axis). We considered only clusters where at least one cDNM is on the allele from the respective parent (paternal allele for d and maternal allele for e). Numbers indicate the size of each group. Boxplot compartments: box: interquartile range; line: median; whiskers: extreme values <1.5 × interquartile ranges from box borders).
(a) The fraction of phased cDNMs per chromosome. Error bars indicate the binomial 95% confidence intervals. (b) The nucleotide substitution spectrum of maternal and paternal clusters and unclustered DNMs. Error bars indicate the binomial 95% confidence intervals. (c) The nucleotide substitution spectrum of cDNMs by location. Error bars indicate the binomial 95% confidence intervals.
Overview of regions enriched for maternal cluster mutations. X-axis and ideograms indicate chromosomal position. The red and blue histograms indicate the number of maternal cDNMs and paternal cDNMs identified in this study, respectively. The pale red and pale blue histograms indicate the number of maternal and paternal unclustered DNMs. The lowest track indicates normalized cSNP C>G score, which is predictive for maternal DNMs. (a) Full chromosome 8. (b) Region with increased maternal mutation rate on chromosome 9 (chr9: 0-10,000,000).
Overview of regions enriched for maternal cluster mutations. X-axis and ideograms indicate chromosomal position. The red and blue histograms indicate the number of maternal cDNMs and paternal cDNMs identified in this study, respectively. The pale red and pale blue histograms indicate the number of maternal and paternal unclustered DNMs. The lowest track indicates normalized cSNP C>G score, which is predictive for maternal DNMs. (a) Full chromosome 16. (b) Full chromosome 8. (c) Region with increased maternal mutation rate on chromosome 9 (chr9: 0-10,000,000). (d) Region with increased maternal mutation rate on chromosome 2 (chr2: 0-10,000,000).
Recombination scores (as defined by Kong et al.20) of cDNM regions. (a) Recombination scores of genomic regions harboring unclustered DNM and cDNM in primary cohort. (b) Recombination scores of genomic regions harboring unclustered DNM and cDNM in replication cohort. (c) Recombination scores of genomic regions harbouring cSNPs. The numbers indicate one-sided p-values for a difference between the groups, based on Wilcoxon rank sum test.
(a) Fitting to unclustered DNMs and cDNMs. (b) Fitting to maternal cDNMs and paternal cDNMs. The solid error bars indicate the standard deviation of resampled mutations’ contributions; the dashed error bars indicate 95% confidence intervals of the resampled mutations’ contributions.
The quality control variables are described in Supplementary Table 18. (a) First two principal components plotted against each other and colored by software version of data analysis pipeline. Spearman-correlation coefficient of PC1 and average coverage: −0.893. (b) Variance explained by principal components. (c) Principal components two and three plotted against each other and colored by estimated ancestry of sequenced individual.
(a) cSNPs depleted by CpG>CpT mutations, but enriched by remaining C>G mutations, reproducing hallmarks of cDNM spectra. (b) Fraction of non-CpG C>G nucleotide substitutions in cSNP spectra decreases with inter-mutational distances, showing a lower fraction of real clusters at higher distances.
Supplementary Figures 1–15, Supplementary Note and Supplementary Tables 1–4, 7–12, 14–25
List of clustered DNMs
Number of clusters per trio
List of cSNPs