De novo mutations arising on the paternal chromosome make the largest known contribution to autism risk, and correlate with paternal age at the time of conception. The recurrence risk for autism spectrum disorders is substantial, leading many families to decline future pregnancies, but the potential impact of assessing parental gonadal mosaicism has not been considered. We measured sperm mosaicism using deep-whole-genome sequencing, for variants both present in an offspring and evident only in father’s sperm, and identified single-nucleotide, structural and short tandem-repeat variants. We found that mosaicism quantification can stratify autism spectrum disorders recurrence risk due to de novo mutations into a vast majority with near 0% recurrence and a small fraction with a substantially higher and quantifiable risk, and we identify novel mosaic variants at risk for transmission to a future offspring. This suggests, therefore, that genetic counseling would benefit from the addition of sperm mosaicism assessment.
Subscribe to Journal
Get full journal access for 1 year
only $4.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Aligned BAM files generated for this study through deep WGS or TAS are available on SRA (accession no. PRJNA588332). WGS data used for de novo calling are available through the NIMH Data Archive (NDA; collection ID: 2019). Long-read sequencing data are likewise available on NDA (collection ID: 2795). NDA access is regulated by the standard organizational process and is subject to review by NDA. Data are also available through the corresponding authors upon reasonable request. Additionally, summary tables of the data are included as Supplementary Information.
Algorithms used for mosaic variant detection were published previously. Any custom code is available through the corresponding authors upon reasonable request.
Iossifov, I. et al. The contribution of de novo coding mutations to autism spectrum disorder. Nature 515, 216–221 (2014).
Turner, T. N. et al. Genomic patterns of de novo mutation in simplex autism. Cell 171, 710–722 e712 (2017).
O’Roak, B. J. et al. Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature 485, 246–250 (2012).
Neale, B. M. et al. Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature 485, 242–245 (2012).
Kong, A. et al. Rate of de novo mutations and the importance of father’s age to disease risk. Nature 488, 471–475 (2012).
Jonsson, H. et al. Parental influence on human germline de novo mutations in 1,548 trios from Iceland. Nature 549, 519–522 (2017).
Campbell, I. M. et al. Parent of origin, mosaicism, and recurrence risk: probabilistic modeling explains the broken symmetry of transmission genetics. Am. J. Hum. Genet. 95, 345–359 (2014).
Acuna-Hidalgo, R., Veltman, J. A. & Hoischen, A. New insights into the generation and role of de novo mutations in health and disease. Genome Biol. 17, 241 (2016).
Freed, D., Stevens, E. L. & Pevsner, J. Somatic mosaicism in the human genome. Genes (Basel) 5, 1064–1094 (2014).
Jonsson, H. et al. Multiple transmissions of de novo mutations in families. Nat. Genet. 50, 1674–1680 (2018).
Rahbari, R. et al. Timing, rates and spectra of human germline mutation. Nat. Genet. 48, 126–133 (2016).
Brandler, W. M. et al. Paternally inherited cis-regulatory structural variants are associated with autism. Science 360, 327–331 (2018).
Brandler, W. M. et al. Frequency and complexity of de novo structural mutation in autism. Am. J. Hum. Genet. 98, 667–679 (2016).
Huang, A. Y. et al. Distinctive types of postzygotic single-nucleotide mosaicisms in healthy individuals revealed by genome-wide profiling of multiple organs. PLoS Genet. 14, e1007395 (2018).
Carvill, G. L. et al. GRIN2A mutations cause epilepsy-aphasia spectrum disorders. Nat. Genet. 45, 1073–1076 (2013).
Lemke, J. R. et al. Mutations in GRIN2A cause idiopathic focal epilepsy with rolandic spikes. Nat. Genet. 45, 1067–1072 (2013).
Turner, D. J. et al. Germline rates of de novo meiotic deletions and duplications causing several genomic disorders. Nat. Genet. 40, 90–95 (2008).
Hehir-Kwa, J. Y. et al. De novo copy number variants associated with intellectual disability have a paternal origin and age bias. J. Med. Genet. 48, 776–778 (2011).
Escaramis, G., Docampo, E. & Rabionet, R. A decade of structural variants: description, history and methods to detect structural variation. Brief. Funct. Genomics 14, 305–314 (2015).
Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).
Kim, S. et al. Strelka2: fast and accurate calling of germline and somatic variants. Nat. Methods 15, 591–594 (2018).
Huang, A. Y. et al. MosaicHunter: accurate detection of postzygotic single-nucleotide mosaicism through next-generation sequencing of unpaired, trio, and paired samples. Nucleic Acids Res. 45, e76 (2017).
Jaiswal, S. et al. Age-related clonal hematopoiesis associated with adverse outcomes. N. Engl. J. Med. 371, 2488–2498 (2014).
Gao, Z. et al. Overlooked roles of DNA damage and maternal age in generating human germline mutations. Proc. Natl Acad. Sci. USA 116, 9491–9500 (2019).
Bernkopf, M. et al. Quantification of transmission risk in a male patient with a FLNB mosaic mutation causing Larsen syndrome: implications for genetic counseling in postzygotic mosaicism cases. Hum. Mutat. 38, 1360–1364 (2017).
Hancarova, M. et al. Parental gonadal but not somatic mosaicism leading to de novo NFIX variants shared by two brothers with Malan syndrome. Am. J. Med. Genet. A 179, 2119–2123 (2019).
Wilbe, M. et al. A novel approach using long-read sequencing and ddPCR to investigate gonadal mosaicism and estimate recurrence risk in two families with developmental disorders. Prenat. Diagn. 37, 1146–1154 (2017).
Yang, X. et al. Genomic mosaicism in paternal sperm and multiple parental tissues in a Dravet syndrome cohort. Sci. Rep. 7, 15677 (2017).
Goriely, A. & Wilkie, A. O. Paternal age effect mutations and selfish spermatogonial selection: causes and consequences for human disease. Am. J. Hum. Genet. 90, 175–200 (2012).
Hamdan, F. F. et al. Identification of a novel in-frame de novo mutation in SPTAN1 in intellectual disability and pontocerebellar atrophy. Eur. J. Hum. Genet. 20, 796–800 (2012).
Schwarz, J. M., Rodelsperger, C., Schuelke, M. & Seelow, D. MutationTaster evaluates disease-causing potential of sequence alterations. Nat. Methods 7, 575–576 (2010).
Pejaver, V. et al. MutPred2: inferring the molecular and phenotypic impact of amino acid variants. Preprint at bioRxiv https://doi.org/10.1101/134981 (2017).
Cooper, D. N., Krawczak, M., Polychronakos, C., Tyler-Smith, C. & Kehrer-Sawatzki, H. Where genotype is not predictive of phenotype: towards an understanding of the molecular basis of reduced penetrance in human inherited disease. Hum. Genet. 132, 1077–1130 (2013).
Snyder, M. W., Adey, A., Kitzman, J. O. & Shendure, J. Haplotype-resolved genome sequencing: experimental methods and applications. Nat. Rev. Genet. 16, 344–358 (2015).
Browning, S. R. & Browning, B. L. Haplotype phasing: existing methods and new developments. Nat. Rev. Genet. 12, 703–714 (2011).
Xia, Y., Liu, Y., Deng, M. & Xi, R. Pysim-sv: a package for simulating structural variation data with GC-biases. BMC Bioinformatics 18, 53 (2017).
Michaelson, J. J. et al. Whole-genome sequencing in autism identifies hot spots for de novo germline mutation. Cell 151, 1431–1442 (2012).
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Krupp, D. R. et al. Exonic mosaic mutations contribute risk for autism spectrum disorder. Am. J. Hum. Genet. 101, 369–390 (2017).
Wu, H., de Gannes, M. K., Luchetti, G. & Pilsner, J. R. Rapid method for the isolation of mammalian sperm DNA. Biotechniques 58, 293–300 (2015).
Regan, J. F. et al. A rapid molecular approach for chromosomal phasing. PloS ONE 10, e0118270 (2015).
Untergasser, A. et al. Primer3Plus, an enhanced web interface to Primer3. Nucleic Acids Res. 35, W71–W74 (2007).
Untergasser, A. et al. Primer3-new capabilities and interfaces. Nucleic Acids Res. 40, e115 (2012).
Koressaar, T. & Remm, M. Enhancements and modifications of primer design program Primer3. Bioinformatics 23, 1289–1291 (2007).
Xu, X. et al. Amplicon resequencing identified parental mosaicism for approximately 10% of ‘de novo’ SCN1A mutations in children with Dravet syndrome. Hum. Mutat. 36, 861–872 (2015).
Karczewski, K. J. et al. Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes. Preprint at bioRxiv https://doi.org/10.1101/531210 (2019).
Goss, P. J. & Lewontin, R. C. Detecting heterogeneity of substitution along DNA and protein sequences. Genetics 143, 589–602 (1996).
Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).
Collins, R. L., Stone, M. R., Brand, H., Glessner, J. T. & Talkowski, M. E. CNView: a visualization and annotation tool for copy number variation from whole-genome sequencing. Preprint at bioRxiv https://doi.org/10.1101/049536 (2016).
Willems, T. et al. Genome-wide profiling of heritable and de novo STR variations. Nat. Methods 14, 590–592 (2017).
Bailey, J. A. et al. Recent segmental duplications in the human genome. Science 297, 1003–1007 (2002).
Karolchik, D. et al. The UCSC table browser data retrieval tool. Nucleic Acids Res. 32, D493–D496 (2004).
Gervais, A. L., Marques, M. & Gaudreau, L. PCRTiler: automated design of tiled and specific PCR primer pairs. Nucleic Acids Res. 38, W308–W312 (2010).
We thank the participants in this study for their contribution. M.W.B. was supported by an EMBO Long-Term Fellowship (no. ALTF 174-2015), which is co-funded by the Marie Curie Actions of the European Commission (nos. LTFCOFUND2013 and GA-2013-609409), and an Erwin Schrödinger Fellowship by the Austrian Science Fund (no. J 4197-B30). This study was supported by grants to J.G.G. from the NIH (nos. U01MH108898 and R01NS083823); the Simons Foundation Autism Research Initiative to J.G.G. (no. 571583), J.S. and M. Wigler (laboratory leader for A.B.M. and Z.W.); the NIH (nos. MH076431 and MH113715) to J.S.; and the Howard Hughes Medical Institute to J.G.G. Sequencing support was provided by the Rady Children’s Institute for Genomic Medicine and ONP. O.D. acknowledges support from the Silverman Family Foundation and Finding a Cure for Epilepsy (FACES) and Seizures. We thank B. Hamilton, N. Chi, V. Stanley, A. Marsh, M. Wigler and L. Alexandrov for suggestions. We thank R. Sinkovits, A. Majumdar, S. Strande and the San Diego Supercomputer Center for hosting the computing infrastructure necessary for completing this project.
M.W.B., D.A., M.K., K.N.J., W.M.B., J.S. and J.G.G. are inventors on a provisional patent (PCT ref. no. SD2017-181-2PCT) filed by UC, San Diego, titled ‘Assessing risk of de novo mutations in males’.
Peer review information Kate Gao was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
a, Plot showing the fraction of the genome that is covered at a given depth for blood and sperm following WGS with a target coverage of 200 × . b, Plot showing the insert size of the reads for blood and sperm. c, Nanopore long-read technology (average read length 5,349 bp) was able to assign parental haplotype to 601/832 dSNVs in 13 children. Out of these, 501 were paternal, resulting in α~4 as reported previously. d-e, Binomial models for the detection limit of mosaic variants. Plots show the probability of detecting a given variant at a specific allelic fraction (AF) when requiring at least 3 alternate reads at different read-depths (d) or including a magnified inset for AF between 0.05 and 0 at 200 × (e). f, Analysis of the power of detection assuming a minimum requirement of 3 reads at 200 × sequencing. Plot shows the integrated probability of detection for the indicated tiers based on the curve seen in e. g-h, Plot of the fraction of detected variants (g) and the integrated detected fraction for the indicated AF ranges (h) of simulated data using Pysim. Results are from 10,000 variants simulated at 0.25, 0.20, 0.15, 0.10, 0.05, 0.02, and 0.01 AF. HaplotypeCaller was employed to detect variants as for data in Fig. 1.
a, 18 variants that could be assessed by ultra-deep target amplicon sequencing (TAS): shown are the reported 200 × WGS results (square with horizontal line) and the results from TAS (closed circle) (shown are estimated fraction ± binomial 95% CI). Sperm (left, green) and blood (right, orange). Dashed line and grey box: upper 95% CI of an unrelated control and the area beneath to visualize likely false positive variants. y-axis: allelic fraction (%) for a log2 transformation of the data. Red text: variants that were considered to have failed orthogonal validation: 15/18 variants were successfully confirmed. Underlined variants were confirmed, but likely annotated as the wrong class (all 5 are probably SDO rather than SDE). For all data points, the estimated fraction and CI are based on the fraction of mutant reads, see Supplementary Data 2 and 4. b, Allelic fraction (determined by ddPCR or WGS read counts) of the mutant allele with the highest allelic fraction in sperm (F05: Chr22:23082101A > G). Sperm and Blood indicate samples from the father, other samples (Blood/ddPCR) were derived from the mother, the child harboring the dSNV (II-2), or control (Ctrl) blood. Graph shows individual data points (experimental triplicates) and mean ± SEM for the ddPCR data.
a, Plot showing the increase in dSNV number with paternal age at birth, as described previously1,5. Dashed line shows a regression curve demonstrating this dependence (n = 14 trios, adjusted R2 = 0.526, P = 0.0020). b, Plot showing the increase in dSNV number with paternal age at birth for paternal variants only. As expected, this correlation was stronger than for non-phased variants (n = 13 trios, adjusted R2 = 0.736, P = 0.000107). c-d, Plots showing correlation for paternal age and the number of mosaic variants or the mean AF in sperm. Paternal age/the number of mosaic variants (c; n = 14 trios, adjusted R2 = -0.048, P = 0.536) and paternal age/mean AF in sperm (d; n = 14 trios, adjusted R2 = -0.047, P = 0.463) did not show any significant correlation. Adjusted R2, coefficient of determination, and F-statistic nominal P-values are derived from a linear regression model through ordinary least squares. All graphs show individual data points, a regression line, and the 95% CI.
a, Mutational signatures (6 categories) for non-mosaic and mosaic dSNVs, compared to the overall gnomAD signature and a permuted subset (n = 1,000 permutations for n = 889 (non-mosaic) and n = 23 (mosaic) dSNVs; shown is the 95% band). Asterisks indicate observed signatures that lie outside the 95% band of the permuted variants.. Non-mosaic variants are largely reminiscent of the gnomAD signature (with the exception of a significant depletion of T > G). Mosaic variants exhibit some differences, but none reach significance due to the low number of available mutations. b, Mutational signatures (96 categories; trinucleotide environment for non-mosaic and mosaic dSNVs. c, Detailed view of the 96 mutational categories for non-mosaic and mosaic dSNVs, compared to the overall gnomAD signature and a permuted subset (n = 1,000 permutations for n = 889 (non-mosaic) and n = 23 (mosaic) dSNVs; shown is the 95% band). Dots indicate the observed mutational signature (black: within 95% band; red: outside the 95% band).
a-c, Calculated copy number (a, c) and fraction of supporting reads (b) for the 6q16.1 deletion in F01 and The 1p36.32 duplication as indicated. Orange band in a and c: ± 1 SD of the CN using similarly sized regions across the genome (n = 1,000 random regions, see Methods). Plot in b shows the estimated fraction of supporting reads (estimated fraction ± binomial 95% CI; based on the fraction of mutant reads, see Supplementary Data 7). Together, these approaches suggest that these dSVs are not mosaic in paternal sperm. Note that the fraction of supporting reads could not be used for the duplication due to the repetitive elements flanking this SV. d, Copy number variant plot for the duplication in F06 for the Proband (40 × ), Father (200 × both), and the mother (40 × ). Visualization was performed with the CNView36 tool (see Methods). e, Correlation of the number of dSTRΔs with paternal age at birth. Dashed line shows a regression curve (n = 14 trios, adjusted R2 = -0.058, P = 0.598). Adjusted R2, coefficient of determination, and F-statistic nominal P-value are derived from a linear regression model through ordinary least squares. Graph shows individual data points, a regression line, and the 95% CI. f, Number of STR repeat units for non-mosaic dSTRΔs or those that are mosaic. No significant difference can be observed between the two groups (n = 111 non-mosaic variants and n = 15 mosaic variants; two-tailed Mann Whitney test; nominal P = 0.5490). Boxplots show median and quartiles with outliers as well as individual values. g, Detailed analysis of the TCTA repeat numbers in paternal, maternal, and child’s blood at low sequencing depth. Results show a de novo 13 × repeat in the child that is neither present in the father nor the mother. h, Sample reads showing the presence of a 10 × and 13 × allele in the child, a homozygous 10 × allele in the mother, a 10 × and a 12 × allele in the father, and the presence of a mosaic 13 × allele exclusively in paternal sperm.
a-c AF (determined by ddPCR) of the mutant allele in paternal sperm (sperm) and maternal blood (mother) for the relevant dSNV in the 14 families. Part of this panel is also presented in Fig. 3. Ctrl –an unrelated sperm or blood sample, as indicated, acting as control. Graphs show individual data points (experimental triplicates) and mean ± SEM. d, Sanger sequencing results of paternal sperm for the locus harboring the dSNV for each family. Confirming the ddPCR results, F09, F10, and F13 showed mosaicism at their respective positions. e, Sanger sequencing results showing the C > T conversion locus in GRIN2A in F09 for all family members. The mutation was absent in the saliva of both parents, but present as a heterozygous allele in all 3 children.
Extended Data Fig. 7 ddPCR assessment of pathogenic structural variants and recurrent sampling of pathogenic DNMs in F01, F09, and F13.
a-c, AF (determined by ddPCR) of the mutant alleles in F09 (a), F10 (b), and F13 (c). DNA tested was derived from paternal sperm and the saliva (a and b) or blood (c, bl.) of the father, mother, or affected child. In addition, controls for sperm (sp) and blood (bl) are provided. d, AF (determined by ddPCR) comparing two biological replicates of paternal sperm for F01, F09, and F13. The samples showed comparable levels of AF over time for all three samples, however, F13 exhibited a minor, but statistically significant difference. ***P < 0.001 (unpaired t-test, two-tailed, degrees of freedom = 12). e-g, Relative copy number (determined by ddPCR) for the three indicated dSVs for blood- and sperm-derived samples. Note that there is no detectable abnormality in the paternal sperm copy number above noise level, suggesting absence of sperm mosaicism in these samples. h, Direct copy number quantification of the duplication by ddPCR. All graphs show individual data points (experimental triplicates except for Affected in g [experimental duplicate], and F01 and F13 in d [7 experimental replicates]) and mean ± SEM.
a-d, Plots of the fraction of detected variants (a, c) and the integrated detected fraction for the indicated AF ranges (b, d) of simulated data using Pysim for the intersection of MuTect 2/Strelka 2 (a, b) and MosaicHunter (c, d). Results were from 10,000 variants simulated at 0.25, 0.20, 0.15, 0.10, 0.05, 0.02, and 0.01 AF. This was the same data set as used in Extended Data Fig. 1. The MuTect 2/Strelka 2 and MosaicHunter pipelines were employed with the same filters as for the data in Fig. 4.
Extended Data Fig. 9 Mosaic SNVs identified by unbiased analysis have a high validation rate and their AF differs depending on their origin.
a-c, 74 variants that could be assessed by ultra-deep target amplicon sequencing (TAS): shown are the reported 200 × WGS results (square with horizontal line) and the results from TAS (closed circle) (shown are estimated fraction ± binomial 95% CI). Sperm (left, green) and blood (right, orange). Dashed line and grey box: upper 95% CI of an unrelated control and the area beneath to visualize likely false positive variants. y-axis: allelic fraction (%) for a log2 transformation of the data. Plots are split by the three categories: SDO (a), BSS (b), and BDO (c). Red text denotes variants that were considered to have failed orthogonal validation: 13/19 (a), 21/21 (b), and 33/34 (c) were successfully confirmed. Underlined variants were confirmed, but likely annotated as the wrong class (that is, they are actually BSS for SDO and BDO variants in a and c, or are SDO (green text) or BDO (orange text) for BSS variants in c). For all data points, the estimated fraction and CI are based on the fraction of mutant reads, see Supplementary Data 2 and 8. d-f, Ranked plot of the estimated sperm and blood AF with 95% confidence intervals (estimated fraction ± binomial CI; based on the fraction of mutant reads, see Supplementary Data 8) for all variants detected in the three categories. SDO (d) and BDO (f) variants both show curves that are reminiscent of exponential decay, consistent with an increase of the number of mutations with expansion of the progenitor pool at a constant mutational rate. However, BSS (e) mosaicism for the first 40 variants appears to be more linear, suggesting that mutation rates for early division might be higher than those for later. This is consistent with previous models that estimated an elevated mutation rate in early embryonic development14.
Extended Data Fig. 10 Mosaic variants do not exhibit clustering but differ in their mutational signatures depending on their origin.
a, Plot of the chromosomal location for each of the mosaic variants and their allelic fraction found in sperm from F01-08. Circles, triangles, and squares denote variants found to be mosaic by the dSNV approach, by the unbiased approach, or by both, respectively. b, Permutation simulations (n = 10,000 simulations of n = 23 mosaic dSNVs, n = 62 SDO mosaics, n = 123 SDO + BSS mosaics, n = 568 BDO mosaics, and n = 629 BDO + BSS mosaics) of variant locations to obtain mean and SD of broken stick fragment lengths. Vertical lines mark the observed value from mosaic dSNVs and mosaic variants from the indicated classes. These simulations illustrate that the observed distributions of variants along the chromosomes (as visualized in a for those that were mosaic in sperm) were within expectation. c, Detailed view of the 96 mutational categories for SDO, shared, and BDO mosaic variants,, compared to the overall gnomAD signature and a permuted subset (n = 1,000 permutations for n = 68 (SDO), 72 (BSS), and 568 (BDO) gnomAD SNVs; shown is the 95% band). Dots indicate the observed mutational signature (black: within 95% band; red: outside the 95% band).
About this article
Cite this article
Breuss, M.W., Antaki, D., George, R.D. et al. Autism risk in offspring can be assessed through quantification of male sperm mosaicism. Nat Med 26, 143–150 (2020). https://doi.org/10.1038/s41591-019-0711-0
Clinical Epigenetics (2021)
Nature Reviews Urology (2021)
American Journal of Medical Genetics Part A (2021)