Low-level parental somatic mosaic SNVs in exomes from a large cohort of trios with diverse suspected Mendelian conditions



The goal of this study was to assess the scale of low-level parental mosaicism in exome sequencing (ES) databases.


We analyzed approximately 2000 family trio ES data sets from the Baylor-Hopkins Center for Mendelian Genomics (BHCMG) and Baylor Genetics (BG). Among apparent de novo single-nucleotide variants identified in the affected probands, we selected rare unique variants with variant allele fraction (VAF) between 30% and 70% in the probands and lower than 10% in one of the parents.


Of 102 candidate mosaic variants validated using amplicon-based next-generation sequencing, droplet digital polymerase chain reaction, or blocker displacement amplification, 27 (26.4%) were confirmed to be low- (VAF between 1% and 10%) or very low (VAF <1%) level mosaic. Detection precision in parental samples with two or more alternate reads was 63.6% (BHCMG) and 43.6% (BG). In nine investigated individuals, we observed variability of mosaic ratios among blood, saliva, fibroblast, buccal, hair, and urine samples.


Our computational pipeline enables robust discrimination between true and false positive candidate mosaic variants and efficient detection of low-level mosaicism in ES samples. We confirm that the presence of two or more alternate reads in the parental sample is a reliable predictor of low-level parental somatic mosaicism.

Fig. 1: Candidate mosaic variant selection in Baylor-Hopkins Center for Mendelian Genomics (BHCMG) cohort.
Fig. 2: Variant allele fraction (VAF) estimated using four different molecular methods: exome sequencing (ES), amplicon-based next-generation sequencing (NGS), blocker displacement amplification (BDA), and droplet digital polymerase chain reaction (ddPCR).
Fig. 3: Distribution of variant allele fractions (VAFs) among six different tissues: blood, saliva, buccal, skin fibroblast, hair, and urine.

Code availability

The source code of our filtering pipeline is publicly available at https://github.com/tgambin/LowLevelMosaicVariantCaller.


We are thankful to our colleagues who provided their expertise that greatly assisted this research work. We thank Davut Pehlivan for helpful discussion. This study is supported by the US National Institute of Health (NIH) Eunice Kennedy Shriver National Institute of Child Health & Human Development (NICHD) grant R01HD087292 to P.S., National Human Genome Research Institute (NHGRI)/National Heart, Lung, and Blood Institute (NHLBI) grant UM1HG006542 to the Baylor-Hopkins Center for Mendelian Genomics (BHCMG), and NHGRI grant HG008986 to J.E.P.

J.R.L. has stock ownership in 23andMe, is a paid consultant for Regeneron Pharmaceuticals, and is a coinventor on multiple US and European patents related to molecular diagnostics for inherited neuropathies, eye diseases, and bacterial genomic fingerprinting. D.Y.Z. and L.R.W. have a patent pending on blocker displacement amplification. D.Y.Z., N.G.X., and L.R.W. are consultants of NuProbe Global. D.Y.Z. consults for Avenge Bio. D.Y.Z. owns equity of NuProbe Global and Torus Biosystems. The other authors declare no conflicts of interest.

