High-throughput sequencing of fetal DNA is a promising and increasingly common method for the discovery of all (or all coding) genetic variants in the fetus, either as part of prenatal screening or diagnosis, or for genetic diagnosis of spontaneous abortions. In many cases, the fetal DNA (from chorionic villi, amniotic fluid, or abortive tissue) can be contaminated with maternal cells, resulting in the mixture of fetal and maternal DNA. This maternal cell contamination (MCC) undermines the assumption, made by traditional variant callers, that each allele in a heterozygous site is covered, on average, by 50% of the reads, and therefore can lead to erroneous genotype calls. We present a panel of methods for reducing the genotyping error in the presence of MCC. All methods start with the output of GATK HaplotypeCaller on the sequencing data for the (contaminated) fetal sample and both of its parents, and additionally rely on information about the MCC fraction (which itself is readily estimated from the high-throughput sequencing data). The first of these methods uses a Bayesian probabilistic model to correct the fetal genotype calls produced by MCC-unaware HaplotypeCaller. The other two methods “learn” the genotype-correction model from examples. We use simulated contaminated fetal data to train and test the models. Using the test sets, we show that all three methods lead to substantially improved accuracy when compared with the original MCC-unaware HaplotypeCaller calls. We then apply the best-performing method to three chorionic villus samples from spontaneously terminated pregnancies.
Subscribe to Journal
Get full journal access for 1 year
only $33.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Tayoun ANA, Spinner NB, Rehm HL, Green RC, Bianchi DW. Prenatal DNA sequencing: clinical, counseling, and diagnostic laboratory considerations. Prenat Diagn. 2018;38:26–32.
Best S, Wou K, Vora N, Van der Veyver IB, Wapner R, Chitty LS. Promises, pitfalls and practicalities of prenatal whole exome sequencing. Prenat Diagn. 2018;38:10–9.
Stojilkovic-Mikic T, Mann K, Docherty Z, Ogilvie CM. Maternal cell contamination of prenatal samples assessed by QF-PCR genotyping. Prenat Diagn 2005;25(1):79–83.
Weida J, Patil AS, Schubert FP, Vance G, Drendel H, Reese A, et al. Prevalence of maternal cell contamination in amniotic fluid samples. J Matern Fetal Neonatal Med. 2017;30:2133–7.
Lamb AN, Rosenfeld JA, Coppinger J, Dodge ET, Dabell MP, Torchia BS, et al. Defining the impact of maternal cell contamination on the interpretation of prenatal microarray analysis. Genet Med. 2012;14:914–21.
Nagan N, Faulkner NE, Curtis C, Schrijver I. Laboratory guidelines for detection, interpretation, and reporting of maternal cell contamination in prenatal analyses. J Mol Diagn. 2011;13:7–11.
DeBoever C, Aguirre M, Tanigawa Y, Spencer CCA, Poterba T, Bustamante CD, et al. Bayesian model comparison for rare variant association studies of multiple phenotypes. 2018. https://doi.org/10.1101/257162.
Degner JF, Marioni JC, Pai AA, Pickrell JK, Nkadori E, Gilad Y, et al. Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics. 2009;25:3207–12.
Jun G, Flickinger M, Hetrick KN, Romm JM, Doheny KF, Abecasis GR, et al. Detecting and estimating contamination of human DNA samples in sequencing and array-based genotype data. Am J Hum Genet. 2012;91:839–48.
Van der Auwera G. Genotype refinement workflow. https://gatkforums.broadinstitute.org/gatk/discussion/4723/genotype-refinement-workflow (2014).
GATK Team. Genotype refinement workflow for germline short variants. https://gatk.broadinstitute.org/hc/en-us/articles/360035531432-Genotype-Refinement-workflow-for-germline-short-variants (2020) (2020).
Chen T, Guestrin C. XGBoost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD; 2016. p. 785–94.
Zook JM, Catoe D, McDaniel J, Vang L, Spies N, Sidow A, et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci Data. 2016;3:160025.
Consortium, The 1000 Genomes Project. A global reference for human genetic variation. Nature 2015;526:68–74.
Jia Z, Fengbiao M, Wang L, Li M, Shi Y, Zhang B, et al. Whole-exome sequencing identifies a de novo mutation in TRPM4 involved in pleiotropic ventricular septal defect. Int J Clin Exp Pathol. 2017;10:5092–104.
Corpas M, Valdivia-Granda W, Torres N, Greshake B, Coletta A, Knaus A, et al. Crowdsourced direct-to-consumer genomic analysis of a family quartet. BMC Genom. 2015;16:910.
Jun G, Wing MK, Abecasis GR, Kang HM. An efficient and scalable analysis framework for variant extraction and refinement from population scale DNA sequence data. Genome Res. 2015. https://doi.org/10.1101/gr.176552.114.
1000 Genomes Project. GRCh38 alignment README. https://github.com/igsr/1000Genomes_data_indexes/blob/master/data_collections/1000_genomes_project/README.1000genomes.GRCh38DH.alignment (2015).
Kim S, Scheffler K, Halpern AL, Bekritsky MA, Noh E, Kallberg M, et al. Strelka2: fast and accurate calling of germline and somatic variants. Nat Methods. 2018;15:591.
Van der Auwera G. (howto) Apply hard filters to a call set. https://gatkforums.broadinstitute.org/gatk/discussion/2806/howto-apply-hard-filters-to-a-call-set (2013).
Flickinger M, Jun G, Abecasis GR, Boehnke M, Kang HM. Correcting for sample contamination in genotype calling of DNA sequence data. Am J Hum Genet. 2015;97:284–90.
This work was funded by the Skoltech Biomedical Initiative grant to GAB and DY.
Conflict of interest
The authors declare that they have no conflict of interest.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Nabieva, E., Sharma, S.M., Kapushev, Y. et al. Accurate fetal variant calling in the presence of maternal cell contamination. Eur J Hum Genet 28, 1615–1623 (2020). https://doi.org/10.1038/s41431-020-0697-6