Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Accurate fetal variant calling in the presence of maternal cell contamination

Abstract

High-throughput sequencing of fetal DNA is a promising and increasingly common method for the discovery of all (or all coding) genetic variants in the fetus, either as part of prenatal screening or diagnosis, or for genetic diagnosis of spontaneous abortions. In many cases, the fetal DNA (from chorionic villi, amniotic fluid, or abortive tissue) can be contaminated with maternal cells, resulting in the mixture of fetal and maternal DNA. This maternal cell contamination (MCC) undermines the assumption, made by traditional variant callers, that each allele in a heterozygous site is covered, on average, by 50% of the reads, and therefore can lead to erroneous genotype calls. We present a panel of methods for reducing the genotyping error in the presence of MCC. All methods start with the output of GATK HaplotypeCaller on the sequencing data for the (contaminated) fetal sample and both of its parents, and additionally rely on information about the MCC fraction (which itself is readily estimated from the high-throughput sequencing data). The first of these methods uses a Bayesian probabilistic model to correct the fetal genotype calls produced by MCC-unaware HaplotypeCaller. The other two methods “learn” the genotype-correction model from examples. We use simulated contaminated fetal data to train and test the models. Using the test sets, we show that all three methods lead to substantially improved accuracy when compared with the original MCC-unaware HaplotypeCaller calls. We then apply the best-performing method to three chorionic villus samples from spontaneously terminated pregnancies.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Pipeline for accurate fetal variant calling in the presence of maternal cell contamination (MCC).
Fig. 2: Accuracy of the genotype-correction methods at various MCC fractions.

Data availability

https://archive.org/details/simulated_maternal_cell_contamination.

Code availability

https://github.com/bazykinlab/ML-maternal-cell-contamination.

References

  1. 1.

    Tayoun ANA, Spinner NB, Rehm HL, Green RC, Bianchi DW. Prenatal DNA sequencing: clinical, counseling, and diagnostic laboratory considerations. Prenat Diagn. 2018;38:26–32.

    Article  Google Scholar 

  2. 2.

    Best S, Wou K, Vora N, Van der Veyver IB, Wapner R, Chitty LS. Promises, pitfalls and practicalities of prenatal whole exome sequencing. Prenat Diagn. 2018;38:10–9.

    CAS  Article  Google Scholar 

  3. 3.

    Stojilkovic-Mikic T, Mann K, Docherty Z, Ogilvie CM. Maternal cell contamination of prenatal samples assessed by QF-PCR genotyping. Prenat Diagn 2005;25(1):79–83.

    Article  Google Scholar 

  4. 4.

    Weida J, Patil AS, Schubert FP, Vance G, Drendel H, Reese A, et al. Prevalence of maternal cell contamination in amniotic fluid samples. J Matern Fetal Neonatal Med. 2017;30:2133–7.

    Article  Google Scholar 

  5. 5.

    Lamb AN, Rosenfeld JA, Coppinger J, Dodge ET, Dabell MP, Torchia BS, et al. Defining the impact of maternal cell contamination on the interpretation of prenatal microarray analysis. Genet Med. 2012;14:914–21.

    CAS  Article  Google Scholar 

  6. 6.

    Nagan N, Faulkner NE, Curtis C, Schrijver I. Laboratory guidelines for detection, interpretation, and reporting of maternal cell contamination in prenatal analyses. J Mol Diagn. 2011;13:7–11.

    Article  Google Scholar 

  7. 7.

    DeBoever C, Aguirre M, Tanigawa Y, Spencer CCA, Poterba T, Bustamante CD, et al. Bayesian model comparison for rare variant association studies of multiple phenotypes. 2018. https://doi.org/10.1101/257162.

  8. 8.

    Degner JF, Marioni JC, Pai AA, Pickrell JK, Nkadori E, Gilad Y, et al. Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics. 2009;25:3207–12.

    CAS  Article  Google Scholar 

  9. 9.

    Jun G, Flickinger M, Hetrick KN, Romm JM, Doheny KF, Abecasis GR, et al. Detecting and estimating contamination of human DNA samples in sequencing and array-based genotype data. Am J Hum Genet. 2012;91:839–48.

    CAS  Article  Google Scholar 

  10. 10.

    Van der Auwera G. Genotype refinement workflow. https://gatkforums.broadinstitute.org/gatk/discussion/4723/genotype-refinement-workflow (2014).

  11. 11.

    GATK Team. Genotype refinement workflow for germline short variants. https://gatk.broadinstitute.org/hc/en-us/articles/360035531432-Genotype-Refinement-workflow-for-germline-short-variants (2020) (2020).

  12. 12.

    Chen T, Guestrin C. XGBoost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD; 2016. p. 785–94.

  13. 13.

    Zook JM, Catoe D, McDaniel J, Vang L, Spies N, Sidow A, et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci Data. 2016;3:160025.

    CAS  Article  Google Scholar 

  14. 14.

    Consortium, The 1000 Genomes Project. A global reference for human genetic variation. Nature 2015;526:68–74.

    Article  Google Scholar 

  15. 15.

    Jia Z, Fengbiao M, Wang L, Li M, Shi Y, Zhang B, et al. Whole-exome sequencing identifies a de novo mutation in TRPM4 involved in pleiotropic ventricular septal defect. Int J Clin Exp Pathol. 2017;10:5092–104.

    CAS  Google Scholar 

  16. 16.

    Corpas M, Valdivia-Granda W, Torres N, Greshake B, Coletta A, Knaus A, et al. Crowdsourced direct-to-consumer genomic analysis of a family quartet. BMC Genom. 2015;16:910.

    Article  Google Scholar 

  17. 17.

    Jun G, Wing MK, Abecasis GR, Kang HM. An efficient and scalable analysis framework for variant extraction and refinement from population scale DNA sequence data. Genome Res. 2015. https://doi.org/10.1101/gr.176552.114.

  18. 18.

    1000 Genomes Project. GRCh38 alignment README. https://github.com/igsr/1000Genomes_data_indexes/blob/master/data_collections/1000_genomes_project/README.1000genomes.GRCh38DH.alignment (2015).

  19. 19.

    Kim S, Scheffler K, Halpern AL, Bekritsky MA, Noh E, Kallberg M, et al. Strelka2: fast and accurate calling of germline and somatic variants. Nat Methods. 2018;15:591.

    CAS  Article  Google Scholar 

  20. 20.

    Van der Auwera G. (howto) Apply hard filters to a call set. https://gatkforums.broadinstitute.org/gatk/discussion/2806/howto-apply-hard-filters-to-a-call-set (2013).

  21. 21.

    Flickinger M, Jun G, Abecasis GR, Boehnke M, Kang HM. Correcting for sample contamination in genotype calling of DNA sequence data. Am J Hum Genet. 2015;97:284–90.

    CAS  Article  Google Scholar 

Download references

Funding

This work was funded by the Skoltech Biomedical Initiative grant to GAB and DY.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Elena Nabieva.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Nabieva, E., Sharma, S.M., Kapushev, Y. et al. Accurate fetal variant calling in the presence of maternal cell contamination. Eur J Hum Genet 28, 1615–1623 (2020). https://doi.org/10.1038/s41431-020-0697-6

Download citation

Search

Quick links