DeTiN: overcoming tumor-in-normal contamination

Abstract

Comparison of sequencing data from a tumor sample with data from a matched germline control is a key step for accurate detection of somatic mutations. Detection sensitivity for somatic variants is greatly reduced when the matched normal sample is contaminated with tumor cells. To overcome this limitation, we developed deTiN, a method that estimates the tumor-in-normal (TiN) contamination level and, in cases affected by contamination, improves sensitivity by reclassifying initially discarded variants as somatic.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Results from in silico and in vitro validation of deTiN.
Fig. 2: Application of deTiN to CLL sequencing data.
Fig. 3: Application of deTiN to analysis of solid tumors with adjacent normal controls.

References

  1. 1.

    Cibulskis, K. et al. Nat. Biotechnol. 31, 213–219 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  2. 2.

    Saunders, C. T. et al. Bioinformatics 28, 1811–1817 (2012).

    Article  PubMed  CAS  Google Scholar 

  3. 3.

    Koboldt, D. C. et al. Genome Res. 22, 568–576 (2012).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  4. 4.

    Stieglitz, E. et al. Nat. Genet. 47, 1326–1333 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  5. 5.

    Wei, L. et al. BMC Med. Genomics 9, 64 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  6. 6.

    The Cancer Genome Atlas Research Network. N. Engl. J. Med. 368, 2059–2074 (2013).

    Article  PubMed Central  CAS  Google Scholar 

  7. 7.

    Taylor-Weiner, A. et al. Nature 540, 114–118 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  8. 8.

    Welch, J. S. et al. Cell 150, 264–278 (2012).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  9. 9.

    Landau, D. A. et al. Nature 526, 525–530 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  10. 10.

    Deng, G., Lu, Y., Zlotnikov, G., Thor, A. D. & Smith, H. S. Science 274, 2057–2059 (1996).

    Article  PubMed  CAS  Google Scholar 

  11. 11.

    Försti, A., Louhelainen, J., Söderberg, M., Wijkström, H. & Hemminki, K. Eur. J. Cancer 37, 1372–1380 (2001).

    Article  PubMed  Google Scholar 

  12. 12.

    Leung, W. K. et al. Cancer 91, 2294–2301 (2001).

    Article  PubMed  CAS  Google Scholar 

  13. 13.

    Braakhuis, B. J. M., Tabor, M. P., Kummer, J. A., Leemans, C. R. & Brakenhoff, R. H. Cancer Res. 63, 1727–1730 (2003).

    PubMed  CAS  Google Scholar 

  14. 14.

    Forbes, S. A. et al. Nucleic Acids Res. 43, D805–D811 (2015).

    Article  PubMed  CAS  Google Scholar 

  15. 15.

    Rheinbay, E. et al. Nature 547, 55–60 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  16. 16.

    Van Allen, E. M. et al. Nat. Med. 20, 682–688 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  17. 17.

    Giannakis, M. et al. Cell Rep. 15, 857–865 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  18. 18.

    Kanda, M. et al. Gastroenterology 142, 730–733 (2012).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  19. 19.

    Bettegowda, C. et al. Sci. Transl. Med. 6, 224ra24 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  20. 20.

    Schwarzenbach, H., Hoon, D. S. B. & Pantel, K. Nat. Rev. Cancer 11, 426–437 (2011).

    Article  PubMed  CAS  Google Scholar 

  21. 21.

    Carter, S. L. et al. Nat. Biotechnol. 30, 413–421 (2012).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  22. 22.

    Lek, M. et al. Nature 536, 285–291 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  23. 23.

    Li, H. et al. Bioinformatics 25, 2078–2079 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  24. 24.

    Cibulskis, K. et al. Bioinformatics 27, 2601–2602 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  25. 25.

    Ramos, A. H. et al. Hum. Mutat. 36, E2423–E2429 (2015).

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

This work was funded in part by the NIH TCGA Genome Data Analysis Center (U24CA143845 to G.G.), the Paul C. Zamecnick, MD, Chair in Oncology at MGH Cancer Center (G.G.), the National Human Genome Research Institute, NIH (T32 HG002295 to A.T.-W.), and the NIH (NCI P01CA206978-01, R01CA182461-01, U10CA180861-01, and R01CA184922-02 to A.T.-W., C.S., and C.J.W.). C.J.W. is a Scholar of the Leukemia and Lymphoma Society.

Author information

Affiliations

Authors

Contributions

A.T.-W., C.S., and G.G. outlined and planned development. A.T.-W. and C.S. developed the method. A.T.-W. and M.R. performed genomic analysis of large data cohorts. A.T.-W., A.M., and N.L. performed and analyzed in vitro simulations. A.T.-W. and C.S. performed and analyzed in silico simulations. E.R., D.-A.L., and C.J.W. enabled sample acquisition for data analysis. T.G. provided histopathology review of TCGA healthy tumor-adjacent tissue samples. A.T.-W., M.M., C.S., C.J.W., and G.G. prepared the manuscript and figures.

Corresponding author

Correspondence to Gad Getz.

Ethics declarations

Competing interests

C.J.W. is a cofounder of Neon Therapeutics and a member of its scientific advisory board.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Schematic of mutation calling using a pure normal and a tumor-contaminated normal with deTiN.

(a) Sequencing data from normal samples that are contaminated with tumor cells contain proportional evidence of alternate reads at sites of somatic mutation. Non-contaminated (pure) normals do not contain such evidence of somatic mutations. Somatic mutations with alternate reads in the normal are often rejected as potential germline or artifactual events. (b) deTiN estimates TiN by comparing allele-fractions of mutations and allelic imbalance caused by copy-number events in the tumor and normal samples. deTiN then uses a Bayesian approach to recover somatic single nucleotide variations (SSNVs) and short insertions and deletions (indels) that were rejected by mutation callers (Methods). Orange points represent germline sites (unaffected by TiN). Grey points represent tumor data. Black points show variants affected by TiN.

Supplementary Figure 2 Results from application of deTiN to VarScan and Strelka SSNVs.

(a–c) Mutation detection sensitivity with deTiN (red) and without deTiN (blue) (n = 1 for each TiN fraction). Error bars indicate 95% beta confidence intervals. (a) Mutation detection sensitivity using VarScan. (b) Mutation detection sensitivity using Strelka. The solid black line indicates the fraction of variants included in the output VCF by Strelka; the dotted line is the ratio between the number of variants called with deTiN and the number of candidate variants emitted by Strelka. (c) Mutation detection sensitivity of MuTect using sites identified by Strelka (using an uncontaminated normal sample).

Supplementary Figure 3 Sensitivity analysis for simulations.

Sensitivity results for high AF (>30%) SSNVs (a,b). (a) Weighted average sensitivity for high allele fraction SSNVs from in silico simulations with and without deTiN (n = 5 independent simulation experiments for each TiN level). Error bars are the standard error. (b) Sensitivity from in vitro experiments using deTiN and without deTiN. (n = 1 sequencing experiment for each TiN level). Error bars indicate 95% confidence interval (beta distribution). (c, d) Bar plot indicating the number of mutations called by MuTect without deTiN (dark blue), recovered by deTiN (teal), and false positives (yellow). (c) Results from the in silico experiments (n = 5) (d) shows results from the in vitro experiment (n = 1). (e) Receiver operating characteristic curves for deTiN's somatic classification at four TiN levels generated using in vitro mixed samples (n = 1 for each TiN level). “FPS” indicates the number of false positives detected at the default threshold. (f) Area under receiver operating characteristic curves (AUC) (y-axis) for all in vitro mixed TiN values (x-axis; one sample per TiN level). Error bars indicate 95% confidence intervals on AUC (n=100 independent bootstrap iterations).

Supplementary Figure 4 Results from deTiN downsampling experiments.

(a) SSNV-based TiN estimates with true TiN = 0.1. (b) Size of confidence interval on TiN estimate. Dotted blue line indicates power-law regression on median points. (a, b) results using down sampling of true somatic variants (n = 100 iterations per number of variants). Median is shown in orange and error bars indicate 95% range. Dotted black line indicates estimate using all variants in panel a. (c) aSCNA-based TiN estimate for various coverage levels. Dotted line indicates true TiN of 0.015. Points indicate aSCNA TiN estimate, and bar indicates 95% confidence interval. Note that we removed deTiN coverage filtering to generate this figure (n = 1 per coverage level). (d) aSCNA-based TiN estimate from one segment with various amounts of SNPs. Points indicate mean TiN estimate from down-sampling iterations (n = 100 iterations per number of SNPs). Errors bars reflects 1 standard deviation. (e) DeTiN's TiN estimates for in vitro mixed samples with different tumor purities (n = 1 per purity level). Blue points indicate TiN estimates. Orange points indicate the expected TiN estimate based on the TiN obtained when using the pure tumor (taken from the in vitro experiment). Error bars represent 95% confidence intervals.

Supplementary Figure 5 DeTiN output from aSCNA model illustrating the presence of a chromosome 2 deletion in a pure normal.

(a) Raw germline SNP data supporting aSCNA TiN model (n = 1 patient sample). SNPs are in grey. SNPs in regions with an aSCNA are colored in. Red points reflect the allele-fraction of SNPs in the normal, and blue points reflect the allele-fraction of SNPs in the tumor. (b) Posterior distribution over TiN levels based on aSCNAs (n = 1 patient sample). Unclustered data are in dashed grey, colored lines represent posterior of clustered aSCNA segments. The pink cluster is based only on segments in chromosome 2.

Supplementary Figure 6 DeTiN output from SSNV models of CD19 selected blood and saliva normals.

(a,b) Grey points denote allele fraction of variants considered for inclusion in the model. Blue points denote allele fraction of SSNVs detected by MuTect without deTiN. Red points show allele fraction of SSNVs recovered by deTiN. Error bars on red points indicate 95% beta confidence interval on the allele-fraction in the normal sample. The slope of the black dashed line indicates deTiN's TiN estimate. (a) SSNV model for CLL and a matched saliva sample pair (n = 1 patient sample); TiN=0. (b) SSNV model for CLL–matched CD19 blood sample pair (n = 1 patient sample); TiN=0.31.

Supplementary Figure 7 Heat map and bar plot illustrating recovery of SSNVs in the CLL cohort.

Samples are in columns, genes in rows. Blue boxes indicate variants detected prior to deTiN (“without deTiN”); red boxes indicate additional variants recovered by deTiN (“with deTiN”).

Supplementary Figure 8 DeTiN’s TiN estimates for all normal tissue data by tumor type.

Each point indicates the TiN estimate of each sample (n = 1477 patient samples).

Supplementary Figure 9 Allelic copy-number data supporting TiN solutions in pathologist-reviewed samples.

(a–e) SNPs are grey circles (n = 1 patient sample per panel) germline SNPs in regions with allelic imbalance are colored in. Red points reflect the allele-fraction of the SNPs in the normal, and blue points are the allele-fraction of the SNPs in the tumor.

Supplementary Figure 10 Copy and allele data demonstrating convergent evolution in chr1q.

(a) Copy data for the adjacent normal tissue and breast invasive carcinoma (n = 1 patient sample). Coverage-based copy number ratio data is shown. (b) SNPs on chr1q are shown; yellow points indicate SNPs on the tumor amplified allele (AF>0.5); purple points indicate SNPs on the allele amplified in the normal.

Supplementary Figure 11

Graphical representations of deTiN variables and parameters.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–11, Supplementary Results, and Supplementary Note

Reporting Summary

Supplementary Table 1

DeTiN estimates and SSNV recovery data from in silico simulations

Supplementary Table 2

DeTiN estimates and SSNV recovery data from in vitro simulations

Supplementary Table 3

SSNV data and TiN estimates derived from mutation calling with CD19 selected normal samples

Supplementary Table 4

DeTiN's TiN estimates for 1,419 cases with tumor-adjacent-tissue normal samples derived from TCGA and SIGMA sample cohorts

Supplementary Table 5

DeTiN annotated allelic copy data supporting multiple TiN contamination rates in adjacent tissue normals

Supplementary Table 6

List of command-line arguments for deTiN

Supplementary Software

DeTiN source code

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Taylor-Weiner, A., Stewart, C., Giordano, T. et al. DeTiN: overcoming tumor-in-normal contamination. Nat Methods 15, 531–534 (2018). https://doi.org/10.1038/s41592-018-0036-9

Download citation

Further reading

  • Identification of cancer driver genes based on nucleotide context

    • Felix Dietlein
    • , Donate Weghorn
    • , Amaro Taylor-Weiner
    • , André Richters
    • , Brendan Reardon
    • , David Liu
    • , Eric S. Lander
    • , Eliezer M. Van Allen
    •  & Shamil R. Sunyaev

    Nature Genetics (2020)

  • Pan-cancer analysis of whole genomes

    Nature (2020)

  • Mechanisms and therapeutic implications of hypermutation in gliomas

    • Mehdi Touat
    • , Yvonne Y. Li
    • , Adam N. Boynton
    • , Liam F. Spurr
    • , J. Bryan Iorgulescu
    • , Craig L. Bohrson
    • , Isidro Cortes-Ciriano
    • , Cristina Birzu
    • , Jack E. Geduldig
    • , Kristine Pelton
    • , Mary Jane Lim-Fat
    • , Sangita Pal
    • , Ruben Ferrer-Luna
    • , Shakti H. Ramkissoon
    • , Frank Dubois
    • , Charlotte Bellamy
    • , Naomi Currimjee
    • , Juliana Bonardi
    • , Kenin Qian
    • , Patricia Ho
    • , Seth Malinowski
    • , Leon Taquet
    • , Robert E. Jones
    • , Aniket Shetty
    • , Kin-Hoe Chow
    • , Radwa Sharaf
    • , Dean Pavlick
    • , Lee A. Albacker
    • , Nadia Younan
    • , Capucine Baldini
    • , Maïté Verreault
    • , Marine Giry
    • , Erell Guillerm
    • , Samy Ammari
    • , Frédéric Beuvon
    • , Karima Mokhtari
    • , Agusti Alentorn
    • , Caroline Dehais
    • , Caroline Houillier
    • , Florence Laigle-Donadey
    • , Dimitri Psimaras
    • , Eudocia Q. Lee
    • , Lakshmi Nayak
    • , J. Ricardo McFaline-Figueroa
    • , Alexandre Carpentier
    • , Philippe Cornu
    • , Laurent Capelle
    • , Bertrand Mathon
    • , Jill S. Barnholtz-Sloan
    • , Arnab Chakravarti
    • , Wenya Linda Bi
    • , E. Antonio Chiocca
    • , Katie Pricola Fehnel
    • , Sanda Alexandrescu
    • , Susan N. Chi
    • , Daphne Haas-Kogan
    • , Tracy T. Batchelor
    • , Garrett M. Frampton
    • , Brian M. Alexander
    • , Raymond Y. Huang
    • , Azra H. Ligon
    • , Florence Coulet
    • , Jean-Yves Delattre
    • , Khê Hoang-Xuan
    • , David M. Meredith
    • , Sandro Santagata
    • , Alex Duval
    • , Marc Sanson
    • , Andrew D. Cherniack
    • , Patrick Y. Wen
    • , David A. Reardon
    • , Aurélien Marabelle
    • , Peter J. Park
    • , Ahmed Idbaih
    • , Rameen Beroukhim
    • , Pratiti Bandopadhayay
    • , Franck Bielle
    •  & Keith L. Ligon

    Nature (2020)

  • RNA sequence analysis reveals macroscopic somatic clonal expansion across normal tissues

    • Keren Yizhak
    • , François Aguet
    • , Jaegil Kim
    • , Julian M. Hess
    • , Kirsten Kübler
    • , Jonna Grimsby
    • , Ruslana Frazer
    • , Hailei Zhang
    • , Nicholas J. Haradhvala
    • , Daniel Rosebrock
    • , Dimitri Livitz
    • , Xiao Li
    • , Eila Arich-Landkof
    • , Noam Shoresh
    • , Chip Stewart
    • , Ayellet V. Segrè
    • , Philip A. Branton
    • , Paz Polak
    • , Kristin G. Ardlie
    •  & Gad Getz

    Science (2019)

  • Genomic analyses of flow-sorted Hodgkin Reed-Sternberg cells reveal complementary mechanisms of immune evasion

    • Kirsty Wienand
    • , Bjoern Chapuy
    • , Chip Stewart
    • , Andrew J. Dunford
    • , David Wu
    • , Jaegil Kim
    • , Atanas Kamburov
    • , Timothy R. Wood
    • , Fathima Zumla Cader
    • , Matthew D. Ducar
    • , Aaron R. Thorner
    • , Anwesha Nag
    • , Alexander T. Heubeck
    • , Michael J. Buonopane
    • , Robert A. Redd
    • , Kamil Bojarczuk
    • , Lee N. Lawton
    • , Philippe Armand
    • , Scott J. Rodig
    • , Jonathan R. Fromm
    • , Gad Getz
    •  & Margaret A. Shipp

    Blood Advances (2019)