Comparison of sequencing data from a tumor sample with data from a matched germline control is a key step for accurate detection of somatic mutations. Detection sensitivity for somatic variants is greatly reduced when the matched normal sample is contaminated with tumor cells. To overcome this limitation, we developed deTiN, a method that estimates the tumor-in-normal (TiN) contamination level and, in cases affected by contamination, improves sensitivity by reclassifying initially discarded variants as somatic.
Subscribe to Journal
Get full journal access for 1 year
only $20.17 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Cibulskis, K. et al. Nat. Biotechnol. 31, 213–219 (2013).
Saunders, C. T. et al. Bioinformatics 28, 1811–1817 (2012).
Koboldt, D. C. et al. Genome Res. 22, 568–576 (2012).
Stieglitz, E. et al. Nat. Genet. 47, 1326–1333 (2015).
Wei, L. et al. BMC Med. Genomics 9, 64 (2016).
The Cancer Genome Atlas Research Network. N. Engl. J. Med. 368, 2059–2074 (2013).
Taylor-Weiner, A. et al. Nature 540, 114–118 (2016).
Welch, J. S. et al. Cell 150, 264–278 (2012).
Landau, D. A. et al. Nature 526, 525–530 (2015).
Deng, G., Lu, Y., Zlotnikov, G., Thor, A. D. & Smith, H. S. Science 274, 2057–2059 (1996).
Försti, A., Louhelainen, J., Söderberg, M., Wijkström, H. & Hemminki, K. Eur. J. Cancer 37, 1372–1380 (2001).
Leung, W. K. et al. Cancer 91, 2294–2301 (2001).
Braakhuis, B. J. M., Tabor, M. P., Kummer, J. A., Leemans, C. R. & Brakenhoff, R. H. Cancer Res. 63, 1727–1730 (2003).
Forbes, S. A. et al. Nucleic Acids Res. 43, D805–D811 (2015).
Rheinbay, E. et al. Nature 547, 55–60 (2017).
Van Allen, E. M. et al. Nat. Med. 20, 682–688 (2014).
Giannakis, M. et al. Cell Rep. 15, 857–865 (2016).
Kanda, M. et al. Gastroenterology 142, 730–733 (2012).
Bettegowda, C. et al. Sci. Transl. Med. 6, 224ra24 (2014).
Schwarzenbach, H., Hoon, D. S. B. & Pantel, K. Nat. Rev. Cancer 11, 426–437 (2011).
Carter, S. L. et al. Nat. Biotechnol. 30, 413–421 (2012).
Lek, M. et al. Nature 536, 285–291 (2016).
Li, H. et al. Bioinformatics 25, 2078–2079 (2009).
Cibulskis, K. et al. Bioinformatics 27, 2601–2602 (2011).
Ramos, A. H. et al. Hum. Mutat. 36, E2423–E2429 (2015).
This work was funded in part by the NIH TCGA Genome Data Analysis Center (U24CA143845 to G.G.), the Paul C. Zamecnick, MD, Chair in Oncology at MGH Cancer Center (G.G.), the National Human Genome Research Institute, NIH (T32 HG002295 to A.T.-W.), and the NIH (NCI P01CA206978-01, R01CA182461-01, U10CA180861-01, and R01CA184922-02 to A.T.-W., C.S., and C.J.W.). C.J.W. is a Scholar of the Leukemia and Lymphoma Society.
C.J.W. is a cofounder of Neon Therapeutics and a member of its scientific advisory board.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Integrated supplementary information
Supplementary Figure 1 Schematic of mutation calling using a pure normal and a tumor-contaminated normal with deTiN.
(a) Sequencing data from normal samples that are contaminated with tumor cells contain proportional evidence of alternate reads at sites of somatic mutation. Non-contaminated (pure) normals do not contain such evidence of somatic mutations. Somatic mutations with alternate reads in the normal are often rejected as potential germline or artifactual events. (b) deTiN estimates TiN by comparing allele-fractions of mutations and allelic imbalance caused by copy-number events in the tumor and normal samples. deTiN then uses a Bayesian approach to recover somatic single nucleotide variations (SSNVs) and short insertions and deletions (indels) that were rejected by mutation callers (Methods). Orange points represent germline sites (unaffected by TiN). Grey points represent tumor data. Black points show variants affected by TiN.
(a–c) Mutation detection sensitivity with deTiN (red) and without deTiN (blue) (n = 1 for each TiN fraction). Error bars indicate 95% beta confidence intervals. (a) Mutation detection sensitivity using VarScan. (b) Mutation detection sensitivity using Strelka. The solid black line indicates the fraction of variants included in the output VCF by Strelka; the dotted line is the ratio between the number of variants called with deTiN and the number of candidate variants emitted by Strelka. (c) Mutation detection sensitivity of MuTect using sites identified by Strelka (using an uncontaminated normal sample).
Sensitivity results for high AF (>30%) SSNVs (a,b). (a) Weighted average sensitivity for high allele fraction SSNVs from in silico simulations with and without deTiN (n = 5 independent simulation experiments for each TiN level). Error bars are the standard error. (b) Sensitivity from in vitro experiments using deTiN and without deTiN. (n = 1 sequencing experiment for each TiN level). Error bars indicate 95% confidence interval (beta distribution). (c, d) Bar plot indicating the number of mutations called by MuTect without deTiN (dark blue), recovered by deTiN (teal), and false positives (yellow). (c) Results from the in silico experiments (n = 5) (d) shows results from the in vitro experiment (n = 1). (e) Receiver operating characteristic curves for deTiN's somatic classification at four TiN levels generated using in vitro mixed samples (n = 1 for each TiN level). “FPS” indicates the number of false positives detected at the default threshold. (f) Area under receiver operating characteristic curves (AUC) (y-axis) for all in vitro mixed TiN values (x-axis; one sample per TiN level). Error bars indicate 95% confidence intervals on AUC (n=100 independent bootstrap iterations).
(a) SSNV-based TiN estimates with true TiN = 0.1. (b) Size of confidence interval on TiN estimate. Dotted blue line indicates power-law regression on median points. (a, b) results using down sampling of true somatic variants (n = 100 iterations per number of variants). Median is shown in orange and error bars indicate 95% range. Dotted black line indicates estimate using all variants in panel a. (c) aSCNA-based TiN estimate for various coverage levels. Dotted line indicates true TiN of 0.015. Points indicate aSCNA TiN estimate, and bar indicates 95% confidence interval. Note that we removed deTiN coverage filtering to generate this figure (n = 1 per coverage level). (d) aSCNA-based TiN estimate from one segment with various amounts of SNPs. Points indicate mean TiN estimate from down-sampling iterations (n = 100 iterations per number of SNPs). Errors bars reflects 1 standard deviation. (e) DeTiN's TiN estimates for in vitro mixed samples with different tumor purities (n = 1 per purity level). Blue points indicate TiN estimates. Orange points indicate the expected TiN estimate based on the TiN obtained when using the pure tumor (taken from the in vitro experiment). Error bars represent 95% confidence intervals.
Supplementary Figure 5 DeTiN output from aSCNA model illustrating the presence of a chromosome 2 deletion in a pure normal.
(a) Raw germline SNP data supporting aSCNA TiN model (n = 1 patient sample). SNPs are in grey. SNPs in regions with an aSCNA are colored in. Red points reflect the allele-fraction of SNPs in the normal, and blue points reflect the allele-fraction of SNPs in the tumor. (b) Posterior distribution over TiN levels based on aSCNAs (n = 1 patient sample). Unclustered data are in dashed grey, colored lines represent posterior of clustered aSCNA segments. The pink cluster is based only on segments in chromosome 2.
(a,b) Grey points denote allele fraction of variants considered for inclusion in the model. Blue points denote allele fraction of SSNVs detected by MuTect without deTiN. Red points show allele fraction of SSNVs recovered by deTiN. Error bars on red points indicate 95% beta confidence interval on the allele-fraction in the normal sample. The slope of the black dashed line indicates deTiN's TiN estimate. (a) SSNV model for CLL and a matched saliva sample pair (n = 1 patient sample); TiN=0. (b) SSNV model for CLL–matched CD19– blood sample pair (n = 1 patient sample); TiN=0.31.
Samples are in columns, genes in rows. Blue boxes indicate variants detected prior to deTiN (“without deTiN”); red boxes indicate additional variants recovered by deTiN (“with deTiN”).
Each point indicates the TiN estimate of each sample (n = 1477 patient samples).
Supplementary Figure 9 Allelic copy-number data supporting TiN solutions in pathologist-reviewed samples.
(a–e) SNPs are grey circles (n = 1 patient sample per panel) germline SNPs in regions with allelic imbalance are colored in. Red points reflect the allele-fraction of the SNPs in the normal, and blue points are the allele-fraction of the SNPs in the tumor.
(a) Copy data for the adjacent normal tissue and breast invasive carcinoma (n = 1 patient sample). Coverage-based copy number ratio data is shown. (b) SNPs on chr1q are shown; yellow points indicate SNPs on the tumor amplified allele (AF>0.5); purple points indicate SNPs on the allele amplified in the normal.
Graphical representations of deTiN variables and parameters.
Supplementary Figures 1–11, Supplementary Results, and Supplementary Note
DeTiN estimates and SSNV recovery data from in silico simulations
DeTiN estimates and SSNV recovery data from in vitro simulations
SSNV data and TiN estimates derived from mutation calling with CD19– selected normal samples
DeTiN's TiN estimates for 1,419 cases with tumor-adjacent-tissue normal samples derived from TCGA and SIGMA sample cohorts
DeTiN annotated allelic copy data supporting multiple TiN contamination rates in adjacent tissue normals
List of command-line arguments for deTiN
DeTiN source code
About this article
Cite this article
Taylor-Weiner, A., Stewart, C., Giordano, T. et al. DeTiN: overcoming tumor-in-normal contamination. Nat Methods 15, 531–534 (2018). https://doi.org/10.1038/s41592-018-0036-9
Nature Genetics (2020)
Genomic analyses of flow-sorted Hodgkin Reed-Sternberg cells reveal complementary mechanisms of immune evasion
Blood Advances (2019)