Abstract
Comparison of sequencing data from a tumor sample with data from a matched germline control is a key step for accurate detection of somatic mutations. Detection sensitivity for somatic variants is greatly reduced when the matched normal sample is contaminated with tumor cells. To overcome this limitation, we developed deTiN, a method that estimates the tumor-in-normal (TiN) contamination level and, in cases affected by contamination, improves sensitivity by reclassifying initially discarded variants as somatic.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Cibulskis, K. et al. Nat. Biotechnol. 31, 213–219 (2013).
Saunders, C. T. et al. Bioinformatics 28, 1811–1817 (2012).
Koboldt, D. C. et al. Genome Res. 22, 568–576 (2012).
Stieglitz, E. et al. Nat. Genet. 47, 1326–1333 (2015).
Wei, L. et al. BMC Med. Genomics 9, 64 (2016).
The Cancer Genome Atlas Research Network. N. Engl. J. Med. 368, 2059–2074 (2013).
Taylor-Weiner, A. et al. Nature 540, 114–118 (2016).
Welch, J. S. et al. Cell 150, 264–278 (2012).
Landau, D. A. et al. Nature 526, 525–530 (2015).
Deng, G., Lu, Y., Zlotnikov, G., Thor, A. D. & Smith, H. S. Science 274, 2057–2059 (1996).
Försti, A., Louhelainen, J., Söderberg, M., Wijkström, H. & Hemminki, K. Eur. J. Cancer 37, 1372–1380 (2001).
Leung, W. K. et al. Cancer 91, 2294–2301 (2001).
Braakhuis, B. J. M., Tabor, M. P., Kummer, J. A., Leemans, C. R. & Brakenhoff, R. H. Cancer Res. 63, 1727–1730 (2003).
Forbes, S. A. et al. Nucleic Acids Res. 43, D805–D811 (2015).
Rheinbay, E. et al. Nature 547, 55–60 (2017).
Van Allen, E. M. et al. Nat. Med. 20, 682–688 (2014).
Giannakis, M. et al. Cell Rep. 15, 857–865 (2016).
Kanda, M. et al. Gastroenterology 142, 730–733 (2012).
Bettegowda, C. et al. Sci. Transl. Med. 6, 224ra24 (2014).
Schwarzenbach, H., Hoon, D. S. B. & Pantel, K. Nat. Rev. Cancer 11, 426–437 (2011).
Carter, S. L. et al. Nat. Biotechnol. 30, 413–421 (2012).
Lek, M. et al. Nature 536, 285–291 (2016).
Li, H. et al. Bioinformatics 25, 2078–2079 (2009).
Cibulskis, K. et al. Bioinformatics 27, 2601–2602 (2011).
Ramos, A. H. et al. Hum. Mutat. 36, E2423–E2429 (2015).
Acknowledgements
This work was funded in part by the NIH TCGA Genome Data Analysis Center (U24CA143845 to G.G.), the Paul C. Zamecnick, MD, Chair in Oncology at MGH Cancer Center (G.G.), the National Human Genome Research Institute, NIH (T32 HG002295 to A.T.-W.), and the NIH (NCI P01CA206978-01, R01CA182461-01, U10CA180861-01, and R01CA184922-02 to A.T.-W., C.S., and C.J.W.). C.J.W. is a Scholar of the Leukemia and Lymphoma Society.
Author information
Authors and Affiliations
Contributions
A.T.-W., C.S., and G.G. outlined and planned development. A.T.-W. and C.S. developed the method. A.T.-W. and M.R. performed genomic analysis of large data cohorts. A.T.-W., A.M., and N.L. performed and analyzed in vitro simulations. A.T.-W. and C.S. performed and analyzed in silico simulations. E.R., D.-A.L., and C.J.W. enabled sample acquisition for data analysis. T.G. provided histopathology review of TCGA healthy tumor-adjacent tissue samples. A.T.-W., M.M., C.S., C.J.W., and G.G. prepared the manuscript and figures.
Corresponding author
Ethics declarations
Competing interests
C.J.W. is a cofounder of Neon Therapeutics and a member of its scientific advisory board.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Integrated supplementary information
Supplementary Figure 1 Schematic of mutation calling using a pure normal and a tumor-contaminated normal with deTiN.
(a) Sequencing data from normal samples that are contaminated with tumor cells contain proportional evidence of alternate reads at sites of somatic mutation. Non-contaminated (pure) normals do not contain such evidence of somatic mutations. Somatic mutations with alternate reads in the normal are often rejected as potential germline or artifactual events. (b) deTiN estimates TiN by comparing allele-fractions of mutations and allelic imbalance caused by copy-number events in the tumor and normal samples. deTiN then uses a Bayesian approach to recover somatic single nucleotide variations (SSNVs) and short insertions and deletions (indels) that were rejected by mutation callers (Methods). Orange points represent germline sites (unaffected by TiN). Grey points represent tumor data. Black points show variants affected by TiN.
Supplementary Figure 2 Results from application of deTiN to VarScan and Strelka SSNVs.
(a–c) Mutation detection sensitivity with deTiN (red) and without deTiN (blue) (n = 1 for each TiN fraction). Error bars indicate 95% beta confidence intervals. (a) Mutation detection sensitivity using VarScan. (b) Mutation detection sensitivity using Strelka. The solid black line indicates the fraction of variants included in the output VCF by Strelka; the dotted line is the ratio between the number of variants called with deTiN and the number of candidate variants emitted by Strelka. (c) Mutation detection sensitivity of MuTect using sites identified by Strelka (using an uncontaminated normal sample).
Supplementary Figure 3 Sensitivity analysis for simulations.
Sensitivity results for high AF (>30%) SSNVs (a,b). (a) Weighted average sensitivity for high allele fraction SSNVs from in silico simulations with and without deTiN (n = 5 independent simulation experiments for each TiN level). Error bars are the standard error. (b) Sensitivity from in vitro experiments using deTiN and without deTiN. (n = 1 sequencing experiment for each TiN level). Error bars indicate 95% confidence interval (beta distribution). (c, d) Bar plot indicating the number of mutations called by MuTect without deTiN (dark blue), recovered by deTiN (teal), and false positives (yellow). (c) Results from the in silico experiments (n = 5) (d) shows results from the in vitro experiment (n = 1). (e) Receiver operating characteristic curves for deTiN's somatic classification at four TiN levels generated using in vitro mixed samples (n = 1 for each TiN level). “FPS” indicates the number of false positives detected at the default threshold. (f) Area under receiver operating characteristic curves (AUC) (y-axis) for all in vitro mixed TiN values (x-axis; one sample per TiN level). Error bars indicate 95% confidence intervals on AUC (n=100 independent bootstrap iterations).
Supplementary Figure 4 Results from deTiN downsampling experiments.
(a) SSNV-based TiN estimates with true TiN = 0.1. (b) Size of confidence interval on TiN estimate. Dotted blue line indicates power-law regression on median points. (a, b) results using down sampling of true somatic variants (n = 100 iterations per number of variants). Median is shown in orange and error bars indicate 95% range. Dotted black line indicates estimate using all variants in panel a. (c) aSCNA-based TiN estimate for various coverage levels. Dotted line indicates true TiN of 0.015. Points indicate aSCNA TiN estimate, and bar indicates 95% confidence interval. Note that we removed deTiN coverage filtering to generate this figure (n = 1 per coverage level). (d) aSCNA-based TiN estimate from one segment with various amounts of SNPs. Points indicate mean TiN estimate from down-sampling iterations (n = 100 iterations per number of SNPs). Errors bars reflects 1 standard deviation. (e) DeTiN's TiN estimates for in vitro mixed samples with different tumor purities (n = 1 per purity level). Blue points indicate TiN estimates. Orange points indicate the expected TiN estimate based on the TiN obtained when using the pure tumor (taken from the in vitro experiment). Error bars represent 95% confidence intervals.
Supplementary Figure 5 DeTiN output from aSCNA model illustrating the presence of a chromosome 2 deletion in a pure normal.
(a) Raw germline SNP data supporting aSCNA TiN model (n = 1 patient sample). SNPs are in grey. SNPs in regions with an aSCNA are colored in. Red points reflect the allele-fraction of SNPs in the normal, and blue points reflect the allele-fraction of SNPs in the tumor. (b) Posterior distribution over TiN levels based on aSCNAs (n = 1 patient sample). Unclustered data are in dashed grey, colored lines represent posterior of clustered aSCNA segments. The pink cluster is based only on segments in chromosome 2.
Supplementary Figure 6 DeTiN output from SSNV models of CD19– selected blood and saliva normals.
(a,b) Grey points denote allele fraction of variants considered for inclusion in the model. Blue points denote allele fraction of SSNVs detected by MuTect without deTiN. Red points show allele fraction of SSNVs recovered by deTiN. Error bars on red points indicate 95% beta confidence interval on the allele-fraction in the normal sample. The slope of the black dashed line indicates deTiN's TiN estimate. (a) SSNV model for CLL and a matched saliva sample pair (n = 1 patient sample); TiN=0. (b) SSNV model for CLL–matched CD19– blood sample pair (n = 1 patient sample); TiN=0.31.
Supplementary Figure 7 Heat map and bar plot illustrating recovery of SSNVs in the CLL cohort.
Samples are in columns, genes in rows. Blue boxes indicate variants detected prior to deTiN (“without deTiN”); red boxes indicate additional variants recovered by deTiN (“with deTiN”).
Supplementary Figure 8 DeTiN’s TiN estimates for all normal tissue data by tumor type.
Each point indicates the TiN estimate of each sample (n = 1477 patient samples).
Supplementary Figure 9 Allelic copy-number data supporting TiN solutions in pathologist-reviewed samples.
(a–e) SNPs are grey circles (n = 1 patient sample per panel) germline SNPs in regions with allelic imbalance are colored in. Red points reflect the allele-fraction of the SNPs in the normal, and blue points are the allele-fraction of the SNPs in the tumor.
Supplementary Figure 10 Copy and allele data demonstrating convergent evolution in chr1q.
(a) Copy data for the adjacent normal tissue and breast invasive carcinoma (n = 1 patient sample). Coverage-based copy number ratio data is shown. (b) SNPs on chr1q are shown; yellow points indicate SNPs on the tumor amplified allele (AF>0.5); purple points indicate SNPs on the allele amplified in the normal.
Supplementary Figure 11
Graphical representations of deTiN variables and parameters.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–11, Supplementary Results, and Supplementary Note
Supplementary Table 1
DeTiN estimates and SSNV recovery data from in silico simulations
Supplementary Table 2
DeTiN estimates and SSNV recovery data from in vitro simulations
Supplementary Table 3
SSNV data and TiN estimates derived from mutation calling with CD19– selected normal samples
Supplementary Table 4
DeTiN's TiN estimates for 1,419 cases with tumor-adjacent-tissue normal samples derived from TCGA and SIGMA sample cohorts
Supplementary Table 5
DeTiN annotated allelic copy data supporting multiple TiN contamination rates in adjacent tissue normals
Supplementary Table 6
List of command-line arguments for deTiN
Supplementary Software
DeTiN source code
Rights and permissions
About this article
Cite this article
Taylor-Weiner, A., Stewart, C., Giordano, T. et al. DeTiN: overcoming tumor-in-normal contamination. Nat Methods 15, 531–534 (2018). https://doi.org/10.1038/s41592-018-0036-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41592-018-0036-9
This article is cited by
-
Computational validation of clonal and subclonal copy number alterations from bulk tumor sequencing using CNAqc
Genome Biology (2024)
-
Clinical application of tumour-in-normal contamination assessment from whole genome sequencing
Nature Communications (2024)
-
The role of APOBEC3B in lung tumor evolution and targeted cancer therapy resistance
Nature Genetics (2024)
-
Inferring early genetic progression in cancers with unobtainable premalignant disease
Nature Cancer (2023)
-
Genomic and transcriptomic analysis of checkpoint blockade response in advanced non-small cell lung cancer
Nature Genetics (2023)