Abstract

Comparison of sequencing data from a tumor sample with data from a matched germline control is a key step for accurate detection of somatic mutations. Detection sensitivity for somatic variants is greatly reduced when the matched normal sample is contaminated with tumor cells. To overcome this limitation, we developed deTiN, a method that estimates the tumor-in-normal (TiN) contamination level and, in cases affected by contamination, improves sensitivity by reclassifying initially discarded variants as somatic.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. 1.

    Cibulskis, K. et al. Nat. Biotechnol. 31, 213–219 (2013).

  2. 2.

    Saunders, C. T. et al. Bioinformatics 28, 1811–1817 (2012).

  3. 3.

    Koboldt, D. C. et al. Genome Res. 22, 568–576 (2012).

  4. 4.

    Stieglitz, E. et al. Nat. Genet. 47, 1326–1333 (2015).

  5. 5.

    Wei, L. et al. BMC Med. Genomics 9, 64 (2016).

  6. 6.

    The Cancer Genome Atlas Research Network. N. Engl. J. Med. 368, 2059–2074 (2013).

  7. 7.

    Taylor-Weiner, A. et al. Nature 540, 114–118 (2016).

  8. 8.

    Welch, J. S. et al. Cell 150, 264–278 (2012).

  9. 9.

    Landau, D. A. et al. Nature 526, 525–530 (2015).

  10. 10.

    Deng, G., Lu, Y., Zlotnikov, G., Thor, A. D. & Smith, H. S. Science 274, 2057–2059 (1996).

  11. 11.

    Försti, A., Louhelainen, J., Söderberg, M., Wijkström, H. & Hemminki, K. Eur. J. Cancer 37, 1372–1380 (2001).

  12. 12.

    Leung, W. K. et al. Cancer 91, 2294–2301 (2001).

  13. 13.

    Braakhuis, B. J. M., Tabor, M. P., Kummer, J. A., Leemans, C. R. & Brakenhoff, R. H. Cancer Res. 63, 1727–1730 (2003).

  14. 14.

    Forbes, S. A. et al. Nucleic Acids Res. 43, D805–D811 (2015).

  15. 15.

    Rheinbay, E. et al. Nature 547, 55–60 (2017).

  16. 16.

    Van Allen, E. M. et al. Nat. Med. 20, 682–688 (2014).

  17. 17.

    Giannakis, M. et al. Cell Rep. 15, 857–865 (2016).

  18. 18.

    Kanda, M. et al. Gastroenterology 142, 730–733 (2012).

  19. 19.

    Bettegowda, C. et al. Sci. Transl. Med. 6, 224ra24 (2014).

  20. 20.

    Schwarzenbach, H., Hoon, D. S. B. & Pantel, K. Nat. Rev. Cancer 11, 426–437 (2011).

  21. 21.

    Carter, S. L. et al. Nat. Biotechnol. 30, 413–421 (2012).

  22. 22.

    Lek, M. et al. Nature 536, 285–291 (2016).

  23. 23.

    Li, H. et al. Bioinformatics 25, 2078–2079 (2009).

  24. 24.

    Cibulskis, K. et al. Bioinformatics 27, 2601–2602 (2011).

  25. 25.

    Ramos, A. H. et al. Hum. Mutat. 36, E2423–E2429 (2015).

Download references

Acknowledgements

This work was funded in part by the NIH TCGA Genome Data Analysis Center (U24CA143845 to G.G.), the Paul C. Zamecnick, MD, Chair in Oncology at MGH Cancer Center (G.G.), the National Human Genome Research Institute, NIH (T32 HG002295 to A.T.-W.), and the NIH (NCI P01CA206978-01, R01CA182461-01, U10CA180861-01, and R01CA184922-02 to A.T.-W., C.S., and C.J.W.). C.J.W. is a Scholar of the Leukemia and Lymphoma Society.

Author information

Author notes

  1. These authors contributed equally: Amaro Taylor-Weiner and Chip Stewart.

Affiliations

  1. Broad Institute of Harvard and MIT, Cambridge, MA, USA

    • Amaro Taylor-Weiner
    • , Chip Stewart
    • , Mendy Miller
    • , Mara Rosenberg
    • , Alyssa Macbeth
    • , Niall Lennon
    • , Esther Rheinbay
    • , Dan-Avi Landau
    • , Catherine J. Wu
    •  & Gad Getz
  2. Harvard University, Cambridge, MA, USA

    • Amaro Taylor-Weiner
  3. Department of Pathology, University of Michigan, Ann Arbor, MI, USA

    • Thomas Giordano
  4. Department of Medicine, Weill Cornell Medicine, New York, NY, USA

    • Dan-Avi Landau
  5. Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA

    • Dan-Avi Landau
  6. New York Genome Center, New York, NY, USA

    • Dan-Avi Landau
  7. Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA

    • Catherine J. Wu
  8. Department of Internal Medicine, Brigham and Women’s Hospital, Boston, MA, USA

    • Catherine J. Wu
  9. Department of Medicine, Harvard Medical School, Boston, MA, USA

    • Catherine J. Wu
  10. Department of Pathology, Harvard Medical School, Boston, MA, USA

    • Gad Getz
  11. Cancer Center, Massachusetts General Hospital, Boston, MA, USA

    • Gad Getz
  12. Department of Pathology, Massachusetts General Hospital, Boston, MA, USA

    • Gad Getz

Authors

  1. Search for Amaro Taylor-Weiner in:

  2. Search for Chip Stewart in:

  3. Search for Thomas Giordano in:

  4. Search for Mendy Miller in:

  5. Search for Mara Rosenberg in:

  6. Search for Alyssa Macbeth in:

  7. Search for Niall Lennon in:

  8. Search for Esther Rheinbay in:

  9. Search for Dan-Avi Landau in:

  10. Search for Catherine J. Wu in:

  11. Search for Gad Getz in:

Contributions

A.T.-W., C.S., and G.G. outlined and planned development. A.T.-W. and C.S. developed the method. A.T.-W. and M.R. performed genomic analysis of large data cohorts. A.T.-W., A.M., and N.L. performed and analyzed in vitro simulations. A.T.-W. and C.S. performed and analyzed in silico simulations. E.R., D.-A.L., and C.J.W. enabled sample acquisition for data analysis. T.G. provided histopathology review of TCGA healthy tumor-adjacent tissue samples. A.T.-W., M.M., C.S., C.J.W., and G.G. prepared the manuscript and figures.

Competing interests

C.J.W. is a cofounder of Neon Therapeutics and a member of its scientific advisory board.

Corresponding author

Correspondence to Gad Getz.

Integrated supplementary information

  1. Supplementary Figure 1 Schematic of mutation calling using a pure normal and a tumor-contaminated normal with deTiN.

    (a) Sequencing data from normal samples that are contaminated with tumor cells contain proportional evidence of alternate reads at sites of somatic mutation. Non-contaminated (pure) normals do not contain such evidence of somatic mutations. Somatic mutations with alternate reads in the normal are often rejected as potential germline or artifactual events. (b) deTiN estimates TiN by comparing allele-fractions of mutations and allelic imbalance caused by copy-number events in the tumor and normal samples. deTiN then uses a Bayesian approach to recover somatic single nucleotide variations (SSNVs) and short insertions and deletions (indels) that were rejected by mutation callers (Methods). Orange points represent germline sites (unaffected by TiN). Grey points represent tumor data. Black points show variants affected by TiN.

  2. Supplementary Figure 2 Results from application of deTiN to VarScan and Strelka SSNVs.

    (a–c) Mutation detection sensitivity with deTiN (red) and without deTiN (blue) (n = 1 for each TiN fraction). Error bars indicate 95% beta confidence intervals. (a) Mutation detection sensitivity using VarScan. (b) Mutation detection sensitivity using Strelka. The solid black line indicates the fraction of variants included in the output VCF by Strelka; the dotted line is the ratio between the number of variants called with deTiN and the number of candidate variants emitted by Strelka. (c) Mutation detection sensitivity of MuTect using sites identified by Strelka (using an uncontaminated normal sample).

  3. Supplementary Figure 3 Sensitivity analysis for simulations.

    Sensitivity results for high AF (>30%) SSNVs (a,b). (a) Weighted average sensitivity for high allele fraction SSNVs from in silico simulations with and without deTiN (n = 5 independent simulation experiments for each TiN level). Error bars are the standard error. (b) Sensitivity from in vitro experiments using deTiN and without deTiN. (n = 1 sequencing experiment for each TiN level). Error bars indicate 95% confidence interval (beta distribution). (c, d) Bar plot indicating the number of mutations called by MuTect without deTiN (dark blue), recovered by deTiN (teal), and false positives (yellow). (c) Results from the in silico experiments (n = 5) (d) shows results from the in vitro experiment (n = 1). (e) Receiver operating characteristic curves for deTiN's somatic classification at four TiN levels generated using in vitro mixed samples (n = 1 for each TiN level). “FPS” indicates the number of false positives detected at the default threshold. (f) Area under receiver operating characteristic curves (AUC) (y-axis) for all in vitro mixed TiN values (x-axis; one sample per TiN level). Error bars indicate 95% confidence intervals on AUC (n=100 independent bootstrap iterations).

  4. Supplementary Figure 4 Results from deTiN downsampling experiments.

    (a) SSNV-based TiN estimates with true TiN = 0.1. (b) Size of confidence interval on TiN estimate. Dotted blue line indicates power-law regression on median points. (a, b) results using down sampling of true somatic variants (n = 100 iterations per number of variants). Median is shown in orange and error bars indicate 95% range. Dotted black line indicates estimate using all variants in panel a. (c) aSCNA-based TiN estimate for various coverage levels. Dotted line indicates true TiN of 0.015. Points indicate aSCNA TiN estimate, and bar indicates 95% confidence interval. Note that we removed deTiN coverage filtering to generate this figure (n = 1 per coverage level). (d) aSCNA-based TiN estimate from one segment with various amounts of SNPs. Points indicate mean TiN estimate from down-sampling iterations (n = 100 iterations per number of SNPs). Errors bars reflects 1 standard deviation. (e) DeTiN's TiN estimates for in vitro mixed samples with different tumor purities (n = 1 per purity level). Blue points indicate TiN estimates. Orange points indicate the expected TiN estimate based on the TiN obtained when using the pure tumor (taken from the in vitro experiment). Error bars represent 95% confidence intervals.

  5. Supplementary Figure 5 DeTiN output from aSCNA model illustrating the presence of a chromosome 2 deletion in a pure normal.

    (a) Raw germline SNP data supporting aSCNA TiN model (n = 1 patient sample). SNPs are in grey. SNPs in regions with an aSCNA are colored in. Red points reflect the allele-fraction of SNPs in the normal, and blue points reflect the allele-fraction of SNPs in the tumor. (b) Posterior distribution over TiN levels based on aSCNAs (n = 1 patient sample). Unclustered data are in dashed grey, colored lines represent posterior of clustered aSCNA segments. The pink cluster is based only on segments in chromosome 2.

  6. Supplementary Figure 6 DeTiN output from SSNV models of CD19 selected blood and saliva normals.

    (a,b) Grey points denote allele fraction of variants considered for inclusion in the model. Blue points denote allele fraction of SSNVs detected by MuTect without deTiN. Red points show allele fraction of SSNVs recovered by deTiN. Error bars on red points indicate 95% beta confidence interval on the allele-fraction in the normal sample. The slope of the black dashed line indicates deTiN's TiN estimate. (a) SSNV model for CLL and a matched saliva sample pair (n = 1 patient sample); TiN=0. (b) SSNV model for CLL–matched CD19 blood sample pair (n = 1 patient sample); TiN=0.31.

  7. Supplementary Figure 7 Heat map and bar plot illustrating recovery of SSNVs in the CLL cohort.

    Samples are in columns, genes in rows. Blue boxes indicate variants detected prior to deTiN (“without deTiN”); red boxes indicate additional variants recovered by deTiN (“with deTiN”).

  8. Supplementary Figure 8 DeTiN’s TiN estimates for all normal tissue data by tumor type.

    Each point indicates the TiN estimate of each sample (n = 1477 patient samples).

  9. Supplementary Figure 9 Allelic copy-number data supporting TiN solutions in pathologist-reviewed samples.

    (a–e) SNPs are grey circles (n = 1 patient sample per panel) germline SNPs in regions with allelic imbalance are colored in. Red points reflect the allele-fraction of the SNPs in the normal, and blue points are the allele-fraction of the SNPs in the tumor.

  10. Supplementary Figure 10 Copy and allele data demonstrating convergent evolution in chr1q.

    (a) Copy data for the adjacent normal tissue and breast invasive carcinoma (n = 1 patient sample). Coverage-based copy number ratio data is shown. (b) SNPs on chr1q are shown; yellow points indicate SNPs on the tumor amplified allele (AF>0.5); purple points indicate SNPs on the allele amplified in the normal.

  11. Supplementary Figure 11

    Graphical representations of deTiN variables and parameters.

Supplementary information

  1. Supplementary Text and Figures

    Supplementary Figures 1–11, Supplementary Results, and Supplementary Note

  2. Reporting Summary

  3. Supplementary Table 1

    DeTiN estimates and SSNV recovery data from in silico simulations

  4. Supplementary Table 2

    DeTiN estimates and SSNV recovery data from in vitro simulations

  5. Supplementary Table 3

    SSNV data and TiN estimates derived from mutation calling with CD19 selected normal samples

  6. Supplementary Table 4

    DeTiN's TiN estimates for 1,419 cases with tumor-adjacent-tissue normal samples derived from TCGA and SIGMA sample cohorts

  7. Supplementary Table 5

    DeTiN annotated allelic copy data supporting multiple TiN contamination rates in adjacent tissue normals

  8. Supplementary Table 6

    List of command-line arguments for deTiN

  9. Supplementary Software

    DeTiN source code

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/s41592-018-0036-9

Further reading

  • Genomic correlates of response to immune checkpoint blockade in microsatellite-stable solid tumors

    • Diana Miao
    • , Claire A. Margolis
    • , Natalie I. Vokes
    • , David Liu
    • , Amaro Taylor-Weiner
    • , Stephanie M. Wankowicz
    • , Dennis Adeegbe
    • , Daniel Keliher
    • , Bastian Schilling
    • , Adam Tracy
    • , Michael Manos
    • , Nicole G. Chau
    • , Glenn J. Hanna
    • , Paz Polak
    • , Scott J. Rodig
    • , Sabina Signoretti
    • , Lynette M. Sholl
    • , Jeffrey A. Engelman
    • , Gad Getz
    • , Pasi A. Jänne
    • , Robert I. Haddad
    • , Toni K. Choueiri
    • , David A. Barbie
    • , Rizwan Haq
    • , Mark M. Awad
    • , Dirk Schadendorf
    • , F. Stephen Hodi
    • , Joaquim Bellmunt
    • , Kwok-Kin Wong
    • , Peter Hammerman
    •  & Eliezer M. Van Allen

    Nature Genetics (2018)