High-intensity sequencing reveals the sources of plasma circulating cell-free DNA variants

Article metrics


Accurate identification of tumor-derived somatic variants in plasma circulating cell-free DNA (cfDNA) requires understanding of the various biological compartments contributing to the cfDNA pool. We sought to define the technical feasibility of a high-intensity sequencing assay of cfDNA and matched white blood cell DNA covering a large genomic region (508 genes; 2 megabases; >60,000× raw depth) in a prospective study of 124 patients with metastatic cancer, with contemporaneous matched tumor tissue biopsies, and 47 controls without cancer. The assay displayed high sensitivity and specificity, allowing for de novo detection of tumor-derived mutations and inference of tumor mutational burden, microsatellite instability, mutational signatures and sources of somatic mutations identified in cfDNA. The vast majority of cfDNA mutations (81.6% in controls and 53.2% in patients with cancer) had features consistent with clonal hematopoiesis. This cfDNA sequencing approach revealed that clonal hematopoiesis constitutes a pervasive biological phenomenon, emphasizing the importance of matched cfDNA–white blood cell sequencing for accurate variant interpretation.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Assay workflow and reproducibility.
Fig. 2: Concordance of cfDNA variants with tumor biopsy.
Fig. 3: TMB and mutational signatures derived from cfDNA targeted assay.
Fig. 4: Characterization of biological sources and composition of cfDNA variants.
Fig. 5: Characterization of WBC variants.

Data availability

The assembled prospective somatic mutational data from cfDNA, WBCs and tumors for the entire cohort are provided as Supplementary Tables 1113. The raw cfDNA and WBC sequencing data have been deposited in the European Genome-phenome Archive under accession number EGAS00001003755. All code and scripts are available for academic use at https://github.com/ndbrown6/MSK-GRAIL-TECHVAL.


  1. 1.

    Stroun, M., Anker, P., Lyautey, J., Lederrey, C. & Maurice, P. A. Isolation and characterization of DNA from the plasma of cancer patients. Eur. J. Cancer Clin. Oncol. 23, 707–712 (1987).

  2. 2.

    Leon, S. A., Shapiro, B., Sklaroff, D. M. & Yaros, M. J. Free DNA in the serum of cancer patients and the effect of therapy. Cancer Res. 37, 646–650 (1977).

  3. 3.

    Diaz, L. A.Jr & Bardelli, A. Liquid biopsies: genotyping circulating tumor DNA. J. Clin. Oncol. 32, 579–586 (2014).

  4. 4.

    Wan, J. C. M. et al. Liquid biopsies come of age: towards implementation of circulating tumour DNA. Nat. Rev. Cancer 17, 223–238 (2017).

  5. 5.

    Lanman, R. B. et al. Analytical and clinical validation of a digital sequencing panel for quantitative, highly accurate evaluation of cell-free circulating tumor DNA. PLoS ONE 10, e0140712 (2015).

  6. 6.

    Adalsteinsson, V. A. et al. Scalable whole-exome sequencing of cell-free DNA reveals high concordance with metastatic tumors. Nat. Commun. 8, 1324 (2017).

  7. 7.

    Aravanis, A. M., Lee, M. & Klausner, R. D. Next-generation sequencing of circulating tumor DNA for early cancer detection. Cell 168, 571–574 (2017).

  8. 8.

    Acuna-Hidalgo, R. et al. Ultra-sensitive sequencing identifies high prevalence of clonal hematopoiesis-associated mutations throughout adult life. Am. J. Hum. Genet. 101, 50–64 (2017).

  9. 9.

    Jamal-Hanjani, M. et al. Tracking the evolution of non-small-cell lung cancer. N. Engl. J. Med. 376, 2109–2121 (2017).

  10. 10.

    Jaiswal, S. et al. Age-related clonal hematopoiesis associated with adverse outcomes. N. Engl. J. Med. 371, 2488–2498 (2014).

  11. 11.

    Choi, M. et al. Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proc. Natl Acad. Sci. USA 106, 19096–19101 (2009).

  12. 12.

    Murtaza, M. et al. Non-invasive analysis of acquired resistance to cancer therapy by sequencing of plasma DNA. Nature 497, 108–112 (2013).

  13. 13.

    Rothwell, D. G. et al. Utility of ctDNA to support patient selection for early phase clinical trials: the TARGET study. Nat. Med. 25, 738–743 (2019).

  14. 14.

    Przybyl, J. et al. Combination approach for detecting different types of alterations in circulating tumor DNA in leiomyosarcoma. Clin. Cancer Res. 24, 2688–2699 (2018).

  15. 15.

    Parikh, A. R. et al. Liquid versus tissue biopsy for detecting acquired resistance and tumor heterogeneity in gastrointestinal cancers. Nat. Med. 25, 1415–1421 (2019).

  16. 16.

    Risques, R. A. & Kennedy, S. R. Aging and the rise of somatic cancer-associated mutations in normal tissues. PLoS Genet. 14, e1007108 (2018).

  17. 17.

    Steensma, D. P. et al. Clonal hematopoiesis of indeterminate potential and its distinction from myelodysplastic syndromes. Blood 126, 9–16 (2015).

  18. 18.

    Bowman, R. L., Busque, L. & Levine, R. L. Clonal hematopoiesis and evolution to hematopoietic malignancies. Cell Stem Cell 22, 157–170 (2018).

  19. 19.

    Busque, L., Buscarlet, M., Mollica, L. & Levine, R. L. Concise review: age-related clonal hematopoiesis: stem cells tempting the Devil. Stem Cells 36, 1287–1294 (2018).

  20. 20.

    Coombs, C. C. et al. Therapy-related clonal hematopoiesis in patients with non-hematologic cancers is common and associated with adverse clinical outcomes. Cell Stem Cell 21, 374–382.e4 (2017).

  21. 21.

    Zink, F. et al. Clonal hematopoiesis, with and without candidate driver mutations, is common in the elderly. Blood 130, 742–752 (2017).

  22. 22.

    Jaiswal, S. et al. Clonal hematopoiesis and risk of atherosclerotic cardiovascular disease. N. Engl. J. Med. 377, 111–121 (2017).

  23. 23.

    Xie, M. et al. Age-related mutations associated with clonal hematopoietic expansion and malignancies. Nat. Med. 20, 1472–1478 (2014).

  24. 24.

    Genovese, G. et al. Clonal hematopoiesis and blood-cancer risk inferred from blood DNA sequence. N. Engl. J. Med. 371, 2477–2487 (2014).

  25. 25.

    Phallen, J. et al. Direct detection of early-stage cancers using circulating tumor DNA. Sci. Transl. Med. 9, eaan2415 (2017).

  26. 26.

    Gillis, N. K. et al. Clonal haemopoiesis and therapy-related myeloid malignancies in elderly patients: a proof-of-concept, case-control study. Lancet Oncol. 18, 112–121 (2017).

  27. 27.

    Liu, J. et al. Biological background of the genomic variations of cf-DNA in healthy individuals. Ann. Oncol. 30, 464–470 (2018).

  28. 28.

    Hu, Y. et al. False-positive plasma genotyping due to clonal hematopoiesis. Clin. Cancer Res. 24, 4437–4443 (2018).

  29. 29.

    Janku, F. et al. Development and validation of an ultradeep next-generation sequencing assay for testing of plasma cell-free DNA from patients with advanced cancer. Clin. Cancer Res. 23, 5648–5656 (2017).

  30. 30.

    Thompson, J. C. et al. Detection of therapeutically targetable driver and resistance mutations in lung cancer patients by next-generation sequencing of cell-free circulating tumor DNA. Clin. Cancer Res. 22, 5772–5782 (2016).

  31. 31.

    Guibert, N. et al. Amplicon-based next-generation sequencing of plasma cell-free DNA for detection of driver and resistance mutations in advanced non-small cell lung cancer. Ann. Oncol. 29, 1049–1055 (2018).

  32. 32.

    Sacher, A. G. et al. Prospective validation of rapid plasma genotyping for the detection of EGFR and KRAS mutations in advanced lung cancer. JAMA Oncol. 2, 1014–1022 (2016).

  33. 33.

    Zehir, A. et al. Mutational landscape of metastatic cancer revealed from prospective clinical sequencing of 10,000 patients. Nat. Med. 23, 703–713 (2017).

  34. 34.

    Cheng, D. T. et al. Memorial Sloan Kettering-integrated mutation profiling of actionable cancer targets (MSK-IMPACT): a hybridization capture-based next-generation sequencing clinical assay for solid tumor molecular oncology. J. Mol. Diagn. 17, 251–264 (2015).

  35. 35.

    Newman, A. M. et al. Integrated digital error suppression for improved detection of circulating tumor DNA. Nat. Biotechnol. 34, 547–555 (2016).

  36. 36.

    Kinde, I., Wu, J., Papadopoulos, N., Kinzler, K. W. & Vogelstein, B. Detection and quantification of rare mutations with massively parallel sequencing. Proc. Natl Acad. Sci. USA 108, 9530–9535 (2011).

  37. 37.

    Razavi, P. et al. The genomic landscape of endocrine-resistant advanced breast cancers. Cancer Cell 34, 427–438 (2018). e426.

  38. 38.

    Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).

  39. 39.

    Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534, 47–54 (2016).

  40. 40.

    Niu, B. et al. MSIsensor: microsatellite instability detection using paired tumor-normal sequence data. Bioinformatics 30, 1015–1016 (2014).

  41. 41.

    Polak, P. et al. A mutational signature reveals alterations underlying deficient homologous recombination repair in breast cancer. Nat. Genet. 49, 1476–1486 (2017).

  42. 42.

    Gerhauser, C. et al. Molecular evolution of early-onset prostate cancer identifies molecular risk markers and clinical trajectories. Cancer Cell 34, 996–1011.e8 (2018).

  43. 43.

    De Bruin, E. C. et al. Spatial and temporal diversity in genomic instability processes defines lung cancer evolution. Science 346, 251–256 (2014).

  44. 44.

    Le, D. T. et al. PD-1 blockade in tumors with mismatch-repair deficiency. N. Engl. J. Med. 372, 2509–2520 (2015).

  45. 45.

    Merker, J. D. et al. Circulating tumor DNA analysis in patients with cancer: American Society of Clinical Oncology and College of American Pathologists joint review. Arch. Pathol. Lab. Med. 142, 1242–1253 (2018).

  46. 46.

    Cohen, J. D. et al. Detection and localization of surgically resectable cancers with a multi-analyte blood test. Science 359, 926–930 (2018).

  47. 47.

    Schultheis, A. M. et al. Massively parallel sequencing-based clonality analysis of synchronous endometrioid endometrial and ovarian carcinomas. J. Natl Cancer Inst. 108, djv427 (2016).

  48. 48.

    Hsu, J. I. et al. PPM1D mutations drive clonal hematopoiesis in response to cytotoxic chemotherapy. Cell Stem Cell 23, 700–713.e6 (2018).

  49. 49.

    Bettegowda, C. et al. Detection of circulating tumor DNA in early- and late-stage human malignancies. Sci. Transl. Med. 6, 224ra24 (2014).

  50. 50.

    Dawson, S. J. et al. Analysis of circulating tumor DNA to monitor metastatic breast cancer. N. Engl. J. Med. 368, 1199–1209 (2013).

  51. 51.

    Chabon, J. J. et al. Circulating tumour DNA profiling reveals heterogeneity of EGFR inhibitor resistance mechanisms in lung cancer patients. Nat. Commun. 7, 11815 (2016).

  52. 52.

    Young, A. L., Challen, G. A., Birmann, B. M. & Druley, T. E. Clonal haematopoiesis harbouring AML-associated mutations is ubiquitous in healthy adults. Nat. Commun. 7, 12484 (2016).

  53. 53.

    Swanton, C. et al. Prevalence of clonal hematopoiesis of indeterminate potential (CHIP) measured by an ultra-sensitive sequencing assay: exploratory analysis of the Circulating Cancer Genome Atlas (CCGA) study. J. Clin. Oncol. 36, 12003 (2018).

  54. 54.

    Mansukhani, S. et al. Ultra-sensitive mutation detection and genome-wide DNA copy number reconstruction by error-corrected circulating tumor DNA sequencing. Clin. Chem. 64, 1626–1635 (2018).

  55. 55.

    Shen, R. & Seshan, V. E. FACETS: allele-specific copy number and clonal heterogeneity analysis tool for high-throughput DNA sequencing. Nucleic Acids Res. 44, e131 (2016).

  56. 56.

    Carter, S. L. et al. Absolute quantification of somatic DNA alterations in human cancer. Nat. Biotechnol. 30, 413–421 (2012).

  57. 57.

    Ulmert, D. et al. A novel automated platform for quantifying the extent of skeletal tumour involvement in prostate cancer patients using the Bone Scan Index. Eur. Urol. 62, 78–84 (2012).

  58. 58.

    Armstrong, A. J. et al. Phase 3 assessment of the automated Bone Scan Index as a prognostic imaging biomarker of overall survival in men with metastatic castration-resistant prostate cancer: a secondary analysis of a randomized clinical trial. JAMA Oncol. 4, 944–951 (2018).

  59. 59.

    Rosenthal, R., McGranahan, N., Herrero, J., Taylor, B. S. & Swanton, C. deconstructSigs: delineating mutational processes in single tumors distinguishes DNA repair deficiencies and patterns of carcinoma evolution. Genome Biol. 17, 31 (2016).

  60. 60.

    Kandoth, C. et al. Mutational landscape and significance across 12 major cancer types. Nature 502, 333–339 (2013).

  61. 61.

    Chang, M. T. et al. Accelerating discovery of functional mutant alleles in cancer. Cancer Discov. 8, 174–183 (2018).

  62. 62.

    Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).

  63. 63.

    Tamborero, D., Gonzalez-Perez, A. & Lopez-Bigas, N. OncodriveCLUST: exploiting the positional clustering of somatic mutations to identify cancer genes. Bioinformatics 29, 2238–2244 (2013).

Download references


We thank the following GRAIL and Memorial Sloan Kettering Cancer Center associates for helpful discussions and contributions to this body of work: M. Berger, N. Schultz, C. Bain, M. Chung, M. Eriksen, T. Liu, R. Mauntz, A. Mich, J. Nguyen, Y. Park, S. Ramani, E. Scott, K. Shashidhar, C. Tom, S. Wen, D. Reales, J. Galle, R. Cambria and members of the Memorial Sloan Kettering Office of Clinical Research. This work was supported by GRAIL and National Institutes of Health awards P30 CA008748 (Memorial Sloan Kettering Cancer Center) and R01 CA234361 (D.B.S.), the Breast Cancer Alliance Young Investigator Award (P.R.), the Breast Cancer Research Foundation (J.S.R.-F.), and Congressionally Directed Medical Research Programs W81XWH-15-1-0547 (J.S.R.-F.) and GC229671 (J.S.R.-F.).

Author information

P.R., B.T.L., D.B.S., A.M.A. and J.S.R.-F. conceived the study. P.R., B.T.L., B.J., W.A., K.J., C.H., A.A., R.V.S., Q.L., L.S., N.E., J.Y., H.X., M.P.H., A.S.-Z., W.F.N., J.M.I., V.W.R., G.P., M. Ladanyi, A.S., A.S.H., M. Lee, D.M.H., D.R.J., M.M., G.J.R., H.I.S., C.M.R., M.E.R., L.A.D., D.B.S. and A.M.A. acquired the data. P.R., D.N.B., E.H., R.S., I.D.B., O.V., R.L., T.M., Q.L., A.W.B., A.M.A. and J.S.R.-F. analyzed and interpreted the data. P.R., D.N.B., E.H., R.S., I.D.B., O.V., S.G., A.W.B., A.M.A. and J.S.R.-F. performed the bioinformatics and genomic analyses. P.R., D.N.B., E.H., M.P.H., A.M.A. and J.S.R.-F. wrote the manuscript, with input from all authors. All authors reviewed and approved the manuscript.

Correspondence to Pedram Razavi or Jorge S. Reis-Filho.

Ethics declarations

Competing interests

P.R. reports consulting and serving on the advisory board for Novartis, as well as receiving institutional research support from Illumina and GRAIL. B.T.L. reports consulting and serving on the advisory board for Genentech, Thermo Fisher Scientific, Guardant Health, Hengrui Therapeutics, Mersana Therapeutics and Biosceptre Australia, as well as receiving institutional research support from Illumina, GRAIL, Genentech and AstraZeneca. W.A. reports consulting and advising for Clovis Oncology, Janssen, ORIC Phamaceuticals and MORE Health, as well as receiving honoraria from CARET, institutional research support from AstraZeneca, Zenith Epigenetics, Clovis Oncology and GlaxoSmithKline, and travel, accommodation and expenses from GlaxoSmithKline and Clovis Oncology. J.M.I. holds equity in LumaCyte and has received institutional research support from GRAIL and Guardant Health. G.P. is on the Scientific Advisory Board for Tizona Therapeutics and has consulted for Merck, Bristol-Myers Squibb and Kyowa Hakko Kirin. D.M.H. reports stock and other ownership interests in Fount, as well as consulting and advising for Chugai Pharmaceutical, Boehringer Ingelheim, AstraZeneca, Pfizer, Bayer, Genentech and Fount. He has also received research funding from AstraZeneca, Puma Biotechnology, Loxo and Bayer, and travel, accommodation and expenses from Genentech and Chugai Pharmaceutical. G.J.R. received consulting fees from Genentech/Roche in 2016, as well as institutional research support for clinical research from Pfizer, Roche/Genentech and Takeda. C.M.R. has consulted on oncology drug development for AbbVie, Amgen, Ascentage, AstraZeneca, Bicycle, Celgene, Chugai, Daiichi Sankyo, Genentech/Roche, G1 Therapeutics, Loxo, Novartis, Pharmamar and Seattle Genetics. He is also on the scientific advisory boards of Harpoon Therapeutics and Elucida Oncology. L.A.D. is a member of the board of directors of Personal Genome Diagnostics (PGDx) and Jounce Therapeutics, and is a paid consultant for PGDx and NeoPhore. He is also an uncompensated consultant for Merck, but has received research support for clinical trials from Merck. At Johns Hopkins University, he is an inventor of multiple licensed patents related to technology used for circulating tumor DNA analyses and mismatch repair deficiency for diagnosis and therapy. Some of these licenses and relationships are associated with equity or royalty payments made directly to L.A.D. and Johns Hopkins University. He also holds equity in PGDx, Jounce Therapeutics, Thrive Earlier Detection and NeoPhore. His spouse holds equity in Amgen. The terms of all of these arrangements are being managed by Johns Hopkins University and the Memorial Sloan Kettering Cancer Center in accordance with their conflict of interest policies. D.B.S. received honoraria and/or consulted for Pfizer, Loxo Oncology, Illumina, Intezyne and Vividion Therapeutics. J.S.R.-F. reports receiving personal/consultancy fees from VolitionRx, Paige.AI, Goldman Sachs, REPARE Therapeutics, GRAIL, Ventana Medical Systems, Roche, Genentech and InviCRO outside of the scope of the submitted work. B.J., E.H., C.H., O.V., T.M., S.G., R.V.S., Q.L., L.S., N.E., J.Y., A.W.B., M. Lee, A.S., H.X., M.P.H., W.F.N. and A.M.A. are (or were) GRAIL employees and hold stock and/or other ownership interests in GRAIL. A.W.B. additionally reports Foresite ownership interest. The other authors declare no competing interests.

Additional information

Peer review information Javier Carmona was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Study overview.

Patient enrollment, inclusion and evaluable group are defined in the blue boxes. Detailed clinical, tissue and cfDNA exclusions are shown in the gray boxes.

Extended Data Fig. 2 Comparison of sequence depth and raw error rate distributions across cancer cohorts (n=124) and non-cancer controls (n=47).

(a) Comparison of deduplicated and uncollapsed mean target sequence depth between cfDNA and WBC. The p values were obtained using paired two-sided Mann-Whitney U-tests comparing cfDNA against WBC. (b) Deduplicated and collapsed mean target sequence depth in cfDNA and WBC between the different cancer cohorts and non-cancer controls. (c) Association between the amount of cfDNA used for library preparation and the mean target deduplicated and collapsed sequencing depth. The diagonal line represents a linear regression with 99% confidence intervals. The p value was obtained using an F-test. (d) Distribution of mean target deduplicated and collapsed sequencing depth across the different cohorts. (e,f) Comparison of (e) raw substitution error rate and (f) raw substitution and indel error rate across the different cohorts. In (b) and (df), the p values were obtained from pairwise comparisons using two-sided Mann-Whitney U-tests and adjusted for multiple testing using the Bonferroni method. In (e), the substitution error rate represents the percentage of collapsed bases with non-reference base. Similarly, in (f) the combined error rate represents the percentage of collapsed bases with non-reference base or indels. In all panels, the cohorts consist of n = 39 MBC, n = 41 NSCLC and n = 44 CRPC patients and n = 47 non-cancer controls. In (a, b) and (df), the horizontal bars indicate the median and the boxes represent the interquartile range (IQR). The whiskers extend to 1.5 x IQR on either side. Source data

Extended Data Fig. 3 Hierarchical Bayesian model for calibrated analysis of somatic cfDNA variants and performance assessment.

(a,b) Plate models showing the hierarchy of statistical relationships for (a) single nucleotide variants and (b) small insertions and deletions influencing the observed quantity of alternate alleles ynp in each sample n at each position p conditional on both latent parameters μ (the rate of events), θ (the type of event), α, β as well as fixed covariates xp (of X types) such as trinucleotide context and, separately, depth of sequencing at a position (dp). Note that insertions and deletions have additional complexity as one must account for length of the insertion/deletion event in the model as insertions and deletions of differing lengths have differing probabilities. The model was fitted to the training data consisting of n=43 unrelated non-cancer controls, estimates for the parameters were fixed and applied to new samples for scoring. (c,d) The posterior distributions of site-specific λp (μp dp) were summarized by their mean μp and displayed for a subset of representative sites in (c) by type of mutation and (d) by trinucleotide context. In both panels, the midpoint indicates the mean and the vertical bars represent the 95% Gaussian confidence limits based on the t-distribution. (e) Estimated μp against the observed λp for samples in the training set. Note the data points at the bottom are all positions p with non-zero mean posterior μp and zero observed alternate allele counts. (f) Comparison of the estimated probability of observing an event (x-axis) with the actual empirical probability of observing such an event (y-axis). The plot was calibrated based on estimates of μp on chromosome 21. Note the initial sharp rise reflects the number of sites with zero observed alternate allele counts whilst the excess low probability events at the other end reflects the difficulty of stringently filtering out rare biological events such as clonal hematopoiesis. (g) Mean number of variants detected in healthy control individuals (x-axis) against the recall rate of biopsy-matched variants (y-axis) for the different cancer types. At Q60, one can expect one false positive per million bases. Here, to exclude potentially CH derived variants, a fixed threshold of 0.8 on the posterior probability of detected variants originating from cfDNA (i.e. PGTKXGDNA) was adopted. (h) Mean number of variants detected in healthy control individuals (x-axis) against the recall rate of biopsy-matched variants (y-axis) at different probabilities for allowing variants to be assigned to cfDNA. The thresholds displayed were obtained by cross-validation holding out each cancer type and selecting a threshold which retains most of the biopsy-matched variants whilst still filtering out variants of potential hematopoietic origin. Here, to exclude variants potentially due to noise, a fixed threshold of Q60 was adopted. Source data

Extended Data Fig. 4 Reproducibility of the high-intensity DNA assay.

Six patient samples were selected for processing using two versions of the assay protocol (V1 and V2). These are labelled Replicate 1 and Replicate 2. A subset of three samples were further retested using version V2 and labelled Replicate 3. The panels illustrate the pairwise comparisons of measured VAF between all available replicates for each patient. In all panels, the variants are shape coded based on their origin, whether they were also detected in the matched tumor biopsy and color coded according to their category, whether they were detected in both replicates and whether they were assigned to similar source categories (i.e. VUSO, WBC-matched or noise). In all panels, the samples are labelled on top. Source data

Extended Data Fig. 5 Top mutated genes carrying VUSO and 96 base substitution profiles of ten hypermutated cfDNA samples.

(a) Frequency of genomic alterations in cfDNA of 47 non-cancer controls and 124 cancer patients. The genes were sorted by their frequency of alterations in the tumor. The colors indicate whether the alterations were biopsy-matched, detected in the tumor but below the threshold of the MSK-IMPACT assay (biopsy-subthreshold), or were specific to cfDNA (i.e. variants of unknown source, VUSO). (b) Correlation of the number of VUSO per gene and per patient (y-axis) in the ten hypermutated and 114 non-hypermutated cancer patients against the length of the coding region sequenced (x-axis) of each target gene. (c-e) Heat maps showing the top mutated genes harboring somatic variants detected in plasma cfDNA that are neither tumor-matched (biopsy-matched or subthreshold) nor WBC-matched across each cohort in (c) 47 non-cancer controls, (d) 114 non-hypermutated and (e) 10 hypermutated cancer patients. The numbers in the cells indicate the number of patients. (f) 96 base substitution profiles of the 10 hypermutated patients. For each patient, the number of C>A, C>G, C>T, T>A, T>C and T>G substitutions together with the sequence context immediately 3′ and 5′ are expressed as a percentage of the total number of substitutions. Source data

Extended Data Fig. 6 Characterization of biological sources and composition of cfDNA variants.

(a) The bar plots show the number of somatic variants detected in plasma cfDNA per megabase (Mb, y-axis) for each sample (x-axis) stratified by cancer status and biological sources and ordered by increasing number of somatic WBC-matched variants. The panels show control samples (top left) and patients with MBC (top right), NSCLC (bottom left) and CRPC (bottom right). The colors indicate WBC-matched variants, tumor biopsy-matched variants, biopsy-subthreshold and VUSO. (b) Top mutated genes carrying WBC-matched variants for each cohort. The number in the cells indicate the overall number of variants for each gene in the corresponding cohort. In (a,b), the cohorts consist of n = 39 MBC, n = 41 NSCLC and n = 44 CRPC patients. Additionally, in (a) n = 47, non-cancer controls are shown. (c,d) Distribution of Variant Allele Fractions (VAFs) of somatic mutations detected in cfDNA and WBC using the high-intensity sequencing assay where variants are color coded according to source of origin. Somatic variants are displayed for n = 114 non-hypermutated cancer patients and n = 47 non-cancer controls. The allelic (AD) and total (DP) depths are obtained from raw pileups without base alignment quality filtering (BAQ). In (c), the VAF is smoothed with added pseudocounts to AD and DP such that \(AD^\prime = AD + 2\) and \(DP^\prime = DP + 4\). In (d), variants detected with zero AD in WBC were displayed as 0.01% VAF in WBC due to the logarithmic scaled axes. Source data

Extended Data Fig. 7 Somatic mutations occurring at high sequencing depth in cfDNA.

Somatic mutations detected at sequencing depth >10,000 in cfDNA occur mostly in hypermutated samples and are related to sample level mean target collapsed depth which is itself a function of the amount of input DNA used for library preparation. (a) Number of somatic mutations occurring at >10,000 sequence depth (n=215) per patient and categorized into WBC-matched, VUSO or Tumor-matched where the latter category is composed of Biopsy-matched and Biopsy-subthreshold mutations. (b) Variant level collapsed depth for all somatic mutations detected in cfDNA categorized into Tumor-matched, VUSO or WBC-matched and grouped according to the amount of input DNA used for library preparation. (c) Variant level collapsed depth for all somatic mutations detected in cfDNA against sample level mean collapsed target depth. (d) variant level collapsed depth for all somatic mutations against modeled VAF in cfDNA. 121 of 215 (56.3%) somatic mutations detected at sequencing depth >10,000 in cfDNA occurred in the hypermutated patient MSK-VB-0023. (e, f) Log2 ratios of (e) tumor biopsy and (f) cfDNA of patient MSK-VB-0023. The tumor biopsy and cfDNA showed similar copy number alterations (i.e. 1q+ and 16q-). No high-level copy number amplifications were observed in either the tumor biopsy or the cfDNA which could explain the high sequencing depth. Three replicate sequencing of cfDNA and WBC were available for that patient. (g) and (h) Pairwise comparisons of VAF for the 121 mutations detected at depth >10,000 using version V1 of the assay. In (a), ‘1’ denotes hypermutated samples. In (b), the midpoint indicates the median whilst the violins extent to the full range of the data. In (bd), the sequencing depths of somatic variants for the cohort of n=124 cancer patients are shown. In (e) and (f), the grey points represent the raw Log2 ratios and are ordered according to their genomic coordinates. The solid red lines indicate the segmented values. In (g) and (h), the variants are shape coded based on their origin (i.e. whether they were also detected in the matched tumor biopsy and color coded according to their category; whether they were called in both replicates and assigned to similar source categories, namely VUSO, WBC-matched or noise). Source data

Extended Data Fig. 8 Characterization of CH derived variants through direct analysis of WBC.

(a) CH-related somatic mutations in the top 14 mutated genes across the 114 non-hypermutated cancer patients and 47 non-cancer controls together with the marginal frequencies by patient (top) and by gene (right). DNMT3A, TET2 and PPM1D are the top mutated genes in WBC and harbor multiple hits (i.e. two or more mutations per patient). (b) Clustering within genes of CH-derived mutations detected in WBC. The clusters and associated p values were computed using a modification of OncodriveCLUST63 which assumes the number of mutations in clusters follows a Poisson distribution. The resulting p values are two-sided. (c, d) Distribution of mutations in PPM1D (c) according to genomic coordinates and for DNMT3A (d). Mutations detected in PPM1D are clustered in the C-terminus of the protein. Source data

Extended Data Fig. 9 Copy number profile derived from cfDNA of non-cancer controls and cancer patients.

(a, b) Log2 ratios estimated from the cfDNA of (a) n = 24 female and (b) n = 23 male control individuals. For each individual, the raw Log2 ratios were smoothed using a cubic spline. The two panels show the superimposed splines for all control samples according to gender. (c–e) Log2 ratios of tumor biopsies (top panels) and their corresponding matched cfDNA (bottom panel) for three cases (c) MSK-VB-0008, (d) MSK-VL-0056 and (e) MSK-VP-0004 where amplification of CCND1, FGFR1, EGFR and a homozygous deletion of BRCA2 were reported, respectively. The arrows point to the reported amplifications or deletions. The segmented Log2 ratios were used to compute the Pearson correlation coefficient comparing segments overlapping >75% in the tumor biopsies and cfDNA samples. In (ae), the Log2 ratios are displayed according to their genomic coordinates. In (c-e), the grey dots show the raw estimates while the red lines represent the segmented values. (f) The association of the Pearson’s r against the ctDNA fraction and purity of the tumor biopsies. The cohort consists of n = 124 cancer patients with paired tumor biopsy and cfDNA samples. The p values were obtained using a permutation based one-sided Jonckheere-Terpstra test for increasing Pearson’s r with ctDNA fraction or tumor purity. The horizontal bars indicate the median and the boxes represent the interquartile range (IQR). The whiskers extend to 1.5 x IQR on either side. NE; not evaluable. Source data

Extended Data Fig. 10 Comparison of copy number alterations in tumor biopsies and matched cfDNA samples.

(a) Heatmap of all genes where an amplification or a homozygous deletion was found in either the tumor biopsy or cfDNA. The samples are interleaved (i.e. tumor biopsy and cfDNA) and represented along the rows, whilst genes are ordered in columns relative to their genomic coordinates. (b, c) Receiver operating characteristic curves comparing (b) copy number amplifications and (c) homozygous deletions detected in the tumor biopsies with the absolute copy numbers inferred in cfDNA. Each tumor-cfDNA sample pair was used to construct individual curves. These were averaged after fitting a local polynomial regression and estimating the sensitivities over fixed intervals of specificities. In (ac), only tumor-cfDNA sample pairs from n = 49 patients with ctDNA fraction >10% were used. (d) Four MBC patients: MSK-VB-0006, MSK-VB-0044, MSK-VB-0059 and MSK-VB-0069 with a reported amplification of ERBB2 on chromosome 17q are shown together with one NSCLC patient, MSK-VL-0044 with a reported MET amplification on chromosome 7q. The tumor biopsies are displayed on the left and the matched cfDNA are shown on the right together with the corresponding chromosome ideogram. The genomic coordinates of ERBB2 and MET are displayed by orange arrows and labelled accordingly. Source data

Supplementary information

Supplementary Information

Supplementary Methods, Fig. 1 and Tables 1, 2, 4, 5 and 7–10

Reporting Summary

Supplementary Tables

Supplementary Tables 3, 6, 11, 12 and 13

Supplementary Data

Source Data Supplementary Fig. 1

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Razavi, P., Li, B.T., Brown, D.N. et al. High-intensity sequencing reveals the sources of plasma circulating cell-free DNA variants. Nat Med 25, 1928–1937 (2019) doi:10.1038/s41591-019-0652-7

Download citation

Further reading