Genome-wide cell-free DNA mutational integration enables ultra-sensitive cancer monitoring

Abstract

In many areas of oncology, we lack sensitive tools to track low-burden disease. Although cell-free DNA (cfDNA) shows promise in detecting cancer mutations, we found that the combination of low tumor fraction (TF) and limited number of DNA fragments restricts low-disease-burden monitoring through the prevailing deep targeted sequencing paradigm. We reasoned that breadth may supplant depth of sequencing to overcome the barrier of cfDNA abundance. Whole-genome sequencing (WGS) of cfDNA allowed ultra-sensitive detection, capitalizing on the cumulative signal of thousands of somatic mutations observed in solid malignancies, with TF detection sensitivity as low as 10−5. The WGS approach enabled dynamic tumor burden tracking and postoperative residual disease detection, associated with adverse outcome. Thus, we present an orthogonal framework for cfDNA cancer monitoring via genome-wide mutational integration, enabling ultra-sensitive detection, overcoming the limitation of cfDNA abundance and empowering treatment optimization in low-disease-burden oncology care.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Low cfDNA input material limits sensitive ctDNA mutation detection with deep targeted sequencing but can be overcome by genome-wide mutational integration.
Fig. 2: Patient-specific genome-wide SNV integration provides ultra-sensitive ctDNA detection and precision TF estimation.
Fig. 3: Patient-specific genome-wide CNA integration provides ultra-sensitive ctDNA detection and precision TF estimation.
Fig. 4: Detection of ctDNA using MRDetect in melanoma during immunotherapy and colon cancer postoperatively.
Fig. 5: Detection of ctDNA using MRDetect in LUAD preperatively and postoperatively.

Data availability

Sequence data has been deposited at the European Genome-phenome Archive (EGA), which is hosted by the EBI and the CRG, under accession number EGAS00001004406.

Code availability

The analytic code used for this work is provided for non-commercial use at: https://ctl.cornell.edu/technology/mrdetect-licence-request.

References

  1. 1.

    Wan, J. C. M. et al. Liquid biopsies come of age: towards implementation of circulating tumour DNA. Nat. Rev. Cancer 17, 223–238 (2017).

    CAS  PubMed  Article  Google Scholar 

  2. 2.

    Cohen, J. D. et al. Detection and localization of surgically resectable cancers with a multi-analyte blood test. Science 359, 926–930 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  3. 3.

    Sadeh, R. et al. ChIP-seq of plasma cell-free nucleosomes identifies cell-of-origin gene expression programs. Preprint at https://www.biorxiv.org/content/10.1101/638643v1 (2019).

  4. 4.

    Moss, J. et al. Comprehensive human cell-type methylation atlas reveals origins of circulating cell-free DNA in health and disease. Nat. Commun. 9, 5068 (2018).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  5. 5.

    Shen, S. Y. et al. Sensitive tumour detection and classification using plasma cell-free DNA methylomes. Nature 563, 579–583 (2018).

    CAS  PubMed  Article  Google Scholar 

  6. 6.

    Snyder, M. W., Kircher, M., Hill, A. J., Daza, R. M. & Shendure, J. Cell-free DNA comprises an in vivo nucleosome footprint that informs its tissues-of-origin. Cell 164, 57–68 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  7. 7.

    Cristiano, S. et al. Genome-wide cell-free DNA fragmentation in patients with cancer. Nature 570, 385–389 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  8. 8.

    Wang, S. et al. Potential clinical significance of a plasma-based KRAS mutation analysis in patients with advanced non-small cell lung cancer. Clin. Cancer Res. 16, 1324–1330 (2010).

    CAS  PubMed  Article  Google Scholar 

  9. 9.

    Kobayashi, S. et al. EGFR mutation and resistance of non-small-cell lung cancer to gefitinib. N. Engl. J. Med. 352, 786–792 (2005).

    CAS  PubMed  Article  Google Scholar 

  10. 10.

    Adalsteinsson, V. A. et al. Scalable whole-exome sequencing of cell-free DNA reveals high concordance with metastatic tumors. Nat. Commun. 8, 1324 (2017).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  11. 11.

    Murtaza, M. et al. Non-invasive analysis of acquired resistance to cancer therapy by sequencing of plasma DNA. Nature 497, 108–112 (2013).

    CAS  PubMed  Article  Google Scholar 

  12. 12.

    Diehl, F. et al. Circulating mutant DNA to assess tumor dynamics. Nat. Med. 14, 985–990 (2008).

    CAS  PubMed  Article  Google Scholar 

  13. 13.

    Sozzi, G. et al. O-297 quantification of free circulating DNA as a diagnostic marker in lung cancer. Lung Cancer 41, S86–S87 (2003).

    Article  Google Scholar 

  14. 14.

    Bettegowda, C. et al. Detection of circulating tumor DNA in early- and late-stage human malignancies. Sci. Transl. Med. 6, 224ra24 (2014).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  15. 15.

    Wang, Y. et al. Prognostic potential of circulating tumor DNA measurement in postoperative surveillance of nonmetastatic colorectal cancer. JAMA Oncol. 5, 1118–1123 (2019).

    PubMed Central  Article  Google Scholar 

  16. 16.

    van Wezel, E. M. et al. Whole-genome sequencing identifies patient-specific DNA minimal residual disease markers in neuroblastoma. J. Mol. Diagn. 17, 43–52 (2015).

    PubMed  Article  CAS  Google Scholar 

  17. 17.

    Abbosh, C. et al. Phylogenetic ctDNA analysis depicts early-stage lung cancer evolution. Nature 545, 446–451 (2017).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  18. 18.

    Phallen, J. et al. Direct detection of early-stage cancers using circulating tumor DNA. Sci. Transl. Med. 9, eaan2415 (2017).

  19. 19.

    Newman, A. M. et al. An ultrasensitive method for quantitating circulating tumor DNA with broad patient coverage. Nat. Med. 20, 548–554 (2014).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  20. 20.

    Newman, A. M. et al. Integrated digital error suppression for improved detection of circulating tumor DNA. Nat. Biotechnol. 34, 547–555 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  21. 21.

    Kennedy, S. R. et al. Detecting ultralow-frequency mutations by duplex sequencing. Nat. Protoc. 9, 2586–2606 (2014).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  22. 22.

    Campbell, B. B. et al. Comprehensive analysis of hypermutation in human cancer. Cell 171, 1042–1056.e10 (2017).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  23. 23.

    Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  24. 24.

    Spinella, J.-F. et al. SNooPer: a machine learning-based method for somatic variant identification from low-pass next-generation sequencing. BMC Genomics 17, 912 (2016).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  25. 25.

    Taylor, A. M. et al. Genomic and functional approaches to understanding cancer aneuploidy. Cancer Cell 33, 676–689 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  26. 26.

    Reinert, T. et al. Analysis of plasma cell-free DNA by ultradeep sequencing in patients with stages I to III colorectal cancer. JAMA Oncol. 5, 1124–1131 (2019).

    PubMed Central  Article  Google Scholar 

  27. 27.

    Kim, C. G. et al. Effects of microsatellite instability on recurrence patterns and outcomes in colorectal cancers. Br. J. Cancer 115, 25–33 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  28. 28.

    Chida, K. et al. Spontaneous regression of transverse colon cancer: a case report. Surg. Case Rep. 3, 65 (2017).

  29. 29.

    Karakuchi, N. et al. Spontaneous regression of transverse colon cancer with high-frequency microsatellite instability: a case report and literature review. World J. Surg. Oncol. 17, 19 (2019).

    PubMed  PubMed Central  Article  Google Scholar 

  30. 30.

    Mouliere, F. et al. Enhanced detection of circulating tumor DNA by fragment size analysis. Sci. Transl. Med. 10, eaat4921 (2018).

  31. 31.

    Jiang, P. et al. Preferred end coordinates and somatic variants as signatures of circulating tumor DNA associated with hepatocellular carcinoma. Proc. Natl Acad. Sci. USA 115, E10925–E10933 (2018).

    CAS  PubMed  Article  Google Scholar 

  32. 32.

    Jiang, P. et al. Lengthening and shortening of plasma DNA in hepatocellular carcinoma patients. Proc. Natl Acad. Sci. USA 112, E1317–E1325 (2015).

    CAS  PubMed  Article  Google Scholar 

  33. 33.

    Bauml, J. & Levy, B. Clonal hematopoiesis: a new layer in the liquid biopsy story in lung cancer. Clin. Cancer Res. 24, 4352–4354 (2018).

    CAS  PubMed  Article  Google Scholar 

  34. 34.

    Razavi, P. et al. High-intensity sequencing reveals the sources of plasma circulating cell-free DNA variants. Nat. Med. 25, 1928–1937 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  35. 35.

    Martincorena, I. et al. Somatic mutant clones colonize the human esophagus with age. Science 362, 911–917 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  36. 36.

    Yizhak, K. et al. RNA sequence analysis reveals macroscopic somatic clonal expansion across normal tissues. Science 364, eaaw0726 (2019).

  37. 37.

    Salk, J. J. et al. Ultra-sensitive TP53 sequencing for cancer detection reveals progressive clonal selection in normal tissue over a century of human lifespan. Cell Rep. 28, 132–144 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  38. 38.

    Goldstraw, P. et al. The IASLC lung cancer staging project: proposals for revision of the TNM stage groupings in the forthcoming (eighth) edition of the TNM classification for lung cancer. J. Thorac. Oncol. 11, 39–51 (2016).

    PubMed  Article  Google Scholar 

  39. 39.

    Hanna, N. Systemic therapy in resectable non-small cell lung cancer. UpToDate https://www.uptodate.com/contents/systemic-therapy-in-resectable-non-small-cell-lung-cancer (2019).

  40. 40.

    Fox, E. J., Reid-Bayliss, K. S., Emond M. J. & Loeb L. A. Accuracy of next generation sequencing platforms. Next Gener. Seq. Appl. 1, 1000106 (2014).

  41. 41.

    TruSeq DNA PCR-Free Reference Guide. https://support.illumina.com/content/dam/illumina-support/documents/documentation/chemistry_documentation/samplepreps_truseq/truseq-dna-pcr-free-workflow/truseq-dna-pcr-free-workflow-reference-1000000039279-00.pdf (2017).

  42. 42.

    Guerrera, F. et al. The influence of tissue ischemia time on RNA integrity and patient-derived xenografts (PDX) engraftment rate in a non-small cell lung cancer (NSCLC) biobank. PLoS ONE 11, e0145100 (2016).

    PubMed  PubMed Central  Article  Google Scholar 

  43. 43.

    Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  44. 44.

    Jiang, H., Lei, R., Ding, S.-W. & Zhu, S. Skewer: a fast and accurate adapter trimmer for next-generation sequencing paired-end reads. BMC Bioinformatics 15, 182 (2014).

    PubMed  PubMed Central  Article  Google Scholar 

  45. 45.

    Bergmann, E. A., Chen, B.-J., Arora, K., Vacic, V. & Zody, M. C. Conpair: concordance and contamination estimator for matched tumor-normal pairs. Bioinformatics 32, 3196–3198 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  46. 46.

    Saunders, C. T. et al. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics 28, 1811–1817 (2012).

    CAS  PubMed  Article  Google Scholar 

  47. 47.

    Wilm, A. et al. LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets. Nucleic Acids Res. 40, 11189–11201 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  48. 48.

    Xi, R., Luquette, J., Hadjipanayis, A., Kim, T.-M. & Park, P. J. BIC-seq: a fast algorithm for detection of copy number alterations based on high-throughput sequencing data. Genome Biol. 11, O10 (2010).

    PubMed Central  Article  Google Scholar 

  49. 49.

    S. T. Kothen-Hill et al. Deep learning mutation prediction enables early stage lung cancer detection in liquid biopsy. ICLR 2018 Conference https://openreview.net/forum?id=H1DkN7ZCZ (2018).

  50. 50.

    Hadi, K. et al. Novel patterns of complex structural variation revealed across thousands of cancer genome graphs. Preprint at https://www.biorxiv.org/content/10.1101/836296v1 (2019).

  51. 51.

    Underhill, H. R. et al. Fragment length of circulating tumor DNA. PLoS Genet. 12, e1006162 (2016).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank the Landau lab and the New York Genome Center computational biology and sequencing teams for help and feedback throughout this work. A.Z. is supported by an EMBO long-term fellowship (ALTF 140-2016). D.A.L. is supported by the Burroughs Wellcome Fund Career Award for Medical Scientists, the Pershing Square Sohn Prize for Young Investigators in Cancer Research and the National Institutes of Health Director’s New Innovator Award (DP2-CA239065). This work was supported by the Mark Foundation ASPIRE Award, the American Lung Association Cancer Discovery Award, the Daedalus Fund for Innovation and the Meyer Cancer Center.

Author information

Affiliations

Authors

Contributions

D.A.L., A.Z. and N.K.A. conceived and designed the project. K.P., A.Z., S.R., D.R., G.G., J.R., N.D.O., C.A., M.M., C.F.S., S.K., S.F., G.G.I., V.A., B.H.-L., J.I., N.K.A., A.W., P.W., A.M.R., M.K.C., D.F., T.S., B.M., T.K., J.D.W., G.B. and D.A.L. performed patient selection and prepared samples for sequencing. A.Z., R.C.S., M.S., S.T.K.H., A.J.W., S.D., P.O.B., A.H.L., D.M., G.H., K.Y.H., W.L., C.S., C.C.K., F.G., N.R. and D.A.L. performed the computational genomics analyses. A.Z., R.C.S. and D.A.L. wrote the manuscript with comments and contributions from all authors.

Corresponding author

Correspondence to Dan A. Landau.

Ethics declarations

Competing interests

D.A.L., A.Z., V.A. and S.T.K.H. submitted two patent applications. D.A.L. and A.Z. are co-founders of C2i Genomics. D.A.L. participated in an advisory board for Illumina and has received research support. J.D.W. consulted for Adaptive Biotech, Advaxis, Amgen, Apricity, Array BioPharma, Ascentage Pharma, Astellas, Bayer, Beigene, Bristol Myers Squibb, Celgene, Chugai, Elucida, Eli Lilly, F Star, Genentech, Imvaq, Janssen, Kleo Pharma, Kyowa Hakko Kirin, Linneaus, MedImmune, Merck, Neon Therapuetics, Northern Biologics, Ono, Polaris Pharma, Polynoma, Psioxus, Puretech, Recepta, Takara Bio, Trieza, Turvax, Sellas Life Sciences, Serametrix, Surface Oncology, Syndax and Syntalogic. J.D.W. also received research support from Bristol Myers Squibb, Medimmune, Merck Pharmaceuticals and Genentech. J.D.W. holds equity in Potenza Therapeutics, Tizona Pharmaceuticals, Adaptive Biotechnologies, Elucida, Imvaq, Beigene, Trieza and Linneaus.

Additional information

Peer review information Javier Carmona was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Deep targeted sequencing analysis and DNA extraction optimization.

a-b, constitute a re-analysis of previously published data18. a, Histogram of samples detected as a function of the maximum variant allele fraction (VAF) in stage I-IV cancer. Maximum VAF serves as an estimation of tumor fraction (TF). Patient histogram (solid bars) and empirical cumulative distribution (solid line) are shown, displaying stage dependent detection probability from 50% sensitivity in stage I cancer to >90% sensitivity in stage IV. Color coding represents different cancer stages. Number of patients considered per cancer stage is indicated in the figure. b, Histogram of variants (mutations) detected as a function of the variant allele fraction (VAF) for different cancer stages. Higher cancer stage exhibits an increase in the mutation VAF, associated with higher probability of detection. Number of mutations included in the analysis per cancer stage is indicated in the figure. c, Comparison of cfDNA yields across various leading commercial extraction kits. Multiple extractions (n = 3) for each kit were performed over 1 mL aliquots of the same input plasma sample (from plasmapheresis of a normal donor), \and total mass (per 1 mL) was determined with Qubit (ThermoFisher, Waltham, MA). Mean value and confidence interval (standard deviation) is shown. d, Extracted cfDNA was evaluated with Nanodrop (ThermoFisher, Waltham, MA) to detect impurities such as salt and genomic DNA. Omega Bio-Tek (Norcross, GA) had the lowest amount of contaminant carry over compared to other kits. Multiple tests (n = 3, solid lines) for each kit were performed over 1 mL aliquots of the same input plasma sample.

Extended Data Fig. 2 Patient-specific genome-wide SNV integration and error suppression.

a, Base quality (BQ) signal for high quality germline single nucleotide polymorphisms (SNPs; n = 9,142,326) vs. low VAF (single supporting read) artifactual variants (n = 407,061) from representative PBMC sample (Pat.01), showing distinct separation between SNP and the low VAF artifacts BQ distributions (P value < 10−100, two-sample t-test), and supporting effective filtration of sequencing artifacts by BQ filtration. b, Receiver-operating-curve (ROC) analysis for BQ filtering including germline SNPs (true labels) and low VAF artifactual variants (false labels), using the same data set as in (a), and showing high filter performance for this simple quality metric (AUC = 0.9, n = 9,142,326). c, Variant position-in-read (PIR) shows association between low VAF artifactual variants (n = 407,061) and position at the 3’ of the sequencing read, while germline SNPs (n = 9,142,326) show uniform spread across the sequencing read length. d, Support vector machine (SVM) classification performance between germline SNPs (random subsampling, n = 100,000) vs. low VAF artifactual variants (random subsampling, n = 100,000) from all PBMC WGS data (n = 8). Performance of SVM and random forest classification was compared over the same sample set with 10-fold cross validation. e, Box plot of error rate estimations before error suppression (blue) and after SVM-based error suppression (red) over four cancer types and 40 PBMC-derived replicates (2 patients per cancer type, 20 replicates per patient’s PBMC sample). Error rate was calculated as the number of mismatches detected over the number of bp checked. Showing a uniform error reduction (median 14 fold-change reduction, range 11-17). f-h, Patient-specific SNV signal-to-noise quantification over a range of TFs (10−5-10−2) compared to basal noise signal detected in control (TF = 0, subsampled PBMC DNA fragments) samples (left column). Signal-to-noise was estimated by calculating the log difference between the number of detections in each plasma-like admixture (TF > 0) and the mean number of detections in the controls (TF = 0). Analysis was done separately using tumor and matched germline (PBMC) WGS from lung (f, Pat.04), breast (g, Pat.05) and osteosarcoma (h, Pat.08) patients. Inset panel shows discrimination of tumor and control samples down to tumor fraction 10−5 after utilizing machine-learning-based sequencing error suppression (red) vs. reduced sensitivity with the raw unfiltered data (blue). i, Benchmarking of mutation detection performance for mutation centric method23 vs. read-centric method (MRDetect). Patient-specific SNV signal-to-noise quantification over a range of TFs (10−5-2×10−1) compared to basal noise signal detected in control (TF = 0, subsampled PBMC DNA fragments) samples (right column). Signal-to-noise was estimated by calculating the log difference between the number of detections in each plasma-like admixture (TF > 0) and the mean number of detections in the controls (TF = 0). j, Single nucleotide variant (SNV) point mutation detection in plasma mixtures with different tumor fractions (TF > 0) and controls (TF = 0) is shown. Y-axis shows the number of detections (variants observed in tumor WGS and also detected in plasma synthetic admixture) as a function of TF (x-axis). Red line constitutes the number of detections predicted for each TF based on the mutation load, coverage, noise model. Gray area represents the area under the background noise model threshold (1.5std), showing robust discrimination from noise for TF > 10−3. Analysis was done on 35X coverage lung cancer (Pat.04) admixture cohort. Centre values represent mean and error bars represent standard deviation. In (f-j), n = 11 independent admixture samples for TFs > 0 and n = 20 independently down-sampled PBMC replicates for the control (TF = 0) of each patient. Throughout the figure, boxplots represent median, bottom and upper quartile; whiskers correspond to 1.5 x IQR.

Extended Data Fig. 3 Patient-specific genome-wide SNV integration provides accurate read out of tumor fraction.

a-h, Tumor fraction (TF) inference plots using MRDetect genome-wide SNV integration for two melanomas (a: Pat.01, b: Pat.02), two lung cancers (c: Pat.03, d: Pat.04), two breast cancers (e: Pat.05, f: Pat.06), and two osteosarcomas (g: Pat.07, h: Pat.08). Each plot was generated using in silico admixtures of varying TF (range 10−5-0.2) with 35X depth of coverage by randomly downsampling and mixing tumor reads (mean coverage 97×, range 85×-110×) and germline reads (mean coverage 49×, range 43×-56×) from WGS data (see Methods; Supplementary Table 1). For TFs > 0, n = 11 independent admixture samples. For the control (TF = 0), n = 20 independently down-sampled PBMC replicates. We observed accurate TF estimation as low as 10−5, discriminated from control (TF = 0) samples (left box-plot), with high Pearson correlation (two-sided test) between the input TF mixture (x-axis) and the SNV-based estimated TF prediction, confirming accurate inference based on genome-wide mutational integration. Throughout the figure, boxplots represent median, bottom and upper quartile; whiskers correspond to 1.5 x IQR.

Extended Data Fig. 4 CNA load across tumor types.

Histogram of CNA load in WGS samples across cancer types from the TCGA cohort. Measured as a function of the size of genome altered by CNA (in log10Mb). Dashed lines represent the percentage of samples that have CNA load of over 10 Mb and 1 Gb respectively, for each cancer type. Cancer types include LUSC: Lung squamous cell carcinoma (n = 50), HNSC: Head and Neck squamous cell carcinoma (n = 50), CESC: Cervical squamous cell carcinoma and endocervical adenocarcinoma (n = 18), OV: Ovarian serous cystadenocarcinoma (n = 50), KICH: Kidney Chromophobe (n = 50), COAD: Colon adenocarcinoma (n = 53), THCA: Thyroid carcinoma (n = 50), LUAD: Lung adenocarcinoma (N = 152), ESCA: Esophageal carcinoma (n = 19).

Extended Data Fig. 5 CNA based detection of ctDNA tumor fraction.

a-c, Tumor fraction (TF) inference using MRDetect genome-wide CNA integration for representative patients, including melanoma (a: Pat.01) and lung cancer (b: Pat.03, c: Pat.04). Each plot was generated using in silico admixtures of varying TF (range 10−5 – 0.2) in 18X coverage, by randomly downsampling and mixing tumor reads (mean coverage 97×, range 85×-110×) and germline reads (mean coverage 49×, range 43×-56×) from WGS data (see Methods; Supplementary Table 1). Twenty replicates used for each TF > 0 sample, and for the control (TF = 0) samples, showing accurate TF estimation as low as 5*10−5, discriminated from control (TF = 0) samples (left box-plot), with high Pearson correlation between the input TF mixture (x-axis) and the CNA-based estimated TF prediction (two-sided test). d-f, Tumor fraction (TF) inference in neutral regions (no copy number gain or loss in the tumor WGS data) for the same in silico admixtures (d: melanoma, Pat.01; e: lung, Pat.03; f: lung, Pat.04), shows the expected low Pearson correlation between input admixture TF and the signal (two-sided test), consistent with no expected coverage changes in the plasma admixtures in these regions. Throughout the figure, boxplots represent median, bottom and upper quartile; whiskers correspond to 1.5 x IQR.

Extended Data Fig. 6 CNA and SNV MRDetect correlation and further error suppression.

a, Spearman correlation between SNV and CNA TF estimation across TF admixtures for a lung tumor (Pat.03) shows high correlation (two-sided test) between the two orthogonal inference methods. Red dots correspond to cancer plasma (TF > 0) samples and blue dots correspond to control plasma (TF = 0) samples. Detection threshold (dashed lines) were set on TF < 5*10−5 for both methods. Eleven replicates used for each TF > 0 sample, and 20 replicates were used for the control (TF = 0) samples. b, Comparison to an orthogonal CNA-based TF method- ichor-CNA15. Analyzing the same cohort of breast cancer (Pat.05) in silico synthetic admixture samples shows concordance in TF estimation for high TF (TF > 5*10−3), with extension of detection for MRDetect to lower TFs. The same 20 replicates were used for both MRDetect and ichor-CNA for each of the TF > 0 and control (TF = 0) samples. c, Proportion of variant concordant (brown) vs. discordant (gray) read pairs (R1 and R2) detected in germline SNPs and artifactual variants. Analysis was done across 10 control (benign lung lesions) plasma samples, comparing read-pairs associated with germline SNPs (right bar) vs. read-pairs associated with artifactual variants (left bar) per plasma sample. The artifactual variants were defined by read pairs with variants overlapping the union of all patient somatic SNVs compendia across all LUAD patients, that were observed with the same variant in the control plasma sample. The number of read-pairs used in the analysis is indicated above each bar. d, Median genome-wide normalize (divided by mean coverage) coverage from matched germline PBMC WGS samples from patients with LUAD (cyan, n = 15) and control plasma WGS samples from patients with benign lung lesions (red, n = 11) before and after robust Z-score normalization. e, Median absolute deviation (MAD) calculated over normalized-coverage (that is, divided by mean coverage) from matched germline PBMC and control plasma WGS samples as in (d) before and after robust Z-score normalization. Throughout the figure, boxplots represent median, bottom and upper quartile; whiskers correspond to 1.5 x IQR.

Extended Data Fig. 7 Serial application of MRDetect to monitor melanoma response to immunotherapy.

Melanoma treatment response (Patient MEL02) during immunotherapy (Nivolumab) is monitored by blood samples. Upper panels- Treatment monitoring by computed tomography (CT) shows response to therapy but residual disease after 3 months of therapy. Middle panel- MRDetect Z-scores effectively track tumor responses, matching radiographic changes, in higher temporal resolution than that feasible with imaging. Lower panel- ichor-CNA captures treatment response dynamics but showing lower signal to noise ratio compared to the MRDetect method. For both MRDetect and ichor-CNA methods, log Z-score is calculated from a single plasma sample for each timepoint compared to a panel of control samples (n = 30). Throughout the figure, boxplots represent median, bottom and upper quartile; whiskers correspond to 1.5 x IQR.

Extended Data Fig. 8 MRDetect performance in colorectal monitoring.

a, Robust Z-score discrimination between signal detected across 20 random subsamplings (80% of reads per subsampling iteration) of LUAD patient pre-operative plasma (black, n = 36 patients) and the cohort of control plasma test set (gray, n = 30). The signal was measured on the subsampling set and control test set using the same patient-specific point mutation (SNV) compendium. Z-score was calculated using the noise parameters estimated in the control test cohort (see Methods). b, Receiver-operating-curve (ROC) analysis was performed over all SNV-based Z-score values calculated on the patients’ pre-operative plasma and control plasma as in (a). c, Cross patient noise evaluation. Robust Z-score discrimination between signal detected at 20 random subsamplings (80% of reads per subsampling iteration) of LUAD patient pre-operative plasma WGS (black, n = 36 patients), cross-patient noise estimation via application of the patient-specific compendium to all other patient pre-operative plasma (n = 35, gray). Z-score was calculated using the noise parameters estimated in the cross-patient cohort (see Methods). d, Receiver-operating-curve (ROC) analysis was performed over all SNV-based Z-score values calculated on the matched patients and cross-patient plasma. e, Z-score discrimination between MRDetect-CNA on LUAD patient pre-operative plasma (red, n = 36 patients) compared to signal detected in neutral regions (as a negative control, blue), control plasma test cohort (n = 30) and cross-patient cohort (n = 35). Cross-patient noise was estimated by applying the patient-specific CNA compendium to other patient plasma samples (n = 35, all other patients). Z-score was calculated using the noise parameters estimated by the control plasma cohort. f, Receiver-operating-curve (ROC) analysis was performed over all CNA-based Z-score values calculated on the patients’ pre-operative plasma and control patients. g, ROC analysis was performed over all ichor-CNA4 TF values calculated on the LUAD patients’ pre-operative plasma and control patients (n = 66). Interestingly the two patient plasma samples detected by ichor-CNA included events that do not appear in the tumor, one of them was found to be a PBMC specific somatic event (potentially from clonal hematopoiesis). Throughout the figure, boxplots represent median, bottom and upper quartile; whiskers correspond to 1.5 x IQR. h, Z-score discrimination between signal detected in 20 random subsampling (80% of reads per subsampling iteration) of LUAD patient plasma WGS (n = 22 patients) collected at a median of 17 days after surgery and a cohort of control plasma test samples (gray, n = 30). The signal was measured on the matched plasma and control set using the same patient-specific point mutation (SNV) compendium. Z-score was calculated using the noise parameters estimated in the control cohort (see Methods). i, Z-score discrimination between MRDetect-CNA on LUAD patient plasma (red, n = 22 patients) collected at a median of 17 days after surgery, compared to signal detected in neutral regions (as a negative control, blue) and control plasma cohort (n = 30). Z-score was calculated using the noise parameters estimated by the control plasma cohort (see Methods). Throughout the figure, boxplots represent median, bottom and upper quartile; whiskers correspond to 1.5 x IQR.

Extended Data Fig. 9 MRDetect performance in LUAD monitoring.

a, Robust Z-score discrimination between signal detected across 20 random subsamplings (80% of reads per subsampling iteration) of LUAD patient pre-operative plasma (black, n = 36 patients) and the cohort of control plasma test set (gray, n = 30). The signal was measured on the subsampling set and control test set using the same patient-specific point mutation (SNV) compendium. Z-score was calculated using the noise parameters estimated in the control test cohort (see Methods). b, Receiver-operating-curve (ROC) analysis was performed over all SNV-based Z-score values calculated on the patients’ pre-operative plasma and control plasma as in (a). c, Cross patient noise evaluation. Robust Z-score discrimination between signal detected at 20 random subsamplings (80% of reads per subsampling iteration) of LUAD patient pre-operative plasma WGS (black, n = 36 patients), cross-patient noise estimation via application of the patient-specific compendium to all other patient pre-operative plasma (n = 35, gray). Z-score was calculated using the noise parameters estimated in the cross-patient cohort (see Methods). d, Receiver-operating-curve (ROC) analysis was performed over all SNV-based Z-score values calculated on the matched patients and cross-patient plasma. e, Z-score discrimination between MRDetect-CNA on LUAD patient pre-operative plasma (red, n = 36 patients) compared to signal detected in neutral regions (as a negative control, blue), control plasma test cohort (n = 30) and cross-patient cohort (n = 35). Cross-patient noise was estimated by applying the patient-specific CNA compendium to other patient plasma samples (n = 35, all other patients). Z-score was calculated using the noise parameters estimated by the control plasma cohort. f, Receiver-operating-curve (ROC) analysis was performed over all CNA-based Z-score values calculated on the patients’ pre-operative plasma and control patients. g, ROC analysis was performed over all ichor-CNA4 TF values calculated on the LUAD patients’ pre-operative plasma and control patients (n = 66). Interestingly the two patient plasma samples detected by ichor-CNA included events that do not appear in the tumor, one of them was found to be a PBMC specific somatic event (potentially from clonal hematopoiesis). Throughout the figure, boxplots represent median, bottom and upper quartile; whiskers correspond to 1.5 x IQR. h, Z-score discrimination between signal detected in 20 random subsampling (80% of reads per subsampling iteration) of LUAD patient plasma WGS (n = 22 patients) collected at a median of 17 days after surgery and a cohort of control plasma test samples (gray, n = 30). The signal was measured on the matched plasma and control set using the same patient-specific point mutation (SNV) compendium. Z-score was calculated using the noise parameters estimated in the control cohort (see Methods). i, Z-score discrimination between MRDetect-CNA on LUAD patient plasma (red, n = 22 patients) collected at a median of 17 days after surgery, compared to signal detected in neutral regions (as a negative control, blue) and control plasma cohort (n = 30). Z-score was calculated using the noise parameters estimated by the control plasma cohort (see Methods). Throughout the figure, boxplots represent median, bottom and upper quartile; whiskers correspond to 1.5 x IQR.

Extended Data Fig. 10 Imaging of sample monitored LUAD cases and fragment length analysis.

Positron emission tomography–computed tomography (PET-CT) of two patients (LUAD#3 and LUAD#6) confirms no radiographically observable metastatic spread at the time of surgery (a and c, respectively), while metastatic recurrence has been identified approximately six months post-operative by PET-CT (b and d, respectively). e-h, Representative fragment size histograms showing the distribution of DNA fragments as a function of the fragment length. DNA fragments that are associated with tumor mutations (gray) are showing significantly shorter size compared to DNA fragments that are associated with artifactual non patient-specific detections (red, derived from applying cross-patient mutational compendia; median P value < 10−3, two-sample t-test). Plots include two pre-operative plasma samples (e: LUAD#18, tumor mutation detection n = 35, artifactual detection n = 18802; f: LUAD#31, tumor mutation detection n = 82, artifactual detection n = 32530) and matching plasma samples after surgery (g: LUAD#18, tumor mutation detection n = 55, artifactual detection n = 22737; h: LUAD#31, tumor mutation detection n = 27, artifactual detection n = 14610). i, Kernel-density-estimator (KDE) trained to discriminate between tumor-derived (human aligned reads from a patient derived xenograft model, see Methods, blue) and normal-derived (from control plasma samples, orange) cfDNA based on the fragment size signature. The log difference between the tumor and normal density functions (black solid line) was used as a score function that integrates the fragment size shift signal across the entire fragment sizes distribution (80bp-600bp). j, Pre-operative cfDNA showed significant shift (two-sample t-test) in their tumor-specific mutation detections in the patient plasma (red, n = 563), compared to non-tumor (cross-patient) detected mutations in the same samples (blue, n = 4184). Violin plots depict kernel density estimates of the density distribution. Center dashed lines represent the median and dashed lines represent the interquartile range.

Supplementary information

Reporting Summary

Supplementary Table 1

Supplementary Tables 1–5

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zviran, A., Schulman, R.C., Shah, M. et al. Genome-wide cell-free DNA mutational integration enables ultra-sensitive cancer monitoring. Nat Med (2020). https://doi.org/10.1038/s41591-020-0915-3

Download citation