Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Tracking early lung cancer metastatic dissemination in TRACERx using ctDNA


Circulating tumour DNA (ctDNA) can be used to detect and profile residual tumour cells persisting after curative intent therapy1. The study of large patient cohorts incorporating longitudinal plasma sampling and extended follow-up is required to determine the role of ctDNA as a phylogenetic biomarker of relapse in early-stage non-small-cell lung cancer (NSCLC). Here we developed ctDNA methods tracking a median of 200 mutations identified in resected NSCLC tissue across 1,069 plasma samples collected from 197 patients enrolled in the TRACERx study2. A lack of preoperative ctDNA detection distinguished biologically indolent lung adenocarcinoma with good clinical outcome. Postoperative plasma analyses were interpreted within the context of standard-of-care radiological surveillance and administration of cytotoxic adjuvant therapy. Landmark analyses of plasma samples collected within 120 days after surgery revealed ctDNA detection in 25% of patients, including 49% of all patients who experienced clinical relapse; 3 to 6 monthly ctDNA surveillance identified impending disease relapse in an additional 20% of landmark-negative patients. We developed a bioinformatic tool (ECLIPSE) for non-invasive tracking of subclonal architecture at low ctDNA levels. ECLIPSE identified patients with polyclonal metastatic dissemination, which was associated with a poor clinical outcome. By measuring subclone cancer cell fractions in preoperative plasma, we found that subclones seeding future metastases were significantly more expanded compared with non-metastatic subclones. Our findings will support (neo)adjuvant trial advances and provide insights into the process of metastatic dissemination using low-ctDNA-level liquid biopsy.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Rent or buy this article

Prices vary by article type



Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of the cohort and ctDNA calling.
Fig. 2: Genomic and transcriptomic predictors of ctDNA detection in early-stage NSCLC.
Fig. 3: Postoperative minimal residual disease detection in early-stage NSCLC.
Fig. 4: Clonality measurements in preoperative plasma overcome sampling bias from a single tissue sample and predict metastatic seeding potential.
Fig. 5: Longitudinal measurements of clonal evolution in the plasma from surgery to therapy and recurrence.

Data availability

The cfDNA sequencing files, RNA-seq data and multiregion tumour exome sequencing data (in each case from the TRACERx study) used or analysed during this study have been deposited at the European Genome–phenome Archive (EGA), hosted by The European Bioinformatics Institute (EBI) and the Centre for Genomic Regulation (CRG) under accession codes EGAS00001006494, EGAS00001006517 and EGAS00001006494 and is under controlled access owing to the nature of the data and commercial partnership arrangements. Details on how to apply for access are available on the linked page.

Code availability

ECLIPSE is available as an R package to install from github ( which is only available for academic non-commercial research purposes. Code used to produce the figures in this paper is available on request.


  1. Moding, E. J., Nabet, B. Y., Alizadeh, A. A. & Diehn, M. Detecting liquid remnants of solid tumors: circulating tumor DNA minimal residual disease. Cancer Discov. 11, 2968–2986 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Jamal-Hanjani, M. et al. Tracking the evolution of non-small-cell lung cancer. N. Engl. J. Med. 376, 2109–2121 (2017).

    Article  CAS  PubMed  Google Scholar 

  3. Chabon, J. J. et al. Integrating genomic features for non-invasive early lung cancer detection. Nature 580, 245–251 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  4. Peng, M. et al. Circulating tumor DNA as a prognostic biomarker in localized non-small cell lung cancer. Front. Oncol. 10, 561598 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Xia, L. et al. Perioperative ctDNA-based molecular residual disease detection for non-small cell lung cancer: a prospective multicenter cohort study (LUNGCA-1). Clin. Cancer Res. 28, 3308–3317 (2021).

    Article  Google Scholar 

  6. Chaudhuri, A. A. et al. Early detection of molecular residual disease in localized lung cancer by circulating tumor DNA profiling. Cancer Discov. 7, 1394–1403 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Abbosh, C. et al. Phylogenetic ctDNA analysis depicts early-stage lung cancer evolution. Nature 545, 446–451 (2017).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  8. Gale, D. et al. Residual ctDNA after treatment predicts early relapse in patients with early-stage non-small cell lung cancer. Ann. Oncol. 33, 500–510 (2022).

  9. Zhang, J.-T. et al. Longitudinal undetectable molecular residual disease defines potentially cured population in localized non-small cell lung cancer. Cancer Discov. 12, 1690–1701 (2022).

  10. Powles, T. et al. ctDNA guiding adjuvant immunotherapy in urothelial carcinoma. Nature 595, 432–437 (2021).

    Article  ADS  CAS  PubMed  Google Scholar 

  11. Tie, J. et al. Circulating tumor DNA analysis guiding adjuvant therapy in stage II colon cancer. N. Engl. J. Med. 386, 2261–2272 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Parikh, A. R. et al. Liquid versus tissue biopsy for detecting acquired resistance and tumor heterogeneity in gastrointestinal cancers. Nat. Med. 25, 1415–1421 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Murtaza, M. et al. Multifocal clonal evolution characterized using circulating tumour DNA in a case of metastatic breast cancer. Nat. Commun. 6, 8760 (2015).

    Article  ADS  PubMed  Google Scholar 

  14. Herberts, C. et al. Deep whole-genome ctDNA chronology of treatment-resistant prostate cancer. Nature 608, 199–208 (2022).

    Article  ADS  CAS  PubMed  Google Scholar 

  15. Lung Cancer: Diagnosis and Management NICE Guideline NG122 (NICE, 2019).

  16. Zheng, Z. et al. Anchored multiplex PCR for targeted next-generation sequencing. Nat. Med. 20, 1479–1484 (2014).

    Article  CAS  PubMed  Google Scholar 

  17. Abbosh, C., Birkbak, N. J. & Swanton, C. Early stage NSCLC—challenges to implementing ctDNA-based screening and MRD detection. Nat. Rev. Clin. Oncol. 15, 577–586 (2018).

    Article  CAS  PubMed  Google Scholar 

  18. Newman, A. M. et al. An ultrasensitive method for quantitating circulating tumor DNA with broad patient coverage. Nat. Med. 20, 548–554 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Hänzelmann, S., Castelo, R. & Guinney, J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinform. 14, 7 (2013).

    Article  Google Scholar 

  20. Liberzon, A. et al. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 1, 417–425 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Biswas, D. et al. A clonal expression biomarker associates with lung cancer mortality. Nat. Med. 25, 1540–1548 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Burrell, R. A. et al. Replication stress links structural and numerical cancer chromosomal instability. Nature 494, 492–496 (2013).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  23. Wang, Z. C. et al. Profiles of genomic instability in high-grade serous ovarian cancer predict treatment outcome. Clin. Cancer Res. 18, 5806–5815 (2012).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  24. Mermel, C. H. et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 12, R41 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Shih, D. J. H. et al. Genomic characterization of human brain metastases identifies drivers of metastatic lung adenocarcinoma. Nat. Genet. 52, 371–377 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Tate, J. G. et al. COSMIC: the catalogue of somatic mutations in cancer. Nucleic Acids Res. 47, D941–D947 (2019).

    Article  CAS  PubMed  Google Scholar 

  27. Garcia-Murillas, I. et al. Assessment of molecular relapse detection in early-stage breast cancer. JAMA Oncol. 5, 1473–1478 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  28. Frankell, A. M. et al. The evolution of lung cancer and impact of subclonal selection in TRACERx. Nature (2023).

  29. Litchfield, K. et al. Meta-analysis of tumor- and T cell-intrinsic mechanisms of sensitization to checkpoint inhibition. Cell 184, 596–614 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. McGranahan, N. et al. Clonal neoantigens elicit T cell immunoreactivity and sensitivity to immune checkpoint blockade. Science 351, 1463–1469 (2016).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  31. Al Bakir, M. et al. The evolution of non-small lung cancer metastases in TRACERx. Nature (2023).

  32. Martínez-Ruiz, C. et al. Genomic–transcriptomic evolution in lung cancer and metastasis. Nature (2023).

  33. Moding, E. J. et al. Circulating tumor DNA dynamics predict benefit from consolidation immunotherapy in locally advanced non-small cell lung cancer. Nat. Cancer 1, 176–183 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Chen, K. et al. Perioperative dynamic changes in circulating tumor DNA in patients with lung cancer (DYNAMIC). Clin. Cancer Res. 25, 7058–7067 (2019).

    Article  CAS  PubMed  Google Scholar 

  35. Li, N. et al. Perioperative circulating tumor DNA as a potential prognostic marker for operable stage I to IIIA non–small cell lung cancer. Cancer 128, 708–718 (2021).

    Article  PubMed  Google Scholar 

  36. Kurtz, D. M. et al. Enhanced detection of minimal residual disease by targeted sequencing of phased variants in circulating tumor DNA. Nat. Biotechnol. 39, 1537–1547 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Cohen, J. D. et al. Detection of low-frequency DNA variants by targeted sequencing of the Watson and Crick strands. Nat. Biotechnol. 39, 1220–1227 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Gydush, G. et al. Massively parallel enrichment of low-frequency alleles enables duplex sequencing at low depth. Nat. Biomed. Eng. 6, 257–266 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Van Loo, P. et al. Allele-specific copy number analysis of tumors. Proc. Natl Acad. Sci. USA 107, 16910–16915 (2010).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  40. Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Roth, A. et al. PyClone: statistical inference of clonal population structure in cancer. Nat. Methods 11, 396–398 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Miller, C. A. et al. Visualizing tumor evolution with the fishplot package for R. BMC Genom. 17, 880 (2016).

    Article  Google Scholar 

  43. Frankell, A. M., Colliver, E., Mcgranahan, N. & Swanton, C. cloneMap: a R package to visualise clonal heterogeneity. Preprint at bioRxiv (2022).

  44. Birkbak, N. J. & Mcgranahan, N. Cancer genome evolutionary trajectories in metastasis. Cancer Cell 37, 8–19 (2020).

    Article  CAS  PubMed  Google Scholar 

  45. Rosenthal, R. et al. Neoantigen-directed immune escape in lung cancer evolution. Nature 567, 479–485 (2019).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  46. Lai, Z. et al. VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research. Nucleic Acids Res. 44, e108 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  47. Signorell, A., Aho, K., Alfons, A., Anderegg, N. & Aragon, T. DescTools: tools for descriptive statistics. R package version 0.99 (2023).

  48. Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).

    Article  CAS  PubMed  Google Scholar 

  49. Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  50. Yu, G. & He, Q.-Y. ReactomePA: an R/Bioconductor package for reactome pathway analysis and visualization. Mol. Biosyst. 12, 477–479 (2016).

    Article  CAS  PubMed  Google Scholar 

  51. Kassambara, A. rstatix: pipe-friendly framework for basic statistical tests. R package version 0.7.1 (2022).

  52. Kuznetsova, A., Brockhoff, P. B. & Christensen, R. H. B. lmerTest package: tests in linear mixed effects models. J. Stat. Softw. 82, 1–26 (2017).

    Article  Google Scholar 

  53. Sanchez-Vega, F. et al. Oncogenic signaling pathways in The Cancer Genome Atlas. Cell 173, 321–337 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Bielski, C. M. et al. Genome doubling shapes the evolution and prognosis of advanced cancers. Nat. Genet. 50, 1189–1195 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Chung, N. C., Miasojedow, B., Startek, M. & Gambin, A. Jaccard/Tanimoto similarity test and estimation methods for biological presence-absence data. BMC Bioinform. 20, 644 (2019).

    Article  Google Scholar 

  56. Larsson, J. eulerr: area-proportional Euler and Venn diagrams with ellipses. R package version 7.0.0 (2022).

  57. Yu, G. ggplotify: convert plot to ‘grob’ or ‘ggplot’ object. R package version 0.1.0 (2021).

  58. Therneau, T. M. survival: a package for survival analysis in R. R package version v.3.2-13 (2021).

  59. Wiesweg, M. survivalAnalysis: high-level interface for survival analysis and associated plots. R package version 0.3.0 (2022).

  60. Kassambara, A., Kosinski, M. & Biecek, P. survminer: drawing survival curves using ‘ggplot2’. R package version 0.4.9 (2021).

  61. R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2021).

  62. Wickham, H. et al. Welcome to the tidyverse. J. Open Source Softw. 4, 1686 (2019).

    Article  ADS  Google Scholar 

  63. Dowle, M. et al. data.table: extension of ‘data.frame’. R package version 1.14.6 (2022).

  64. Wickham, H. et al. readxl: read excel files. R package version 1.4.1 (2022).

  65. Klik, M. fst: lightning fast serialization of data frames. R package version 0.9.8 (2022).

  66. Yaari, G., Bolen, C. R., Thakar, J. & Kleinstein, S. H. Quantitative set analysis for gene expression: a method to quantify gene set differential expression including gene-gene correlations. Nucleic Acids Res. 41, e170 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Turner, J. A., Bolen, C. R. & Blankenship, D. M. Quantitative gene set analysis generalized for repeated measures, confounder adjustment, and continuous covariates. BMC Bioinform. 16, 272 (2015).

    Article  Google Scholar 

  68. Meng, H., Yaari, G., Bolen, C. R., Avey, S. & Kleinstein, S. H. Gene set meta-analysis with quantitative set analysis for gene expression (QuSAGE). PLoS Comput. Biol. 15, e1006899 (2019).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  69. Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32, 2847–2849 (2016).

    Article  CAS  PubMed  Google Scholar 

  70. Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer, 2016).

  71. Kassambara, A. ggpubr: ‘ggplot2’ based publication ready plots. R package version 3.3.5 (2020).

  72. Slowikowski, K. ggrepel: automatically position non-overlapping text labels with ‘ggplot2’. R package version 0.9.2 (2022).

  73. Clarke, E. ggbeeswarm: categorical scatter (violin point) plots. R package version 0.7.1 (2022).

  74. Wickham, H. et al. scales: scale functions for visualization. R package version 1.2.1 (2022).

  75. Pedersen, T. L. ggforce: accelerating ‘ggplot2’. R package version 0.4.1 (2022).

  76. Wilke, C. O. cowplot: streamlined plot theme and plot annotations for ‘ggplot2’. R package version 1.1.1 (2020).

  77. Lakatos, E. et al. LiquidCNA: tracking subclonal evolution from longitudinal liquid biopsies using somatic copy number alterations. iScience 24, 102889 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

Download references


The TRACERx study ( NCT01888601) is sponsored by University College London (UCL/12/0279) and has been approved by an independent Research Ethics Committee (13/LO/1546). TRACERx is funded by Cancer Research UK (C11496/A17786) and is coordinated through the Cancer Research UK and UCL Cancer Trials Centre, which has a core grant from CRUK (C444/A15953). We thank the patients and relatives who participated in the TRACERx study, and all site personnel, investigators, funders and industry partners who supported the generation of the data within this study. In particular, we acknowledge the support of staff at the Scientific Computing, the Advanced Sequencing Facility and Experimental Histopathology departments at the Francis Crick Institute. We also thank J. Brock for help. This work was supported by the Cancer Research UK Lung Cancer Centre of Excellence and the CRUK City of London Centre Award (C7893/A26233). M.A.B. is supported by Cancer Research UK, the Rosetrees Trust and the Francis Crick Institute. N.J.B. is a fellow of the Lundbeck Foundation (R272-2017-4040) and acknowledges funding from Aarhus University Research Foundation (AUFF-E-2018-7-14) and the Novo Nordisk Foundation (NNF21OC0071483). A. Huebner is supported by Cancer Research UK. D.A.M. is supported by the Cancer Research UK Lung Cancer Centre of Excellence (C11496/A30025). T.B.K.W. is supported by the Francis Crick Institute, as well as the Marie Curie ITN Project PLOIDYNET (FP7-PEOPLE-2013, 607722), the Breast Cancer Research Foundation (BCRF), Royal Society Research Professorships Enhancement Award (RP/EA/180007) and the Foulkes Foundation. T.K. is supported by the JSPS Overseas Research Fellowships Program (202060447). C.M.-R. is supported by the Rosetrees (M630) and Wellcome trusts. E.L.L. receives funding from NovoNordisk Foundation (16584). C.T.H. has received funding from NIHR University College London Hospitals Biomedical Research Centre. M.J.-H. is a CRUK Career Establishment Awardee and has received funding from CRUK, the NIH National Cancer Institute, the IASLC International Lung Cancer Foundation, the Lung Cancer Research Foundation, the Rosetrees Trust, UKI NETs, NIHR and the NIHR UCLH Biomedical Research Centre. N.M. is a Sir Henry Dale Fellow, jointly funded by the Wellcome Trust and the Royal Society (grant no. 211179/Z/18/Z) and also receives funding from Cancer Research UK, Rosetrees and the NIHR BRC at University College London Hospitals and the CRUK University College London Experimental Cancer Medicine Centre. T.L.C. acknowledges funding support from the Howard Hughes Medical Institute, and the Radiation Oncology Institute; G.I.E. from Cancer Research UK (A29210) and the European Research Council Advanced Investigator Award (294851); H.J.W.L.A. from the NIH (NIH-USA U24CA194354, NIH-USA U01CA190234, NIH-USA U01CA209414 and NIH-USA R35CA22052) and the European Union–European Research Council (grant agreement no. 866504); and K.L. from the UK Medical Research Council (MR/V033077/1), the Rosetrees Trust and Cotswold Trust (A2437), the Royal Marsden Cancer Charity and Melanoma Research Alliance. BioRender was used in the generation of Fig. 4 and Extended Data Fig. 7. C.S. is a Royal Society Napier Research Professor (RSRP\R\210001); and is supported by the Francis Crick Institute, which receives its core funding from Cancer Research UK (CC2041), the UK Medical Research Council (CC2041) and the Wellcome Trust (CC2041). For the purpose of Open Access, the author has applied a CC BY public copyright licence to any author accepted manuscript version arising from this submission. C.S. is funded by Cancer Research UK (TRACERx (C11496/A17786), PEACE (C416/A21999) and CRUK Cancer Immunotherapy Catalyst Network); Cancer Research UK Lung Cancer Centre of Excellence (C11496/A30025); the Rosetrees Trust, Butterfield and Stoneygate Trusts; the NovoNordisk Foundation (ID16584); the Royal Society Professorship Enhancement Award (RP/EA/180007); the National Institute for Health Research (NIHR) University College London Hospitals Biomedical Research Centre; the Cancer Research UK–University College London Centre; the Experimental Cancer Medicine Centre; the Breast Cancer Research Foundation (US) BCRF-22-157; Cancer Research UK Early Detection an Diagnosis Primer Award (Grant EDDPMA-Nov21/100034); and The Mark Foundation for Cancer Research Aspire Award (Grant 21-029-ASP). This work was supported by a Stand Up to Cancer‐LUNGevity-American Lung Association Lung Cancer Interception Dream Team Translational Research grant (grant no. SU2C-AACR-DT23-17 to S. M. Dubinett and A. E. Spira). Stand Up To Cancer is a division of the Entertainment Industry Foundation. Research grants are administered by the American Association for Cancer Research, the Scientific Partner of SU2C. C.S. is in receipt of an ERC Advanced Grant (PROTEUS) from the European Research Council under the European Union’s Horizon 2020 research and innovation programme (grant agreement no. 835297).

Author information

Authors and Affiliations




C.A., A.M.F., N.J.B., N.M. and C.S. co-wrote the manuscript. C.A., A.M.F., N.J.B., J.S. and C.S. conceived the study design. C.A., A.M.F., J.K., K.G., C.P., D.B., T.L.C., J.W., C.M.-R., M.A.B., O.P., T.B.K.W., E.L.L., A. Huebner, D.A.M., R. Salgado., F.G., A.J.P., E.M., D.E.C., C.T.H., M.J.-H. and N.J.B. integrated clinicopathological data, transcriptomic data, exome data and ctDNA data. C.A., T.H., A.G., A.L., J.S., M.R.S., K.L., L.J., C.P. and C.S. worked to develop and validate the MRD calling algorithm used in this manuscript. A.M.F. developed ECLIPSE and performed analyses of clonal composition used in this manuscript. A.G., M.M., A.C., L.J., P.R. and R.D.D. conducted AMP NGS experimental work for ctDNA data. K.G. performed GISTIC copy-number analysis. S.V., S.W., N.C., J.R., R.D.D., M.M., A.C. and J.A.S. provided oversight of TRACERx patient sample storage and/or DNA extraction and/or sequencing of patient samples. T.L.C., J.W. and H.J.W.L.A. performed radiomic analyses of baseline CT scans. T.H., M.R.S., A.G., A.S., A.O. and A.L. conducted ArcherDx variant selection, PSP design and informatic processing of AMP data. A.M.F., K.G., M.A.B., O.P., T.B.K.W., E.L.L., A. Huebner, D.E.C. and N.M. conducted multiregion sequencing and phylogenetic tree analyses and identified TRACERx variants for PSP design. D.A.M. conducted the pathological review. A. L’Hernault, A.G., L.H., P.R., H.B. and N.G.-H. designed and conducted analytical validation experiments of the AMP MRD assay. C.A. and T.H. designed and conducted in silico specificity experiments for the AMP assay. D.B. and N.J.B. conducted ORACLE analyses. C.A. and T.K. conducted reviews of radiological imaging reports. R.M.K., D.H., D.S., G.I.E. and J.C.B. gave advice on analyses performed in this paper. M.J.-H., J.A.S. and C.S. designed the study protocols. A. Hackshaw gave statistical advice. C.A., N.M., M.J.-H., N.J.B. and C.S. provided overall study oversight. All of the authors approved the final version of the manuscript.

Corresponding authors

Correspondence to Christopher Abbosh, Nicholas McGranahan or Charles Swanton.

Ethics declarations

Competing interests

C.A. has received speaking honoraria or expenses from AstraZeneca and Bristol-Myers Squibb and reports employment at AstraZeneca. C.A. and C.S. are listed as inventors on a European patent application relating to assay technology to detect tumour recurrence (PCT/GB2017/053289). This patent has been licensed to commercial entities and, under their terms of employment, C.A and C.S are due a revenue share of any revenue generated from such license(s). C.A. and C.S. declare a patent application (PCT/US2017/028013) for methods to detect lung cancer. A.M.F., C.A. and C.S. are named inventors on a patent application to determine methods and systems for tumour monitoring (PCT/EP2022/077987). C.A., C.S., K.L., C.P., T.H., L.J., M.R.S., A.G. and A. Licon are named inventors on a provisional patent protection related to a ctDNA detection algorithm. S.V. is listed as a co-inventor on a patent of methods for detecting molecules in a sample (US patent, 10,578,620). T.H., A.G., M.M., A.C., A.S., A.O., L.J., P.R., M.R.S., R.D.D., A.L. and J.S. are former or current employees of Invitae or ArcherDx and report stock ownership. D.B. reports personal fees from NanoString and AstraZeneca and has a patent (PCT/GB2020/050221) application on methods for cancer prognostication. M.A.B. has consulted for Achilles Therapeutics. D.A.M. reports speaker fees from AstraZeneca, Eli Lilly and Takeda; consultancy fees from AstraZeneca, Thermo Fisher Scientific, Takeda, Amgen, Janssen, MIM Software, Bristol-Myers Squibb and Eli Lilly; and has received educational support from Takeda and Amgen. N.G.-H., A. L’Hernault, H.B., D.H., D.S. and J.C.B. report stock ownership and employment at AstraZeneca. A. Hackshaw has received fees for being a member of independent data monitoring committees for Roche-sponsored clinical trials, and academic projects co-ordinated by Roche. C.T.H. has received speaker fees from AstraZeneca. M.J.-H. has consulted for, and is a member of, the Achilles Therapeutics scientific advisory board and steering committee; has received speaker honoraria from Pfizer, Astex Pharmaceuticals, Oslo Cancer Cluster; and is listed as a co-inventor on a European patent application relating to methods to detect lung cancer (PCT/US2017/028013). This patent has been licensed to commercial entities and, under terms of employment, M.J.-H. is due a share of any revenue generated from such license(s). N.J.B. is listed as a co-inventor on a patent to identify responders to cancer treatment (PCT/GB2018/051912), has a patent application (PCT/GB2020/050221) on methods for cancer prognostication and a patent on methods for predicting anti-cancer response (US14/466,208). H.J.W.L.A. has received personal fees and stock from Onc.AI, Sphera and Love Health, and speaking honoraria from Bristol-Myers Squibb. K.L. has a patent (CA3068366A) on indel burden and CPI response pending and speaker fees from Roche tissue diagnostics and Ellipses Pharmaceuticals, research funding from CRUK TDL/Ono/LifeArc alliance, Genesis Therapeutics and consulting roles with Monopteros Therapeutics and Kynos Therapeutics (all outside of this work). N.M. has received consultancy fees and has stock options in Achilles Therapeutics; and holds European patents relating to targeting neoantigens (PCT/EP2016/059401), identifying patient response to immune checkpoint blockade (PCT/EP2016/071471), determining HLA LOH (PCT/GB2018/052004) and predicting survival rates of patients with cancer (PCT/GB2020/050221). C.S. acknowledges grant support from AstraZeneca, Boehringer-Ingelheim, Bristol-Myers Squibb, Pfizer, Roche-Ventana, Invitae (previously Archer Dx, collaboration in minimal residual disease sequencing technologies), Ono Pharmaceutical and Personalis; he is an AstraZeneca advisory board member and chief investigator for the AZ MeRmaiD 1 and 2 clinical trials and is also co-chief investigator of the NHS Galleri trial funded by GRAIL and a paid member of GRAIL’s scientific advisory board. He receives consultant fees from Achilles Therapeutics (also a member of the scientific advisory board), Bicycle Therapeutics (also a member of the scientific advisory board), Genentech, Medicxi, Roche Innovation Centre–Shanghai, Metabomed (until July 2022) and the Sarah Cannon Research Institute; has received honoraria from Amgen, AstraZeneca, Pfizer, Novartis, GlaxoSmithKline, MSD, Bristol-Myers Squibb, Illumina and Roche-Ventana; had stock options in Apogen Biotechnologies and GRAIL until June 2021, and currently has stock options in Epic Bioscience, Bicycle Therapeutics, and has stock options and is co-founder of Achilles Therapeutics; and holds additional patent applications related to targeting neoantigens (PCT/EP2016/059401), identifying patient response to immune checkpoint blockade (PCT/EP2016/071471), determining HLA LOH (PCT/GB2018/052004), predicting survival rates of patients with cancer (PCT/GB2020/050221), identifying patients who respond to cancer treatment (PCT/GB2018/051912) and both a European and US patent application related to identifying insertion/deletion mutation targets (PCT/GB2018/051892).

Peer review

Peer review information

Nature thanks Aadel Chaudhuri and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 TRACERx ctDNA cohort sequencing parameters.

A. Stacked bar plot of patient specific panels (PSPs) designed from primary tumour sequencing data showing the number of clonal (dark red) and subclonal (light red) variants per panel. Variants lacking clonality information are displayed in grey (median of 3 variants per patient [1-20], these mutations are either no longer called by TRACERx or called by ArcherDx but not TRACERx, see methods). A median of 126 clonal variants (range 21 to 195) and 64 subclonal variants (range 0 to 174) were tracked by the PSPs. Clonality was determined by PyClone analyses of multi-region exome data derived from primary resections of NSCLC (methods), in the absence of PyClone data, variants present in all multi-region sequenced tumour samples were called clonal. B. Violin plot demonstrating the % of subclonal clusters derived from multi-region tumour exome data tracked by PSPs on a per patient basis. A median of 88% of the subclonal mutation clusters present in each patient’s multi-region exome derived phylogenetic tree were tracked [range 0-100]. 184 tumours with phylogenetic trees were included. C. Distribution of cfDNA input values for the cohort, median input of 23 ng, n = 1069 samples. Capping at 60 ng input was performed for some of the cohort explaining the peak at this value; for the remainder of the cohort, all cfDNA extracted was input into the assay (colours represent different cfDNA input categories as indicated). D. Histogram demonstrating the distribution of per-variant unique sequencing depth values across the cohort; unique depth refers to error-controlled depth achieved across a position targeted by a PSP (at least 5 unique molecular identifier (UMI) matched reads required to create a consensus error-controlled read, see methods). The median unique depth per-variant tracked by a PSP was 2226x (range 0 to 53789x, n = 201910). E. Correlation between cfDNA input (ng, Y axis) into the assay and the median UMI-corrected depth achieved across a PSP across 1069 plasma timepoints (X axis). Spearman’s R value = 0.63 and two-sided P value < 2.2e-16. F. Association between median deduplication ratio achieved in a sample (Y-axis) and cfDNA input into the assay (ng, X-axis); duplication ratio refers to the median number of duplicate UMI-supported reads within a read family. Resequencing of samples where the median duplication ratio was less than 10 was performed where possible to maximize recoverable information from cfDNA samples, given that 5 UMI-supported reads are required to make a UMI family. 17 of 1069 evaluated cfDNA samples exhibited a final median deduplication ratio less than 5 (corresponds to the horizontal line on the plot). Colours correspond to different cfDNA input categories and match panel c. G-H. Boxplots demonstrating the error rates (%, Y axis) per each of 96 mutation trinucleotide contexts (X axis, 192 mutation trinucleotide contexts [TNCs] simplified to 96 reverse-complement identical mutation types), plots divided by transition event (G) and transversion event (H). Background position data from n = 1069 cell-free DNA libraries utilized to generate plots, variants predicted to exhibit low background error rates from pilot data analyses were prioritized for PSP design. Hinges correspond to first and third quartiles, whiskers extend to the largest/smallest value no further than 1.5x the interquartile range. Centre lines represent medians.

Extended Data Fig. 2 MRD calling thresholds and analytical validation.

A-D. Pre- and postoperative MRD caller P values (Y axis MRD caller P value, one-sided Poisson test, see Methods) observed in pilot-phase of the project. X axis displays clonal ctDNA levels. A. Postoperative samples from n = 5 patients who did not have recurrence of their NSCLC; all n = 55 patient samples had caller P values in excess of P > 0.1 threshold meaning that they were deemed negative for ctDNA. B. Postoperative caller P values observed in n = 5 patients who had relapse of their NSCLC. 1 of 13 calls was made between caller P values of 0.1 and 0.01, the remaining 12 calls were made at a caller P value less than 0.01. C. Preoperative ctDNA calls from pilot cohort; 7 patients had positive ctDNA in plasma prior to surgery, all calls were made at caller P values < 0.01. D. In-silico simulation analysis to assess MRD caller specificity. 3157 mock MRD panels were generated within the evaluable pilot patient libraries and MRD caller P values were assessed. At a caller P value < 0.1 threshold, 121/3157 simulated mock panels were ctDNA positive (in-silico specificity of 96.2%); at a caller P value threshold < 0.01, 22/3157 simulated mock panels were ctDNA positive (in-silico specificity of 99.3%). E-F. Analytical validation of 50 variant MRD detection panels. E. Fragmented DNA with a known single nucleotide polymorphism (SNP) profile was spiked into a second background of fragmented DNA with a different SNP profile and a patient-specific panel targeted 50 alternate positions present in spiked-in DNA. 559 data points were generated across different DNA input quantities indicated, to establish the limit of detection plots. The Y axis and centre of the error bars demonstrate sensitivity (defined as the proportion of all repeats that resulted in MRD detection using a caller P value of 0.01). The confidence intervals on the plot are Clopper-Pearson confidence intervals (95% CIs). The X axis shows the quantity of variant germline DNA that was spiked into each repeat expressed as a percentage of total DNA in that sample. F. Circulating tumour DNA samples with high variant allele fractions were spiked into a different cell-free DNA background. Variant positions in ctDNA were targeted with a 50 variant panel; 100 data points were generated across the DNA input quantities indicated. Axes and error bars are the same as (E). G. Data from analyses of 48 blank samples donated by 24 healthy participants, caller P values are displayed. H. Barplots demonstrating the intended allele frequencies and the measured allele frequencies in the different spike-ins presented in part (E) and part (F) only data from variant DNA positive samples are presented. The colours of the barplot represent different DNA input masses as shown by the legend. The error bars on the plot represent the mean value of all positive spike-in samples +/− standard deviation of the values. Where the error bar is absent, this is because at this spike-in level and DNA input mass, only one positive sample was observed. Where the error bar led to an observed mean AF less than 0, the error bar was stopped at 0 for visualization purposes (the 0.05% spike-in, 2 ng input mass case). The horizontal dashed lines correspond to 0.1%, 0.05%, and 0.01% spike-in categories. Each data point is represented on the plots by a circle. n = 369 variant DNA positive samples displayed in LOD1 barchart, n = 93 variant DNA positive samples displayed in LOD2 barchart. I. Comparison between the content of cell-free DNA input into ddPCR reactions (yellow) and AMP PCR reactions (blue). Hinges correspond to first and third quartiles, whiskers extend to the largest/smallest value no further than 1.5x the interquartile range. Centre lines represent medians. Each dot on the plot represents a data point, lines connect paired samples from the same patient. Significantly more cell-free DNA was input into ddPCR reactions (paired two-sided Wilcoxon-test P = 0.01366). J. Orthogonal comparison between ctDNA detection based on AMP panels used in TRACERx and ddPCR against a single clonal variant. ddPCR ctDNA positive call threshold was two mutant droplets (bottom table) and one mutant droplet (top table). Percentage positive agreement (PPA) and percentage negative agreement (NPA) using ddPCR as the comparator is displayed in the table. Two-sided Fisher’s test P values are demonstrated under the cross tables. K. A 300 mutation patient-specific panel was designed and applied to 10 ng DNA samples containing spike-in variant levels from 0% to 0.1%. In silico sub-sampling of the 300 mutations was performed (3 x 200 mutation in silico panels, 3x 100 mutation in silico panels and 3x 50 mutation in silico panels, see methods) and sensitivities are categorized by the number of mutations targeted by the panel.

Extended Data Fig. 3 Preoperative ctDNA detection.

A Flow diagram demonstrating different cohorts analysed in this manuscript; the top part of the flow diagram shows the total number of plasma samples that were intended to be analysed (n = 1095 from 197 patients) which reduced to 1069 samples due to single nucleotide polymorphism mismatches between cfDNA and tissue exome data in 26 cases, suggesting sample swap. These samples were analysed in 3 main cohorts, the pilot cohort (left), the preoperative cohort (middle), and the postoperative cohort (right). The postoperative cohort was divided into different categories based on landmark evaluability (relating to samples donated within 120 days of surgery to enable a landmark ctDNA analysis). B. Heatmap demonstrating individual tumour-specific clonal ctDNA fractions in patients with synchronous primaries diagnosed at baseline. The annotation rows of the heatmap show the ctDNA call present in that sample across all variants interrogated by the MRD caller, the highest pathological TNM stage, the individual histology, and individual tumour volumes of the two synchronous tumours present at baseline (for this category, grey represents absent data or volume unevaluable). C. Boxplot demonstrating the difference in pack-year history across 187 preoperative ctDNA positive NSCLC patients and preoperative ctDNA negative NSCLC patients. Hinges correspond to first and third quartiles, whiskers extend to the largest/smallest value no further than 1.5x the interquartile range. Centre lines represent medians. P value represents a Wilcoxon rank sum test. D. Kaplan-Meier curves demonstrating freedom from recurrence outcomes in ctDNA high (dark red), ctDNA low (blue), and ctDNA negative (grey) single primary adenocarcinoma patients (left) and single primary non-adenocarcinoma patients (right). ctDNA high and low were categorized based on median clonal ctDNA levels across ctDNA positive cases and relate to above and below 0.16%. Log-rank P values are displayed on each plot. E. Multivariable Cox regression analyses of Overall Survival (OS) and Freedom From Recurrence (FFR, defined as recurrence only) in patients with single (non-synchronous) NSCLC; evaluating ctDNA detection status, pTNM stage (Tumour Node Metastasis pathological stage version 7, categories I, II or III), whether adjuvant therapy was administered, age, and log10-transformed unique sequencing depth as predictors in adenocarcinomas and non-adenocarcinomas separately. Unique sequencing depth was included to adjust for under sequenced samples, representing potential false negatives. n = 88 adenocarcinoma patients and n = 81 non-adenocarcinoma patients were analysed for FFR and OS. On the forest plots, the diamond represents the multivariable Hazard Ratio (HR) with error-bars corresponding to 95% confidence intervals (CI). Multivariable P values (p) are displayed on the plot alongside the number of patients in each category (N). Reference categories were ctDNA positive patients, pTNM stage I patients and patients given adjuvant therapy. The exact Cox regression P value for the Outcome: ctDNA -ve category in the FFR adenocarcinoma plot = 0.00022. F. Heatmap showing the site of relapse in recurrent adenocarcinoma cases divided by whether preoperative ctDNA was detected (dark red, right) or undetected (grey, left). Intrathoracic (mediastinum, locoregional, ipsilateral lung, distant lung – green colours) or extrathoracic (bone, brain, liver, adrenal, extrathoracic lymph nodes or other extrathoracic site – red colours) sites of relapse are shown (sites shown are metastatic sites diagnosed within 180 days of clinical relapse). Heatmap is annotated by Tumour Node Metastasis pathological version 7 stage. G. Kaplan-Meier curve demonstrating post-relapse survival in recurrent adenocarcinoma patients (n = 38) stratified by preoperative ctDNA positive (red) or preoperative ctDNA negative (grey). Log-rank P value is displayed on the plot.

Extended Data Fig. 4 Volume and phenotypic analysis of ctDNA positive and ctDNA negative adenocarcinomas.

A. Flow chart demonstrating patients available for volumetric analyses and reasons for exclusion. B. Histogram showing the number of NSCLC cases by volume, with ctDNA positive samples shown as red bars, and ctDNA negative samples shown as grey bars. n = 150 volume evaluable cases. C. Volume versus log10-transformed clonal ctDNA level correlation plot with each individual TRACERx case that was ctDNA positive as a point and coloured by adenocarcinoma status (dark red) and squamous or other histology (dark blue). Fitted line represents a linear model line categorized by tumour histology. Below the correlation plot is a table describing a linear multivariable model based on these data to predict log10-transformed clonal ctDNA levels based on tumour volume and histology (adenocarcinoma and squamous and other categories). P values represent linear model adjusted P values, n = 96 ctDNA positive, volume evaluable NSCLCs analysed. D. Based on a multivariable linear regression model fitted to the data in (C), we categorized ctDNA negative adenocarcinomas as biological low-shedders or technical non-shedders (see methods). If a particular tumour volume resulted in an estimated clonal mutation ctDNA level above the clonal ctDNA level a library could detect (95% lower confidence interval for estimated clonal ctDNA level based on tumour volume is above detectable clonal ctDNA level in the preoperative cfDNA library from that patient), then the case was classed as a probable biological low-shedder (red on histogram); otherwise, the case was classed as a probable technical non-shedder (turquoise on histogram). Y axis represents the lower 95% confidence estimate for clonal mutation ctDNA level divided by the minimally detectable clonal mutation ctDNA level (MDCL) for that patient’s panel. The X axis is each individual patient analysed. Data from n = 47 ctDNA negative adenocarcinomas presented. E. Violin box-plots comparing tumour purity in ctDNA low-shedder adenocarcinomas (blue, n = 79 tumour regions from 28 patients) and ctDNA positive adenocarcinomas (red, n = 166 tumour and lymph node regions from 35 patients). Pairwise comparisons are performed using linear mixed-effects models, P values are two-sided. Boxplot hinges correspond to first and third quartiles, whiskers extend to the largest/smallest value no further than 1.5x the interquartile range and centre lines represent medians. Violins represent the distribution of the underlying data. F. Barplots showing gene-level driver alterations between ctDNA positive adenocarcinomas (n = 39 patients) and ctDNA negative low-shedder adenocarcinomas (n = 31 patients). Colours denote ctDNA detection status. Y axis shows the top 14 most frequently altered genes, X axis shows the percentage of patients carrying an alteration in the gene per detection category. NS: Not significant (two-sided Fisher’s exact test with FDR P value adjustment). G. Pathway-level driver mutations between ctDNA positive adenocarcinomas (n = 39 patients) and ctDNA negative low-shedder adenocarcinomas (n = 31 patients). X axis shows patient IDs, Y axis shows pathways following the Sanchez-Vega definition. Top bar denotes ctDNA detection status (dark red represents ctDNA positives, blue represents biological low-shedders). Heatmap colours display mutations; blue denote clonal mutations and red denote subclonal mutations. No pathway showed significant enrichment in either ctDNA shedder or non-shedder adenocarcinomas (NS: Not significant, using two-sided Fisher’s exact test with FDR P value adjustment). H. Whole genome doubling status per tumour comparing ctDNA positive adenocarcinomas to ctDNA negative low-shedder adenocarcinomas, using two-tailed Fisher’s exact test. Yellow represents the number of tumours subjected to whole genome doubling in at least one region, turquoise represents tumours without any whole genome doublings. I. Volume by ctDNA shedding status. Biological non-shedders in red represent the smallest quartile samples. After removal of these from the analysis, no significant difference in tumour volume was found between ctDNA positives and ctDNA low-shedders. Pairwise comparisons are made with two–sided Wilcoxon rank sum tests. J. Venn diagram showing the overlap between significantly differentially expressed genes between ctDNA positive and ctDNA low shedder adenocarcinomas obtained from the full dataset, relative to the volume-adjusted dataset. Comparisons are made by computing the Jaccard similarity index and the corresponding two-sided P value using the exact method. K. Venn diagram showing the overlap between significantly altered cytobands as called by GISTIC, comparing ctDNA positive to ctDNA low shedder adenocarcinomas obtained from the full dataset, relative to the volume-adjusted dataset. Statistical testing follows (J).

Extended Data Fig. 5 Exploration of unexpected MRD positive results in non-relapse patients.

A. Table demonstrating details of unexpected ctDNA positive results in patients who did not have disease recurrence. B. CRUK0498 false positive analysis: Dot-plots represent confidently detected variants at illustrated cfDNA sampling timepoints (left panel), variants confidently detected in normal tissue, control DNA, and peripheral-blood mononuclear cell (PBMC, buffy-coat) DNA based on application of CRUK0498’s patient specific panel to these respective samples (middle panel) and the mutant allele frequencies of selected variants in tumour tissue exome data (right panel). The four variants in the legend (variants in genes ATP2C1, DDIT4L, EYS, and TUSC3) represent variants confidently called at 50% or more of the timepoints across the cfDNA samples (note that confidently called means an individual variant Poisson one-sided P value of <0.01 [generated by MRD caller, see methods]). C. A haematoxylin and eosin image from patient CRUK0498’s tumour where exome analysis detected the variants in genes ATP2C1, DDIT4L, EYS and TUSC3 at high variant allele-frequencies. This image shows a dense lymphocyte aggregate in this tumour region. Scale bar below image. A single image was analysed. D. A further 19 preoperative PBMC samples were analysed from TRACERx patients; no confident panel-wide variant DNA calls were made in these patients’ PBMC samples using the MRD calling algorithm. E. Variant-level analyses of the preoperative PBMC samples analysed in panel (D) highlighted that 12 of 3621 variants interrogated by the panels were detected (variant level one-sided Poisson P value < 0.01). 8 of 12 detected variants were removed from the MRD caller algorithm in cell-free DNA analyses (cfDNA) due to triggering filters highlighted in the heatmap annotation. Only 2 of the 4 remaining variants carried deep alternate reads in the respective patients’ preoperative cfDNA sample (red arrows). The heatmap shows the cfDNA variant allele frequency and the WBC variant allele frequency of the detected variants (grey colour represents no detection of the variant). Two mistargeted germline variants are highlighted by black arrows for patient CRUK0296, variants were targeted in error by the industry panel design pipeline but not by the TRACERx exome pipeline (methods), and were filtered from the MRD calling algorithm due to triggering the outlier filter (dao imbalance filter, dark red).

Extended Data Fig. 6 Expanded postoperative ctDNA and imaging surveillance analysis.

A. Analysis of 13 patients who experienced intracranial relapse who were positive for ctDNA in a postoperative blood sample. The X axis shows the clonal ctDNA level at the point of postoperative ctDNA detection and the Y axis shows the day of postoperative ctDNA detection. Points are coloured based on whether the intracranial relapse was solitary (green), accompanied by another extracranial site (red), or unconfirmed solitary (blue, no extracranial imaging performed) and are shaped by landmark ctDNA status. B. Heatmap of clonal mutation ctDNA level data at first postoperative ctDNA detection. The annotation rows show the landmark ctDNA status of the patient (landmark positive, ctDNA detected within 120 days postoperatively; landmark negative, ctDNA negative within 120 days postoperatively; unevaluable, landmark status cannot be established), the day ctDNA was detected postoperatively, the histology of the primary tumour, and lead time (days from ctDNA detection to clinical relapse). Where lead time was not applicable (for example incompletely resected disease, ctDNA detected post-relapse, see methods) lead time is coloured grey. The next two rows (bar charts) demonstrate the number of clonal or subclonal mutations tracked by an AMP patient-specific panel (PSP); if the bar is blue, it represents confident detection of an individual variant (based on an individual variant P value of <0.01 [one sided Poisson test based on MRD caller output, see methods]), if the bar is black, it represents absence of confident calling of a variant, if the bar is red, it represents that a variant was filtered by the MRD calling algorithm. The final row represents the mean clonal ctDNA level at the first ctDNA detection time point for a patient. This is on a log-10 scale as displayed in the heatmap legend. For patient CRUK0296, ctDNA detection occurred but clonal ctDNA levels were 0% (grey bar) as the mutation driving ctDNA detection postoperatively did not have a clonal status. C Longitudinal per-patient plots in 12 patients who were ctDNA positive prior to adjuvant therapy. Plots are annotated with lead time (L-t), scans performed, and treatment administered (see legend). The Y axis represents clonal ctDNA levels and each circle on the plot represents a blood sampling time point. If the circle is red, it indicates that the blood sample was positive for ctDNA using the MRD caller. The X axis displays days post-surgery. D-E. Kaplan-Meier curves in the landmark evaluable population (patients who donated blood within 120 days post-surgery before treatment or clinical recurrence, n = 102/108 landmark evaluable patients were evaluable for survival analysis, see methods for exclusions) showing overall survival (OS,D) or freedom from recurrence (FFR,E) outcomes for landmark positive (dark red) versus landmark negative (grey) patients. Log-rank P values displayed on curves. F. Boxplots showing the distribution of lead times (times from ctDNA detection to clinical recurrence) categorized by patient landmark ctDNA status. Hinges correspond to first and third quartiles, whiskers extend to the largest/smallest value no further than 1.5x the interquartile range. Centre lines represent medians. Kruskal-Wallis test P = 0.0057, unadjusted pairwise Wilcoxon-tests compare individual categories, n = 63 patients analysed. G. Pie charts demonstrate the number of occurrences of specified ctDNA detection statuses (red – ctDNA negative, green – ctDNA positive, blue – no ctDNA status established), preceding a scan showing no new changes (left) or new equivocal extracranial changes (middle). The ctDNA positive and negative categories are then broken down further into a patient-level analysis showing the outcomes of patients who experienced the occurrence of the specified imaging and ctDNA status event(s). H. Barchart showing the count of specific equivocal anatomical sites noted on scans showing new equivocal changes; equivocal lung lesions and lymph nodes were the most common abnormal equivocal findings on NSCLC surveillance imaging. Multiple equivocal sites can be observed on one scan. I. Barplot of eventual site of relapse and ctDNA status in 33 patients with ctDNA status established prior to surveillance imaging, showing new equivocal lymph node enlargement. The X axis shows the patient ctDNA detection status preceding surveillance scans. The Y axis shows the patient count. Patient CRUK0090 exhibited occurrences of both negative and positive ctDNA statuses prior to separate equivocal lymphadenopathy scans, so is present in both ctDNA positive and negative categories. Other patients are only included once. Patient CRUK0234 was diagnosed with an unresected lymph node, was ctDNA negative postoperatively and included in the analysis. The barcharts are filled with recurrence status of patients in these categories. Recurred with LN refers to lymph node involvement at relapse (dark red colour). Recurred with no LN refers to recurrence with no lymph node involvement (green colour).

Extended Data Fig. 7 ECLIPSE methodology.

A. A conceptual overview of the ECLIPSE method and data input types. CCF; cancer cell fraction and VAF; variant allele fraction. The schematic was created using BioRender. B. Equation to calculate tumour purity (the % of cells from which the DNA was derived which are tumour cells, see supplementary note 1, also termed ‘cellularity’ or ‘aberrant cell fraction’) using clonal mutations. C. Equation to calculate cancer cell fraction (CCF). Multiplicity = the number of mutated DNA copies in each mutated cell, CNt = total copy number in the tumour, CNn = total copy number in normal (non-tumour) cells, VAF = variant allele fraction, P = tumour purity (the % of cells from which the DNA was derived which are tumour cells, see Supplementary Note 1). D. Percentage change in mean multiplicity of clonal mutations comparing measurements in surgical excised tissue samples to tissue samples taken at relapse (46 patients with paired primary and recurrence tissue samples plotted). E. A comparison between mean clonal VAF of mutations and ctDNA tumour purity as calculated by ECLIPSE where data points (plasma samples) are coloured by the average copy number of tracked clonal mutations (measured using tissue sequencing). Multi-tumour patients and samples with evidence of copy number of instability at relapse are excluded. A total of 322 samples from 134 patients are plotted.

Extended Data Fig. 8 Subclone detection sensitivity of ECLIPSE.

A. Minimally detectable CCF for each ctDNA positive sample compared to clonal ctDNA levels for each sample. All ctDNA positive samples included (N = 354). Minimally detectable CCF was calculated using the minimum number of required reads for a positive (P < 0.01) clone detection call (methods). B. Minimally detectable CCF over time for each patient with a horizontal line indicating the threshold for high subclone sensitivity samples (20% CCF). All ctDNA positive samples included (N = 354). 61% of preoperative MRD positive samples were considered high subclone sensitivity and 66% of postoperative samples were considered of high subclone sensitivity (overall 64% of samples). C. A histogram of clonal ctDNA levels for all ctDNA positive samples (N = 354) with vertical lines indicating thresholds for ECLIPSE evaluability and for traditional clonal deconvolution evaluability used for TRACERx tissue samples28 and previous clonal deconvolution approaches in ctDNA14,77. D. A histogram of maximum clonal ctDNA levels observed in post-operative samples for each patient with vertical lines indicating thresholds for ECLIPSE evaluability and for traditional clonal deconvolution evaluability (see C). This is shown for 66 patients who relapsed with ctDNA positive postoperative plasma . E. Validation of ECLIPSE detection rates across varying subclonal mutation number, clonal ctDNA level, subclone cancer cell fraction and DNA input amount into the assay. Subclones were constructed using ground truth in vitro spike-in experiments with 10-12 technical replicates for each input mass-allele fraction combination. These ground truth mutant allele fractions were then mixed in silico to construct 76,263 subclones varying across these parameters. Data from these experimentally derived subclones were then run through ECLIPSE and subclone detection rates across each of these parameters depicted.

Extended Data Fig. 9 Time-matched comparisons between subclonal structure measured in plasma and in tissue at surgery.

A. Correlation between cancer cell fractions (CCFs) as measured in preoperative plasma samples with phylogenetic data, >0.1% clonal ctDNA level & >=10 ng DNA input (high subclone sensitivity samples) with ECLIPSE and those measured with multi-region tissue sequencing (M-seq) at surgery (N = 71 patients and 684 subclones included). B. Copy number unaware CCFs calculated only using VAFs (methods) compared to tissue CCF from M-seq. All preoperative samples with phylogenetic data, >0.1% clonal ctDNA level & >=10 ng DNA input (high subclone sensitivity samples) were included (N = 71 patients and 684 subclones included). C. A scatter plot demonstrating the relationship between clonal ctDNA level and the proportion of multi-region tumour exome (M-seq) defined subclones detected by ECLIPSE based on varying subclonal cancer cell fractions as indicated, loess lines are fitted to the plots, n = 117 ctDNA positive preoperative samples. D. A comparison of preoperative plasma CCFs and the average CCFs across all tissue regions sampled at surgery for clones that were unique to one tumour tissue region and for clones that were distributed across more than two tumour tissue regions. N = 71 patients and 684 subclones included. A Wilcoxon-test was used to compare groups. E. A comparison of preoperative plasma CCFs and the average CCFs across all tissue regions sampled at surgery for clones that were unique to one tumour tissue region separated between small (<20 cm3), medium (>20 cm3 & <100 cm3), and large (>100 cm3) tumours as measured on preoperative PET/CT scans. N = 71 patients and 684 subclones included. A Wilcoxon-test was used to compare groups. F. A comparison of detection rates in preoperative plasma for 20% CCF subclones across a range of clonal ctDNA levels split by whether the subclones were spread across multiple primary tumour tissue regions or were limited to only a single primary tumour tissue region. 1924 subclones were assessed in 197 preoperative plasma samples. G. A map of tumour clones with areas of multi-regional tissue sampling indicated and clones which are over- and undersampled highlighted. Most of the undersampled clones are in fact not in the sampled areas creating a bias towards oversampling in clones which we are able to detect, an effect also called the ‘winner’s curse’. H. A ROC curve describing the sensitivity and specificity of detecting clonal illusion mutations using plasma-based CCFs with 95% confidence intervals generated using bootstrapping across 500-fold cross-validation (N = 71 tumours).

Extended Data Fig. 10 Clonal composition measurements in ctDNA after surgery.

A. An overview of clonal structure evaluability at relapse for TRACERx patients in our cohort (N = 75 tumours) using either cell-free DNA and ECLIPSE or relapse tissue and WES/PyClone. B. ctDNA detection status post-operatively of subclones split by detection status in metastatic tissue. Untracked subclones (those without any mutations included in the PSP panels) were excluded (N = 26 tumours). P value indicates the result from Fisher’s exact test. C. Clonal (estimated as present in 100% of tumour cells) vs subclonal (estimated as present in <100% of cells) status at relapse of primary tumour subclones by whether they were detected in cfDNA and metastatic tissue or cfDNA alone (N = 26 tumours). P value indicates the result from a Fisher’s exact test. D. Metastatic dissemination class determined by tissue and by cfDNA in 22 cases with a metastatic biopsy, a postoperative high subclone sensitivity plasma sample, and a phylogenetic tree constructed. E. Overall survival Kaplan-Meier plot demonstrating time from the first MRD positive timepoint to death stratified by ECLIPSE metastatic dissemination class at relapse (monoclonal: light blue, polyclonal polyphyletic: purple, and polyclonal monophyletic: green). HR: Hazard ratio, CI: confidence interval. 44 patients were included in this analysis. The P value indicates the result of a log-rank test. F. A multivariable Cox proportional hazards model to predict overall survival from the time of first MRD detection including the clonality of metastatic dissemination at relapse, stage, maximum postoperative clonal ctDNA level, average DNA assay input, histology, and whether the first plasma sample after surgery was ctDNA positive, including only relapse patients. 44 patients were included in this analysis. Error bars indicate 95% confidence intervals. G. The frequency of high confidence subclonal to clonal bottlenecks (methods) at the latest possible plasma sample time point with sufficient clonal ctDNA level (high sensitivity subclone samples, N = 44 tumours) and which of these subclones harbour subclonal neoantigens (NAGs) which therefore become clonal at relapse. H. In cases of clonal bottlenecking at relapse, the percentage increase in the number of clonal mutations is shown as a box and whisker plot with the absolute number of new clonal mutations (N = 18 tumours). I. In cases of clonal bottlenecking at relapse, the percentage increase in the number of clonal NAGs is shown as a box and whisker plot with the absolute number of new clonal NAGs (N = 18 tumours). NAG = Neoantigen.

Supplementary information

Supplementary Note

Reporting Summary

Supplementary Fig. 1

Longitudinal subclonal analyses across all relapsing patients with available phylogenetic trees and at least one postoperative time point with high subclone sensitivity (n = 44 patients).

Supplementary Tables

Supplementary Tables 1–21.

Supplementary Data

Legends for Supplementary Tables 1-21.

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abbosh, C., Frankell, A.M., Harrison, T. et al. Tracking early lung cancer metastatic dissemination in TRACERx using ctDNA. Nature 616, 553–562 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer