A clonal expression biomarker associates with lung cancer mortality

Article metrics

Abstract

An aim of molecular biomarkers is to stratify patients with cancer into disease subtypes predictive of outcome, improving diagnostic precision beyond clinical descriptors such as tumor stage1. Transcriptomic intratumor heterogeneity (RNA-ITH) has been shown to confound existing expression-based biomarkers across multiple cancer types2,3,4,5,6. Here, we analyze multi-region whole-exome and RNA sequencing data for 156 tumor regions from 48 patients enrolled in the TRACERx study to explore and control for RNA-ITH in non-small cell lung cancer. We find that chromosomal instability is a major driver of RNA-ITH, and existing prognostic gene expression signatures are vulnerable to tumor sampling bias. To address this, we identify genes expressed homogeneously within individual tumors that encode expression modules of cancer cell proliferation and are often driven by DNA copy-number gains selected early in tumor evolution. Clonal transcriptomic biomarkers overcome tumor sampling bias, associate with survival independent of clinicopathological risk factors, and may provide a general strategy to refine biomarker design across cancer types.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Tumor sampling bias confounds lung cancer biomarkers.
Fig. 2: RNA inter- and intratumor heterogeneity quadrants.
Fig. 3: Clonal gene selection improves prognostic accuracy over conventional biomarker design and beyond clinicopathological risk factors.
Fig. 4: Pan-cancer prognostic relevance and the genomic underpinning of RNA heterogeneity quadrants.

Data availability

Sequence data used during the study are available through the Cancer Research UK & University College London Cancer Trials Centre (ctc.tracerx@ucl.ac.uk) for noncommercial research purposes, and access will be granted upon review of a project proposal that will be evaluated by a TRACERx data access committee and entering into an appropriate data access agreement subject to any applicable ethical approvals.

Code availability

Code is available at https://github.com/dhruvabiswas/tracerx-oracle.

References

  1. 1.

    Vargas, A. J. & Harris, C. C. Biomarker development in the precision medicine era: lung cancer as a case study. Nat. Rev. Cancer 16, 525–537 (2016).

  2. 2.

    Lee, W.-C. et al. Multiregion gene expression profiling reveals heterogeneity in molecular subtypes and immunotherapy response signatures in lung cancer. Mod. Pathol. 31, 947–955 (2018).

  3. 3.

    Gerlinger, M. et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N. Engl. J. Med. 366, 883–892 (2012).

  4. 4.

    Gulati, S. et al. Systematic evaluation of the prognostic impact and intratumour heterogeneity of clear cell renal cell carcinoma biomarkers. Eur. Urol. 66, 936–948 (2014).

  5. 5.

    Gyanchandani, R. et al. Intratumor heterogeneity affects gene expression profile test prognostic risk stratification in early breast cancer. Clin. Cancer Res. 22, 5362–5369 (2016).

  6. 6.

    Gulati, S., Turajlic, S., Larkin, J., Bates, P. A. & Swanton, C. Relapse models for clear cell renal carcinoma. Lancet Oncol. 16, e376–e378 (2015).

  7. 7.

    Beer, D. G. et al. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat. Med. 8, 816–824 (2002).

  8. 8.

    Bianchi, F. et al. Survival prediction of stage I lung adenocarcinomas by expression of 10 genes. J. Clin. Invest. 117, 3436–3444 (2007).

  9. 9.

    Garber, M. E. et al. Diversity of gene expression in adenocarcinoma of the lung. Proc. Natl Acad. Sci. USA 98, 13784–13789 (2001).

  10. 10.

    Kratz, J. R. et al. A practical molecular assay to predict survival in resected non-squamous, non-small-cell lung cancer: development and international validation studies. Lancet 379, 823–832 (2012).

  11. 11.

    Krzystanek, M., Moldvay, J., Szüts, D., Szallasi, Z. & Eklund, A. C. A robust prognostic gene expression signature for early stage lung adenocarcinoma. Biomark. Res. 4, 4 (2016).

  12. 12.

    Li, B., Cui, Y., Diehn, M. & Li, R. Development and validation of an individualized immune prognostic signature in early-stage nonsquamous non-small cell lung cancer. JAMA Oncol. 3, 1529–1537 (2017).

  13. 13.

    Raz, D. J. et al. A multigene assay is prognostic of survival in patients with early-stage lung adenocarcinoma. Clin. Cancer Res. 14, 5565–5570 (2008).

  14. 14.

    Shukla, S. et al. Development of a RNA-Seq based prognostic signature in lung adenocarcinoma. J. Natl Cancer Inst. 109, djw200 (2017).

  15. 15.

    Wistuba, I. I. et al. Validation of a proliferation-based expression signature as prognostic marker in early stage lung adenocarcinoma. Clin. Cancer Res. 19, 6261–6271 (2013).

  16. 16.

    Shedden, K. et al. Gene expression-based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study. Nat. Med. 14, 822–827 (2008).

  17. 17.

    Subramanian, J. & Simon, R. Gene expression-based prognostic signatures in lung cancer: ready for clinical use? J. Natl Cancer Inst. 102, 464–474 (2010).

  18. 18.

    Burrell, R. A., McGranahan, N., Bartek, J. & Swanton, C. The causes and consequences of genetic heterogeneity in cancer evolution. Nature 501, 338–345 (2013).

  19. 19.

    Boutros, P. C. The path to routine use of genomic biomarkers in the cancer clinic. Genome Res. 25, 1508–1513 (2015).

  20. 20.

    Blackhall, F. H. et al. Stability and heterogeneity of expression profiles in lung cancer specimens harvested following surgical resection. Neoplasia 6, 761–767 (2004).

  21. 21.

    Bachtiary, B. et al. Gene expression profiling in cervical cancer: an exploration of intratumor heterogeneity. Clin. Cancer Res. 12, 5632–5640 (2006).

  22. 22.

    Barranco, S. C. et al. Intratumor variability in prognostic indicators may be the case of conflicting estimates of patient survival and response to therapy. Cancer Res. 54, 5351–5356 (1994).

  23. 23.

    Jamal-Hanjani, M. et al. Tracking the evolution of non-small-cell lung cancer. N. Engl J. Med. 376, 2109–2121 (2017).

  24. 24.

    The Cancer Genome Atlas Research Network. Comprehensive genomic characterization of squamous cell lung cancers. Nature 489, 519–525 (2012).

  25. 25.

    The Cancer Genome Atlas Research Network. Comprehensive molecular profiling of lung adenocarcinoma. Nature 511, 543–550 (2014).

  26. 26.

    Djureinovic, D. et al. Profiling cancer testis antigens in non-small-cell lung cancer. JCI Insight 1, e86837 (2016).

  27. 27.

    Goldstraw, P. et al. The IASLC lung cancer staging project: proposals for the revision of the TNM stage groupings in the forthcoming (seventh) edition of the TNM classification of malignant tumours. J. Thorac. Oncol. 2, 706–714 (2007).

  28. 28.

    Okayama, H. et al. Identification of genes upregulated in ALK-positive and EGFR/KRAS/ALK-negative lung adenocarcinomas. Cancer Res. 72, 100–111 (2012).

  29. 29.

    Rousseaux, S. et al. Ectopic activation of germline and placental genes identifies aggressive metastasis-prone lung cancers. Sci. Transl. Med. 5, 186ra66 (2013).

  30. 30.

    Der, S. D. et al. Validation of a histology-independent prognostic gene signature for early-stage, non-small-cell lung cancer including stage IA patients. J. Thorac. Oncol. 9, 59–64 (2014).

  31. 31.

    Venet, D., Dumont, J. E. & Detours, V. Most random gene expression signatures are significantly associated with breast cancer outcome. PLoS Comput. Biol. 7, e1002240 (2011).

  32. 32.

    Tang, H. et al. Comprehensive evaluation of published gene expression prognostic signatures for biomarker-based lung cancer clinical studies. Ann. Oncol. 28, 733–740 (2017).

  33. 33.

    Chen, H.-Y. et al. A five-gene signature and clinical outcome in non-small-cell lung cancer. N. Engl J. Med. 356, 11–20 (2007).

  34. 34.

    Reka, A. K. et al. Epithelial–mesenchymal transition-associated secretory phenotype predicts survival in lung cancer patients. Carcinogenesis 35, 1292–1300 (2014).

  35. 35.

    Strauss, G. M. et al. Adjuvant paclitaxel plus carboplatin compared with observation in stage IB non-small-cell lung cancer: CALGB 9633 with the Cancer and Leukemia Group B, Radiation Therapy Oncology Group, and North Central Cancer Treatment Group study groups. J. Clin. Oncol. 26, 5043–5051 (2008).

  36. 36.

    Pignon, J.-P. et al. Lung adjuvant cisplatin evaluation: a pooled analysis by the LACE collaborative group. J. Clin. Oncol. 26, 3552–3559 (2008).

  37. 37.

    Goldstraw, P. et al. The IASLC lung cancer staging project: proposals for revision of the TNM stage groupings in the forthcoming (eighth) edition of the TNM classification for lung cancer. J. Thorac. Oncol. 11, 39–51 (2016).

  38. 38.

    Robinson, D. R. et al. Integrative clinical genomics of metastatic cancer. Nature 548, 297–303 (2017).

  39. 39.

    Danaher, P. et al. Gene expression markers of tumor infiltrating leukocytes. J. Immunother. Cancer 5, 18 (2017).

  40. 40.

    Loo, P. V. et al. Allele-specific copy number analysis of tumors. Proc. Natl Acad. Sci. USA 107, 16910–16915 (2010).

  41. 41.

    Lambrechts, D. Phenotype molding of stromal cells in the lung tumor microenvironment. Nat. Med. 24, 1277–1289 (2018).

  42. 42.

    Gentles, A. J. et al. The prognostic landscape of genes and infiltrating immune cells across human cancers. Nat. Med. 21, 938–945 (2015).

  43. 43.

    Uhlen, M. et al. A pathology atlas of the human cancer transcriptome. Science 357, eaan2507 (2017).

  44. 44.

    Rosenthal, R. et al. Neoantigen-directed immune escape in lung cancer evolution. Nature 567, 479–485 (2019).

  45. 45.

    Mlecnik, B. et al. Comprehensive intrametastatic immune quantification and major impact of immunoscore on survival. J. Natl Cancer Inst. 110, 97–108 (2018).

  46. 46.

    Yachida, S. et al. Distant metastasis occurs late during the genetic evolution of pancreatic cancer. Nature 467, 1114–1117 (2010).

  47. 47.

    Yates, L. R. et al. Subclonal diversification of primary breast cancer revealed by multiregion sequencing. Nat. Med. 21, 751–759 (2015).

  48. 48.

    Kim, T.-M. et al. Subclonal genomic architectures of primary and metastatic colorectal cancer based on intratumoral genetic heterogeneity. Clin. Cancer Res. 21, 4461–4472 (2015).

  49. 49.

    Svensson, V., Teichmann, S. A. & Stegle, O. SpatialDE: identification of spatially variable genes. Nat. Methods 15, 343–346 (2018).

  50. 50.

    Tang, H. et al. A 12-gene set predicts survival benefits from adjuvant chemotherapy in non-small cell lung cancer patients. Clin. Cancer Res. 19, 1577–1586 (2013).

  51. 51.

    Cleary, B., Cong, L., Cheung, A., Lander, E. S. & Regev, A. Efficient generation of transcriptomic profiles by random composite measurements.Cell 171, 1424–1436 (2017).

  52. 52.

    Jiang, P. et al. Signatures of T cell dysfunction and exclusion predict cancer immunotherapy response. Nat. Med. 24, 1550–1558 (2018).

  53. 53.

    Hugo, W. et al. Genomic and transcriptomic features of response to anti-PD-1 therapy in metastatic melanoma. Cell 165, 35–44 (2016).

  54. 54.

    Dobin, A. et al. STAR: ultrafast universal RNA-Seq aligner. Bioinformatics 29, 15–21 (2013).

  55. 55.

    Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).

  56. 56.

    Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2. Genome Biol. 15, 550 (2014).

  57. 57.

    Wan, Y.-W., Allen, G. I. & Liu, Z. TCGA2STAT: simple TCGA data access for integrated statistical analysis in R. Bioinformatics 32, 952–954 (2016).

  58. 58.

    Durinck, S., Spellman, P. T., Birney, E. & Huber, W. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat. Protoc. 4, 1184–1191 (2009).

  59. 59.

    Li, Q., Birkbak, N. J., Gyorffy, B., Szallasi, Z. & Eklund, A. C. Jetset: selecting the optimal microarray probe set to represent a gene. BMC Bioinformatics 12, 474 (2011).

  60. 60.

    Yu, G. & He, Q.-Y. ReactomePA: an R/Bioconductor package for reactome pathway analysis and visualization. Mol. BioSyst. 12, 477–479 (2016).

  61. 61.

    Chen, J. J. W. et al. Global analysis of gene expression in invasion by a lung cancer model. Cancer Res. 61, 5223–5230 (2001).

  62. 62.

    Ishwaran, H., Kogalur, U. B., Blackstone, E. H. & Lauer, M. S. Random survival forests. Ann. Appl. Stat. 2, 841–860 (2008).

  63. 63.

    Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010).

  64. 64.

    Campbell, J. D. et al. Distinct patterns of somatic genome alterations in lung adenocarcinomas and squamous cell carcinomas. Nat. Genet. 48, 607–616 (2016).

Download references

Acknowledgements

D.B. was the recipient of a Jean Shanks Foundation MBPhD studentship and also receives funding from the MBPhD program at University College London, as well as the NIHR BRC at University College London Hospitals. N.J.B. is a fellow of the Lundbeck Foundation and acknowledges funding from the Aarhus University Research Foundation and the Danish Cancer Society. K.L. is funded by the UK Medical Research Council (MR/P014712/1). J.M., B.D. and J.F. are supported by the Hungarian Science Foundation (OTKA-K129065). I.C. is supported by NVKP_16–1–2016-0004. Z.S. is supported by NAP2-2017-1.2.1-NKP-0002 and the Breast Cancer Research Foundation (BCRF-18-159). N.M. is a Sir Henry Dale Fellow, jointly funded by the Wellcome Trust and the Royal Society (grant number 211179/Z/18/Z), and also receives funding from Cancer Research UK (CRUK), Rosetrees and the NIHR BRC at University College London Hospitals. C.S. is Royal Society Napier Research Professor. This work was supported by the Francis Crick Institute, which receives its core funding from Cancer Research UK (FC001169, FC001202), the UK Medical Research Council (FC001169, FC001202) and the Wellcome Trust (FC001169, FC001202). C.S. is funded by Cancer Research UK (TRACERx and CRUK Cancer Immunotherapy Catalyst Network), the CRUK Lung Cancer Centre of Excellence, Stand Up 2 Cancer (SU2C), the Rosetrees Trust, the Butterfield and Stoneygate Trusts, NovoNordisk Foundation (ID16584), the Prostate Cancer Foundation and the Breast Cancer Research Foundation (BCRF). The research leading to these results has received funding from the European Research Council (ERC) under the European Union’s Seventh Framework Programme (FP7/2007-2013) Consolidator Grant (FP7-THESEUS-617844), European Commission ITN (FP7-PloidyNet 607722), an ERC Advanced Grant (PROTEUS) from the European Research Council under the European Union’s Horizon 2020 research and innovation programme (grant agreement 835297). Support was also provided to C.S. by the National Institute for Health Research, the University College London Hospitals Biomedical Research Centre and the Cancer Research UK University College London Experimental Cancer Medicine Centre.

Author information

D.B. and N.J.B. conceived the project, designed the experiments, performed the bioinformatics analyses and wrote the manuscript. R.R., E.L.L., K.P., S.B., M.K., T.B.K.W. and G.A.W. performed data processing and bioinformatics analyses. C.T.H., Y.W., D.A.M., M.S., C.A. and M.A.B. gave advice on clinical interpretation. D.D., L.L.F., M.G., B.D., J.F., H.B. and J.M. performed the sample collection, curated the clinical data and helped with data interpretation. K.L., I.C., Z.S. and J.H. helped to direct the avenues of bioinformatics analysis. S.V. performed the sample preparation and RNA extraction. M.J.-H. designed the TRACERx study protocols and helped to analyze the clinical characteristics of the patients. J.Botling, A.M.C., P.M. and J.Bartek provided access to additional RNA-Seq datasets and gave feedback on the manuscript. A.H. provided statistical advice. N.M. and C.S. conceived the project, designed the experiments and helped write the manuscript. N.J.B., N.M. and C.S. supervised the study. All authors reviewed and approved the manuscript.

Correspondence to Nicolai J. Birkbak or Nicholas McGranahan or Charles Swanton.

Ethics declarations

Competing interests

C.S. receives grant support from Pfizer, AstraZeneca, BMS, and Roche-Ventana. C.S. has consulted for Pfizer, Novartis, GlaxoSmithKline, MSD, BMS, Celgene, AstraZeneca, Illumina, Genentech, Roche-Ventana, GRAIL, Medicxi and the Sarah Cannon Research Institute and is an advisor for Dynamo Therapeutics. C.S. holds shares in Apogen Biotechnologies, Epic Bioscience and GRAIL, and has stock options in, and is co-founder of, Achilles Therapeutics. R.R. has stock options in, and has consulted for, Achilles Therapeutics. C.A. has received speaking honoraria or expenses from Novartis, Roche, AstraZeneca and BMS. M.A.B. has consulted for Achilles Therapeutics. G.A.W. holds shares in Achilles Therapeutics. M.J.-H. has consulted, and is an advisor, for Achilles Therapeutics. D.B., N.J.B., N.M. and C.S. are co-inventors on a UK patent application (1901439.8) filed by Cancer Research Technology relating to methods of predicting survival rates for patients with cancer.

Additional information

Peer review information Joao Monteiro was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Patient cohorts included in the study.

a, CONSORT diagram for patient recruitment (left) and composition by tumour stage (right) of the TRACERx cohort. b, Patient composition of two RNAseq datasets: The Cancer Genome Atlas cohort (left), and the Uppsala cohort (right). c, Patient composition of four microarray datasets: Der et al, GSE50081 (top left); Okayama et al, GSE31210 (top right); Rousseaux et al, GSE30219 (bottom left); Shedden et al, GSE68465 (bottom right). Tumour stage (x-axis) and therapy status (colour) is indicated for all patient composition bar charts. LUAD = lung adenocarcinoma, LUSC = lung squamous cell carcinoma.

Extended Data Fig. 2 Analysis of the most variably expressed genes in TRACERx.

a, The dendrogram and coloured heatmap (top) shows the hierarchical clustering of tumour regions (columns) in the TRACERx multi-region RNAseq cohort (156 tumour regions, 48 NSCLC patients, stage I-III) according to the top 500 variably expressed genes (rows). The sparse heatmap (bottom) shows tumour regions (coloured by histology) per patient (rows). b, Kaplan-Meier survival analysis of the largest two patient clusters from the dendrogram in (a). Statistical significance was tested with a two-sided log-rank test. c, The hierarchical clustering approach taken to quantify discordance rates for published signatures is illustrated for a non-RNAseq signature, Kratz et al10, in TRACERx (n = 28 LUAD patients, stage I-III). As previously described by Gyanchandani et al5, this clustering approach provides a metric that is invariant of gene expression profiling platform. For a given number of clusters, clustering concordance was quantified as the percentage of TRACERx patients with all tumour regions in the same cluster. This analysis was run iteratively from 2 to 28 clusters; 28 is the total number of TRACERx LUAD patients, hence clustering concordance of 100% at 28 clusters is the theoretical upper limit using this metric. The dendrogram and coloured heatmap (top) shows the clustering of tumour regions (columns) according to the expression pattern of genes comprising the prognostic signature (rows). The grayscale heatmap (bottom left) shows tumour regions per patient (rows). For a range of clusters (2, 3, 14, 28), the coloured bars (middle left) show the assignment of tumour regions to clusters, the grayscale bars (bottom right) show which patients have their tumour regions discordantly assigned (gray) across clusters, and the pie charts (middle right) show the percentage of discordantly classified patients. d, Discordance rates for 9 published LUAD prognostic signatures7,8,9,10,11,12,13,14,15 plotted as the percentage of patients with tumour regions clustering together against the number of clusters. Vertical dashed lines mark a range of clusters (2, 3, 14, 28) as highlighted in (c).

Extended Data Fig. 3 Intra- and inter-tumour RNA heterogeneity scores.

a, Gene-wise and patient-wise RNA-ITH scores were calculated using multi-region RNAseq data (normalized count values) from TRACERx tumours (n=28 LUAD patients, 89 tumour regions, stage I-III). For a given tumour, the standard deviation of expression values for a particular gene across tumour regions was calculated yielding a gene-specific, patient-specific measure of RNA-ITH (σg,p). This was repeated for all genes, then all tumours, generating a matrix of σg,p values. Gene-wise RNA-ITH values are summarised as the average (median) value per gene across all tumours in the cohort (σg). Conversely, patient-wise RNA-ITH values are summarised as the average (median) value per tumour across all expressed genes (σp). Dashed lines indicate mean values. b, The scatter plots show the Spearman correlation between the chosen metric of intra-tumour expression variability (standard deviation) and alternative metrics, median absolute deviation (left) or coefficient of variation (right), as calculated in the TRACERx cohort (n=28 LUAD patients, 89 tumour regions, stage I-III). c, Diagram illustrating the calculation of gene-wise inter-tumour RNA heterogeneity scores through the random sampling of tumour regions from the TRACERx cohort (n=28 LUAD patients, 89 tumour regions, stage I-III; see Methods). d, The scatter plot shows the Spearman correlation between inter-tumour RNA heterogeneity scores calculated in TRACERx (n=28 LUAD patients, 89 tumour regions, stage I-III), randomly sampled to yield a sham single-biopsy cohort, and TCGA (n = 469 LUAD patients, stage I-III), a true single-biopsy cohort.

Extended Data Fig. 4 Clustering concordance and published prognostic signatures.

a, Clustering concordance scores calculated in TRACERx (n=28 LUAD patients, 89 tumour regions, stage I-III) using the same method taken to estimate the sampling bias of microarray signatures as described by Gyanchandani et al5 (see Extended Data 2c,d). For each gene, a curve is calculated for the number of patients with all regions in the same cluster against the number of clusters (2–28 clusters). Curves for five genes (minimum = CKMT2, lower quartile = CYSLTR2, median = MCM2, upper quartile = MFSD1, maximum = HOXC11) are shown (top), in addition to summarised clustering concordance scores for all genes (bottom). b, Gene-wise clustering concordance scores stratified by RNA heterogeneity quadrant, both calculated in TRACERx (n=28 LUAD patients, 89 tumour regions, stage I-III). Boxplots represent the median, 25th and 75th percentiles and the vertical bars span the 5th to the 95th percentiles. Statistical significance was tested with a two-sided Wilcoxon signed rank sum test. “*” indicates a P-value < 0.05, “**” indicates a P-value < 0.01, “***” indicates a P-value < 0.001.

Extended Data Fig. 5 Analysis of published prognostic signatures for LUAD by RNA heterogeneity quadrant.

a, The composition of published prognostic signatures by RNA heterogeneity quadrant, plotted in order of increasing percentage of Q4 genes (low intra- and high inter-tumour heterogeneity). b, Percentage of genes expected (total no. genes, as indicated in Fig. 2a) versus observed (in 9 published LUAD prognostic signatures7,8,9,10,11,12,13,14,15) per RNA heterogeneity quadrant. Statistical significance was tested with a two-sided Fisher’s exact test. The ability of published prognostic genes for LUAD (the combined gene list from nine published signatures, 242 unique genes) to maintain prognostic value across patient cohorts is assessed (using Cox univariate survival analysis) in four microarray datasets: Shedden et al, GSE68465 (c); Okayama et al, GSE31210 (d); Der et al, GSE50081 (e); Rousseaux et al, GSE30219 (f). Boxplots represent the median, 25th and 75th percentiles and the vertical bars span the 5th to the 95th percentiles. Statistical significance was tested with a two-sided Wilcoxon signed rank sum test. “*” indicates a P-value < 0.05, “**” indicates a P-value < 0.01, “***” indicates a P-value < 0.001.

Extended Data Fig. 6 Prognostic signature design.

a, Biomarkers are designed using state-of-the-art signature construction methods, replicated from Shukla et al14 (signature A and B), Chen et al33 (signature C), Reka et al34 (Signature D) and Kratz et al10 (signature E). In parallel, the “prognostic significance” filters (present in each signature construction method) were substituted with “clonal expression” filters, generating corresponding clonal signatures (signatures A-clonal, B-clonal, C-clonal, D-clonal, and E-clonal). Published signature construction methods are indicated in orange, novel methods integrating clonal biomarker design are indicated in blue. All signatures are developed in TCGA LUAD patients (n=469, stage I-III) as the training dataset. b, Flow diagram illustrating the gene selection steps for ORACLE. Criteria to identify prognostic and clonally expressed genes, and the number of genes selected at each step are indicated. c, Optimization of the number of genes to select at the clustering concordance step through 10-fold cross-validation in the training cohort (TCGA, n=469 LUAD patients, stage I-III). The optimal number of genes, with the lowest cross-validation error, is shown by the vertical red line. d, The cut-off to dichotomize the ORACLE risk-score into ‘high’ and ‘low’ risk groups is optimized in the training cohort (TCGA, n=469 LUAD patients, stage I-III). The horizontal blue line indicates a log-rank P-value = 0.01 and the optimal cut-off is shown by the vertical red line. Statistical significance was tested with a two-sided log-rank test. e, Tumour sampling bias of the ORACLE signature assessed using multi-region RNAseq data from TRACERx (n=28 LUAD patients, 89 tumour regions, stage I-III). Each point represents a single tumour region, vertical lines display the range for each patient, and patients are ordered by predicted survival risk score. Points are coloured according to the risk classification of tumour regions within a patient: concordant low-risk (blue), concordant high-risk (red), or discordant (gray).

Extended Data Fig. 7 Risk stratification using ORACLE.

a, Kaplan-Meier plot of ORACLE in the RNAseq-based validation cohort (Uppsala, n=103 LUAD patients, stage I-III). Statistical significance was tested with a two-sided log-rank test. The ability of substaging criteria (b) and ORACLE (c) to split patients into prognostically informative groups is tested in stage I patients using the updated TNM version 8 criteria37, shown as Kaplan-Meier plots for the Uppsala RNAseq dataset (n=53 LUAD patients, stage I, TNMv8). Statistical significance was tested with a two-sided log-rank test. d, The distribution of ORACLE risk scores by disease stage, shown for the Uppsala cohort (n=103 LUAD patients, stage I-III) and the MET500 cohort38 (n=8 metastatic samples from patients with LUAD primary tumours). Boxplots represent the median, 25th and 75th percentiles and the vertical bars span the 5th to the 95th percentiles. Statistical significance was tested with a Wilcoxon signed rank sum test. No corrections were made for multiple comparisons. e, The scatter plot shows the Spearman correlation between Ki67 staining % and ORACLE risk-scores in the TRACERx cohort (n=28 LUAD patients, 89 tumour regions, stage I-III).

Extended Data Fig. 8 ORACLE as a cancer cell expression signature.

a, Spearman correlations between the infiltration of immune cell subsets, calculated from RNAseq data using the method described by Danaher et al39, and ORACLE risk-scores in the TCGA dataset (n=469 patients, stage I-III). b, The scatter plot shows the Spearman correlation between ORACLE risk score and tumour purity assessed from whole-exome sequencing data using ASCAT, as described by Van Loo et al40, in TRACERx (n=28 LUAD patients, 84 tumour regions, stage I-III). c, Lambrechts et al41 performed single-cell RNAseq on 52,698 cells sourced from 5 NSCLC patients, then defined 7 clusters of stromal cell genes and provided a per-cluster expression measure for every gene. The relative expression levels (y-axis) for each stromal cluster (coloured by cell-type, see figure legend) is plotted for all 23 genes comprising the ORACLE signature (bottom 3 rows). To aid interpretation, a marker gene for each of the 7 stromal cell clusters is also plotted (top row) for comparison: alveolar (AGER), B cell (MS4A1), epithelial (EPCAM), fibroblast (COL6A2), myeloid (CD68), T cell (CD3D), and vascular (FLT1) cell-types. d, Pearson correlations between the expression of individual ORACLE genes and copy-number state at the corresponding gene locus in the TRACERx cohort (n=28 LUAD patients, 89 tumour regions, stage I-III). Significant correlations (P<0.05) are marked in red, non-significant correlations are marked in blue.

Extended Data Fig. 9 Patient-level estimates of RNA-ITH and association with tumour cellular composition.

a, RNA-ITH scores calculated from each tumour by sampling one to N biopsies (where N is the total number of biopsies yielded by that tumour) in TRACERx (n=48 NSCLC patients, 156 tumour regions, stage I-III). For each patient the RNA-ITH score (y-axis) is plotted for all possible subgroups of tumour regions against the number of biopsies (x-axis). The mean (red line) and standard deviation (blue lines) are shown for each tumour. b, The scatter plots show the Spearman correlation between patient-level RNA-ITH scores and RNAseq-based immune infiltration measures, calculated from RNAseq data using the method described by Danaher et al39 in TRACERx (n=48 NSCLC patients, 156 tumour regions, stage I-III). c, The scatter plot shows the Spearman correlation between patient-level RNA-ITH scores and tumour purity assessed from whole-exome sequencing data using ASCAT, as described by Van Loo et al40, in TRACERx (n=48 NSCLC patients, 156 tumour regions, stage I-III).

Extended Data Fig. 10 Pathway analysis by RNA heterogeneity quadrant.

The top 10 Reactome pathways for each RNA heterogeneity quadrant are plotted: low inter- and high intra- (Q1, a), low inter- and low intra- (Q2, b), high inter- and high intra- (Q3, c), high inter- and low intra- (Q4, d).

Supplementary information

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark