Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Detecting repeated cancer evolution from multi-region tumor sequencing data

Abstract

Recurrent successions of genomic changes, both within and between patients, reflect repeated evolutionary processes that are valuable for the anticipation of cancer progression. Multi-region sequencing allows the temporal order of some genomic changes in a tumor to be inferred, but the robust identification of repeated evolution across patients remains a challenge. We developed a machine-learning method based on transfer learning that allowed us to overcome the stochastic effects of cancer evolution and noise in data and identified hidden evolutionary patterns in cancer cohorts. When applied to multi-region sequencing datasets from lung, breast, renal, and colorectal cancer (768 samples from 178 patients), our method detected repeated evolutionary trajectories in subgroups of patients, which were reproduced in single-sample cohorts (nā€‰=ā€‰2,935). Our method provides a means of classifying patients on the basis of how their tumor evolved, with implications for the anticipation of disease progression.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Identifying repeated evolution in cancer multi-region sequencing data using transfer learning.
Fig. 2: Synthetic test of the method and biological validation.
Fig. 3: Repeated evolutionary trajectories in lung cancer.
Fig. 4: Repeated evolutionary trajectories in breast cancer.
Fig. 5: Stratifying single-sample cross-sectional cohorts with repeated evolutionary trajectories.

Similar content being viewed by others

References

  1. McGranahan, N. & Swanton, C. Biological and therapeutic impact of intratumor heterogeneity in cancer evolution. Cancer Cell 27, 15ā€“26 (2015).

    CASĀ  PubMedĀ  Google ScholarĀ 

  2. McGranahan, N. & Swanton, C. Clonal heterogeneity and tumor evolution: past, present, and the future. Cell 168, 613ā€“628 (2017).

    CASĀ  PubMedĀ  Google ScholarĀ 

  3. Greaves, M. & Maley, C. C. Clonal evolution in cancer. Nature 481, 306ā€“313 (2012).

    CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  4. Burrell, R. A., McGranahan, N., Bartek, J. & Swanton, C. The causes and consequences of genetic heterogeneity in cancer evolution. Nature 501, 338ā€“345 (2013).

    CASĀ  PubMedĀ  Google ScholarĀ 

  5. Gould, S. J. Wonderful Life: The Burgess Shale and the Nature of History (W.W. Norton & Company, New York, 1990).

    Google ScholarĀ 

  6. Graham, T. A. & Sottoriva, A. Measuring cancer evolution from the genome. J. Pathol. 241, 183ā€“191 (2017).

    PubMedĀ  Google ScholarĀ 

  7. Lipinski, K. A. et al. Cancer evolution and the limits of predictability in precision cancer medicine. Trends Cancer 2, 49ā€“63 (2016).

    PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  8. Beerenwinkel, N. et al. Genetic progression and the waiting time to cancer. PLoS Comput. Biol. 3, e225 (2007).

    PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  9. Pathare, S., SchƤffer, A. A., Beerenwinkel, N. & Mahimkar, M. Construction of oncogenetic tree models reveals multiple pathways of oral cancer progression. Int. J. Cancer 124, 2864ā€“2871 (2009).

    CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  10. Attolini, C. S.-O. et al. A mathematical framework to determine the temporal sequence of somatic genetic events in cancer. Proc. Natl. Acad. Sci. USA 107, 17604ā€“17609 (2010).

    CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  11. Caravagna, G. et al. Algorithmic methods to infer the evolutionary trajectories in cancer progression. Proc. Natl. Acad. Sci. USA 113, E4025ā€“E4034 (2016).

    CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  12. Gerlinger, M. et al. Genomic architecture and evolution of clear cell renal cell carcinomas defined by multiregion sequencing. Nat. Genet. 46, 225ā€“233 (2014).

    CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  13. de Bruin, E. C. et al. Spatial and temporal diversity in genomic instability processes defines lung cancer evolution. Science 346, 251ā€“256 (2014).

    PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  14. Sottoriva, A. et al. Intratumor heterogeneity in human glioblastoma reflects cancer evolutionary dynamics. Proc. Natl. Acad. Sci. USA 110, 4009ā€“4014 (2013).

    CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  15. Yates, L. R. et al. Subclonal diversification of primary breast cancer revealed by multiregion sequencing. Nat. Med. 21, 751ā€“759 (2015).

    CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  16. Kim, J. et al. Spatiotemporal evolution of the primary glioblastoma genome. Cancer Cell 28, 318ā€“328 (2015).

    CASĀ  PubMedĀ  Google ScholarĀ 

  17. Kim, H. et al. Whole-genome and multisector exome sequencing of primary and post-treatment glioblastoma reveals patterns of tumor evolution. Genome Res. 25, 316ā€“327 (2015).

    CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  18. Jamal-Hanjani, M. et al. Tracking the evolution of non-small-cell lung cancer. N. Engl. J. Med. 376, 2109ā€“2121 (2017).

    CASĀ  PubMedĀ  Google ScholarĀ 

  19. Pan, S. J. & Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22, 1345ā€“1359 (2010).

    Google ScholarĀ 

  20. Roth, A. et al. PyClone: statistical inference of clonal population structure in cancer. Nat. Methods 11, 396ā€“398 (2014).

    CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  21. Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994ā€“1007 (2012).

    CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  22. Schwartz, R. & SchƤffer, A. A. The evolution of tumour phylogenetics: principles and practice. Nat. Rev. Genet. 18, 213ā€“229 (2017).

    CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  23. Yuan, K., Sakoparnig, T., Markowetz, F. & Beerenwinkel, N. BitPhylogeny: a probabilistic framework for reconstructing intra-tumor phylogenies. Genome. Biol. 16, 36 (2015).

    PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  24. Deshwar, A. G. et al. PhyloWGS: reconstructing subclonal composition and evolution from whole-genome sequencing of tumors. Genome. Biol. 16, 35 (2015).

    PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  25. El-Kebir, M., Satas, G., Oesper, L. & Raphael, B. J. Inferring the mutational history of a tumor using multi-state perfect phylogeny mixtures. Cell Syst. 3, 43ā€“53 (2016).

    CASĀ  PubMedĀ  Google ScholarĀ 

  26. Salehi, S. et al. ddClone: joint statistical inference of clonal populations from single cell and bulk tumour sequencing data. Genome. Biol. 18, 44 (2017).

    PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  27. Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307ā€“321 (2010).

    CASĀ  PubMedĀ  Google ScholarĀ 

  28. Efron, B. The Jackknife, the Bootstrap and Other Resampling Plans (Society for Industrial and Applied Mathematics, Philadelphia, 1982).

    Google ScholarĀ 

  29. Fearon, E. R. & Vogelstein, B. A genetic model for colorectal tumorigenesis. Cell 61, 759ā€“767 (1990).

    CASĀ  PubMedĀ  Google ScholarĀ 

  30. Logan, R. F. A. et al. Outcomes of the bowel cancer screening programme (BCSP) in England after the first 1 million tests. Gut 61, 1439ā€“1446 (2011).

    PubMedĀ  Google ScholarĀ 

  31. Zauber, A. G. et al. Colonoscopic polypectomy and long-term prevention of colorectal-cancer deaths. N. Engl. J. Med. 366, 687ā€“696 (2012).

    CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  32. Cross, W. et al. The evolutionary landscape of colorectal carcinogenesis. Nat. Ecol. Evol. (in the press).

  33. Carter, S. L. et al. Absolute quantification of somatic DNA alterations in human cancer. Nat. Biotechnol. 30, 413ā€“421 (2012).

    CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  34. Prandi, D. et al. Unraveling the clonal hierarchy of somatic genomic aberrations. Genome. Biol. 15, 439 (2014).

    PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  35. Cancer Genome Atlas Research Network. Comprehensive molecular profiling of lung adenocarcinoma. Nature 511, 543ā€“550 (2014).

    Google ScholarĀ 

  36. Cancer Genome Atlas Research Network. Comprehensive genomic characterization of squamous cell lung cancers. Nature 489, 519ā€“525 (2012).

    Google ScholarĀ 

  37. Imielinski, M. et al. Mapping the hallmarks of lung adenocarcinoma with massively parallel sequencing. Cell 150, 1107ā€“1120 (2012).

    CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  38. Campbell, J. D. et al. Distinct patterns of somatic genome alterations in lung adenocarcinomas and squamous cell carcinomas. Nat. Genet. 48, 607ā€“616 (2016).

    CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  39. Curtis, C. et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486, 346ā€“352 (2012).

    CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  40. Pereira, B. et al. The somatic mutation profiles of 2,433 breast cancers refines their genomic and transcriptomic landscapes. Nat. Commun. 7, 11479 (2016).

    CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  41. Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61ā€“70 (2012).

    Google ScholarĀ 

  42. Kapur, P. et al. Effects on survival of BAP1 and PBRM1 mutations in sporadic clear-cell renal-cell carcinoma: a retrospective analysis with independent validation. Lancet. Oncol. 14, 159ā€“167 (2013).

    CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  43. Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115ā€“118 (2017).

    CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  44. Gerlinger, M. et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N. Engl. J. Med. 366, 883ā€“892 (2012).

    CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  45. Davis, A. & Navin, N. E. Computing tumor trees from single cells. Genome. Biol. 17, 113 (2016).

    PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  46. Swofford, D. L. PAUP*: Phylogenetic Analysis Using Parsimony (and Other Methods) 4.0 beta (Sinauer Associates, 2005).

  47. Dang, H. X. et al. ClonEvol: clonal ordering and visualization in cancer sequencing. Ann. Oncol. 28, 3076ā€“3082 (2017).

    CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  48. Olde Loohuis, L. et al. Inferring tree causal models of cancer progression with probability raising. PLoS One 9, e108358 (2014).

  49. Ramazzotti, D. et al. CAPRI: efficient inference of cancer progression models from cross-sectional data. Bioinformatics 31, 3016ā€“3026 (2015).

    Google ScholarĀ 

  50. Chow, C. & Liu, C. Approximating discrete probability distributions with dependence trees. IEEE Trans. Inf. Theory 14, 462ā€“467 (1968).

    Google ScholarĀ 

Download references

Acknowledgements

This work is supported by the Wellcome Trust (202778/B/16/Z to A.S.; 202778/Z/16/Z to T.A.G.; 105104/Z/14/Z to the Centre for Evolution and Cancer, Institute of Cancer Research), Cancer Research UK (A22909 to A.S.; A19771 to T.A.G.), the Institute of Cancer Research (Chris Rokos Fellowship in Evolution and Cancer to A.S.), and ERC (MLCS 306999 to G.S.).

Author information

Authors and Affiliations

Authors

Contributions

G.C., G.S., and A.S. designed the approach and interpreted the results. G.C. defined the method, and G.C. and Y.G. implemented it. G.C., Y.G., and D.R. analyzed the data. I.T. contributed data. G.S. and A.S. supervised the study with input from T.A.G. All of the authors drafted and approved the manuscript.

Corresponding authors

Correspondence to Giulio Caravagna, Guido Sanguinetti or Andrea Sottoriva.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisherā€™s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Synthetic test: example CCF and phylogenies.

Two example cases of ambiguous (A), and non-ambiguous (B) Cancer Cell Fraction (CCF) values. In the top case, we sequence r = 3 regions and detect c = 4 clones (via subclonal deconvolution), where we annotate 4 distinct driver lesions; the color represents the driver. When we compute the possible phylogenetic trees from the CCF values reported in the data matrix, we find 6 equally scoring trees: they all have the same score, and no violations of the pigeonhole principle. Each of the solutions, however, provides a different ordering of the drivers (information transfer; see also Supplementary Figure 2). By chance, in this case the true model is not top-ranked, and hence cannot be trivially retrieved with a standard uncorrelated fit of the CCF data; for this reason, we term this CCF dataset ambiguous. In the bottom case, instead, we show a case of non-ambiguous CCF data where the true model ranks top. We here model a patient with c = 6 clones; light-gray nodes have no driver annotated. This patient has many more phylogenetic trees associated, but the true model ranks top (linear path). Other equally scoring trees have the same driverā€™s ordering of the true mode, as they only differ by the placement of clones without drivers (5 and 6). Violations start after the 3rd ranked model; the fifth ranked model mistakes the parent of clone 4, thus transferring ā€œwrongā€ orderings.

Supplementary Figure 2 Information transfer in REVOLVER.

We can build a model T for a patient that captures the evolutionary trajectories for its tumour. The model is a tree that has, as nodes, the groups of alterations annotated in the patientā€™s data; some of which are flagged as drivers. There can be one (A) or more (B) drivers annotated in each one of the input groups. We can focus on trajectories that involve recurrent drivers that appear in several patients; we would like these trajectories to be consistent across several patients (repeated evolution). REVOLVER is a Transfer Learning approach that will scan several possible models for a patient, trying to match predictions across multiple patients. The method ā€œtransfersā€ orderings estimated from evolutionary trajectories, which are extracted by the transitive closure of a path in the tree. An ordering connects two driver alterations, as annotated in the cartoon; here for instance the green driver is upstream the brown one (A). The germline (GL) is added to transfer the information on the tumour-initiating driver. The information transfer is not necessarily a tree; it will be indeed a graph whenever we have more than one driver annotated in a group (B), as in the bottom panel.

Supplementary Figure 3 REVOLVER.

First (A) and second (B) steps of the REVOLVER algorithm to fit the data from a cohort of cancer patients. The first step is an Expectation Maximization of which we show the optimization gradient in the E and M-steps of the fit. We are interested in repeated trajectories among drivers observed in more than one patient (coloured nodes; see also Supplementary Figure 2). During the fit, we have identified the best model for this patient (left), but the next iteration of the EM might change our best guess for this patient. Here we focus on the trajectory for the gray driver, currently downstream green ones in the information transfer. REVOLVER measures the correlation of this tree against the ones fit to the rest of the cohort: in this example this prediction is supported by only one other model, while three suggest an alternative trajectory initiated by the turquoise driver (central panel). Via w, we define a gradient that can induce a new scoring of the trees by means of a penalized likelihood; the model to the right is the new best (maximum likelihood estimate) since its trajectory is more correlated to the rest of the patients. Notice that we can place the gray mutation in 5 different positions to still obtain the same information transfer; in this case the one that we select is totally driven by the likelihood (red asterisks). This change is driven by a combination of factors: (i) how better the ā€œalternativeā€ model explains this patient's data, with respect to the original model, and (ii) how strong is the consensus/ information transfer on the trajectory of grey/ turquoise drivers. Once we have converged to the EM solution, we can further expand our models (B) with Transfer Learning. Intra-group trajectories for drivers that belong to the same node of the tree cannot be inferred from data of a single patient. This is the case here for A, B, C and D which are clonal drivers. After correlating the structures of the models, however, we can observe the orderings of A, B, C and D in the rest of the cohort via w. Here, we show a graph representation of w (central panel), and highlight in red the Maximum Likelihood Estimate (MLE) of the driver upstream each one of A, B, C and D (most frequent parent). We than expand the node of our model to reflect those orderings. Uncertainty reflects in the structure of the estimated paths; it should be a linear chain of events (assuming that A, B, C and D are all true drivers) but w might not be able to retrieve it. For instance, in this example, we are not sure if the pink driver is downstream the green or the turquoise one, and we have no evidence of the ordering between gray and pink drivers as well.

Supplementary Figure 4 Synthetic test: performance.

Comparison between REVOLVER and standard uncorrelated inference (A), and test of the effect of noise on the performance (B). In the first test we simulate various cohort sizes (n = 10, 50, 100) and proportion of patients harbouring ambiguous CCF (p = 10, 20, 50%; Supplementary Figure 1). We simulate bulk sampling from r = 1, 3, 10 regions; all patients have the same r. There are 4 drivers per true tree, one per clone. The performance is measured as the proportion of correct parents (true positives) for the four drivers, in N = 20 independent replicates. REVOLVERā€™s Transfer Learning allows disambiguating almost all cases of ambiguous CCFs in each cohort, validating the approach compared to standard uncorrelated inference. Histograms report summaries for this test, such as the number of phylogenies per patient, the number of combinations of information transfer and the number of cases in which REVOLVER fits a model which, without Transfer Learning, would have fit lower in the rank (non-top). In the second test we assess how the performance decreases when we add Gaussian noise with mean 0 and low/ high standard deviation (Ļƒ = 0.01/ 0.05), and re-run the analysis. Noise is applied independently to each entry of the input CCF matrix, with fluctuations of Ā±0.2 (high noise) that heavily confound CCF values. For 100 random cohorts in each simulated experiments we observed that, with high noise over 52% of the models transfer edges with reversed orientation. Very large noise leads to lower performances because REVOLVER transfers ā€œnoiseā€, eventually degenerating to the point where the whole transfer is pointless. Notice that the number of non-top fits increases with noise; this suggests that the true model receives lower rank when we include noise, rendering the inference harder. Top and bottom boxes of the plot are 25th and 75th percentiles, centerline is the mean; the upper whisker is located at the smaller of the maximum value and 75th percentile + 1.5 IQR (Inter Quartile Range), and the lower whisker is located at the larger of the minimum value and 25th percentileā€”1.5 IQR; dots are outliers, which are less than 25th percentileā€”1.5 IQR or more than 75th percentile + 1.5 IQR.

Supplementary Figure 5 Analysis of colorectal cancers: full size clusteringā€™s results.

Extended version of Figure 2b, Main Text. In top we show all evolutionary trajectories detected in this cohort, and in bottom all the drivers annotated in the cohort, as well as the clonality status (average of binary observations for this cohort). Both heatmap panels are sorted by frequency of the annotated variable, in the overall breast cohort. In bottom we show the counts of all the trajectories identified with REVOLVER, and their counts in the group. Further annotations show the number of occurrences of each driver, and the clonality status, across the full cohort.

Supplementary Figure 6 Analysis of TRACERx lung cancers: full size clusteringā€™s results.

Extended version of Fig. 3a, Main Text, without the top dendrogram. In top we show all evolutionary trajectories detected in at least 3 patients, and in bottom all the drivers annotated in the cohort. The top set of drivers, with larger rows, is the set of most frequent ones. Both heatmap panels are sorted by frequency of the annotated variable, in the overall TRACERx cohort.

Supplementary Figure 7 Repeated evolutionary trajectories in the TRACERx cohort.

The 10 clusters of TRACERx tumours detected by REVOLVER, and their repeated evolutionary trajectories. For each group we report a graph where we annotate the group size (n), the number of times a trajectory is detected within the group on the edge, and the number of times each alteration is clonal or subclonal across all patients. GL stands for germline. In this plot, we show trajectories that occur at least 3 times.

Supplementary Figure 8 Stability of REVOLVERā€™s cluster for the TRACERx cohort.

We used a jackknife approach to estimate clustering's stability, as measured via the probability that two patients are clustered together in a resampling process (A), and the number of patients harbouring an edge, across all resamples. These statistics are computed by resampling N = 1,000 times the cohort and removing, each time, a random percentage (p = 10%) of patients and re-computing fit and clustering with the original parameters. The heatmap shows the empirical probability estimated via this jackknife approach, and the boxplot in right is computed per cluster; for each cluster the number of points used is equal to n(n-1)/2, where n is the cluster size (Main Text, Fig. 3). We report the counts of edges per patients in the bottom boxplot, where we annotate those with median above four. Top and bottom boxes of each boxplot are 25th and 75th percentiles, centerline is the mean; the upper whisker is located at the smaller of the maximum value and 75th percentile + 1.5 IQR (Inter Quartile Range), and the lower whisker is located at the larger of the minimum value and 25th percentileā€”1.5 IQR; dots are outliers, which are less than 25th percentileā€”1.5 IQR or more than 75th percentile + 1.5 IQR.

Supplementary Figure 9 REVOLVERā€™s TRACERx clusters against clusters of occurrences.

We compare REVOLVERā€™s clusters to those obtained by clustering the binary occurrences of the annotated drivers: we consider the pattern of occurrence in the cohort (A) and the clonality status (B). In both cases the input data is a feature matrix where an entry is 1 if the driver has CCF above 0 in any of the samples of a patient; for both cases we show a tanglegram and the entanglement score, as well the dendrogram coloured with REVOLVER's clusters.

Supplementary Figure 10 Alternative evolutionary analysis from single-sample cross-sectional data with TRONCO.

We show the data (A) and the fit (B). We run the CAPRI algorithm from TRONCO/ PiCnIc on a feature matrix where an entry is 1 if the driver has CCF above 0 in any of the samples of a patient; here we show only the drivers that occur in at least 7% of the samples in the cohort (n = 99), and the fit is run with all driver that occur in at least 2% of the cohort. CAPRI uses a statistical procedure that scans the patterns of co-occurrence of the input alterations in cross-sectional single-sample tumours, and infers a Suppes-Bayes Causal Network. The graph is annotated with non-parametric bootstrap scores, and with REVOLVERā€™s edge-specific jackknife scores (Supplementary Notes). NA stands for edges that are never detected across resamples. These predicted associations overlap only partially to the trajectories inferred with REVOLVER, which estimates phylogenetic orderings from each patient. For instance, this model is suggesting a possible transition among PIK3CA and SOX2 which, however, is undetected via phylogenetic analysis of TRACERx CCFs. Conversely, no transitions are estimated for EGFR, while phylogenetic analysis determines two subgroups associated to trajectories initiated by such driver.

Supplementary Figure 11 Survival analysis from single-sample cross-sectional data with REVOLVERā€™s clusters.

From REVOLVERā€™s clusters we can create a decision tree (A) to classify large single-sample cross-sectional cohorts (B) and test significant differences in survival outcomes (C). Here the decision tree is manually curated from the most frequent features (edges/ drivers) in the 10 clusters identified by REVOLVER. With this tree we could stratify n = 589 tumours from two TCGA and one Broad Institute projects ā€“ the groups are annotated with the cluster color ā€“ and analyze their disease free survival with standard Kaplan-Meier curves (time unit in months). Curves are compared via logrank test (two-sided) at level 0.05; shaded regions represent 95% confidence intervals of the curves; in the panel we show only pairwise comparisons with significantly different survival risks (p < 0.05).

Supplementary Figure 12 Analysis of breast cancers: full size clusteringā€™s results.

Extended version of Fig. 4a, Main Text, without the top dendrogram. In top we show all evolutionary trajectories detected in at least 2 patients, and in bottom all the drivers annotated in the cohort. The top set of drivers, with larger rows, is the set of most frequent ones. Both heatmap panels are sorted by frequency of the annotated variable, in the overall breast cohort.

Supplementary Figure 13 Repeated evolutionary trajectories in the breast cohort.

The 6 clusters of breast tumours detected by REVOLVER, and their repeated evolutionary trajectories. For each group we report a graph where we annotate the group size (n), the number of times a trajectory is detected within the group on the edge, and the number of times each alteration is clonal or subclonal across all patients. GL stands for germline. In this plot, we show trajectories that occur at least 3 times.

Supplementary Figure 14 Stability of REVOLVERā€™s cluster for the breast cohort.

We used a jackknife approach to estimate clustering's stability, as measured via the probability that two patients are clustered together in a resampling process (A), and the number of patients harbouring an edge, across all resamples. These statistics are computed by resampling N = 1,000 times the cohort and removing, each time, a random percentage (p = 10%) of patients and re-computing fit and clustering with the original parameters. The heatmap shows the empirical probability estimated via this jackknife approach, and the boxplot in right is computed per cluster; for each cluster the number of points used is equal to n(n ā€“ 1)/2, where n is the cluster size (Main Text, Fig. 4). We report the counts of edges per patients in the bottom boxplot, where we annotate those with median above four. Top and bottom boxes of each boxplot are 25th and 75th percentiles, centerline is the mean; the upper whisker is located at the smaller of the maximum value and 75th percentile + 1.5 IQR (Inter Quartile Range), and the lower whisker is located at the larger of the minimum value and 25th percentileā€”1.5 IQR; dots are outliers, which are less than 25th percentileā€”1.5 IQR or more than 75th percentile + 1.5 IQR.

Supplementary Figure 15 REVOLVERā€™s breast clusters against clusters of occurrences.

We compare REVOLVERā€™s clusters to those obtained by clustering the binary occurrences of the annotated drivers: we consider the pattern of occurrence in the cohort (A) and the clonality status (B). In both cases the input data is a feature matrix where an entry is 1 if the driver has CCF above 0 in any of the samples of a patient; for both cases we show a tanglegram and the entanglement score, as well the dendrogram coloured with REVOLVER's clusters.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1ā€“15 and Supplementary Notes 1ā€“4

Reporting Summary

Supplementary Table 1

Analysis of colorectal cancers

Supplementary Table 2

Analysis of lung cancers

Supplementary Table 3

Analysis of breast cancers

Supplementary Table 4

Analysis of kidney cancers

Supplementary Software

REVOLVER software package (R source code), along with vignettes

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Caravagna, G., Giarratano, Y., Ramazzotti, D. et al. Detecting repeated cancer evolution from multi-region tumor sequencing data. Nat Methods 15, 707ā€“714 (2018). https://doi.org/10.1038/s41592-018-0108-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-018-0108-x

This article is cited by

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter ā€” what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer