A community effort to create standards for evaluating tumor subclonal reconstruction


Tumor DNA sequencing data can be interpreted by computational methods that analyze genomic heterogeneity to infer evolutionary dynamics. A growing number of studies have used these approaches to link cancer evolution with clinical progression and response to therapy. Although the inference of tumor phylogenies is rapidly becoming standard practice in cancer genome analyses, standards for evaluating them are lacking. To address this need, we systematically assess methods for reconstructing tumor subclonality. First, we elucidate the main algorithmic problems in subclonal reconstruction and develop quantitative metrics for evaluating them. Then we simulate realistic tumor genomes that harbor all known clonal and subclonal mutation types and processes. Finally, we benchmark 580 tumor reconstructions, varying tumor read depth, tumor type and somatic variant detection. Our analysis provides a baseline for the establishment of gold-standard methods to analyze tumor heterogeneity.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Features of tumor subclonal reconstruction.
Fig. 2: Quantifying performance of subclonal reconstruction algorithms.
Fig. 3: Simulating subclonal CNAs in tumor BAM files and spiking somatic mutations.
Fig. 4: Simulated realistic tumor genomes.
Fig. 5: Error profiles of subclonal reconstruction algorithms.
Fig. 6: Impact of CNA error profiles on subclonal reconstruction.

Data availability

Sequences files are available at EGA under study accession no. EGAD00001003971.

Code availability

BAMSurgeon is available at: https://github.com/adamewing/bamsurgeon. The framework for subclonal mutation simulation is available at http://search.cpan.org/~boutroslb/NGS-Tools-BAMSurgeon-v1.0.0/. The PhaseTools BAM phasing toolkit is available at https://github.com/mateidavid/phase-tools. Scripts providing the complete scoring harness are available at: https://github.com/asalcedo31/SMC-Het_Scoring/smc_het_eval.


  1. 1.

    Stratton, M. R., Campbell, P. J. & Futreal, P. A. The cancer genome. Nature 458, 719–724 (2009).

  2. 2.

    Martincorena, I. et al. Universal patterns of selection in cancer and somatic tissues. Cell 171, 1029–1041.e21 (2017).

  3. 3.

    Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell 144, 646–674 (2011).

  4. 4.

    Jamal-Hanjani, M. et al. Tracking the evolution of non-small-cell lung cancer. N. Engl. J. Med. 376, 2109–2121 (2017).

  5. 5.

    Gerlinger, M. et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N. Engl. J. Med. 366, 883–892 (2012).

  6. 6.

    Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).

  7. 7.

    Cooper, C. S. et al. Analysis of the genetic phylogeny of multifocal prostate cancer identifies multiple independent clonal expansions in neoplastic and morphologically normal prostate tissue. Nat. Genet. 47, 367–372 (2015).

  8. 8.

    Boutros, P. C. et al. Spatial genomic heterogeneity within localized, multifocal prostate cancer. Nat. Genet. 47, 736–745 (2015).

  9. 9.

    Caiado, F., Silva-Santos, B. & Norell, H. Intra-tumour heterogeneity—going beyond genetics. FEBS J. 283, 2245–2258 (2016).

  10. 10.

    Carter, S. L. et al. Absolute quantification of somatic DNA alterations in human cancer. Nat. Biotechnol. 30, 413–421 (2012).

  11. 11.

    Dentro, S. C., Wedge, D. C. & Van Loo, P. Principles of reconstructing the subclonal architecture of cancers. Cold Spring Harb. Perspect. Med. 7, a026625 (2017).

  12. 12.

    Jiao, W., Vembu, S., Deshwar, A. G., Stein, L. & Morris, Q. Inferring clonal evolution of tumors from single nucleotide somatic mutations. BMC Bioinformatics 15, 35 (2014).

  13. 13.

    Deshwar, A. G. et al. PhyloWGS: reconstructing subclonal composition and evolution from whole-genome sequencing of tumors. Genome Biol. 16, 35 (2015).

  14. 14.

    Fischer, A., Vázquez-García, I., Illingworth, C. J. R. & Mustonen, V. High-definition reconstruction of clonal composition in cancer. Cell Rep. 7, 1740–1752 (2014).

  15. 15.

    Roth, A. et al. PyClone: statistical inference of clonal population structure in cancer. Nat. Methods 11, 396–398 (2014).

  16. 16.

    Yates, L. R. et al. Subclonal diversification of primary breast cancer revealed by multiregion sequencing. Nat. Med. 21, 751–759 (2015).

  17. 17.

    de Bruin, E. C. et al. Spatial and temporal diversity in genomic instability processes defines lung cancer evolution. Science 346, 251–256 (2014).

  18. 18.

    Turajlic, S. et al. Deterministic evolutionary trajectories influence primary tumor crowth: TRACERx renal. Cell 173, 595–610.e11 (2018).

  19. 19.

    Espiritu, S. M. G. et al. The evolutionary landscape of localized prostate cancers drives clinical aggression. Cell 173, 1003–1013 (2018).

  20. 20.

    Wedge, D. C. et al. Sequencing of prostate cancers identifies new cancer genes, routes of progression and drug targets. Nat. Genet. 50, 682–692 (2018).

  21. 21.

    Gundem, G. et al. The evolutionary history of lethal metastatic prostate cancer. Nature 520, 353–357 (2015).

  22. 22.

    McPherson, A. et al. Divergent modes of clonal spread and intraperitoneal mixing in high-grade serous ovarian cancer. Nat. Genet. 48, 758–767 (2016).

  23. 23.

    Turajlic, S. et al. Tracking cancer evolution reveals constrained routes to metastases: TRACERx renal. Cell 173, 581–594.e12 (2018).

  24. 24.

    Bolli, N. et al. Heterogeneity of genomic evolution and mutational profiles in multiple myeloma. Nat. Commun. 5, 2997 (2014).

  25. 25.

    Landau, D. A. et al. Mutations driving CLL and their evolution in progression and relapse. Nature 526, 525–530 (2015).

  26. 26.

    Van Loo, P. & Voet, T. Single cell analysis of cancer genomes. Curr. Opin. Genet. Dev. 24, 82–91 (2014).

  27. 27.

    Ewing, A. D. et al. Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection. Nat. Methods 12, 623–630 (2015).

  28. 28.

    Rosenberg, A. & Hirschberg, J. V-Measure: a conditional entropy-based external cluster evaluation measure. In Proc. 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, June 2830, 2007, Prague, Czech Republic (ed Eisner, J.) 410–420 (Association for Computational Linguistics, 2007).

  29. 29.

    Dentro, S. C. et al. Portraits of genetic intra-tumour heterogeneity and subclonal selection across cancer types. Preprint at bioRxiv https://doi.org/10.1101/312041(2018).

  30. 30.

    Lee, A. Y.-W. et al. Combining accurate tumor genome simulation with crowdsourcing to benchmark somatic structural variant detection. Genome Biol. 19, 188 (2018).

  31. 31.

    Cheng, J. et al. Pan-cancer analysis of homozygous deletions in primary tumours uncovers rare tumour suppressors. Nat. Commun. 8, 1221 (2017).

  32. 32.

    Andor, N. et al. Pan-cancer analysis of the extent and consequences of intratumor heterogeneity. Nat. Med. 22, 105–113 (2016).

  33. 33.

    Storchova, Z. & Kuffer, C. The consequences of tetraploidy and aneuploidy. J. Cell Sci. 121, 3859–3866 (2008).

  34. 34.

    Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).

  35. 35.

    Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).

  36. 36.

    Williams, M. J., Werner, B., Barnes, C. P., Graham, T. A. & Sottoriva, A. Identification of neutral tumor evolution across cancer types. Nat. Genet. 48, 238–244 (2016).

  37. 37.

    Sun, R. et al. Between-region genetic divergence reflects the mode and tempo of tumor evolution. Nat. Genet. 49, 1015–1024 (2017).

  38. 38.

    Williams, M. J. et al. Quantification of subclonal selection in cancer from bulk sequencing data. Nat. Genet. 50, 895–903 (2018).

  39. 39.

    Tarabichi, M. et al. Neutral tumor evolution? Nat. Genet. 50, 1630–1633 (2018).

  40. 40.

    Bozic, I., Paterson, C. & Waclaw, B. On measuring selection in cancer from subclonal mutation frequencies. PLoS Comput Biol. 15, e1007368 (2019).

  41. 41.

    Campbell, P. J. et al. Pan-cancer analysis of whole genomes. Preprint at bioRxiv https://doi.org/10.1101/162784 (2017).

  42. 42.

    Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotech. 31, 213–219 (2013).

  43. 43.

    Larson, D. E. et al. SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics 28, 311–317 (2012).

  44. 44.

    Saunders, C. T. et al. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics 28, 1811–1817 (2012).

  45. 45.

    Ding, J. et al. Feature-based classifiers for somatic mutation detection in tumour-normal paired sequencing data. Bioinformatics 28, 167–175 (2012).

  46. 46.

    Xu, C. A review of somatic single nucleotide variant calling algorithms for Next-Generation Sequencing data. Comput. Struct. Biotech. J. 16, 15–24 (2018).

  47. 47.

    Xu, H., DiCarlo, J., Satya, R. V., Peng, Q. & Wang, Y. Comparison of somatic mutation calling methods in amplicon and whole exome sequence data. BMC Genomics 15, 244 (2014).

  48. 48.

    McKenna, A. et al. Whole-organism lineage tracing by combinatorial and cumulative genome editing. Science 353, aaf7907 (2016).

  49. 49.

    Alemany, A., Florescu, M., Baron, C. S., Peterson-Maduro, J. & van Oudenaarden, A. Whole-organism clone tracing using single-cell sequencing. Nature 556, 108–112 (2018).

  50. 50.

    Frieda, K. L. et al. Synthetic recording and in situ readout of lineage information in single cells. Nature 541, 107–111 (2017).

  51. 51.

    Van Loo, P. et al. Allele-specific copy number analysis of tumors. PNAS 107, 16910–16915 (2010).

  52. 52.

    Cribari-Neto, F. & Zeileis, A. Beta regression in R. J. Stat. Soft. 34, 1–24 (2010).

  53. 53.

    P’ng, C. et al. BPG: seamless, automated and interactive visualization of scientific data. BMC Bioinform. 20, 42 (2019).

Download references


We thank the members of their laboratories for support, and Sage Bionetworks and the DREAM Challenge organization for their ongoing support of the SMC-Het Challenge. In particular, we thank T. Norman, J.C. Bare, S. Friend and G. Stolovitzky for their patience, technical support and scientific insight. We also thank R. Sun and C. Curtis for kindly sharing code for calculating the intra-tumor heterogeneity metrics and building the support vector machine predictor in multi-region sequencing simulations. This study was conducted with the support of the Ontario Institute for Cancer Research to P.C.B. and J.T.S. through funding provided by the Government of Ontario. This work was supported by Prostate Cancer Canada and is proudly funded by the Movember Foundation (Grant no. RS2014-01 to P.C.B.). This study was conducted with the support of Movember funds through Prostate Cancer Canada and with the additional support of the Ontario Institute for Cancer Research, funded by the Government of Ontario. This project was supported by Genome Canada through a Large-Scale Applied Project contract to P.C.B., S.P. Shah and R.D. Morin. This work was supported by the Discovery Frontiers: Advancing Big Data Science in Genomics Research program, which is jointly funded by the Natural Sciences and Engineering Research Council of Canada, the Canadian Institutes of Health Research (CIHR), Genome Canada and the Canada Foundation for Innovation (CFI). Q.M. is a Canada CIFAR AI chair and is supported by an Associate Investigator award from OICR. This research is part of the University of Toronto’s Medicine by Design initiative, which receives funding from the Canada First Research Excellence Fund (CFREF). J.A.W. was partially supported by an Ontario Graduate Scholarship. This work was supported by The Francis Crick Institute, which receives its core funding from Cancer Research UK (grant no. FC001202), the UK Medical Research Council (grant no. FC001202), and the Wellcome Trust (grant no. FC001202). M.T. is a postdoctoral fellow supported by the European Union’s Horizon 2020 research and innovation program (Marie Sklodowska-Curie Grant Agreement no. 747852-SIOMICS). P.V.L. is a Winton Group Leader in recognition of the Winton Charitable Foundation’s support toward the establishment of The Francis Crick Institute. This project was enabled through access to the MRC eMedLab Medical Bioinformatics infrastructure, supported by the UK Medical Research Council (grant no. MR/L016311/1 to M.T. and P.V.L.). A.S. was partly supported by a CIHR CGS-doctoral award. P.C.B. was supported by a Terry Fox Research Institute New Investigator Award and a CIHR New Investigator Award. D.C.W. is supported by the Li Ka Shing foundation. The Galaxy portions of the evaluation system were supported by National Institutes of Health (NIH) grant nos. U41 HG006620 and R01 AI134384-01 as well as NSF grant no. 1661497. The following NIH grants supported this work: no. R01-CA180778 (to J.M.S.), no. U24-CA143858 (to J.M.S.) and no. P30-CA008748 (to Thompson, subgrant to Q.M.). We thank Google Inc. (in particular N. Deflaux) for their ongoing support of the ICGC-TCGA DREAM Somatic Mutation Calling Challenge. This work was supported by the NIH/NCI under award no. P30CA016042.

Author information

All authors edited and approved the final manuscript. A.S. wrote the first draft of the paper, designed experiments, performed statistical analyses, performed bioinformatics analyses and performed data visualization. M.T. wrote the first draft of the paper, designed experiments, generated tools and reagents, performed statistical analyses, performed bioinformatics analyses and performed data visualization. S.M.G.E. wrote the first draft of the paper, generated tools and reagents, performed bioinformatics analyses and performed data visualization. A.G.D. wrote the first draft of the paper, designed experiments, generated tools and reagents and performed bioinformatics analyses. M.D., S.D., L.Y.L., S.S., H.Z. J.M.C., A.B., C.M.L., I.U. and B.L. generated tools and reagents. K.Z. and T.-H.O.Y. generated tools and reagents and performed bioinformatics analyses. A.D.E. generated tools and reagents and supervised research. N.M.W. performed bioinformatics analyses and performed data visualization. J.A.W., M.K., H.Z. and C.V.A. performed bioinformatics analyses. C.P. performed data visualization. J.T.S., J.M.S., D.A. and Y.G. supervised research. K.E. wrote the first draft of the paper and supervised research. D.C.W. designed experiments and supervised research. Q.M. wrote the first draft of the paper, designed experiments, generated tools and reagents and supervised research. P.V.L. wrote the first draft of the paper, designed experiments and supervised research. P.C.B. wrote the first draft of the paper, designed experiments and supervised research.

Correspondence to Paul C. Boutros.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Materials

Supplementary Figs. 1–6, Tables 2–5 and Notes 1–3.

Reporting Summary

Supplementary Table 1

Benchmark scores Unnormalized benchmark scores for all tumors and all subchallenges varying depth, mutation caller, CNA input, and subclonal reconstruction algorithms. The number of SNVs detected (SNVs), false positive (FP), false negative (FN), true positive (TP) and true negative (TN) rates for SNV detection are included as well the estimated cF.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Salcedo, A., Tarabichi, M., Espiritu, S.M.G. et al. A community effort to create standards for evaluating tumor subclonal reconstruction. Nat Biotechnol 38, 97–107 (2020). https://doi.org/10.1038/s41587-019-0364-z

Download citation