Subclonal diversification of primary breast cancer revealed by multiregion sequencing

Journal name:
Nature Medicine
Year published:
Published online


The sequencing of cancer genomes may enable tailoring of therapeutics to the underlying biological abnormalities driving a particular patient's tumor. However, sequencing-based strategies rely heavily on representative sampling of tumors. To understand the subclonal structure of primary breast cancer, we applied whole-genome and targeted sequencing to multiple samples from each of 50 patients' tumors (303 samples in total). The extent of subclonal diversification varied among cases and followed spatial patterns. No strict temporal order was evident, with point mutations and rearrangements affecting the most common breast cancer genes, including PIK3CA, TP53, PTEN, BRCA2 and MYC, occurring early in some tumors and late in others. In 13 out of 50 cancers, potentially targetable mutations were subclonal. Landmarks of disease progression, such as resistance to chemotherapy and the acquisition of invasive or metastatic potential, arose within detectable subclones of antecedent lesions. These findings highlight the importance of including analyses of subclonal structure and tumor evolution in clinical trials of primary breast cancer.

At a glance


  1. Study design.
    Figure 1: Study design.

    (a) Summary of samples within cohorts 1 and 2. n values denote the number of subjects. (b) Geographical sampling approach for tumor hemispheres 1 and 2, plus one or two involved lymph nodes in three cases. NW, northwest; NE, northeast; SW, southwest; SE, southeast; NGS, next-generation sequencing. For multifocal cancers, all samples were taken from the single largest focus. (c) Source of retrospective clinical samples in relation to primary tumor management. NAC, neo-adjuvant chemotherapy; pCR, pathological complete response; RD, residual disease.

  2. Systematic sampling revealed spatial and temporal tumor evolution.
    Figure 2: Systematic sampling revealed spatial and temporal tumor evolution.

    (ad) Coxcomb plots presenting somatic mutation genotypes organized according to the sample schema described in Figure 1b. Point estimates of the variant-allele frequency (VAF) or copy number (logR) are represented by lateral extension of an outlined wedge. Pale wedges with no outline represent the 95% confidence interval; if coverage is low, the confidence of the VAF is reduced and the pale wedge appears beyond the point estimate. IDCA, invasive ductal carcinoma. Driver mutations and arm-level copy-number gains (+) and losses (−) detected in each cancer are annotated in the case-specific mutation legends. Significant heterogeneity among point mutations in individual cancers was determined using generalized linear models and Benjamini-Hochberg correction: q < 0.05 indicates significant point-mutational heterogeneity (S); NS, not significant. Mock phylogenetic trees are also shown. The presence and absence of mutations across related samples indicated distinct subclones and dictated the branching structure, and the number of mutations in each subclone determined the branch length. (a) No detected intratumoral heterogeneity (q = 0.8). (b,c) Local expansion of subclones (arrowheads). (d) Complex intermixing of subclones: individual mutations (each highlighted with a different-colored arrowhead, with colors corresponding to those in the tree to the right) appeared in different combinations of samples. Coxcomb plots and heat maps for every cancer in the cohort are available at

  3. Subclonal patterns in multifocal breast cancers.
    Figure 3: Subclonal patterns in multifocal breast cancers.

    (ae) Targeted capture genomic analysis of subclonal structure (ac,e) and immunohistochemistry (IHC; d) of multifocal cancers. Coxcomb plots and mock phylogenetic trees were generated as described for Figure 2, and the scale legend for that figure applies here. Plots from multiple samples from the same tumor focus are grouped together. Colored arrowheads identify subclones that were shared by fewer than all invasive foci. (a) Case PD14753: genotypes of five samples from three disease foci indicated deep branching of the tree, driver heterogeneity and subclone intermingling across foci. (b) Case PD9193: genotypes of seven samples from four disease foci demonstrated subclone intermingling. Orientation in mastectomy specimen: UIQ, upper inner quadrant; UOQ, upper outer quadrant; LIQ, lower inner quadrant; LOQ, lower outer quadrant. F1–F4, foci 1–4. (c) Case PD9694: parallel evolution with two unique PTEN driver mutations in different foci. The schematic representation of the mastectomy specimen (center) shows pathological features in the specimen; the dashed horizontal line represents the deep (chest wall) margin. Numbered foci are outlined in the image of a formalin-fixed paraffin-embedded (FFPE) tissue section. Scale bar, 3 mm. (d) Case PD9694: PTEN IHC showed that PTEN protein was present in DCIS but lost in invasive disease foci 1 and 2. Scale bars, 100 μm. (e) Genotypes of three samples from two disease foci in PD9770 before chemotherapy and two samples from focus 1 after neoadjuvant chemotherapy. Focus 2 exhibited a complete pathological response to three cycles of each chemotherapy agent. IDCA, invasive ductal carcinoma; LOH, loss of heterozygosity; TN, triple negative; UPD, uniparental disomy; VAF, variant-allele frequency.

  4. The genome-wide spectrum of branching evolution.
    Figure 4: The genome-wide spectrum of branching evolution.

    (ac) Phylogenetics (a,c) and subclonal composition (b) of primary cancers. (a) Phylogenetic trees generated by clustering genome-wide point mutation data from ten multiregion primary cancer samples. Relative branch lengths were determined from the proportion of mutations in each branch. An 'x' indicates the most recent common ancestor inferred from treatment-naive samples alone. (a,c) Cases for which post-treatment samples were available (green bars above trees); red nodes indicate where subclones detected only after treatment (branches with red outlines) emerged in the tree. Branches detected only among pre-treatment samples are indicated by a purple outline; black branches indicate detection in both pre- and post-chemotherapy samples. Genes likely to be driver genes are colored according to mutation type: amplification, red text; homozygous deletion, blue text; point mutation, black text; and potentially relevant structural variants, purple text. Cancer type is specified: TN, triple negative; DCIS, ductal carcinoma in situ. Types of neo-adjuvant chemotherapy (NAC): Epi, epirubicin; T, docetaxel; P, paclitaxel; FEC, fluorouracil, epirubicin and cyclophosphamide. Panel c shows mock trees inferred from targeted capture data for samples with pre- and post-treatment samples. Six samples with no branching are not presented. In b, colors correspond to the tree branch directly above in a, and the area is proportional to the percentage of cells in that sample that contained the mutations in that branch. In c, branches are colored as stated above for genome data. (d) Pearson's correlation for heterogeneity estimates from whole-genome and targeted capture data.

  5. Subclonal driver mutations and parallel evolution.
    Figure 5: Subclonal driver mutations and parallel evolution.

    (a) Heat map of somatic driver mutations and copy-number (CN) changes identified from genomic sequencing of 50 tumors. Single-base substitutions (subs) and small insertions and deletions (indels) are denoted by dark red squares when detected in all associated samples from the tumor (omnipresent) and pink squares when present in fewer than all samples or when clearly subclonal. Omnipresent and heterogeneous copy-number changes are denoted by dark blue and light blue squares, respectively. TN, triple negative; SV, structural variant. (bd) Three examples of parallel evolution; the fourth example is in Figures 3c,d and 4a (PD9694). (c) One possible phylogenetic tree and sample subclonal compositions inferred from targeted capture data (as described in Figs. 2 and 4c) with TP53 mutations arising on three branches. Coxcomb plots for PD9850 are in Supplementary Figure 3. Multiple independent episomal amplification events in FGFR2 (b) and two independent deletions in RUNX1 (d) were detected in two samples from the same cancer. In copy-number graphs (b,d) the black dots reflect the number of copies of genomic DNA from that specific locus, with a value greater than 2 reflecting a net gain and a value less than 2 reflecting a loss. Reconstructed rearrangement breakpoints are represented by colored lines according to whether they were detectable in pre- (purple) or post- (red) chemotherapy samples only. The type of event is indicated by the position of the arc joining the breakpoints. D, deletion; TD, tandem duplication; IFD, in-frame deletion; QC, quality control; LOH, loss of heterozygosity; RD, residual disease; pCR, pathological complete response; pS, primary surgery; ACF, aberrant cell fraction.

  6. Structural variants shape cancer evolution.
    Figure 6: Structural variants shape cancer evolution.

    (a) Comparison of the proportions of substitutions (subs) and structural variants (SVs) that are subclonal in each cancer. Inset shows scatter plot and Pearson's correlation coefficient (r). (b) Clonal and subclonal complex rearrangements (as described in the Supplementary Note) and arm-level loss-of-heterozygosity (LOH) events. The average genome-wide ploidy is indicated. TP, tetraploid (four copies); D, diploid (two copies). (c) Breakdown of clonal and subclonal structural variants by category (Del, deletion; Inv, inversion; TD, tandem duplication; Trans, interchromosomal translocation). For each cancer, the total number of mutations assigned to the trunk (T) or branches (B) is indicated at the top left, and the proportion of each mutation type that was subclonal (i.e., within the branches) is presented above each bar. (d) Case PD9770: examples of two subclonal complex structural rearrangements arising on separate branches of the phylogenetic tree. In PD9770c, structural rearrangements link multiple regions of amplification across three chromosomes. Amplifications include multiple genomic regions that have been previously identified as recurrently amplified in cancers (red arrows); the locations of known oncogenes are marked by pink bars. In PD9770d these events are not seen, but a breakage fusion-bridge event (BFB) amplifies segments including the CDC7 gene. Rearrangement types included interchromosomal translocations (Tr), deletions (Del; purple), tandem duplication (TD; brown), head-to-head inversions (HH; blue) and tail-to-tail inversions (TT; green).


  1. Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 9941007 (2012).
  2. Navin, N. et al. Tumour evolution inferred by single-cell sequencing. Nature 472, 9094 (2011).
  3. Shah, S.P. et al. The clonal and mutational evolution spectrum of primary triple-negative breast cancers. Nature 486, 395399 (2012).
  4. Meric-Bernstam, F. et al. Concordance of genomic alterations between primary and recurrent breast cancer. Mol. Cancer Ther. 13, 13821389 (2014).
  5. Ding, L. et al. Genome remodelling in a basal-like breast cancer metastasis and xenograft. Nature 464, 9991005 (2010).
  6. Wang, Y. et al. Clonal evolution in breast cancer revealed by single nucleus genome sequencing. Nature 512, 155160 (2014).
  7. Li, S. et al. Endocrine-therapy-resistant ESR1 variants revealed by genomic characterization of breast-cancer-derived xenografts. Cell Rep. 4, 11161130 (2013).
  8. Hammond, M.E. et al. American Society of Clinical Oncology/College of American Pathologists guideline recommendations for immunohistochemical testing of estrogen and progesterone receptors in breast cancer. J. Clin. Oncol. 28, 27842795 (2010).
  9. Seol, H. et al. Intratumoral heterogeneity of HER2 gene amplification in breast cancer: its clinicopathological significance. Mod. Pathol. 25, 938948 (2012).
  10. Simon, R. & Roychowdhury, S. Implementing personalized cancer genomics in clinical trials. Nat. Rev. Drug Discov. 12, 358369 (2013).
  11. Sleijfer, S., Bogaerts, J. & Siu, L.L. Designing transformative clinical trials in the cancer genome era. J. Clin. Oncol. 31, 18341841 (2013).
  12. Moskaluk, C.A., Hruban, R.H. & Kern, S.E. p16 and K-ras gene mutations in the intraductal precursors of human pancreatic adenocarcinoma. Cancer Res. 57, 21402143 (1997).
  13. Powell, S.M. et al. APC mutations occur early during colorectal tumorigenesis. Nature 359, 235237 (1992).
  14. Papaemmanuil, E. et al. Clinical and biological implications of driver mutations in myelodysplastic syndromes. Blood 122, 36163627, 3699 (2013).
  15. Green, M.R. et al. Hierarchy in somatic mutations arising during genomic evolution and progression of follicular lymphoma. Blood 121, 16041611 (2013).
  16. Yachida, S. & Iacobuzio-Donahue, C.A. Evolution and dynamics of pancreatic cancer progression. Oncogene 32, 52535260 (2013).
  17. Sottoriva, A. et al. A Big Bang model of human colorectal tumor growth. Nat. Genet. 47, 209216 (2015).
  18. Gerlinger, M. et al. Genomic architecture and evolution of clear cell renal cell carcinomas defined by multiregion sequencing. Nat. Genet. 46, 225233 (2014).
  19. Gerlinger, M. et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N. Engl. J. Med. 366, 883892 (2012).
  20. Yachida, S. et al. Distant metastasis occurs late during the genetic evolution of pancreatic cancer. Nature 467, 11141117 (2010).
  21. Cooper, C.S. et al. Analysis of the genetic phylogeny of multifocal prostate cancer identifies multiple independent clonal expansions in neoplastic and morphologically normal prostate tissue. Nat. Genet. 47, 367372 (2015).
  22. Santarius, T., Shipley, J., Brewer, D., Stratton, M.R. & Cooper, C.S. A census of amplified and overexpressed human cancer genes. Nat. Rev. Cancer 10, 5964 (2010).
  23. Beroukhim, R. et al. The landscape of somatic copy-number alteration across human cancers. Nature 463, 899905 (2010).
  24. Futreal, P.A. et al. A census of human cancer genes. Nat. Rev. Cancer 4, 177183 (2004).
  25. Lawrence, M.S. et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495501 (2014).
  26. Gonzalez-Perez, A. et al. IntOGen-mutations identifies cancer drivers across tumor types. Nat. Methods 10, 10811082 (2013).
  27. Stephens, P.J. et al. The landscape of cancer genes and mutational processes in breast cancer. Nature 486, 400404 (2012).
  28. Ellis, M.J. et al. Whole-genome analysis informs breast cancer response to aromatase inhibition. Nature 486, 353360 (2012).
  29. Banerji, S. et al. Sequence analysis of mutations and translocations across breast cancer subtypes. Nature 486, 405409 (2012).
  30. Cancer Genome Atlas Network Comprehensive molecular portraits of human breast tumours. Nature 490, 6170 (2012).
  31. Zhang, J. et al. Intratumor heterogeneity in localized lung adenocarcinomas delineated by multiregion sequencing. Science 346, 256259 (2014).
  32. Van Loo, P. et al. Allele-specific copy number analysis of tumors. Proc. Natl. Acad. Sci. USA 107, 1691016915 (2010).
  33. Zack, T.I. et al. Pan-cancer patterns of somatic copy number alteration. Nat. Genet. 45, 11341140 (2013).
  34. Balko, J.M. et al. Molecular profiling of the residual disease of triple-negative breast cancers after neoadjuvant chemotherapy identifies actionable therapeutic targets. Cancer Discov. 4, 232245 (2014).
  35. Gundem, G. et al. The evolutionary history of lethal metastatic prostate cancer. Nature 520, 353357 (2015).
  36. Almendro, V. et al. Inference of tumor evolution during chemotherapy by computational modeling and in situ analysis of genetic and phenotypic cellular diversity. Cell Rep. 6, 514527 (2014).
  37. Almendro, V. et al. Genetic and phenotypic diversity in breast tumor metastases. Cancer Res. 74, 13381348 (2014).
  38. de Bruin, E.C. et al. Spatial and temporal diversity in genomic instability processes defines lung cancer evolution. Science 346, 251256 (2014).
  39. Ali, H.R. et al. Genome-driven integrated classification of breast cancer validated in over 7,500 samples. Genome Biol. 15, 431 (2014).
  40. Nielsen, T.O. et al. Immunohistochemical and clinical characterization of the basal-like subtype of invasive breast carcinoma. Clin. Cancer Res. 10, 53675374 (2004).
  41. Sørlie, T. et al. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc. Natl. Acad. Sci. USA 98, 1086910874 (2001).
  42. Perou, C.M. et al. Molecular portraits of human breast tumours. Nature 406, 747752 (2000).
  43. Rakha, E.A. & Ellis, I.O. Breast cancer: updated guideline recommendations for HER2 testing. Nat. Rev. Clin. Oncol. 11, 89 (2014).
  44. Early Breast Cancer Trialists' Collaborative Group. et al. Relevance of breast cancer hormone receptors and other factors to the efficacy of adjuvant tamoxifen: patient-level meta-analysis of randomised trials. Lancet 378, 771784 (2011).
  45. Yuan, Y. et al. Assessing the clinical utility of cancer genomic and proteomic data across tumor types. Nat. Biotechnol. 32, 644652 (2014).
  46. Denkert, C. et al. Tumor-associated lymphocytes as an independent predictor of response to neoadjuvant chemotherapy in breast cancer. J. Clin. Oncol. 28, 105113 (2010).
  47. Bolli, N. et al. Heterogeneity of genomic evolution and mutational profiles in multiple myeloma. Nat. Commun. 5, 2997 (2014).
  48. Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589595 (2010).
  49. Supek, F., Minana, B., Valcarcel, J., Gabaldon, T. & Lehner, B. Synonymous mutations frequently act as driver mutations in human cancers. Cell 156, 13241335 (2014).
  50. Gentleman, R.C. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80 (2004).
  51. Fischer, A., Vazquez-Garcia, I., Illingworth, C.J. & Mustonen, V. High-definition reconstruction of clonal composition in cancer. Cell Rep. 7, 17401752 (2014).
  52. Oesper, L., Mahmoody, A. & Raphael, B.J. THetA: inferring intra-tumor heterogeneity from high-throughput DNA sequencing data. Genome Biol. 14, R80 (2013).
  53. Alexandrov, L.B., Nik-Zainal, S., Wedge, D.C., Campbell, P.J. & Stratton, M.R. Deciphering signatures of mutational processes operative in human cancer. Cell Rep. 3, 246259 (2013).

Download references

Author information


  1. Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton, UK.

    • Lucy R Yates,
    • Moritz Gerstung,
    • Gunes Gundem,
    • Peter Van Loo,
    • Ludmil B Alexandrov,
    • Helen Davies,
    • Yilong Li,
    • Young Seok Ju,
    • Manasa Ramakrishna,
    • Serena Nik-Zainal,
    • Stuart McLaren,
    • Adam Butler,
    • Sancha Martin,
    • Dominic Glodzik,
    • Andrew Menzies,
    • Keiran Raine,
    • Jonathan Hinton,
    • David Jones,
    • Laura J Mudie,
    • Michael R Stratton,
    • David C Wedge &
    • Peter J Campbell
  2. Department of Oncology, The University of Cambridge, Cambridge, UK.

    • Lucy R Yates
  3. Section of Oncology, Department of Clinical Science, University of Bergen, Bergen, Norway.

    • Stian Knappskog &
    • Per Eystein Lønning
  4. Department of Oncology, Haukeland University Hospital, Bergen, Norway.

    • Stian Knappskog &
    • Per Eystein Lønning
  5. Breast Cancer Translational Research Laboratory, Institut Jules Bordet, Université Libre de Bruxelles, Brussels, Belgium.

    • Christine Desmedt,
    • Denis Larsimont,
    • Delphine Vincent,
    • Pierre-Yves Adnet,
    • Marion Maetens,
    • Michail Ignatiadis &
    • Christos Sotiriou
  6. Department of Human Genetics, University of Leuven, Leuven, Belgium.

    • Peter Van Loo
  7. Department of Surgery, Haukeland University Hospital, Bergen, Norway.

    • Turid Aas
  8. Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico, USA.

    • Ludmil B Alexandrov
  9. Department of Pathology, Haukeland University Hospital, Bergen, Norway.

    • Hans Kristian Haugland &
    • Peer Kaare Lilleng
  10. The Gade Laboratory for Pathology, Department of Clinical Medicine, University of Bergen, Bergen, Norway.

    • Peer Kaare Lilleng
  11. Dana-Farber Cancer Institute, Boston, Massachusetts, USA.

    • Bing Jiang,
    • April Greene-Colozzi,
    • Aquila Fatima &
    • Andrea L Richardson
  12. Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA.

    • Andrea L Richardson


L.R.Y. and P.J.C. designed and directed the study and prepared the manuscript. L.R.Y. and M.G. performed analyses and prepared figures. S.K., T.A. and P.E.L. contributed to the study design and sample preparation for cohort 1. C.D., C.S., M.I. and M.M. contributed to the study design and sample preparation for cohort 2. D.C.W., P.V.L., G.G., H.D., Y.S.J., S. McLaren, M.R., S.N.-Z., A.B., D.G., A.M., K.R., J.H., D.J., M.R.S., Y.L. and L.B.A. contributed to analysis. S. Martin managed samples. A.L.R., D.L., H.K.H. and P.K.L. conducted histopathological assessment. P.-Y.A., D.V., B.J., A.G.-C. and A.F. performed DNA extraction. L.J.M. contributed to library preparation, PCR and gel electrophoresis.

Competing financial interests

P.J.C. and M.R.S. are founders, stock holders and consultants for 14M Genomics Ltd, a genomics diagnostic company.

Corresponding author

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (7,903 KB)

    Supplementary Figures 1–6, Supplementary Note

Zip files

  1. Supplementary Source Code (12 KB)

    R code for mutation clustering

Excel files

  1. Supplementary Table 1 (147 KB)

    Patient and sample characteristics

  2. Supplementary Table 2 (94 KB)

    Sequencing coverage

  3. Supplementary Table 3 (60 KB)

    Annotation of potential driver genes

  4. Supplementary Table 4 (205 KB)

    Validation data

  5. Supplementary Table 5 (148 KB)

    Mutation clusters.

  6. Supplementary Table 6 (73 KB)

    Heterogeneity scores.

  7. Supplementary Table 7 (134 KB)

    Mutation and copy number calls from capture data.

  8. Supplementary Table 8 (204 KB)

    Coding mutations and oncogenic copy number events from whole genome data

Additional data