Our knowledge of copy number evolution during the expansion of primary breast tumours is limited1,2. Here, to investigate this process, we developed a single-cell, single-molecule DNA-sequencing method and performed copy number analysis of 16,178 single cells from 8 human triple-negative breast cancers and 4 cell lines. The results show that breast tumours and cell lines comprise a large milieu of subclones (7–22) that are organized into a few (3–5) major superclones. Evolutionary analysis suggests that after clonal TP53 mutations, multiple loss-of-heterozygosity events and genome doubling, there was a period of transient genomic instability followed by ongoing copy number evolution during the primary tumour expansion. By subcloning single daughter cells in culture, we show that tumour cells rediversify their genomes and do not retain isogenic properties. These data show that triple-negative breast cancers continue to evolve chromosome aberrations and maintain a reservoir of subclonal diversity during primary tumour growth.
This is a preview of subscription content
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
The data from this study were deposited in the NCBI Sequence Read Archive under accession number PRJNA629885.
Code used in this study is available at https://github.com/navinlabcode/ACT_paper.
Davis, A., Gao, R. & Navin, N. Tumor evolution: linear, branching, neutral or punctuated? Biochim. Biophys. Acta Rev. Cancer 1867, 151–161 (2017).
Burrell, R. A., McGranahan, N., Bartek, J. & Swanton, C. The causes and consequences of genetic heterogeneity in cancer evolution. Nature 501, 338–345 (2013).
Pfister, K. et al. Identification of drivers of aneuploidy in breast tumors. Cell Rep. 23, 2758–2769 (2018).
Xu, J., Huang, L. & Li, J. DNA aneuploidy and breast cancer: a meta-analysis of 141,163 cases. Oncotarget 7, 60218–60229 (2016).
Gordon, D. J., Resio, B. & Pellman, D. Causes and consequences of aneuploidy in cancer. Nat. Rev. Genet. 13, 189–203 (2012).
Fearon, E. R. & Vogelstein, B. A genetic model for colorectal tumorigenesis. Cell 61, 759–767 (1990).
Gao, R. et al. Punctuated copy number evolution and clonal stasis in triple-negative breast cancer. Nat. Genet. 48, 1119–1130 (2016).
Navin, N. et al. Tumour evolution inferred by single-cell sequencing. Nature 472, 90–94 (2011).
Cross, W. et al. The evolutionary landscape of colorectal tumorigenesis. Nat. Ecol. Evol. 2, 1661–1672 (2018).
Baca, S. C. et al. Punctuated evolution of prostate cancer genomes. Cell 153, 666–677 (2013).
Carter, S. L. et al. Absolute quantification of somatic DNA alterations in human cancer. Nat. Biotechnol. 30, 413–421 (2012).
Zack, T. I. et al. Pan-cancer patterns of somatic copy number alteration. Nat. Genet. 45, 1134–1140 (2013).
Gerstung, M. et al. The evolutionary history of 2,658 cancers. Nature 578, 122–128 (2020).
Cross, W. Ch., Graham, T. A. & Wright, N. A. New paradigms in clonal evolution: punctuated equilibrium in cancer. J. Pathol. 240, 126–136 (2016).
Hadimioglu, B., Stearns, R. & Ellson, R. Moving liquids with sound: the physics of acoustic droplet ejection for robust laboratory automation in life sciences. J. Lab. Autom. 21, 4–18 (2016).
Zahn, H. et al. Scalable whole-genome single-cell library preparation without preamplification. Nat. Methods 14, 167–173 (2017).
Chavez, K. J., Garimella, S. V. & Lipkowitz, S. Triple negative breast cancer cell lines: one tool in the search for better treatment of triple negative breast cancer. Breast Dis. 32, 35–48 (2010).
Williams, M. J., Werner, B., Barnes, C. P., Graham, T. A. & Sottoriva, A. Identification of neutral tumor evolution across cancer types. Nat. Genet. 48, 238–244 (2016).
Liberzon, A. et al. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 1, 417–425 (2015).
Wang, Y. et al. Clonal evolution in breast cancer revealed by single nucleus genome sequencing. Nature 512, 155–160 (2014).
Cross, W. et al. Stabilising selection causes grossly altered but stable karyotypes in metastatic colorectal cancer. Preprint at https://doi.org/10.1101/2020.03.26.007138 (2020).
Fehrmann, R. S. et al. Gene expression analysis identifies global gene dosage sensitivity in cancer. Nat. Genet. 47, 115–125 (2015).
Ben-David, U. et al. Genetic and transcriptional evolution alters cancer cell line drug response. Nature 560, 325–330 (2018).
Greenfield, E. A. Single-cell cloning of hybridoma cells by limiting dilution. Cold Spring Harb. Protoc. https://doi.org/10.1101/pdb.prot103192 (2019).
Zong, C., Lu, S., Chapman, A. R. & Xie, X. S. Genome-wide detection of single-nucleotide and copy-number variations of a single human cell. Science 338, 1622–1626 (2012).
Xi, L. et al. New library construction method for single-cell genomes. PLoS ONE 12, e0181163 (2017).
Laks, E. et al. Clonal decomposition and DNA replication states defined by scaled single-cell genome sequencing. Cell 179, 1207–1221 (2019).
Vitak, S. A. et al. Sequencing thousands of single-cell genomes with combinatorial indexing. Nat. Methods 14, 302–308 (2017).
Baslan, T. et al. Genome-wide copy number analysis of single cells. Nat. Protoc. 7, 1024–1041 (2012).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Venkatraman, E. S. & Olshen, A. B. A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics 23, 657–663 (2007).
Hahsler, M., Piekenbrock, M. & Doran, D. Fast density-based Clustering with R. J. Stat. Softw. https://doi.org/10.18637/jss.v091.i01 (2019).
Leung, M. L. et al. Highly multiplexed targeted DNA sequencing from single nuclei. Nat. Protoc. 11, 214–235 (2016).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Nilsen, G. et al. Copynumber: efficient algorithms for single- and multi-track copy number segmentation. BMC Genomics 13, 591 (2012).
L. McInnes, J. Healy & J. Melville. UMAP: uniform manifold approximation and projection for dimension reduction. Preprint at https://arxiv.org/abs/1802.03426 (2018).
Lun, A. T., McCarthy, D. J. & Marioni, J. C. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor. F1000Res. 5, 2122 (2016).
Csardi, G. & Nepusz, T. The igraph software package for complex network research. InterJournal 1695 (2006).
McInnes, L. Healy, J. & Astels, S. hdbscan: hierarchical density based clustering. JOSS 2, 205 (2017).
Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32, 2847–2849 (2016).
Zhang, Z., Lange, K. & Sabatti, C. Reconstructing DNA copy number by joint segmentation of multiple sequences. BMC Bioinformatics 13, 205 (2012).
Desper, R. & Gascuel, O. Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle. J. Comput. Biol. 9, 687–705 (2002).
Paradis, E. & Schliep, K. ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35, 526–528 (2019).
Yu, G., Smith, D. K., Zhu, H., Guan, Y. & Lam, T. T.-Y. ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol. Evol. 8, 28–36 (2017).
McKenna, A. et al. The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).
Knaus, B. J. & Grünwald, N. J. vcfr: a package to manipulate and visualize variant call format data in R. Mol. Ecol. Resour. 17, 44–53 (2017).
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
Ng, P. C. & Henikoff, S. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 31, 3812–3814 (2003).
Adzhubei, I., Jordan, D. M. & Sunyaev, S. R. Predicting functional effect of human missense mutations using PolyPhen-2. Curr. Protoc. Hum. Genet. 7, 20 (2013).
1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Van Loo, P. et al. Allele-specific copy number analysis of tumors. Proc. Natl Acad. Sci. USA 107, 16910–16915 (2010).
Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).
Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017).
Frankish, A. et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 47, D766–D773 (2019).
Soneson, C., Love, M. I. & Robinson, M. D. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res. 4, 1521 (2015).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Korotkevich, G., Sukhov, V. & Sergushichev, A. Fast gene set enrichment analysis. Preprint at https://doi.org/10.1101/060012 (2019).
R Core Team. R: A Language and Environment for Statistical Computing http://www.R-project.org/ (R Foundation for Statistical Computing, 2013).
Kassambara, A. rstatix: Pipe-Friendly Framework for Basic Statistical Tests https://CRAN.R-project.org/package=rstatix (2020).
Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer-Verlag, 2016).
Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
McKinney, W. Data structures for statistical computing in Python. In Proc. 9th Python in Science Conference (eds van der Walt, S. & Millman, J.) 56–61 (2010).
This work was supported by grants to N.E.N. from the American Cancer Society (129098-RSG-16-092-01-TBG), the National Cancer Institute (R01CA240526, R01CA236864), the Emerson Collective Cancer Research Fund (144300-80-121457-19) and the CPRIT Single Cell Genomics Center (RP180684). N.E.N. is an AAAS Wachtel Scholar and AAAS Fellow. This study was supported by core facility grants (CA016672, CA016672) from the NIH. This work was supported by the Dana-Farber Cancer Institute Physical Sciences Oncology Center (grant no. U54CA143798 to F.M.) and the DFCI Center for Cancer Evolution (to F.M.). T.B., M.T. and P.V.L. are supported by the Francis Crick Institute, Cancer Research UK (FC001202), the UK Medical Research Council (FC001202) and the Wellcome Trust (FC001202). P.V.L. is a Winton Group Leader in recognition of the Winton Charitable Foundation’s support towards the establishment of the Francis Crick Institute. T.B. is supported by a fellowship from the Boehringer Ingelheim Fonds. We thank K. Polyak, H. Tang, L. Ramagli, E. Thompson, S. Shah, A. McPherson, N. Ramesh and A. Kalia for their assistance on this project.
F.M. is the co-founder of an oncology company. The other authors declare no competing interests.
Peer review information Nature thanks the anonymous reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
a, ACT single-cell DNA library size distributions for TN1, TN2 and TN3 after pooling 384 cell libraries. b, Schematic showing the use of positional barcoding information to determine single-molecule information by tagmentation during ACT, compared to whole-genome amplification using DOP-PCR, where the original DNA fragmentation sites of single molecules cannot be resolved. c, Breadth of coverage for sparse depth data from different scDNA-seq methods plotted by individual samples, using n = 100 random cells per sample. d, Overdispersion of bin counts for sparse depth data from different scDNA-seq methods plotted by individual samples, using N = 100 random cells per sample. e, Distribution of sequencing reads across a diploid region of chromosome 4p14 for a single SK-BR-3 cell sequenced by DOP-PCR compared to ACT, in which the PCR duplicates were retained or removed to obtain single-molecule data. f, Distribution of sequencing reads across a diploid region of chromosome 4p (top) and 10q (bottom) for a single SK-BR-3 cell sequenced by DOP-PCR compared to ACT, with or without duplicate molecules retained. g, Lorenz curves of coverage uniformity for ACT, DOP-PCR and one bulk DNA-seq data from SK-BR-3 single cells, downsampled to equal coverage depth. h, Breadth of coverage as a function of pseudo-bulk reconstruction by combining multiple cells for ACT, DOP-PCR and bulk sequencing.
a, FACS profiles of DAPI-stained nuclei flow-sorted for ACT from eight patients with TNBC, showing ploidy distributions, with vertical red lines showing the sorting gates. b, Shannon diversity indexes calculated from the single-cell copy number data from each of the eight individuals with 95% confidence intervals indicated. c, Heat map of the genomic regions of cCNAs, sCNAs and uCNAs across the eight tumour samples. d, Distributions of the genomic segment sizes of clonal, subclonal and unique CNAs across the eight tumours. e, Proportion of genome altered relative to the tumour ploidy classified as copy number losses in blue, neutral ground state copy number in white and gains in red. f, Bootstrapping of subclone clusters showing the mean Jaccard similarity for each subclone across the eight tumours. g, Scatter plots of number of cells in each subclone cluster by mean Jaccard similarity for each of the eight tumours.
a, Clustered heat maps of single-cell copy number profiles for TN3–TN8 with left annotation bars representing superclones and subclones, and bottom annotation bars representing different genomic regions of CNA classes as well as annotations for selected breast cancer genes. b, Matrix plots for TN3–TN8, showing integer copy number states for selected breast cancer genes in regions of cCNAs, sCNAs and uCNAs across the different subclones in each tumour.
a, Co-clustering of ACT and 10X Genomics copy number data for samples TN1 (n = 1,976 cells) and TN3 (n = 2,171 cells), showing subclones detected in the merged datasets. b, Frequency of subclones detected on each platform in the merged datasets from 10X and ACT. c, Clustered heat maps of single-cell copy number profiles for TN1 and TN3 with left annotation bars representing the scDNA-seq technology platform and the different subclones, with annotations for selected breast cancer genes indicated below. d, Bar plots of copy number state frequencies of selected breast cancer genes for ACT and 10X CNV showing the proportion of copy number states for all cells separated by platform.
a, Most frequent exonic mutations in genes with significant SIFT (<0.05) and PolyPhen-2 (>0.85) scores. b, Exome mutation counts of each tumour indicating mutations that were classified as clonal or subclonal based on allele-specific copy number frequencies. c, Density plots showing the probability of genome doubling as a function of relative mutational time for 7 out of the 8 patients with TNBC with sufficient number of truncal exome mutations. d, Minimum evolution trees of single-cell copy number profiles using Manhattan distances for TN3–TN8, indicating the distance from the diploid root node to the MRCA and the distance from the MRCA to the terminal nodes. Annotations indicate the timing of genome doubling and timing of TP53 mutations before WGD in all of the tumours. e, Summary of the truncal distances from the diploid root node to the MRCA and the branching distances from the MRCA to the last terminal node.
a, Left, minimum evolution trees after the MRCA generated using the consensus CNA profiles of subclones for TN3–TN8 rooted by a neutral node to the MRCA and coloured by superclones and subclones. Right, heat maps of consensus subclones profiles, with annotations for the superclones and subclones on left annotation bars and bottom annotation bars showing different CNA classes, as well as selected breast cancer genes. The last row in the clustered heat maps shows the inferred MRCA copy number profiles. b, Genome-wide copy number profiles of TNBC tumours with segments of the rounded total copy number (orange) and the rounded number of copies of the minor allele (blue). Thick segments are ASCAT profiles from the exome bulk, and thinner segments are from the superclones with slight offset relative to integer values for visualization. For each superclone, parentheses show the percentage of the genomic region in which both the minor and major allele copy numbers are the same as in the exome, restricting analysis to the genomic region where the total is also the same.
a, Comparison of the expected CNA frequency spectrum obtained from theory and simulation. Simulations include a flexible fitness distribution, whereas the theoretical analysis considers neutral and lethal changes only. Different colours correspond to varying the increase in CNA rate during the transient instability phase, and the tumour size at which the instability subsides. Exact parameters are provided in the Supplementary Methods. b, Maximum-likelihood fits for the breakpoint frequency spectra obtained for TNBC tumours under models of gradual and transient instability after PCNE; parameter values for simulations and further details are provided in the Supplementary Methods. c, Maximum-likelihood fits for the breakpoint frequency spectra obtained from expanded clones of MDA-MB-231 under models of gradual and transient instability. Further details are provided in the Supplementary Methods.
a, b, Clustered heat maps of single-cell copy number data from the BT-20 (n = 1,231 cells) and MDA-MB-157 (n = 1,210 cells) cell lines, in which left annotation bars represent superclones and subclones, and the bottom annotation bar represents different classes of CNA types. c, Number of superclones and subclones identified in the TNBC cell lines. d, Number of clonal, subclonal and unique CNAs detected in the four TNBC cell lines, as well as the two MDA-MB-231 expanded daughter cells. e, Distributions of the genomic sizes of clonal, subclonal and unique CNAs across the four TNBC cell lines and the two MDA-MB-231 expanded daughter cell lines. f, Shannon indexes calculated from the single-cell copy number profiles from the four TNBC cell lines and the two expanded MDA-MB-231 daughter cells with 95% confidence intervals. g, Microscopic field of DNA-FISH experiments of MDA-MB-231 using AKT3 and BCAS2 probes at 60× original magnification. h, Bar plots showing the results of DNA-FISH copy number states counted across 1,000 cells for each of the probes compared to the ACT data. i, Clustered heat map of single-cell copy number data for MDA-MB-231 EX2 cell line expansion (n = 897 cells), in which left annotation bars represent superclones and subclones, and the bottom annotation bar represents different classes of CNA types.
a, Schematic of physical single-cell subcloning experiments of daughter cells to generate 78 expansions from the MDA-MB-231 parental cell line. b, Co-clustering of the single-cell copy number data from the parental MDA-MB-231 cell line (n = 820 cells) with the 78 expanded clone bulk DNA-seq copy number profiles. c, Principal component analysis of bulk RNA-seq profiles of the 78 expanded daughter cell lines triplicates, with contour colour representing superclones and point colour representing the subclone clusters from the genotypes of the single-cell and bulk DNA-seq co-clustering. d, Clustered heat map of bulk DNA copy number profiles from the 78 expanded clones, with left annotation bars representing superclones and subclones, as determined by co-clustering with the parental single-cell copy number data. e, Mean gene expression levels of different copy number states for 78 expansions from the MDA-MB-231 parental cell line. f, Cumulative number of subclonal segments as a function of Kruskal–Wallis test P-value, in which the red line denotes a P-value of 0.05. g, Mean gene expression as a function of copy number segments with points representing expanded clusters for two subclonal CNAs on chr11 and chr19. h, i, Consensus integer copy number profiles of the 10 expanded clone clusters on chromosome 11 (h) and chromosome 19 (i) (top) with matched RNA-seq expression (bottom) using moving windows of 100 genes. Right, selected breast cancer genes in subclonal CNA regions and their corresponding box plots of RNA expression for each expanded cluster. Box plots show the median, box edges represent the first and third quartiles, and the whiskers extend to 1.5× interquartile range. j, Cancer hallmark signatures with significant variability of normalized enrichment scores (NES) across the expanded clone clusters.
a–c, Three models of chromosome evolution dynamics during the expansion of primary TNBC tumours, with schematic plots of chromosome accumulation over time (left) and Muller plots of clonal frequencies (right). a, Gradual model of copy number evolution, in which CNAs are acquired sequentially throughout tumour progression leading to the expansion of successive subclones over time. b, Punctuated copy number evolution model, in which an initial burst of instability generates a large number of CNAs and subclones that undergo stable expansions to form the primary tumour mass, with no (or few) new CNAs acquired after the initial burst. c, Model of punctuated evolution and transient instability, in which the early acquisition of TP53 mutations and genome doubling lead to a burst of genomic instability in which a large number of CNA events are acquired and subclones are generated. These events are followed by a period of transient instability and ongoing copy number evolution during the expansion of the primary tumour mass, which leads to the generation of additional subclones and genomic diversity.
This files contains Supplementary Table 1: Relevant clinical information for the eight TNBC tumors and four TNBC cell lines that were analyzed in this study; Supplementary Table 2: Exome sequencing mutational information from bulk DNA exome of the 8 TNBC tumors; Supplementary Table 3: List of dual barcodes from ACT protocol; and Supplementary Methods: Extensive description of the methodology used for the mathematical modeling section, including Supplementary Figures 1-20 and additional references.
About this article
Cite this article
Minussi, D.C., Nicholson, M.D., Ye, H. et al. Breast tumours maintain a reservoir of subclonal diversity during expansion. Nature 592, 302–308 (2021). https://doi.org/10.1038/s41586-021-03357-x
BMC Bioinformatics (2021)
Nature Communications (2021)
Pan-cancer analysis reveals homologous recombination deficiency score as a predictive marker for immunotherapy responders
Human Cell (2021)
Cellular and Molecular Life Sciences (2021)