Single-cell genomic variation induced by mutational processes in cancer

Funnell, Tyler; O’Flanagan, Ciara H.; Williams, Marc J.; McPherson, Andrew; McKinney, Steven; Kabeer, Farhia; Lee, Hakwoo; Salehi, Sohrab; Vázquez-García, Ignacio; Shi, Hongyu; Leventhal, Emily; Masud, Tehmina; Eirew, Peter; Yap, Damian; Zhang, Allen W.; Lim, Jamie L. P.; Wang, Beixi; Brimhall, Jazmine; Biele, Justina; Ting, Jerome; Au, Vinci; Van Vliet, Michael; Liu, Yi Fei; Beatty, Sean; Lai, Daniel; Pham, Jenifer; Grewal, Diljot; Abrams, Douglas; Havasov, Eliyahu; Leung, Samantha; Bojilova, Viktoria; Moore, Richard A.; Rusk, Nicole; Uhlitz, Florian; Ceglia, Nicholas; Weiner, Adam C.; Zaikova, Elena; Douglas, J. Maxwell; Zamarin, Dmitriy; Weigelt, Britta; Kim, Sarah H.; Da Cruz Paula, Arnaud; Reis-Filho, Jorge S.; Martin, Spencer D.; Li, Yangguang; Xu, Hong; de Algara, Teresa Ruiz; Lee, So Ra; Llanos, Viviana Cerda; Huntsman, David G.; McAlpine, Jessica N.; Shah, Sohrab P.; Aparicio, Samuel

doi:10.1038/s41586-022-05249-0

Download PDF

Article
Open access
Published: 26 October 2022

Single-cell genomic variation induced by mutational processes in cancer

Tyler Funnell ORCID: orcid.org/0000-0003-1612-5644^1,2^na1,
Ciara H. O’Flanagan³^na1,
Marc J. Williams ORCID: orcid.org/0000-0001-5524-4174²^na1,
Andrew McPherson²,
Steven McKinney ORCID: orcid.org/0000-0002-1611-0867³,
Farhia Kabeer^3,4,
Hakwoo Lee^3,4,
Sohrab Salehi²,
Ignacio Vázquez-García²,
Hongyu Shi²,
Emily Leventhal²,
Tehmina Masud³,
Peter Eirew³,
Damian Yap ORCID: orcid.org/0000-0002-5370-4592³,
Allen W. Zhang³,
Jamie L. P. Lim²,
Beixi Wang³,
Jazmine Brimhall³,
Justina Biele³,
Jerome Ting³,
Vinci Au ORCID: orcid.org/0000-0002-3587-6200³,
Michael Van Vliet³,
Yi Fei Liu³,
Sean Beatty ORCID: orcid.org/0000-0001-6819-7071³,
Daniel Lai ORCID: orcid.org/0000-0001-9203-6323^3,4,
Jenifer Pham³,
Diljot Grewal²,
Douglas Abrams²,
Eliyahu Havasov²,
Samantha Leung²,
Viktoria Bojilova²,
Richard A. Moore⁵,
Nicole Rusk ORCID: orcid.org/0000-0003-2663-6288²,
Florian Uhlitz²,
Nicholas Ceglia²,
Adam C. Weiner ORCID: orcid.org/0000-0002-5968-3606^1,2,
Elena Zaikova³,
J. Maxwell Douglas³,
Dmitriy Zamarin ORCID: orcid.org/0000-0002-0094-0161⁶,
Britta Weigelt ORCID: orcid.org/0000-0001-9927-1270⁷,
Sarah H. Kim⁸,
Arnaud Da Cruz Paula⁸,
Jorge S. Reis-Filho ORCID: orcid.org/0000-0003-2969-3173⁷,
Spencer D. Martin⁴,
Yangguang Li³,
Hong Xu³,
Teresa Ruiz de Algara³,
So Ra Lee³,
Viviana Cerda Llanos³,
David G. Huntsman^3,4,
Jessica N. McAlpine ORCID: orcid.org/0000-0001-6003-485X⁹,
IMAXT Consortium,
Sohrab P. Shah ORCID: orcid.org/0000-0001-6402-523X² &
…
Samuel Aparicio ORCID: orcid.org/0000-0002-0487-9599^3,4

Nature volume 612, pages 106–115 (2022)Cite this article

44k Accesses
26 Citations
161 Altmetric
Metrics details

Subjects

Abstract

How cell-to-cell copy number alterations that underpin genomic instability¹ in human cancers drive genomic and phenotypic variation, and consequently the evolution of cancer², remains understudied. Here, by applying scaled single-cell whole-genome sequencing³ to wild-type, TP53-deficient and TP53-deficient;BRCA1-deficient or TP53-deficient;BRCA2-deficient mammary epithelial cells (13,818 genomes), and to primary triple-negative breast cancer (TNBC) and high-grade serous ovarian cancer (HGSC) cells (22,057 genomes), we identify three distinct ‘foreground’ mutational patterns that are defined by cell-to-cell structural variation. Cell- and clone-specific high-level amplifications, parallel haplotype-specific copy number alterations and copy number segment length variation (serrate structural variations) had measurable phenotypic and evolutionary consequences. In TNBC and HGSC, clone-specific high-level amplifications in known oncogenes were highly prevalent in tumours bearing fold-back inversions, relative to tumours with homologous recombination deficiency, and were associated with increased clone-to-clone phenotypic variation. Parallel haplotype-specific alterations were also commonly observed, leading to phylogenetic evolutionary diversity and clone-specific mono-allelic expression. Serrate variants were increased in tumours with fold-back inversions and were highly correlated with increased genomic diversity of cellular populations. Together, our findings show that cell-to-cell structural variation contributes to the origins of phenotypic and evolutionary diversity in TNBC and HGSC, and provide insight into the genomic and mutational states of individual cancer cells.

A single-cell atlas enables mapping of homeostatic cellular shifts in the adult human breast

Article Open access 28 March 2024

Austin D. Reed, Sara Pensa, … Walid T. Khaled

Spatial transcriptomics reveals discrete tumour microenvironments and autocrine loops within ovarian cancer subclones

Article Open access 03 April 2024

Elena Denisenko, Leanne de Kock, … Alistair R. R. Forrest

Evolutionary trajectories of small cell lung cancer under therapy

Article Open access 13 March 2024

Julie George, Lukas Maas, … Roman K. Thomas

Main

The identification and characterization of endogenous mutational processes^4,5,6 have transformed our understanding of cancer genomes^{6,7,8,9,10,11}, and have led to improved prognostic and therapeutic stratification of cancers with genomic instability^12,13,14. However, mutational processes are typically inferred from bulk whole-genome sequencing (WGS), which yields aggregate signals from pools of DNA composed of millions of cells. Thus, contemporaneous post-mitotic cell-to-cell variation due to genomic instability is not detectable in bulk sequencing, and has been understudied. Single-cell WGS can readily decompose clone-specific and cellular genomic events^2,3,15, enabling the calculation of copy number alteration (CNA) and structural variation (SV) accrual rates and mutational patterns over thousands of individual cells. This allows for the separation of evolutionary vestigial events, which are present in initial clonal expansions, from contemporaneous ‘foreground’ events, which reflect ongoing mechanisms of cell-to-cell genomic diversification. For example, breakage–fusion–bridge cycles (BFBCs) and homologous recombination deficiency (HRD) are endogenous mutational processes that accrue SVs with specific patterns including tandem duplications, interstitial deletions and fold-back inversions (FBIs) that generate high-level copy number amplifications^5,10,12,14. Because HRD and BFBCs are predicted to induce cell-specific structural changes on individual maternal or paternal alleles, a haplotype-specific analysis is essential for a comprehensive account of genome-scale structural variation. Here we combine single-cell approaches with haplotype-specific analysis to reveal how different mutational processes diversify the genomes of individual cancer cells and thereby determine phenotypic variation and evolutionary selection in human tumours. We apply scaled single-cell WGS and haplotype-specific analysis to an in vitro cell line system with experimentally induced HRD-associated genomic instability and human breast and ovarian tumours defined by SV-associated mutational processes^{6,10,12,16,17}. Our study reveals three sources of cell-to-cell variation in cancer genomes, with implications for interpreting phenotypic diversity and evolutionary selection in cancers with genomic instability.

Induced single-cell genomic instability

We first developed a combined experimental and computational approach for studying genome-scale cell-to-cell variation in human cells, by establishing an in vitro isogenic system of breast epithelium with induced HRD and defined temporal passaging. We generated TP53 (ref. ¹⁸), TP53 and BRCA1, and TP53 and BRCA2 loss-of-function genotype lineages from diploid non-transformed 184-hTERT mammary epithelial cells¹⁹ using CRISPR–Cas9 editing (Fig. 1a, Extended Data Figs. 1 and 2a,b and Supplementary Table 1). We then subjected these cells to tagmentation whole-genome single-cell sequencing (DLP+), which enables scaled analysis of each population and inference of cell-specific rates of structural alterations³. In addition, we developed a computational method called SIGNALS, a hidden Markov model (HMM) which phases copy number events to individual homologues²⁰ in single-cell genomes to quantify haplotype-specific CNA as a source of cell-to-cell variation. SIGNALS was benchmarked on the ovarian cancer cell line OV2295, and when evaluated across different technologies and tumour types showed increased genomic and cellular resolution (0.5 Mb) compared with previously published methods^21,22, identified cell-to-cell diversity that would be unclear when relying on total copy number, and exhibited the expected distributions of phased somatic point mutation variant allele fractions (VAFs) resulting from haplotype-specific gains and losses (Extended Data Fig. 3 and Supplementary Note).

**Fig. 1: Single-cell genome properties of CRISPR–Cas9-derived isogenic genotypes of 184-hTERT mammary epithelial cell lines.**

Single-cell WGS libraries (DLP+) (median 0.04× coverage, interquartile range (IQR) 0.03) from each genotype combination were generated as follows: 184-hTERT wild type (n = 878 genomes), 184-hTERT^TP53−/− (TP53^−/−, two lines, n = 1,634), 184-hTERT^{TP53−/−,BRCA1+/−} (BRCA1^+/−, n = 377), 184-hTERT^{TP53−/−,BRCA1−/−} (BRCA1^−/−, n = 382), 184-hTERT^{TP53−/−;BRCA2−/−} (BRCA2^−/−, two lines, n = 887) and 184-hTERT^{TP53−/−;BRCA2+/−} (BRCA2^+/−, n = 472) (Fig. 1a, Extended Data Fig. 4 and Supplementary Tables 2 and 3). Per-cell copy number distributions showed a progressive increase in the rates of CNA as a function of TP53 and BRCA1 or BRCA2 loss (Fig. 1b–e). In addition, we observed increasing whole-genome polyploidy (Fig. 1f), chromosomal missegregation (Fig. 1g and Methods) and per-cell alteration counts in TP53^−/−; BRCA2^−/− and BRCA1^−/− cells, respectively, relative to wild-type cells. BRCA1^−/− genomes (median 53 events per cell) also contained higher rates of per-cell segmental alteration counts (Fig. 1h) relative to BRCA2^−/− (30 and 10), BRCA2^+/− (6), BRCA1^+/− (6), TP53^−/− (5) or wild-type (1) cell lines (all P < 10⁻¹⁰). In BRCA1^−/− cell lines, most cells had also undergone whole-genome duplication, consistent with BRCA1- and BRCA2-deficient cancers^23,24 (Fig. 1e,f). We then compared distributions of the ratio of gains to losses over cells, assuming that unbalanced ratios would indicate tolerance away from neutrality. The ratio was balanced in the wild-type cells and in BRCA2^+/− cells; however, BRCA1^−/−, BRCA1^+/−, and BRCA2^−/− cells exhibited skewed ratios towards losses relative to wild-type cells (P < 0.05) (Fig. 1i). SIGNALS analysis revealed extensive loss of heterozygosity (LOH) and haplotype-specific events across cells (Fig. 1b–e), with higher rates of segmental homozygosity in BRCA1^−/− (6.3×, P < 10⁻¹⁰) and BRCA2^−/− (13.5× and 2.5×, P < 10⁻¹⁰) relative to TP53^−/− (Fig. 1j). Analysis of cell-to-cell pairwise haplotype-specific copy number (HSCN) distances^20,25 found that TP53^−/− induced a 3.9-fold (SA906a) and 1.9-fold (SA906b) increase in cell-to-cell divergence, BRCA2^−/− induced a 4.5-fold (SA1055) and 2.6-fold (SA1056) increase and BRCA1^−/− induced a 13.7-fold increase (Fig. 1k; P < 10⁻¹⁰) relative to pairwise distances in wild-type cells.

We next tested whether haplotype-specific analysis at single-cell resolution could identify properties of mutational processes (Fig. 2). BFBC processes induce segmental amplifications adjacent to terminal losses on the same homologue, staircase-like copy number patterns and clustered FBI breakpoints^26,27,28 (Fig. 2b–d and Extended Data Fig. 5a). Using haplotype-specific alterations, we identified subclonal and variable amplitude high-level amplifications (HLAMPs, defined by 10 or more copies). HLAMPs were rare in the wild-type setting but increased with TP53 loss of function, and further increased with BRCA1 or BRCA2 loss of function (Fig. 2a). Notably, some HLAMPs were consistent with BFBCs, and affected known oncogenes including MYC (SA1188, Fig. 2b; SA906a, Fig. 2c) and PIK3CA (SA1054, Fig. 2c). An early passage of SA1188 BRCA2^+/− (1,395 cells; Extended Data Fig. 5a,b) exhibited hallmark patterns of BFBCs on chr. 3q through the presence of extant cells mapping to expected stepwise stages of progression with successive cell divisions. This included clusters of cells with reciprocal gains and losses, clusters in which the loss was extended and clusters in which a segmental amplification was adjacent to a terminal loss, including examples of cells with PIK3CA amplification (Extended Data Fig. 5c–f). Thus, an in vitro system characterized by population-scale single-cell sequencing revealed specific cell divisions that generated cell-to-cell variation in the amplitude and genomic structure of HLAMPs.

**Fig. 2: Processes that generate cell-to-cell variation in single-cell genomes.**

We then quantified the extent of parallel HSCN alterations, whereby cells with an identical total copy number at a given locus were composed of subsets of cells segregated by altered maternal or paternal alleles²⁹ (Fig. 2e,f). The rates of parallel losses and gains were increased in TP53^−/− cells relative to wild-type cells, and were further increased in BRCA1^−/− and BRCA2^−/− populations (Fig. 2g). Notably, the parallel events affected transcriptional phenotypes that resulted from the loss of either allele A or allele B; chr. 2q in SA906b provides an example (Extended Data Fig. 5g,h). Chr 2q losses in matched single-cell RNA sequencing (scRNA-seq) data were readily identified by SIGNALS (Supplementary Note) and cells with the loss of allele A or B clustered together in gene expression space (Extended Data Fig. 5i). The nearest neighbours of monosomic 2q cells in scRNA-seq were equally enriched for losses of both A and B alleles (Extended Data Fig. 5j), suggesting that maternal and paternal allelic losses converge on a common transcriptional phenotype.

In addition to multi-allelic variation, we observed extensive cell-to-cell variation in the genomic locations of breakpoints of CNA events. The precise boundaries of CNAs from cell to cell yielded a pattern that we term 'serrate structural variation' (SSV) (Fig. 2h), which consists of a modal breakpoint across cells, with ‘tails’ that reflect either a progressive accumulation or ‘erosion’ away from the modal breakpoint. The aggregate, consensus copy number profiles over cells across the entire SSV regions (analogous to what would be seen in bulk sequencing libraries), revealed sloping copy number changes between integer values, indicative of an averaged signal with underlying variance (Fig. 2h). In some cases, these events were restricted to a single allele (for example, SA906a chr. 19), whereas in others, both alleles were implicated (for example, SA906b chr. 2). Further DLP+ sequencing from serial passaging¹⁸ of these cells (an additional 7,793 cells), indicated that SSV events distributed across serial passaging, consistent with an ongoing mutational process (Fig. 2h).

In summary, the induction of genomic instability in breast epithelium yielded progressively higher rates of genomic divergence between individual cells, measurable as rate distributions with scaled single-cell WGS and cell-specific CNAs. The resulting 'foreground' cell-to-cell variation could be further characterized as clone- and cell-specific HLAMP, parallel allele alteration and serriform patterns of copy number breakpoints in the cellular population.

Cell-level CNA variation in HGSC and TNBC

On the basis of observing foreground mutational patterns defined by cell-to-cell variation, we next asked how the foreground event types distributed as a function of HRD and non-HRD mutational processes in TNBC and HGSC cancers. To identify appropriate patient tumour samples for this comparison, we first constructed a ‘meta-cohort’ of 309 patients comprising 170 patients with HGSC and 139 patients with TNBC with bulk tumour–normal paired WGS to infer the distribution of established mutational processes (106 TNBC and 22 HGSC genomes were newly sequenced for this study and combined with published HGSC^12,30,31,32 and TNBC^{3,18,33,34,35} datasets (Extended Data Fig. 1)). We applied a previously described correlated topic model machine learning approach (MMCTM)¹⁰ and recapitulated previously described groups of tumours. Distinct structural copy number mutational features in both TNBC and HGSC were observed as follows: HRD-Dup (enriched in small tandem duplications and BRCA1 mutations), HRD-Del (enriched in deletions, BRCA2 mutations), FBI (enriched in FBIs and CCNE1 amplification) and TD (enriched in large tandem duplications, CDK12 mutations) (Extended Data Figs. 6 and 7 and Supplementary Tables 4 and 5). Prognostic association of these groups in the meta-cohort of patients with HGSC was consistent with previous findings^10,12 (P = 0.0038; Extended Data Fig. 7d), with HRD-Del at a higher median survival than HRD-Dup, followed by FBI and TD with the worst median survival (Extended Data Fig. 7e; P = 0.0022). We then selected 23 cases (16 HGSC and 7 TNBC) from the meta-cohort across a range of signature types (Extended Data Fig. 1 and Supplementary Table 4), from which we generated patient-derived xenografts (PDXs), passaged over a multi-year period using subcutaneous engraftment³³ (Extended Data Fig. 2c and Methods). DLP+ libraries from HRD-Dup (n = 8), TD (n = 3), and FBI (n = 12) PDXs and patient tissues yielded a total of 22,057 genomes (median 556 per series), and a median of 1.96 million reads per genome (median 0.05× coverage, IQR 0.05; Extended Data Fig. 4 and Supplementary Tables 2 and 3). Single-nucleotide variant (SNV) and SV mutational signature profiles that were inferred from DLP+-derived pseudobulk from the PDXs clustered with their bulk WGS counterparts (Extended Data Fig. 8a), indicating consistent mutational signature types without significant distortion of the signals from the original source tumour. In addition, SIGNALS analysis from DLP+ showed that the proportion of the genome identified as homozygous was highly correlated with bulk sequencing (R = 0.9, P < 0.001; Extended Data Fig. 8b), and that VAFs of somatic mutations were distributed as expected (Extended Data Fig. 8c), indicating accurate single-cell HSCN inference.

Cellular copy number profiles revealed extensive subclonal heterogeneity and cell-to-cell variation in both HRD and FBI tumours (Fig. 3a,b and Extended Data Fig. 8d). However, FBI cells exhibited higher overall rates of polyploidy (Fig. 3c; P = 0.02) and chromosomal missegregation relative to HRD-Dup (Fig. 3d; P = 0.0015). In addition, FBI tumours accrued gains at a significantly higher rate than did HRD-Dup tumours, with more skewing of the gain/loss ratio (4.9 versus 2.1, P = 0.04; Fig. 3e and Extended Data Fig. 8e,f). This was more pronounced when considering the baseline ploidy of the tumours (P = 0.0012; Extended Data Fig. 8g). Indeed, higher rates of polyploidy and segmental copy number gains may provide a greater opportunity for—and greater tolerance of—the large interstitial deletions that are found in some FBI cancers (Extended Data Fig. 7b). Pairwise HSCN distances between cells, reflecting cell-population diversity, yielded highly variable distributions across samples, ranging from a median value of 2 for the diploid TD case SA1047 to more than 123 for the pentaploid FBI case SA604 (Extended Data Fig. 8h). FBI tumours were more diverse than HRD-Dup or TD samples, with average HSCN distances of 71 (FBI), 46 (HRD-Dup) and 26 (TD) (P = 0.047 FBI versus HRD-Dup, P = 0.031 FBI versus TD; Extended Data Fig. 8i). Thus, considering whole-genome duplication, overall rates of segmental aneuploidy and the gain/loss ratio, FBI and HRD-Dup tumours showed markedly different patterns of CNA accrual at the single-cell level.

**Fig. 3: Single-cell genome properties of PDX models and patient tissues.**

HLAMP amplitude varies within FBI tumours

Next, we determined whether the CNA patterns that gave rise to single-cell variation in the cell lines could also explain cell-to-cell variation in the amplitude of HLAMPs in the tumours. We found extensive heterogeneity in the amplitude of HLAMPs across clonal populations within tumours, that would otherwise be obscured in bulk sequencing. By first focusing on a specific example, we assessed the phenotypic effect of a clone-specific HLAMP in the KRAS locus, present with average copy number 16.1 in a clone with 55 cells relative to a sibling clone composed of 230 cells that lacked the amplification (Fig. 4a). KRAS was differentially expressed between cell clusters from matched scRNA-seq data (maximum log-transformed fold change (logFC) = 0.346, q < 0.05; Fig. 4b), and immunohistochemistry for KRAS both in tissue from the primary patient and in PDX tissue corroborated a punctate pattern of expression across spatially separated regions within tumour sections (Fig. 4c). Thus, in a specific example, clone-specific HLAMP of an oncogene in a minor clone—otherwise not detectable with bulk methods—revealed co-associated clone-specific phenotypic variation. Across the dataset as a whole, FBI tumours had a 1.9-fold higher median HLAMP copy variance than did the other tumours (P = 0.00096; Fig. 4d,e), consistent with continual plasticity of HLAMP amplitude as a general property of FBI. Most events were less than 10 Mb in width (56%; Fig. 4f) and exhibited a distribution of maximum observed copy number with median 16.1 and IQR 8.7 (Fig. 4g). Furthermore, we noticed that amplitude variation in HLAMPs affected numerous other known oncogenes, including ERBB2 (DG1197), KIT (DG1197), KRAS (SA1049 and SA604), MYC (SA1184 and SA1051), CCNE1 (DG1134, SA1162 and SA604) and FGFR1 (SA1049 and SA535) (Fig. 4h). Notably, oncogenes with a variable copy number between cells and clones also exhibited greater variability in gene expression than did other genes, as measured by matched scRNA-seq (P = 0.012; Fig. 4i).

**Fig. 4: HLAMP copy number variation.**

To determine the structural processes that lead to these events, we found that the rearrangement properties of variable HLAMPs were enriched for FBIs, consistent with BFBCs being a central mechanism of variable HLAMPs in FBI tumours. We also found clusters enriched for simple tandem duplications driving variable HLAMPs in HRD-Dup tumours (Methods and Extended Data Fig. 9a), providing further evidence for the different aetiological origin of these events in FBI and HRD-Dup tumours. This analysis also revealed that in many cases, clone-specific HLAMPs were part of complex genomic structures involving multiple chromosomes. For example, variable amplitude around the CTNNB1 locus in SA1096 coincided with a translocation between chr. 3 and chr. 6 (Fig. 4j). Long-read single-molecule nanopore sequencing³⁶ of the same samples validated the presence of this rearrangement (Extended Data Fig. 9d). Other examples of complex inter-chromosomal HLAMPs with orthogonal long-read sequencing included fixed non-variable amplification of CCNE1 in SA530 (chr. 4 and chr. 19), variable MYC amplification in SA1184 (chr. 3 and chr. 8) and amplification on 5q in SA1184 (Extended Data Fig. 9b,c and Supplementary Note).

Thus, cell-to-cell variability in HLAMP—which is not observable with bulk sequencing—is a pervasive mutational pattern that is most pronounced in FBI tumours, and consists of clone-specific complex rearrangements that influence phenotype through variable oncogene expression.

Haplotype-specific parallel evolution

We next investigated the extent of haplotype-specific parallel copy number evolution in tumours (Fig. 5). Phylogenetic tree analysis using breakpoints inferred from total copy number across the whole genome^18,37 (see Methods) revealed that in some cases, alleles segregated into distinct clades on the tree; for example, gains of 1q in SA1049 (Fig. 5a) and losses at the terminal end of chr. 10 in SA1053 (Fig. 5b). In other cases, gains and losses of different alleles were sporadic and were distributed more randomly across the tree, such as chr. 8 in SA1093 (Fig. 5c). Parallel copy number events were validated using VAFs of mutations found in these regions, in which—as expected—the VAF distribution inverted between two expected values, depending on allelic composition (Fig. 5d). We contend that in bulk sequencing, represented here by pseudobulk with mixtures (see Methods), the computed VAF no longer reflected the underlying copy number state in a heterogeneous mix of cells (Fig. 5e). We therefore suggest that accurate cancer cell fraction (CCF) inference, which depends on accurate VAF values, may be challenging in tumours with parallel copy number evolution.

**Fig. 5: Haplotype-specific parallel copy number evolution.**

We confirmed that parallel CNAs influence transcription with matched scRNA-seq. Inactivation of TP53 is invariably mediated by LOH of chr. 17 in these cancer types, and chr. 17 was indeed mono-allelically expressed across all tumours—in contrast to the hTERT wild-type cell line, which was used here as a control population (Fig. 5i). In addition, genes located at the terminal end of chr. 10 in SA1053 (Fig. 5b), were mono-allelically expressed in 100% of cells, with one cluster of cells expressing the B allele and another group of cells expressing the A allele (Fig. 5f,g). Across all data with matched scRNA-seq, mean BAF values per segment per sample measured in single-cell DNA sequencing (scDNA-seq) were strongly correlated (R = 0.91, P < 10⁻⁵) with those measured in scRNA-seq (Fig. 5h), consistent with allele bias at the DNA level translating to consequent allele bias in expression.

Notably, nearly all tumours exhibited parallel CNA evolution. We classified genomic segments as parallel CNAs if more than 1% of cells had gain or loss of both the A and B alleles and assigned the clonality using total copy number as follows: clonal (CCF > 80%, as in Fig. 5a,b), subclonal (20% < CCF ≤ 80%) or rare (CCF ≤ 20%, as in Fig. 5c). Every tumour had at least one parallel CNA event, with most containing parallel CNAs at different clonalities (Fig. 5j). Across all samples, an average of 6% of clonal segments, 15% of subclonal segments and 7% of rare segments contained parallel CNAs, with a trend for higher event rates in FBI relative to HRD-Dup (Extended Data Fig. 8j,k; P = 0.02 for subclonal, P > 0.05 for clonal and rare). Motivated by the sporadic pattern of losses of both alleles observed in Fig. 5c, we then tested whether parallel CNAs due to losses were more common on a tetraploid versus a diploid background, using ancestral state reconstruction to estimate the event rate across the phylogenetic tree (see Methods). We found that on a diploid (1|1) background, parallel gains were more common than losses, but that on a tetraploid (2|2) background, parallel losses became more common than gains. This was true for whole chromosome, chromosome arm and segmental aneuploidies (Fig. 5j). The number of parallel CNAs was significantly correlated with both copy number and phylogenetic distances computed using total copy number (Fig. 5k,l). Thus, parallel copy number evolution was a pervasive feature, affecting the interpretation of somatic mutations, haplotype-specific expression and overall levels of genomic diversity in TNBC and HGSC tumours.

Increased CNA serriformity in FBI tumours

SSVs first identified in cell lines from single-cell WGS represent a structural mutation type that is not identifiable using bulk WGS, as the cell-to-cell variation in copy number breakpoints is obscured. We analysed the tumour DLP+ data for the presence of SSVs (Fig. 6a; heat maps 2–5 from left). SSVs, occurring at a megabase length scale, were distinct from small cell-to-cell variations in copy number breakpoint localization, which may occur owing to fluctuations in sequencing coverage rather than true changes in copy number (for example, Fig. 6a; invariant copy number heat map boundaries, heat map 1 from left). SSVs were also visible in single cells comprising the serration pattern (Fig. 6b). Additional confirmation of the SSV scale was obtained from allele phasing, in which the concomitant loss of heterozygosity was observed (Fig. 6b, bottom track). We computed serration scores per breakpoint event to identify the relative degree of variation in breakpoints across cells in each cancer. Scores were calculated as the fraction of event-containing cells with rare (less than 5% of cells) event breakpoint positions (see Methods) and with breakpoint regions that met size and prevalence criteria (≥20 Mbp, that is, 40 genomic bins; ≥100 cells with breakpoint event) to permit the detection of positional and cell-to-cell variation. Variable cell-to-cell breakpoints were common, with 6.6% of regions having serration scores of 0.15 or higher (that is, 15% of cells or more have a rare event breakpoint position) across all cases, with FBI cases having the highest (12.1%), HRD-Dup cases the lowest (1.2%), and TD cases intermediate (10.5%) rates. Comparison of distributions of serration using a mixed effects linear model accounting for individual variation indicated that FBI cases had higher degrees of breakpoint variance in breakpoint regions (P = 0.0081; Fig. 6c,d). We further observed that serration scores increased in cases with more polyploid cells (R = 0.68, P = 0.001; Fig. 6e), and as a function of cell-to-cell HSCN distance (R = 0.62, P = 0.0033; Fig. 6f), implicating SSVs as an additional genome-diversifying mechanism in TNBC and HGSC cancers.

**Fig. 6: Breakpoint serriform variability.**

Discussion

Our findings show that cell-to-cell variation at the level of structural and copy number alteration is a pervasive 'foreground' feature of TNBC and HGSC cancers that is exhibited against distinct endogenous mutational processes of genomic instability. Because CNAs can influence the expression levels of hundreds of genes, each of the foreground mutational patterns provides extensive and distinct genomic diversity upon which selection may act. Oncogenic HLAMPs are understood to be key drivers of tumour progression and are prognostic in HGSC when co-localized with FBIs¹². Here we reveal an additional layer of complexity, finding that the amplitude of HLAMPs can vary substantially between cells. Although this has been recognized as a defining feature of extrachromosomal DNA amplifications³⁸, we propose that it is also a general property of other classes of HLAMPs, such as those mediated by BFBCs and by complex inter-chromosomal rearrangement processes³⁹. This has important implications for therapeutic strategies to target frequently altered oncogenes, as cancer types of high genomic instability may be predisposed to containing treatment-resistant clones. Multi-allelic variation within the same locus is also a highly prevalent feature of breast and ovarian cancers, consistent with some previous observations in other cancers^21,22,30. Notably, events that appeared clonal at the total copy number level were often composed of distinct clades with different alleles gained or lost; this might reflect evolutionary convergence for favourable karyotypes at the total copy number level, as shown by transcriptional phenotypic convergence. Evolutionary time series modelling¹⁸ is likely to further help to resolve patterns of phenotypic selection from parallel CNAs. We also highlight that sporadic gains and losses happen on both alleles, with rates increased on a whole-genome-doubled background relative to diploid, potentially reflecting increased fitness tolerance owing to genomic redundancy. Finally, megabase-scale copy length variation at a single-cell level (SSV) has been observed in vitro with cell-selected single-cell sequencing¹. Here we show with single-cell genome sequencing at the cell-population level that SSVs are in fact prevalent in TNBC and HGSC and distribute across clones within tumours. Although the underlying mechanisms that generate SSVs are unknown, they represent a new class of variation that may contribute to the structural copy evolution of tumours enriched in the FBI background and in polyploid genome states. We observed each of the foreground mutational patterns in all mutational processes, but FBI-type tumours showed a significant enrichment in all three foreground patterns. As such, FBI may comprise a distinct phenotypic class in which foreground mutational patterns generate diversity that could underlie poor prognostic significance. We conclude that scaled single-cell sequencing is a useful means to reveal hidden cellular states of structural copy number diversity in genomically unstable tumours. The data that we present here show that foreground mutational patterns are key determinants of genomically encoded phenotypic diversity and consequent ‘evolvability’ in cancer.

Methods

Generation and culture of human mammary epithelial cell lines

The wild-type human mammary epithelial cell line 184-hTERT L9 (SA039) and isogenic 184-hTERT TP53 knockout (SA906) cell line, generated from 184-hTERT L9, were cultured as previously described^18,19,40 in Mammary epithelial cell growth basal medium (MEBM) (Lonza) supplemented with the SingleQuots kit (Lonza), 5 μg ml⁻¹ transferrin (Sigma-Aldrich) and 10 μM isoproterenol (Sigma-Aldrich). Additional truncation mutations (Supplementary Table 4) of BRCA1 (SA1054: c.[427_441+36delGAAAATCCTTCCTTGGTAAAACCATTTGTTTTCTTC];[437_441+8delCCTTGGTAAAACC]; SA1292 c.[71_75delGTCCC];[=]) and BRCA2 (SA1056: c.[6997delG];[6997_6998delGT]; SA1188: c.[6997_6999delGT];[=]; SA1055: c.[3507_3522delinsGA];[3509_3520delinT]) (hg19) were introduced by CRISPR–Cas9 nuclease (pX330 hSpCas9) with an RFP reporter gene using Mirus TransIT LT1 transfection (Mirus Bio). Clonal populations were generated by flow sorting and propagating single RFP-positive cells. Mutations were verified by TOPO cloning and Sanger sequencing of both alleles for genotypes, protein expression by western blotting and absence of off-target effects by sequencing of the top hits. SNV positions from Sanger sequencing data were annotated with information from GENCODE v.19 (ref. ⁴¹). Variant sequence and position was used to annotate variant calls with records from Clinvar 20200206_data_release⁴² and COSMIC v. 91 (ref. ⁴³). Although multiple BRCA2 homozygous loss-of-function alleles could be derived from 184-hTERT^{p53−/−;BRCA2+/−} intermediates, only a single homozygous BRCA1 allele was retrieved from the 119 clones of 184-hTERT^{p53−/−,BRCA1+/−} that were screened, emphasizing that even with a p53 deletion, full loss of BRCA1 is initially negatively selected. OV2295 cells⁴⁴ were maintained in a 1:1 mix of Media 199 (Sigma-Aldrich) and MCDB 105 (Sigma-Aldrich) supplemented with 10% fetal bovine serum (FBS) under normoxic conditions. Cell lines were authenticated by short tandem repeat (STR) profiling and tested for mycoplasma by LabCorp.

Immunoblotting

184-hTERT cells were lysed directly in 1× Laemmli buffer supplemented with 7.5% β-mercaptoethanol and proteins were denatured at 95 °C for 15 min. Protein from 250,000 cells was resolved on a 4–15% acrylamide gel (Biorad) or 3–8% Tris-acetate acrylamide gel (Novex) and transferred to a nitrocellulose membrane with Towbin transfer buffer overnight at 30 V at 4 °C. Blots were blocked with 5% milk in TBST for 1 h and incubated overnight at 4 °C with mouse anti-p53 (Santa Cruz SC-126, 1:500 in 5% bovine serum albumin (BSA)), mouse anti-BRCA1 (Santa Cruz SC-6954, 1:200 in 5% BSA), mouse anti-BRCA2 (Millipore OP95, 1:200 in 5% BSA) or goat anti-GAPDH (SC-48166, 1:500 in 5% BSA). Blots were washed five times for 5 min in TBST and incubated with anti-goat HRP-conjugated secondary antibody (Abcam ab6721, 1:5,000 in 5% BSA) for 1 h at room temperature, then washed five times for 5 min and the signal was imaged using Immobilon Western Chemiluminescent HRP Substrate (MilliporeSigma, WBKL20500) and the ImageQuant LAS 4000 (GE Healthcare) using the ImageQuant TL software.

Verification of mutations in the 184-hTERT cell line

Genomic DNA was extracted from 184-hTERT cell lines and regions of interest of BRCA1 or BRCA2 were amplified by PCR. Amplicons were inserted into a pCR-TOPO vector and transformed into Escherichia coli using the TOPO TA cloning kit (Thermo Fisher Scientific). Colonies were selected, DNA purified by Purelink Quick Plasmid Miniprep kit (Thermo Fisher Scientific) and sequenced by Sanger sequencing to assess CRISPR-induced mutations.

Acquisition of samples from patients and patient consent

Samples were acquired with informed consent, according to procedures approved by the Ethics Committees at the University of British Columbia. Patients with breast cancer undergoing diagnostic biopsy or surgery were recruited and samples were collected under protocols H06-00289 (BCCA-TTR-BREAST), H11-01887 (Neoadjuvant Xenograft Study), H18-01113 (Large-scale genomic analysis of human tumours) or H20-00170 (Linking clonal genomes to tumour evolution and therapeutics). HGSC samples were obtained from women undergoing debulking surgery under protocols H18-01652 and H18-01090. Banked HGSC and TNBC specimens were obtained at the Memorial Sloan Kettering Cancer Center following Institutional Review Board (IRB) approval and patient informed consent (protocols 15–200 (HGSC) and 18–376 (TNBC)). HGSC and TNBC clinical assignments were performed according to American Society of Clinical Oncology guidelines for ER, PR and HER2 positivity.

Xenografting

Fragments of tumours from patients were chopped finely with scalpels and mechanically disaggregated for one minute using a Stomacher 80 Biomaster (Seward Limited) in 1 ml cold DMEM/F-12 with glucose, l-glutamine and HEPES (Lonza 12–719F). Two hundred microlitres of medium containing cells or organoids from the resulting suspension was used equally for transplantation in four mice. The remaining tissue fragments were cryopreserved viably in DMEM/F-12 supplemented with 47% FBS and 6% dimethyl sulfoxide (DMSO). Tumours were transplanted in mice as previously described (Eirew) in accordance with SOP BCCRC 009. Female NOD/SCID/IL2rγ^−/− (NSG) and NOD/Rag1^−/−Il2rγ^−/− (NRG) mice were bred and housed at the Animal Resource Centre (ARC) at the British Columbia Cancer Research Centre. For subcutaneous transplants, mechanically disaggregated cells and clumps of cells were resuspended in 150–200 µl of a 1:1 v/v mixture of cold DMEM/F-12:Matrigel (BD Biosciences). Female mice (8–12 weeks old) were anaesthetized with isoflurane and the mechanically disaggregated cell clump suspension was transplanted under the skin on the left flank using a 1-ml syringe and 21-gauge needle. Mice were housed at a 18–25 °C temperature range and 20–70% humidity range, with a 12-h daylight cycle (on at 06:00; off at 18:00). All animal experimental work was approved by the animal care committee (ACC) and animal welfare and ethical review committee at the University of British Columbia (UBC) under protocol A19-0298.

Tissue processing

Xenograft-bearing mice were euthanized when the size of the tumours approached 1,000 mm³ in volume, according to the limits of the experimental protocol. The tumour material was excised aseptically and processed as described for primary tumour. A section of tumour was fixed in 10% buffered formalin for 24 h, dehydrated in 70% ethanol and paraffin-embedded before duplicate 1-mm cores were used to generate tissue microarrays for staining and pathological review. Remaining tumour was finely chopped and gently paddle-blended, and released single cells and fragments were viably frozen in DMEM supplemented with 47% FBS and 6% DMSO.

Histopathology of PDX tumours

Deparaffinized 4-µm sections of tissue microarrays were stained with haematoxylin and eosin or KRAS (Lifespan Bioscience, LS-B4683, 1:50), performed using the Ventana Discovery XT platform and the UltraMap DAB detection kit. HGSC pathology was confirmed by an anatomical pathology resident at University of British Columbia, under the supervision of a certified staff pathologist.

WGS

Genomic DNA was extracted from frozen tissue fragments using the DNeasy Blood and Tissue kit (Qiagen) and constructed libraries for whole genomes of 309 tumour–normal pairs were sequenced on the Illumina HiSeqX according to Illumina protocols, generating 100-bp paired-end reads for an estimated coverage of sequencing between 40× (normal) and 80× (tumour). Sequenced reads were aligned to the human reference GRCh37 (hg19) using BWA-MEM.

Long-read sequencing

High-molecular weight (HMW) DNA was extracted from fresh and/or frozen tissue fragments using the MagAttract HMW DNA Kit (Qiagen) and size-selected using Blue Pippin for single long-molecule sequencing on the PromethION (Oxford Nanopore Technologies).

Generation of single-cell suspensions and nuclei for scDNA-seq

Viably frozen aliquots of patient tissues and PDX tumours were thawed and either homogenized and lysed using Nuclei EZ Buffer (Sigma) or enzymatically dissociated using a collagenase/hyaluronidase 1:10 (10×) enzyme mix (STEMCELL Technologies), as described previously^3,18. Cells and nuclei were stained with CellTrace CFSE (Life Technologies) and LIVE/DEAD Fixable Red Dead Cell Stain (Thermo Fisher Scientific) in a 0.04% BSA/PBS (Miltenyi Biotec 130-091-376) incubated at 37 °C for 20 min. Cells were pelleted and resuspended in 0.04% BSA/PBS. This single-cell suspension was loaded into a contactless piezoelectric dispenser (Cellenone or sciFLEXARRAYER S3, Scienion) and spotted into open nanowell arrays (SmartChip, TakaraBio) preprinted with unique dual index sequencing primer pairs. Occupancy and cell state were confirmed by fluorescent imaging and wells were selected for single-cell copy number profiling using the DLP+ method³. In brief, cell dispensing was followed by enzymatic and heat lysis. After cell lysis, tagmentation mix (14.335 nl TD buffer, 3.5 nl TDE1 and 0.165 nl 10% Tween-20) in PCR water was dispensed into each well followed by incubation and neutralization. For BRCA1^+/− cells, the tagmentation mix consisted of 10 nl TB1 buffer and 10 nl BT1 enzyme without Tween-20 in PCR water. Final recovery and purification of single-cell libraries was done after eight cycles of PCR. Pooled single-cell libraries were analysed using the Agilent Bioanalyzer 2100 HS kit. Libraries were sequenced at the UBC Biomedical Research Centre on the Illumina NextSeq 550 (mid- or high-output, paired-end 150-bp reads), or at the Genome Sciences Centre on the Illumina HiSeq2500 (paired-end 125-bp reads) and Illumina HiSeqX (paired-end 150-bp reads). The data were then processed through a quantification and statistical analysis pipeline³.

Generation of 10X scRNA-seq data

The 184-hTERT cells were pelleted and gently resuspended in 200 µl PBS followed by 800 µl 100% methanol and incubation at −20 °C for 30 min to fix, dehydrate and shrink cells. PDX tumour fragments were dissociated into single cells using collagenase/hyaluronidase at 37 °C for 2 h for TNBC tumours or with cold active Bacillus lichenformis (Creative Enzymes NATE0633) in PBS supplemented with 5 mM CaCl₂ and 125 U ml⁻¹ DNAse for HGSC tumours, as described previously⁴⁵ with additional mechanical dissociation using a gentleMACS dissociator (Miltenyi Biotec). Cells were then pelleted and resuspended in 0.04% BSA/PBS and immediately loaded onto a 10X Genomics Chromium single-cell controller targeting 3,000 cells for recovery. Libraries were prepared according to the 10X Genomics Single Cell 3′ Reagent kit standard protocol. Libraries were then sequenced on an Illumina Nextseq500/550 with 42-bp paired-end reads, or a HiSeq2500 v4 with 125-bp paired-end reads. 10X Genomics Cell Ranger 3.0.2 was used to perform demultiplexing, counting and alignment to GRCh38 and mm10.

Processing of bulk whole-genome data

SNV and SV calls for 121 HGSC samples were acquired from a previous study¹². For new samples, reads were aligned to the hg19 reference genomes using BWA-MEM. Processing proceeded as per the aforementioned study¹² to maintain consistency.

SNVs were called with MutationSeq⁴⁶ (probability threshold = 0.9) and Strelka⁴⁷. The intersection of calls from these methods were retained; however, SNVs falling in blacklist regions were removed. The blacklist regions include the UCSC Genome Browser Duke and DAC blacklists, and those in the CRG Alignability 36mer track that had more than two mismatched nucleotides. SNVs were then annotated with OncoKB⁴⁸ for variant impact.

SVs were called using deStruct⁴⁹ and LUMPY⁵⁰, and breakpoints called by both methods were retained. We then filtered events with the following criteria: any breakpoints falling in the blacklists described above; ≤30-bp inter-breakpoint distance; <1,000-bp deletion; any breakpoints with fewer than 5 supporting reads in the tumour sample or any read support in the matched normal sample.

Gene mutation enrichment analysis was performed using the hypergeometric test for SNVs, amplifications and deep deletions separately, comparing each signature stratum to all other samples.

Nanopore data analysis

For Nanopore sequence data, base calling and read alignment were performed using Guppy v.3 and Minimap2, respectively^51,52. Reads that were likely to be derived from mouse were filtered by first aligning to a concatenated hg19 and mm10 reference, removing reads with alignments to mm10 and re-aligning the remaining reads to hg19. Signal artefact regions as well as alignments with mapping quality of less than 60 were excluded from the final alignments. Alignments were then phased using the PEPPER-Margin-DeepVariant pipeline, after which WhatsHap was used to tag reads in the filtered alignments using phasing information^53,54. SV calling was performed using Sniffles (v.1.0.12) and cuteSV (v.1.0.11) with 5 read support, and subsequently merged using SURVIVOR for a union set of predicted variants^55,56. Alignments and variants were visualized using IGV, Ribbon and the karyoploteR R package^57,58,59.

HGSC and TNBC meta-cohort signature analysis

Signature analysis was performed according to a previous study¹⁰. The MMCTM model was run on the sample SNV and SV count matrices. The number of signatures to estimate in the HGSC and TNBC integrated cohort was chosen by running the above fitting procedure for k = 2–16 for both SNV and SV signatures with the number of restarts set to 500, in which k is the number of signatures. We performed this step on approximately half the mutations in each sample, then computed the average per-mutation log likelihood on the other held-out half of the mutations. The elbow curve method on log-likelihood values was used to select the final number of signatures to fit to the entire dataset.

To estimate MMCTM parameters on the full dataset, α hyper-parameters were set to 0.1. The model was initially fit to the data 1,500 times. Each restart was run for a maximum of 1,000 iterations or until the relative difference in predictive log likelihood on the training data was less than 10⁻⁴ between iterations. The restarts with the best predictive log likelihoods for SNVs and SVs were selected as seeds for the final fitting step. The model was again fit to the data 1,500 times. The model parameters for each restart were set to the parameters of the optimal models from the previous step described above, then run for a maximum of 1,000 iterations or until the relative difference in predictive log likelihood on the training data was less than 10⁻⁵ between iterations. The restart with the best mean rank of the SNV and SV predictive log likelihoods from this round was selected as the final model.

MMCTM estimated SNV signatures were matched to COSMIC signatures by solving the linear sum assignment problem for cosine distances between the MMCTM and COSMIC signatures v.3 (minus tobacco smoking-associated COSMIC SBS4) using the clue R package⁶⁰.

Samples were clustered by first applying UMAP⁶¹ to the normalized signature probabilities for the HRD SNV signature and all SV signatures with n_neighbors = 20 and min_dist = 0 to produce two-dimensional sample embeddings. Next, HDBSCAN⁶² was run on the sample embeddings with min_samples = 5, min_cluster_size = 5 and cluster_selection_epsilon = 0.75 to produce the sample clusters (strata).

Survival analysis

For each patient, the number of days between diagnosis and death or last follow-up was collected. Patients were segregated into groups, and a Kaplan–Meier curve was fitted for each group. Each cancer type was analysed separately and in two distinct grouping schemes. First, patients were split into HRD and 'Other' groups, in which the HRD group included patients whose cancers were identified as being in either the HRD-Dup or HRD-Del groups, and the 'Other' group included all other patients. Next, patients were grouped according to their assigned signature types: HRD-Dup, HRD-Del, TD or FBI.

DLP+ WGS quantification and analysis

Single-cell copy number, SNV and SV calls were generated using a previously described approach³, except that BWA-MEM⁶³ was used to align DLP+ reads to the hg19 reference genome. The genome was segregated into 500-kb bins, and GC-corrected read counts were calculated for each bin. These read counts were then input into HMMCopy⁶⁴ to produce integer copy number states for each bin.

DLP+ data filtering

Cells were retained for further analysis if the cell quality was at least 0.75 (ref. ³), and they passed both the S-phase and the contamination filters. The contamination filter uses FastQ Screen⁶⁵ to tag reads as matching human, mouse or salmon genomes. If more than 5% of reads in a cell are tagged as matching the mouse or salmon genomes, then the cell is flagged as contaminated. The S-phase filter uses the cell-cycle state Random Forest classifier from ref. ³ and removes cells for which S-phase is the most probable state. The HGSC and TNBC cells were also filtered to remove small numbers of contaminating diploid cells.

Finally, cell filtering was performed to remove putative early and late S-phase cells that passed the initial S-phase filter. This involved two steps: first, building a cell phylogeny with sitka³⁷ and manually identifying the minimal phylogeny branches in which the cycling cells have been clustered. The cells in these branches were then removed. Next, clustering cells according to their copy number profiles and removing manually identified clusters of cycling cells.

We removed potentially problematic genome bins from our copy number results that had a mappability score of 0.99 or below, or that were contained in the ENCODE hg19 blacklist⁶⁶.

To detect SNVs and SVs in each dataset, reads from all cells in a DLP+ library were merged to form 'pseudobulk' libraries. SNV calling was performed on these libraries individually using MutationSeq⁴⁶ (probability threshold = 0.9) and Strelka (score > 20) (ref. ⁴⁷). Only SNVs that were detected by both methods were retained. For each dataset, the union of SNVs was aggregated, then for each cell and each SNV, the sequencing reads of that cell were searched for evidence of that SNV. SV calling was performed in a similar manner, by forming pseudobulk libraries, then running LUMPY⁵⁰ and deStruct⁴⁹ on each pseudobulk library, and retaining events that were detected by both methods. LUMPY and deStruct predictions were considered matched if the breakpoints matched in orientation and the positions involved were each no more than 200 nucleotides apart on the genome. Only deStruct predictions with a matching LUMPY prediction were retained. Sparse per-cell breakpoint read counts were extracted from deStruct using the cell identity of read evidence for each predicted breakpoint. SNV and SV calls were further post-processed according to a previous study¹². When performing pseudobulk analysis on groups of cells, a breakpoint is considered present in a clone if at least one cell that constitutes the clone contains evidence of the breakpoint. A subsampling experiment determined that this approach has 80% power to recover breakpoints at a cumulative coverage of 5× (100–150 cells) (see Supplementary Note).

Analysis of mutation signatures in DLP+ data

Mutation signature probabilities were fit to DLP+ pseudobulk-derived SNV and SV counts for each patient using the MMCTM method and pre-computed mutation signatures from the HGSC and TNBC meta-cohort. Inference was performed as per the bulk sequencing data, until the relative difference in predictive log likelihood was < 10⁻⁶ between iterations.

Identifying clones in DLP+ WGS by clustering copy number profiles

For most datasets, clones were detected by first using UMAP on per-cell GC-corrected read count profiles, producing a two-dimensional embedding of the cell profiles. We then ran HDBSCAN on the two-dimensional embedding from UMAP to detect clusters of cells with similar copy number profiles.

UMAP was run with min_dist = 0.0, and metric = “correlation”, whereas HDBSCAN was run with approx_min_span_tree = False, cluster_selection_epsilon = 0.2, and gen_min_span_tree = True. Dataset specific UMAP and HDBSCAN parameter settings are listed in Supplementary Table 3.

Calculating cell ploidy

Cell ploidy was calculated by taking the most common copy number state. Copy number states were those determined by HMMCopy.

Identifying missegregated chromosomes

The approach taken to identify putative chromosome missegregation events is similar to a previous one³. Cells were split into groups corresponding to their clones. Clone copy number profiles were generated for each clone. Cells with ploidy not equal to the clone consensus profile were normalized to match the clone ploidy. Cell copy number profiles were compared to the clone copy number profile for the matching clone to which the cell belongs. The result was assignment of an offset value for each genomic bin in each cell, that represented the copy number difference between the cell and the clone-level consensus profile. For each chromosome in each cell, if a particular copy number difference (that is, −1, 1, and so on) represented at least 75% of the chromosome, then we labelled that chromosome as having a missegregation event.

Identifying CNA segments

Gain and loss segments in each cell were found by comparing the copy number state in each 500-kb bin to that cell’s ploidy. A copy number higher than ploidy was labelled as a gain, and a copy number lower than ploidy was labelled as a loss. Gain and loss segments are a set of consecutive bins with the same gain–loss label. Segments ≤ 1.5 × 10⁶ bp were excluded to reduce segments potentially resulting from noise in the HMMCopy copy number states.

Computing serriform variability scores in CNA breakpoints

For each dataset, consensus copy number profiles were generated for each clone. Copy number segments were identified as above for each consensus profile. Copy number segments were then identified for single-cell copy number profiles. The copy number profiles of each cell were normalized so that the adjusted cell ploidy matched the ploidy of the clone to which the cell belonged using the following formula: cell_state = cell_state/cell_ploidy × clone_ploidy

Cell copy number segments were matched to segments in the clone copy number profile as follows: for each segment in the clone copy number profile, inspect the copy number states of the adjacent segments. If the segment state was less than both adjacent states, then only cell segments whose state was less than both of the two adjacent clone segment states could be matched to that segment. If the clone segment state was higher than both adjacent states, only cell segments whose state was higher than both of the adjacent clone segment states could be matched to that segment. If the clone segment state was in between the two adjacent states, only cell segments whose state was in between the two adjacent clone segment states could be matched to that segment. Finally, each cell segment was matched to the compatible clone segment that it overlapped the most, in which compatibility means that the cell segment state met the criteria described above and the cell belonged to the relevant clone.

Next, clone segment breakpoints were aggregated across all clones. For each breakpoint, matched cell breakpoints were identified. Stable cell breakpoints (that is, those cell breakpoints that matched the clone-level breakpoint position) and unstable cell breakpoints (all other cell breakpoints) were queried for their raw GC-corrected read count values up to five bins to the left and right of the breakpoint position. Breakpoint noise values were computed as the mean absolute value of the difference between these values and the integer copy number state inferred by HMMCopy. For each clone-level breakpoint event, cells were removed if their breakpoint noise values were higher than a threshold value, which was computed as the mean noise value of the stable cell breakpoints.

Serration scores for each event were calculated by first computing the frequency of cell-specific breakpoint positions. Each cell breakpoint position was considered 'rare' if it occurred in less than 5% of cells with the considered event. The final serration score was computed as the fraction of event cells whose breakpoint position was considered 'rare'.

For comparing serration rates between cases, breakpoints with at least 100 cells, and whose adjacent copy number segments were in total at least 20 Mbp (40 genomic bins) were retained. This was done to retain only those breakpoints for which serration could be reliably computed. As a result, SA605 was not included in comparisons as this case had fewer than 100 cells. A zero-inflated generalized linear mixed model with a beta response that accounted for case-specific and signature-type effects was fit to determine the effect of mutation signature type on serration scores.

Comparison of HLAMP copy number variance

HLAMPs were identified by first selecting 500-kb genomic bins in which at least 10 cells have a raw copy number (adjusted per-bin read counts) of at least 10. Copy number variance for each bin was calculated using the raw copy number that was adjusted for cell ploidy and cell clone by first dividing the copy number by the cell ploidy, then subtracting the mean clone raw copy number. The cell ploidy is the most common HMMCopy copy number state as described above, and the mean clone copy number is computed for each bin in each clone across all cells in that clone. Mean HLAMP copy number variance was calculated for each dataset across all HLAMP bins, and these values were compared between signature type dataset groups.

Clustering HLAMP genomic features

To explore plausible mechanistic origins of oncogenic HLAMPs we extracted genomic features proximal to the locus of interest. We took a region 15 Mbp either side of the locus of interest and pulled out copy number and SV features. We extracted the following features: entropy of haplotype-specific states; total number of SVs identified; proportion of SVs of each type (fold-back inversions, duplications, deletions and translocations); number of chromosomes involved in translocations; ratio of copy numbers between the bin containing the oncogene and the average copy number across the chromosome; average copy number state; average size of segments; average number of segments; and average minor allele copy number. All averages are across cells. We then performed hierarchical clustering on a scaled matrix of all features, using the silhouette width to determine the appropriate number of clusters.

HSCN analysis

See the Supplementary Note for a detailed discussion of our method, SIGNALS, for HSCN analysis. This includes validation of the method and benchmarking against other methods. In brief, SIGNALS uses haplotype blocks genotyped in single cells and implements an hidden Markov model (HMM) based on a Beta-Binomial likelihood to infer the most probable haplotype-specific state. We used default parameters for all datasets apart from SA1292, in which we increased the self transition probability from 0.95 to 0.999 to mitigate against the noisier copy number data in this sample.

Pseudobulk HSCN profiles

In numerous places in this study we construct 'pseudobulk' haplotype-specific or total copy number profiles either across all cells in a sample or subsets of cells that share some features of interest. To do this, we group the cells of interest and then compute an average profile by taking the median values of copy number and BAF and the mode of the haplotype-specific state. The function 'consensuscopynumber' provided in SIGNALS was used for this.

Comparing segmentation profiles across cells

To facilitate comparisons of genomic profiles across cells, we inferred a set of disjoint segments from the consensus copy number profiles of clusters. For each clone or cluster we generated a consensus segmentation profile, and then used the 'disjoin_ranges' function from plyranges⁶⁷ to generate a non-overlapping disjoint segmentation profile. Each segment was then genotyped in each cell by taking a consensus across the bins within each segment, producing a consistent set of genomic segments and states that could be compared across cells.

Identification of parallel copy number events

The set of genotyped disjoint segmentation profiles was used to calculate the number of parallel copy number CNAs. Parallel CNAs were defined as genomic regions greater than 4 Mbp in which gain or loss of both the maternal and paternal haplotype was observed in more than 1% of cells. Copy number breakpoints of segments do not need to match to be included.

Phylogenetic analysis

We computed phylogenetic trees using sitka as previously described³⁷, using the consensus tree from the posterior distribution for downstream analysis. For visualization, clades with a high fraction of singletons (nodes with a single descendant) were removed. To remove nodes, nodes were ordered by the fraction of descendants that were singletons, and nodes were removed iteratively until a maximum of 3% of cells in the tree were removed. Trees were visualized using ggtree⁶⁸ and functionality in SIGNALS. Phylogenetic distances were computed as the mean pairwise distances between phylogenetic tips (cells) using the cophenetic function in APE⁶⁹. Distances represent the number of copy number change points between two cells on the phylogeny.

Event rates inferred from single-cell phylogenies

To compute the rates of gains and losses of whole chromosomes, chromosome arms and segmental aneuploidies we enumerated the number of events from our single-cell phylogenies using parsimony-based ancestral state reconstruction. We used the genotyped disjoint segmentation profiles for this.

We first defined states for each segment in each cell relative to the most common state across all cells. For each segment, cells can have one of two possible states for each class of interest: (gain, not gained), (loss, not lost). By casting the problem as reconstructing the ancestral states within the phylogeny, we can then compute the number of transitions between these states that most parsimoniously explains the phylogenetic tree. We used a simple transition matrix in which transitions between states incur a cost of 1. Ancestral state reconstruction then amounts to finding the reconstruction that minimizes this cost. The event frequency per sample is then calculated by dividing the parsimony score (number of events) by the number of cells. We used castor v.1.6.6 in R to perform the ancestral state reconstruction⁷⁰. The unit of this quantity is the number of events per cell division, assuming no cell death. It is possible (perhaps likely) that many cells get segmental gains or losses but then die, we never sample such cells and our phylogenetic tree reconstructs ancestral relationships between cells that survive and that we sample. It is challenging to decouple the death rate of cells from the true event rate per cell division⁷¹; thus, our event rate is an effective event rate; that is, the event rate scaled by the (unknown) death rate of cells. To contrast the rates across different types of events, we classified segments as whole chromosomes, chromosome arms or segmental aneuploidies.

Calculation of copy number distance

The copy number distance calculates the number segments that need to be modified to transform one copy number profile into another²⁰. We use this measure to compute cell-to-cell variation in our dataset. To compute this measure, we modified the code provided in a previous study²⁵ to take into account whole-genome doubling of cells (https://github.com/raphael-group/WCND). We did this as follows: given two copy number profiles (integer copy number states of individual haplotypes in bins across the genome) CNP_A and CNP_B, we computed the following distances:

$${d}_{1}=f({{\rm{CNP}}}_{{\rm{A}}},{{\rm{CNP}}}_{{\rm{B}}})$$

$${d}_{2}=f(2\times {{\rm{CNP}}}_{{\rm{A}}},{{\rm{CNP}}}_{{\rm{B}}})$$

$${d}_{3}=f({{\rm{CNP}}}_{{\rm{A}}},2\times {{\rm{CNP}}}_{{\rm{B}}})$$

in which 2× refers to doubling the copy number state across the whole genome. We then took the copy number distance to be

$$d=\min ({d}_{1},{d}_{2},{d}_{3}).$$

If the minimum was d₂ or d₃, we increased d by 1 (that is, counting WGD as an additional event). Calculating all pairwise comparisons is computationally expensive, so for each dataset we subsampled 250 cells and calculated all pairwise distances for these 250 cells.

10X scRNA-seq processing

CellRanger software (v.3.1.0) was used to perform read alignment, barcode filtering and quantification of unique molecular identifiers (UMIs) using the 10X GRCh38 transcriptome (v.3.0.0) for FASTQ inputs. CellRanger filtered matrices were loaded into individual Seurat objects using the Seurat R package (v.4.1.0) (refs. ^72,73). The resulting gene-by-cell matrix was normalized and scaled for each sample. Cells retained for analysis had a minimum of 500 expressed genes and 1,000 UMI counts and less than 25% mitochondrial gene expression. Genes expressed in fewer than three cells were removed. Cell-cycle phase was assigned using the Seurat⁷³ CellCycleScoring function. Scrublet⁷⁴ (v.0.2.3) was used to calculate and filter cells predicted to be doublets. We then applied the standard Seurat processing pipeline using default parameters apart from using the first 20 principal component analysis (PCA) dimensions for nearest neighbour and UMAP calculations.

Allelic imbalance in scRNA-seq

We called heterozygous SNPs in the scRNA-seq data using cellSNP v.1.2.2 (ref. ⁷⁵). As input, we used the same set of heterozygous SNPs identified in the scDNA-seq and the corresponding normal sample for each sample. The liftover script provided in cellSNP was used to lift over SNPs from hg19 to hg38. After genotyping, we phased the SNPs using the phasing information computed from the haplotype-specific inference in the scDNA-seq. As SNP counts are much more sparse in scRNA-seq than in scDNA-seq (around two orders of magnitude lower), we aggregated counts across segments (minimum size = 10 Mbp), computing the BAF for each segment. We then generated a cell by segment BAF matrix and incorporated this into our gene expression Seurat objects. We applied an additional filtering criterion here, removing cells with fewer than 200 SNP counts. Functionality to map scDNA-seq to scRNA-seq and call allelic imbalance is provided in SIGNALS.

Differential expression analysis

Differential expression analysis between gene expression clusters was computed using the Wilcoxon rank sum test with the presto R package. Gene expression clusters were computed using the FindClusters function in Seurat. Only cells in G1 phase were included. To compare gene expression variability for oncogenes, we took the absolute maximum log-transformed fold change for each sample for each oncogene and contrasted this value in cases in which oncogene copy number was determined to be fixed or variable from DLP+ single-cell sequencing of the same samples. 'Variable' oncogenes were defined as those that had a minimum ratio of 2 between the maximum to minimum clone-level copy number, and 'non-variable' oncogenes as those that had a ratio of less than 2.

Nearest neighbour gene expression analysis

To assess transcriptional convergence of losses of alleles we made use of the shared nearest neighbour graph computed using Seurat. This was done for chr. 2q in sample SA906b. For a given cell, an enrichment score was defined as the observed fraction of nearest neighbours divided by the expected fraction of nearest neighbours. Here, the expected fraction of neighbours with the same allelic state was defined as the global fraction of cells in each state. Hence, a positive enrichment score indicates an overrepresentation of cells in the allelic state of interest among its nearest neighbours, a negative score indicates an underrepresentation and a score of 0 would reflect a perfectly mixed neighbourhood of cells with different allelic states. To mitigate the influence of other technical or biological variability, for this analysis we only included cells in G1 phase, and removed cells with greater than 7.5% mitochondrial gene expression as we found that this was variable between gene expression clusters.

Statistical tests

The statistical tests used were two-tailed unequal-variance t-tests unless otherwise specified: log-rank tests were used for comparing survival curves; Wilcoxon rank sum two-tailed tests were used for comparing segment lengths, segment counts, missegregations and ploidy percentages, copy variances, bin counts, gene copy number distributions, gene expression log-transformed fold changes, parallel copy number counts and breakpoint counts; and hypergeometric tests were used to identify enrichment of gene mutations. P values from multiple comparisons were corrected using the Benjamini–Hochberg method⁷⁶.

Box plot statistics

All box plots indicate the median, first and third quartiles (hinges), and the most extreme data points no farther than 1.5× the IQR from the hinge (whiskers).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

All data are available for general research use. Processed data including somatic mutation data for bulk WGS, total (and allele-specific) copy number profiles for DLP+ data and filtered count matrices for scRNA-seq data are available for download at https://zenodo.org/record/6998936. Raw scRNA-seq data are available for download at https://ega-archive.org/studies/EGAS00001006343. Raw single-cell sequencing data generated for this study are available from https://ega-archive.org/studies/EGAS00001006343, and previously published single-cell sequencing data used in this study are available at https://ega-archive.org/studies/EGAS00001004448 and https://ega-archive.org/studies/EGAS00001003190. Somatic mutation calls from bulk WGS for 16 patients with TNBC for whom the IRB consent does not include public deposition of raw sequencing data are available at http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs003038.v1.p1, and raw sequencing data can be provided upon request under material transfer agreement to shahs3@mskcc.org. Bulk WGS BAM files from patients under IRB consent protocols for public release of raw data are available for download at https://ega-archive.org/studies/EGAS00001006343, http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs003036.v1.p1 and https://ega-archive.org/datasets/EGAD00001003268 (for previously published data¹²) or by request under material transfer agreement to shahs3@mskcc.org and saparicio@bccrc.ca.

Code availability

MMCTM method: https://github.com/shahcompbio/MultiModalMuSig.jl. DLP+ single-cell WGS pipeline: https://github.com/shahcompbio/single_cell_pipeline. Bulk WGS pipeline: https://github.com/shahcompbio/wgs. SIGNALS processing pipeline: https://github.com/marcjwilliams1/hscn_pipeline. SIGNALS: https://github.com/shahcompbio/signals; v.0.7.2 archived at https://doi.org/10.5281/zenodo.6642342.

References

Umbreit, N. T. et al. Mechanisms generating cancer genome complexity from a single cell division error. Science 368, eaba0712 (2020).
Article CAS PubMed PubMed Central Google Scholar
Minussi, D. C. et al. Breast tumours maintain a reservoir of subclonal diversity during expansion. Nature 592, 302–308 (2021).
Article CAS PubMed PubMed Central Google Scholar
Laks, E. et al. Clonal decomposition and DNA replication states defined by scaled single-cell genome sequencing. Cell 179, 1207–1221 (2019).
Article CAS PubMed PubMed Central Google Scholar
Hakem, R. DNA-damage repair; the good, the bad, and the ugly. EMBO J. 27, 589–605 (2008).
Article CAS PubMed PubMed Central Google Scholar
Li, Y. et al. Patterns of somatic structural variation in human cancer genomes. Nature 578, 112–121 (2020).
Article CAS PubMed PubMed Central Google Scholar
Macintyre, G. et al. Copy number signatures and mutational processes in ovarian carcinoma. Nat. Genet. 50, 1262–1270 (2018).
Article CAS PubMed PubMed Central Google Scholar
Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).
Article CAS PubMed PubMed Central Google Scholar
Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534, 47–54 (2016).
Article CAS PubMed PubMed Central Google Scholar
Shiraishi, Y., Tremmel, G., Miyano, S. & Stephens, M. A simple model-based approach to inferring and visualizing cancer mutation signatures. PLoS Genet. 11, e1005657 (2015).
Article PubMed PubMed Central Google Scholar
Funnell, T. et al. Integrated structural variation and point mutation signatures in cancer genomes using correlated topic models. PLoS Comput. Biol. 15, e1006799 (2019).
Article CAS PubMed PubMed Central Google Scholar
Curtis, C. et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486, 346–352 (2012).
Article CAS PubMed PubMed Central Google Scholar
Wang, Y. et al. Genomic consequences of aberrant DNA repair mechanisms stratify ovarian cancer histotypes. Nat. Genet. 49, 856–865 (2017).
Article CAS PubMed Google Scholar
Davies, H. et al. HRDetect is a predictor of BRCA1 and BRCA2 deficiency based on mutational signatures. Nat. Med. 23, 517–525 (2017).
Article CAS PubMed PubMed Central Google Scholar
Staaf, J. et al. Whole-genome sequencing of triple-negative breast cancers in a population-based clinical study. Nat. Med. 25, 1526–1533 (2019).
Article CAS PubMed PubMed Central Google Scholar
Zahn, H. et al. Scalable whole-genome single-cell library preparation without preamplification. Nat. Methods 14, 167–173 (2017).
Article CAS PubMed Google Scholar
Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).
Article CAS PubMed PubMed Central Google Scholar
Nguyen, L., W M Martens, J., Van Hoeck, A. & Cuppen, E. Pan-cancer landscape of homologous recombination deficiency. Nat. Commun. 11, 5584 (2020).
Article CAS PubMed PubMed Central Google Scholar
Salehi, S. et al. Clonal fitness inferred from time-series modelling of single-cell cancer genomes. Nature 595, 585–590 (2021).
Article CAS PubMed PubMed Central Google Scholar
Burleigh, A. et al. A co-culture genome-wide RNAi screen with mammary epithelial cells reveals transmembrane signals required for growth and differentiation. Breast Cancer Res. 17, 4 (2015).
Article PubMed PubMed Central Google Scholar
Schwarz, R. F. et al. Phylogenetic quantification of intra-tumour heterogeneity. PLoS Comput. Biol. 10, e1003535 (2014).
Article PubMed PubMed Central Google Scholar
Wu, C.-Y. et al. Integrative single-cell analysis of allele-specific copy number alterations and chromatin accessibility in cancer. Nat. Biotechnol. 39, 1259–1269 (2021).
Article CAS PubMed PubMed Central Google Scholar
Zaccaria, S. & Raphael, B. J. Characterizing allele- and haplotype-specific copy numbers in single cells with CHISEL. Nat. Biotechnol. 39, 207–214 (2021).
Article CAS PubMed Google Scholar
Wang, Y. et al. The negative interplay between Aurora A/B and BRCA1/2 controls cancer cell growth and tumorigenesis via distinct regulation of cell cycle progression, cytokinesis, and tetraploidy. Mol. Cancer 13, 94 (2014).
Article PubMed PubMed Central Google Scholar
Lee, H. et al. Mitotic checkpoint inactivation fosters transformation in cells lacking the breast cancer susceptibility gene, Brca2. Mol. Cell 4, 1–10 (1999).
Article CAS PubMed Google Scholar
Zeira, R. & Raphael, B. J. Copy number evolution with weighted aberrations in cancer. Bioinformatics 36, i344–i352 (2020).
Article CAS PubMed PubMed Central Google Scholar
Sanders, A. D. et al. Single-cell analysis of structural variations and complex rearrangements with tri-channel processing. Nat. Biotechnol. 38, 343–354 (2020).
Zakov, S., Kinsella, M. & Bafna, V. An algorithmic approach for breakage-fusion-bridge detection in tumor genomes. Proc. Natl Acad. Sci. USA 110, 5546–5551 (2013).
Article MathSciNet CAS PubMed PubMed Central Google Scholar
Gisselsson, D. et al. Chromosomal breakage-fusion-bridge events cause genetic intratumor heterogeneity. Proc. Natl Acad. Sci. USA 97, 5357–5362 (2000).
Article CAS PubMed PubMed Central Google Scholar
Watkins, T. B. K. et al. Pervasive chromosomal instability and karyotype order in tumour evolution. Nature 587, 126–132 (2020).
Article CAS PubMed PubMed Central Google Scholar
Zhang, A. W. et al. Interfaces of malignant and immunologic clonal dynamics in ovarian cancer. Cell 173, 1755–1769 (2018).
Article CAS PubMed Google Scholar
McPherson, A. et al. Divergent modes of clonal spread and intraperitoneal mixing in high-grade serous ovarian cancer. Nat. Genet. 48, 758–767 (2016).
Article CAS PubMed Google Scholar
Patch, A.-M. et al. Whole-genome characterization of chemoresistant ovarian cancer. Nature 521, 489–494 (2015).
Article CAS PubMed Google Scholar
Eirew, P. et al. Dynamics of genomic clones in breast cancer patient xenografts at single-cell resolution. Nature 518, 422–426 (2015).
Article CAS PubMed Google Scholar
Shah, S. P. et al. The clonal and mutational evolution spectrum of primary triple-negative breast cancers. Nature 486, 395–399 (2012).
Article CAS PubMed Google Scholar
Savage, P. et al. Chemogenomic profiling of breast cancer patient-derived xenografts reveals targetable vulnerabilities for difficult-to-treat tumors. Commun. Biol. 3, 310 (2020).
Article CAS PubMed PubMed Central Google Scholar
Branton, D. et al. The potential and challenges of nanopore sequencing. Nat. Biotechnol. 26, 1146–1153 (2008).
Article CAS PubMed PubMed Central Google Scholar
Salehi, S. et al. Cancer phylogenetic tree inference at scale from 1000s of single cell genomes. Preprint at bioRxiv https://doi.org/10.1101/2020.05.06.058180 (2021).
Verhaak, R. G. W., Bafna, V. & Mischel, P. S. Extrachromosomal oncogene amplification in tumour pathogenesis and evolution. Nat. Rev. Cancer 19, 283–288 (2019).
Article CAS PubMed PubMed Central Google Scholar
Hadi, K. et al. Distinct classes of complex structural variation uncovered across thousands of cancer genome graphs. Cell 183, 197–210 (2020).
Article CAS PubMed PubMed Central Google Scholar
Annab, L. A. et al. Establishment and characterization of a breast cell strain containing a BRCA1 185delAG mutation. Gynecol. Oncol. 77, 121–128 (2000).
Article CAS PubMed Google Scholar
Frankish, A. et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 47, D766–D773 (2019).
Article CAS PubMed Google Scholar
Landrum, M. J. et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46, D1062–D1067 (2018).
Article CAS PubMed Google Scholar
Tate, J. G. et al. COSMIC: the catalogue of somatic mutations in cancer. Nucleic Acids Res. 47, D941–D947 (2019).
Article CAS PubMed Google Scholar
Létourneau, I. J. et al. Derivation and characterization of matched cell lines from primary and recurrent serous ovarian cancer. BMC Cancer 12, 379 (2012).
Article PubMed PubMed Central Google Scholar
O’Flanagan, C. H. et al. Dissociation of solid tumor tissues with cold active protease for single-cell RNA-seq minimizes conserved collagenase-associated stress responses. Genome Biol. 20, 210 (2019).
Article PubMed PubMed Central Google Scholar
Ding, J. et al. Feature-based classifiers for somatic mutation detection in tumour–normal paired sequencing data. Bioinformatics 28, 167–175 (2012).
Article CAS PubMed Google Scholar
Saunders, C. T. et al. Strelka: accurate somatic small-variant calling from sequenced tumor–normal sample pairs. Bioinformatics 28, 1811–1817 (2012).
Article CAS PubMed Google Scholar
Chakravarty, D. et al. OncoKB: annotation of the oncogenic effect and treatment implications of somatic mutations in cancer. J. Clin. Oncol. 34, 11583–11583 (2016).
Article Google Scholar
McPherson, A., Shah, S. & Cenk Sahinalp, S. deStruct: accurate rearrangement detection using breakpoint specific realignment. Preprint at bioRxiv https://doi.org/10.1101/117523 (2017).
Layer, R. M., Chiang, C., Quinlan, A. R. & Hall, I. M. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 15, R84 (2014).
Article PubMed PubMed Central Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Article CAS PubMed PubMed Central Google Scholar
Wick, R. R., Judd, L. M. & Holt, K. E. Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol. 20, 129 (2019).
Article PubMed PubMed Central Google Scholar
Shafin, K. et al. Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads. Nat. Methods 18, 1322–1332 (2021).
Article CAS PubMed PubMed Central Google Scholar
Martin, M. et al. WhatsHap: fast and accurate read-based phasing. Preprint at bioRxiv https://doi.org/10.1101/085050 (2016).
Sedlazeck, F. J. et al. Tools for annotation and comparison of structural variation. F1000Res. 6, 1795 (2017).
Article PubMed PubMed Central Google Scholar
Jiang, T. et al. Long-read-based human genomic structural variation detection with cuteSV. Genome Biol. 21, 189 (2020).
Article CAS PubMed PubMed Central Google Scholar
Nattestad, M., Aboukhalil, R., Chin, C.-S. & Schatz, M. C. Ribbon: intuitive visualization for complex genomic variation. Bioinformatics 37, 413–415 (2021).
Article CAS PubMed Google Scholar
Gel, B. & Serra, E. karyoploteR: an R/Bioconductor package to plot customizable genomes displaying arbitrary data. Bioinformatics 33, 3088–3090 (2017).
Article CAS PubMed PubMed Central Google Scholar
Robinson, P. & Jtel, T. Z. in Computational Exome and Genome Analysis (eds Robinson, P. N., Piro, R. M. & Jäger, M.) 233–245 (2017).
Hornik, K. A CLUE for CLUster Ensembles. J. Stat. Softw. 14, 1–25 (2005).
McInnes, L., Healy, J. & Melville, J. UMAP: Uniform Manifold Approximation and Projection for dimension reduction. Preprint at arXiv https://doi.org/10.48550/arXiv.1802.03426 (2020).
McInnes, L. & Healy, J. Accelerated hierarchical density based clustering. In IEEE International Conference on Data Mining Workshops (ICDMW) 33–42 (IEEE, 2017).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at arXiv https://doi.org/10.48550/arXiv.1303.3997 (2013).
Shah, S. P. et al. Integrating copy number polymorphisms into array CGH analysis using a robust HMM. Bioinformatics 22, e431–e439 (2006).
Article CAS PubMed Google Scholar
Wingett, S. W. & Andrews, S. FastQ Screen: a tool for multi-genome mapping and quality control. F1000Res. 7, 1338 (2018).
Article PubMed PubMed Central Google Scholar
Amemiya, H. M., Kundaje, A. & Boyle, A. P. The ENCODE blacklist: identification of problematic regions of the genome. Sci. Rep. 9, 9354 (2019).
Lee, S., Cook, D. & Lawrence, M. plyranges: a grammar of genomic data transformation. Genome Biol. 20, 4 (2019).
Article PubMed PubMed Central Google Scholar
Yu, G. Using ggtree to visualize data on tree-like structures. Curr. Protoc. Bioinformatics 69, e96 (2020).
Article PubMed Google Scholar
Paradis, E., Claude, J. & Strimmer, K. APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20, 289–290 (2004).
Article CAS PubMed Google Scholar
Louca, S. & Doebeli, M. Efficient comparative phylogenetics on large trees. Bioinformatics 34, 1053–1055 (2018).
Article CAS PubMed Google Scholar
Werner, B. et al. Measuring single cell divisions in human tissues from multi-region sequencing data. Nat. Commun. 11, 1035 (2020).
Article CAS PubMed PubMed Central Google Scholar
Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).
Article CAS PubMed PubMed Central Google Scholar
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902 (2019).
Article CAS PubMed PubMed Central Google Scholar
Wolock, S. L., Lopez, R. & Klein, A. M. Scrublet: Computational identification of cell doublets in single-cell transcriptomic data. Cell Syst. 8, 281–291 (2019).
Article CAS PubMed PubMed Central Google Scholar
Huang, X. & Huang, Y. Cellsnp-lite: an efficient tool for genotyping single cells. Bioinformatics 37, 4569–4571 (2021).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. 57, 289–300 (1995).
MathSciNet MATH Google Scholar

Download references

Acknowledgements

This project was supported by the BC Cancer Foundation at BC Cancer and Cycle for Survival supporting the Memorial Sloan Kettering Cancer Center. S.P.S. holds the Nicholls Biondi Chair in Computational Oncology and is a Susan G. Komen Scholar (GC233085). S.A. holds the Nan and Lorraine Robertson Chair in Breast Cancer and is a Canada Research Chair in Molecular Oncology (950–230610). Additional funding was provided by a Terry Fox Research Institute grant (1082), a Canadian Cancer Society Research Institute Impact program grant (705617), a CIHR grant (FDN-148429), Breast Cancer Research Foundation awards (BCRF-18-180, BCRF-19-180 and BCRF-20-180), a MSK Cancer Center Support Grant/Core Grant (P30 CA008748), National Institutes of Health grants (1RM1 HG011014-01 and P50 CA247749-01), a CCSRI grant (705636), the Cancer Research UK Grand Challenge Program and the Canada Foundation for Innovation (40044) to S.A. and S.P.S. M.J.W. is supported by a National Cancer Institute Pathway to Independence award (K99CA256508).

Author information

These authors contributed equally: Tyler Funnell, Ciara H. O’Flanagan, Marc J. Williams

Authors and Affiliations

Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medicine, New York, NY, USA
Tyler Funnell & Adam C. Weiner
Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Tyler Funnell, Marc J. Williams, Andrew McPherson, Sohrab Salehi, Ignacio Vázquez-García, Hongyu Shi, Emily Leventhal, Jamie L. P. Lim, Diljot Grewal, Douglas Abrams, Eliyahu Havasov, Samantha Leung, Viktoria Bojilova, Nicole Rusk, Florian Uhlitz, Nicholas Ceglia, Adam C. Weiner & Sohrab P. Shah
Department of Molecular Oncology, British Columbia Cancer Research Centre, Vancouver, British Columbia, Canada
Ciara H. O’Flanagan, Steven McKinney, Farhia Kabeer, Hakwoo Lee, Tehmina Masud, Peter Eirew, Damian Yap, Allen W. Zhang, Beixi Wang, Jazmine Brimhall, Justina Biele, Jerome Ting, Vinci Au, Michael Van Vliet, Yi Fei Liu, Sean Beatty, Daniel Lai, Jenifer Pham, Elena Zaikova, J. Maxwell Douglas, Yangguang Li, Hong Xu, Teresa Ruiz de Algara, So Ra Lee, Viviana Cerda Llanos, David G. Huntsman, Emma Laks, Austin Smith, Daniel Lai, Andrew Roth & Samuel Aparicio
Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, British Columbia, Canada
Farhia Kabeer, Hakwoo Lee, Daniel Lai, Spencer D. Martin, David G. Huntsman, Andrew Roth & Samuel Aparicio
Michael Smith Genome Sciences Centre, Vancouver, British Columbia, Canada
Richard A. Moore
GYN Medical Oncology, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Dmitriy Zamarin
Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Britta Weigelt & Jorge S. Reis-Filho
Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Sarah H. Kim & Arnaud Da Cruz Paula
Department of Gynecology and Obstetrics, University of British Columbia, Vancouver, British Columbia, Canada
Jessica N. McAlpine
Cancer Research UK Cambridge Institute, Li Ka Shing Centre, University of Cambridge, Cambridge, UK
Gregory J. Hannon, Georgia Battistoni, Dario Bressan, Ian G. Cannell, Hannah Casbolt, Cristina Jauset, Tatjana Kovačević, Claire M. Mulvey, Fiona Nugent, Marta Paez Ribes, Isabella Pearson, Fatime Qosaj, Kirsty Sawicka, Sophia A. Wild, Elena Williams, Shankar Balasubramanian, Maximilian Lee, Maurizio Callari, Wendy Greenwood, Giulia Lerda, Simon Tavare & Eyal Fisher
UBC Data Science Institute, University of British Columbia, Vancouver, British Columbia, Canada
Andrew Roth
Department of Chemistry, University of Cambridge, Cambridge, UK
Shankar Balasubramanian & Maximilian Lee
School of Clinical Medicine, University of Cambridge, Cambridge, UK
Shankar Balasubramanian & Maximilian Lee
Department of Quantitative Biomedicine, University of Zurich, Zurich, Switzerland
Bernd Bodenmiller, Marcel Burger, Laura Kuett, Sandra Tietscher & Jonas Windhager
McGovern Institute, Departments of Biological Engineering and Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
Edward S. Boyden, Shahar Alon, Yi Cui, Amauche Emenari, Daniel R. Goodwin, Emmanouil D. Karagiannis, Anubhav Sinha & Asmamaw T. Wassie
Department of Oncology and Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
Carlos Caldas, Alejandra Bruna, Yaniv Eyal-Lubling, Oscar M. Rueda & Abigail Shea
Súil Interactive, Dublin, Ireland
Owen Harris, Robby Becker, Flaminia Grimaldo, Suvi Harris & Sara Lisa Vogl
Department of Oncology and Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
Johanna A. Joyce & Spencer S. Watson
Herbert and Florence Irving Institute for Cancer Dynamics, Columbia University, New York, NY, USA
Simon Tavare, Khanh N. Dinh & Russell Kunes
New York Genome Center, New York, NY, USA
Simon Tavare, Khanh N. Dinh & Russell Kunes
Institute of Astronomy, University of Cambridge, Cambridge, UK
Nicholas A. Walton, Mohammed Al Sa’d, Nick Chornay, Ali Dariush, Eduardo A. González-Solares, Carlos González-Fernández, Aybüke Küpcü Yoldaş & Neil Miller
Howard Hughes Medical Institute, Harvard University, Cambridge, MA, USA
Xiaowei Zhuang, Jean Fan, Hsuan Lee, Leonardo A. Sepúlveda, Chenglong Xia & Pu Zheng
Department of Physics, Harvard University, Cambridge, MA, USA
Xiaowei Zhuang, Jean Fan, Hsuan Lee, Leonardo A. Sepúlveda, Chenglong Xia & Pu Zheng
Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA
Xiaowei Zhuang, Jean Fan, Hsuan Lee, Leonardo A. Sepúlveda, Chenglong Xia & Pu Zheng

Authors

Tyler Funnell
View author publications
You can also search for this author in PubMed Google Scholar
Ciara H. O’Flanagan
View author publications
You can also search for this author in PubMed Google Scholar
Marc J. Williams
View author publications
You can also search for this author in PubMed Google Scholar
Andrew McPherson
View author publications
You can also search for this author in PubMed Google Scholar
Steven McKinney
View author publications
You can also search for this author in PubMed Google Scholar
Farhia Kabeer
View author publications
You can also search for this author in PubMed Google Scholar
Hakwoo Lee
View author publications
You can also search for this author in PubMed Google Scholar
Sohrab Salehi
View author publications
You can also search for this author in PubMed Google Scholar
Ignacio Vázquez-García
View author publications
You can also search for this author in PubMed Google Scholar
Hongyu Shi
View author publications
You can also search for this author in PubMed Google Scholar
Emily Leventhal
View author publications
You can also search for this author in PubMed Google Scholar
Tehmina Masud
View author publications
You can also search for this author in PubMed Google Scholar
Peter Eirew
View author publications
You can also search for this author in PubMed Google Scholar
Damian Yap
View author publications
You can also search for this author in PubMed Google Scholar
Allen W. Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jamie L. P. Lim
View author publications
You can also search for this author in PubMed Google Scholar
Beixi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jazmine Brimhall
View author publications
You can also search for this author in PubMed Google Scholar
Justina Biele
View author publications
You can also search for this author in PubMed Google Scholar
Jerome Ting
View author publications
You can also search for this author in PubMed Google Scholar
Vinci Au
View author publications
You can also search for this author in PubMed Google Scholar
Michael Van Vliet
View author publications
You can also search for this author in PubMed Google Scholar
Yi Fei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Sean Beatty
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Lai
View author publications
You can also search for this author in PubMed Google Scholar
Jenifer Pham
View author publications
You can also search for this author in PubMed Google Scholar
Diljot Grewal
View author publications
You can also search for this author in PubMed Google Scholar
Douglas Abrams
View author publications
You can also search for this author in PubMed Google Scholar
Eliyahu Havasov
View author publications
You can also search for this author in PubMed Google Scholar
Samantha Leung
View author publications
You can also search for this author in PubMed Google Scholar
Viktoria Bojilova
View author publications
You can also search for this author in PubMed Google Scholar
Richard A. Moore
View author publications
You can also search for this author in PubMed Google Scholar
Nicole Rusk
View author publications
You can also search for this author in PubMed Google Scholar
Florian Uhlitz
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas Ceglia
View author publications
You can also search for this author in PubMed Google Scholar
Adam C. Weiner
View author publications
You can also search for this author in PubMed Google Scholar
Elena Zaikova
View author publications
You can also search for this author in PubMed Google Scholar
J. Maxwell Douglas
View author publications
You can also search for this author in PubMed Google Scholar
Dmitriy Zamarin
View author publications
You can also search for this author in PubMed Google Scholar
Britta Weigelt
View author publications
You can also search for this author in PubMed Google Scholar
Sarah H. Kim
View author publications
You can also search for this author in PubMed Google Scholar
Arnaud Da Cruz Paula
View author publications
You can also search for this author in PubMed Google Scholar
Jorge S. Reis-Filho
View author publications
You can also search for this author in PubMed Google Scholar
Spencer D. Martin
View author publications
You can also search for this author in PubMed Google Scholar
Yangguang Li
View author publications
You can also search for this author in PubMed Google Scholar
Hong Xu
View author publications
You can also search for this author in PubMed Google Scholar
Teresa Ruiz de Algara
View author publications
You can also search for this author in PubMed Google Scholar
So Ra Lee
View author publications
You can also search for this author in PubMed Google Scholar
Viviana Cerda Llanos
View author publications
You can also search for this author in PubMed Google Scholar
David G. Huntsman
View author publications
You can also search for this author in PubMed Google Scholar
Jessica N. McAlpine
View author publications
You can also search for this author in PubMed Google Scholar
Sohrab P. Shah
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Aparicio
View author publications
You can also search for this author in PubMed Google Scholar

Consortia

IMAXT Consortium

Gregory J. Hannon
, Georgia Battistoni
, Dario Bressan
, Ian G. Cannell
, Hannah Casbolt
, Cristina Jauset
, Tatjana Kovačević
, Claire M. Mulvey
, Fiona Nugent
, Marta Paez Ribes
, Isabella Pearson
, Fatime Qosaj
, Kirsty Sawicka
, Sophia A. Wild
, Elena Williams
, Samuel Aparicio
, Emma Laks
, Yangguang Li
, Ciara H. O’Flanagan
, Austin Smith
, Teresa Ruiz de Algara
, Daniel Lai
, Andrew Roth
, Shankar Balasubramanian
, Maximilian Lee
, Bernd Bodenmiller
, Marcel Burger
, Laura Kuett
, Sandra Tietscher
, Jonas Windhager
, Edward S. Boyden
, Shahar Alon
, Yi Cui
, Amauche Emenari
, Daniel R. Goodwin
, Emmanouil D. Karagiannis
, Anubhav Sinha
, Asmamaw T. Wassie
, Carlos Caldas
, Alejandra Bruna
, Maurizio Callari
, Wendy Greenwood
, Giulia Lerda
, Yaniv Eyal-Lubling
, Oscar M. Rueda
, Abigail Shea
, Owen Harris
, Robby Becker
, Flaminia Grimaldo
, Suvi Harris
, Sara Lisa Vogl
, Johanna A. Joyce
, Spencer S. Watson
, Sohrab P. Shah
, Andrew McPherson
, Ignacio Vázquez-García
, Simon Tavare
, Khanh N. Dinh
, Eyal Fisher
, Russell Kunes
, Nicholas A. Walton
, Mohammed Al Sa’d
, Nick Chornay
, Ali Dariush
, Eduardo A. González-Solares
, Carlos González-Fernández
, Aybüke Küpcü Yoldaş
, Neil Miller
, Xiaowei Zhuang
, Jean Fan
, Hsuan Lee
, Leonardo A. Sepúlveda
, Chenglong Xia
& Pu Zheng

Contributions

S.P.S. and S.A.: project conception and oversight. T.F., C.H.O. and M.J.W.: led implementation of the study and performed all experiments and data analysis. A.M., S.M., S.S., I.V.-G., E.L., H.S., S.B., D.L., J.P., D.G., D.A., E.H., S.L., V.B., F.U., N.G., A.C.W., E.Z. and J.M.D.: computational biology and statistical analysis. F.K., H.L., T.M., P.E., D.Y., A.W.Z., J.L.P.L., B.W., J. Brimhall, J. Biele, J.T., V.A., M.V.V., Y.F.L., H.X., T.R.d.A., S.R.L., J.N.M., D.G.H., Y.L. and V.C.L.: single-cell sequencing, in vitro experiments, PDX generation and sample processing. S.D.M.: histopathology analysis. R.A.M., D.Z., B.W., S.H.K., A.D.C.P. and J.S.R.-F.: bulk WGS and sample procurement. N.R., S.P.S., S.A., T.F., C.H.O. and M.J.W.: wrote the manuscript.

Corresponding authors

Correspondence to Marc J. Williams, Sohrab P. Shah or Samuel Aparicio.

Ethics declarations

Competing interests

B.W. reports ad hoc membership of the advisory board of Repare Therapeutics, outside the scope of this study. J.S.R.-F. reports receiving personal or consultancy fees from Goldman Sachs, REPARE Therapeutics, Paige.AI and Eli Lilly, membership of the scientific advisory boards of VolitionRx, REPARE Therapeutics and Paige.AI, membership of the Board of Directors of Grupo Oncoclinicas and ad hoc membership of the scientific advisory boards of Roche Tissue Diagnostics, Ventana Medical Systems, Novartis, Genentech and InVicro, outside the scope of this study. D.G.H is the Chief Medical officer of Imagia Canexia Health, outside the scope of this study. S.P.S. and S.A. are shareholders and consultants of Canexia Health and shareholders of Imagia Canexia Health, outside the scope of this study The remaining authors declare no competing interests.

Peer review

Peer review information

Nature thanks Matthew Ellis, Nancy Zhang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Study overview.

Experimental and cohort design. Single-cell genomes, transcriptomes and long-read sequencing libraries were generated from isogenic 184-hTERT cell lines (WT, or with TP53, BRCA1 or BRCA2 mutations) a) or PDX tissue from patients with TNBC and patients with HGSC from a meta-cohort with assigned SV or SNV mutational signatures b). Single-cell and long-read sequencing (c) was used to examine mutational processes and haplotype-specific genomic diversity, including HLAMPs or rearrangements d), parallel events e) and SSVs f) at single-cell resolution and within clonal and subclonal populations. Generated using Biorender.com.

Extended Data Fig. 2 Sanger sequencing of cell lines and tumour histology.

a,b) Verification of CRISPR–Cas9 induced genotypes of 184-hTERT cell lines. a) Sanger sequencing of TOPO cloned BRCA1 and BRCA2 regions. b) Western blotting for p53, BRCA1 and BRCA2 proteins for 184-hTERT cell lines and including an additional BRCA2−/− clone, 112.72 and TP53−/− clone, SA1101. GAPDH and vinculin loading controls were performed on the same blot as p53, BRCA1 or BRCA2 probes. Blots shown are representative of n = 3 (WT), n = 3 (BRCA1) and n = 6 (BRCA2) independent experiments. For source blots, refer to Supplementary Fig. 1. c) Histology of HGSC PDXs in the dataset. Scale bars 300 µm and 50 µm as indicated. Images are representative of two cores stained from each PDX tissue.

Extended Data Fig. 3 Validation of the SIGNALs method in an ovarian cancer cell line.

a,b) HSCN from 2 individual cells from the OV2295 cell line. c) Total copy number heat maps and HSCN heat map for 1084 cells. Each row is an individual cell. Rows are ordered according to a UMAP + HDBSCAN clustering, with clusters annotated on the left hand side. d) Distribution of VAFs as a function of haplotype-specific state. e,f,g) VAFs in clones where the dominant allele switches between A and B. Each point is the VAF of a mutation, with lines connecting the same mutation in different clones.

Extended Data Fig. 4 DLP summary statistics.

Summary of DLP+ sequencing statistics of data for 184-hTERT cell lines a–d) and HGSC and TNBC tumours e–h). Number of cells for each box plot as indicated in panels a) and e). Shared y axis labels shown at left. The legend for e–h) indicates the number of samples for each cancer and signature type. All box plots indicate the median, 1^st and 3^rd quartiles (hinges), and the most extreme data points no farther than 1.5x the IQR from the hinge (whiskers).

Extended Data Fig. 5 Haplotype-specific analysis reveals breakage–fusion–bridge processes and parallel losses.

a) Diagram of BFBCs b) Heat maps of the copy number of each homologue in SA1188. c–f) HSCN and structural variation in clusters B, I, F and the small subpopulation with PIK3CA amplification. Here we plot the copy number for each homologous chromosome in purple for homologue B and green for homologue A. g) Parallel copy number losses in SA906b: total copy number (top) and HSCN (bottom) heat maps for chr2 in SA906b h) two individual cells from g). i) UMAP dimensionality reduction plots of scRNA-seq data generated from SA906b, colours indicate the density of loss of chr 2q A vs. B haplotype. j) Enrichment of the haplotype-specific state on chr 2q of nearest neighbour cells (# cells with loss of A = 175, # of cells with loss of B = 34, # Balanced = 2066). All box plots indicate the median, 1^st and 3^rd quartiles (hinges), and the most extreme data points no farther than 1.5x the IQR from the hinge (whiskers).

Extended Data Fig. 6 SNV and SV signatures.

a) SNV and b) SV mutation signatures estimated from HGSC and TNBC bulk tumour mutation catalogues using the MMCTM method. The x axis in a) is the 96-channel (i.e. A[C>A]A, …, T[T>G]T) SNV types. SV types are DEL: interstitial deletions, DUP: tandem duplications, INV: inversions, FBI: fold-back inversions, TR: translocations. SV signature labels are S-Dup: small duplications, M-Dup: medium duplications, L-Dup: large duplications, S-Del: small deletions, L-Del: large deletions, Clust-FBI: clustered fold-back inversions, Clust-SV: clustered other structural variants, Tr: translocations, FBI/Inv: fold-back inversions and inversions.

Extended Data Fig. 7 Meta-cohort signature analysis of 139 TNBC and 170 HGSC bulk whole genomes.

a) Heat map representing individual patients as columns, annotation tracks (top) including cancer type and mutation status of key genes (strata with adjusted p-values ≤ 0.1 shown as coloured bars on left), standardized signature probabilities of SNVs and SVs (middle) and event counts (bottom). b) Signature type (see stratum annotation track) proportions by cancer type. c) SNV and SV count distributions per signature type (number of samples shown below each violin, data points shown left of violins). Kaplan–Meier survival probability of HGSCs faceted by d) HRD and e) more granular signatures (p-values computed using the log-rank test, p = 0.0038 for d) and p = 0.0022 for e)). All box plots indicate the median, 1^st and 3^rd quartiles (hinges), and the most extreme data points no farther than 1.5x the IQR from the hinge (whiskers).

Extended Data Fig. 8 Summary, quality control and features of single-cell WGS of tumours.

a) UMAP of meta-cohort signature probabilities. Lines connect DLP-pseudobulk to their bulk data counterpart. b) Correlation of proportion of the genome that is LOH between DLP-pseudobulk (horizontal) and matched bulk WGS (vertical). Correlation coefficient (R) and p-value (p) derived from a linear regression in inset, shaded area shows the 95% CI of the linear regression. c) VAF distributions (horizontal) for somatic mutations called in single cells as a function of haplotype-specific state (vertical), coded as integer copy level allele A | integer copy level allele B. Data from all DLP samples are included. d) Heat map showing total copy number (left) and HSCN (right) of single cells from a TNBC HRD-Dup case (SA501). e) Chromosomal gains and losses across different ploidy states and mutational signature grouping. Total counts (black), gains (red), and losses (blue) shown. f) Relationship between gain/loss ratios and number of gained or lost segments for representative datasets from each signature type (left) and all HRD-Dup, TD, or FBI cases (right). g) Differences in copy number segmental gain and loss counts (n = 12 FBI, n = 8 HRD-Dup, n = 3 TD), comparing ploidy-relative case-level consensus copy number profiles (green) and mean cell-level changes relative to clone copy number profiles (purple). h) HSCN distance distributions for all PDX samples. Distribution is over n = 1,000 sampled pairwise HSCN distances. Horizontal black line shows the mean value of the distribution. i) HSCN distance distributions as a function of signature type, each dataset is summarized as the mean of the distributions on the left. P-values indicate per group comparisons using the two-sided Wilcoxon test (n = 12 FBI, n = 8 HRD-Dup, n = 3 TD). j) Number of parallel copy number segments (n, size of circle) and the proportion of segments containing parallel events (f, colour of circle) across all datasets as a function of clonality. Clonal: CCF > 80%, Subclonal: 20% < CCF ≤ 80%, Rare: 1% < CCF ≤ 20%. k) Proportion of segments with parallel CNA in HRD-Dup vs FBI, * = p < 0.05, ns = p > 0.05, two-sided Wilcoxon test (n = 12 FBI, n = 8 HRD-Dup). Exact p-values from left to right, p = 0.85, p = 0.031, p = 0.1. All box plots represent the median, 1^st and 3^rd quartiles (hinges), and the most extreme data points no farther than 1.5x the IQR from the hinge (whiskers).

Extended Data Fig. 9 Genomic features of HLAMPS and long read sequencing validation.

a) Each column is a HLAMP that amplifies an oncogene. Each row is a feature extracted from a region 15Mb either side of the amplification. Complexity = entropy of haplotype-specific states, #SV = total number of structural variants identified, proportion of SVs of each type: fold-back inversions, duplications, deletions and translocations. #chr = number of chromosomes involved in translocation. bin/chr ratio copy number of the bin containing the oncogene to the average copy number across the chromosome. Ratio is the copy number ratio between the clone with the maximum copy number state and the minimum copy number state. b–c) HLAMPs involving multiple chromosomes, left plot shows copy number profiles from pseudobulk clones derived from DLP, lines indicate rearrangement breakpoints, right plot shows example long reads from Oxford nanopore technologies that support inter-chromosomal translocations. Example reads and their mapping to chromosomes of interest (top right), long-read coverage of genomic region and alignment of all supporting reads (bottom right). b) SA1184 MYC amplification c) SA1181 chr5q amplification. d) Long-read support for inter-chromosomal alterations involving chromosomes 3 and 6 in SA1096, DLP clone-level plots shown in Fig. 4j.

Supplementary information

Supplementary Information

SIGNALS validation and comparison and additional methods.

Reporting Summary

Peer Review File

Supplementary Figure 1

- Full gel scans from Extended Figure 2.

Supplementary Table 1

List of induced mutations and CRISPR guides used in 184-hTERT cells. TP53, BRCA1 or BRCA2 mutations in tab 1 were induced by transduction with a CRISPR/Cas9 nuclease and one of the guides listed in tab 2.

Supplementary Table 2

Per-cell level DLP+ sequencing statistics for 184-hTERT cell lines, HGSC, and TNBC tumours.

Supplementary Table 3

Dataset level DLP+ sequencing statistics, UMAP & HDBSCAN parameters for 184-hTERT cell lines, HGSC, and TNBC tumours.

Supplementary Table 4

Bulk sequencing sample cancer type, signature type strata, gene mutation status, SNV and SV mutation counts.

Supplementary Table 5

Gene mutation enrichment in bulk sequencing meta-cohort mutation signature strata.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Funnell, T., O’Flanagan, C.H., Williams, M.J. et al. Single-cell genomic variation induced by mutational processes in cancer. Nature 612, 106–115 (2022). https://doi.org/10.1038/s41586-022-05249-0

Download citation

Received: 03 June 2021
Accepted: 17 August 2022
Published: 26 October 2022
Issue Date: 01 December 2022
DOI: https://doi.org/10.1038/s41586-022-05249-0

This article is cited by

scAbsolute: measuring single-cell ploidy and replication status
- Michael P. Schneider
- Amy E. Cullen
- Florian Markowetz
Genome Biology (2024)
Joint analysis of mutational and transcriptional landscapes in human cancer reveals key perturbations during cancer evolution
- Jae-Won Cho
- Jingyi Cao
- Martin Hemberg
Genome Biology (2024)
Chromosome evolution screens recapitulate tissue-specific tumor aneuploidy patterns
- Emma V. Watson
- Jake June-Koo Lee
- Stephen J. Elledge
Nature Genetics (2024)
BRCA1 mutation promotes sprouting angiogenesis in inflammatory cancer-associated fibroblast of triple-negative breast cancer
- Chae Min Lee
- Yeseong Hwang
- Sungsoon Fang
Cell Death Discovery (2024)
Aneuploidy and complex genomic rearrangements in cancer evolution
- Toby M. Baker
- Sara Waise
- Peter Van Loo
Nature Cancer (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.