Functional analysis of structural variants in single cells using Strand-seq

Jeong, Hyobin; Grimes, Karen; Rauwolf, Kerstin K.; Bruch, Peter-Martin; Rausch, Tobias; Hasenfeld, Patrick; Benito, Eva; Roider, Tobias; Sabarinathan, Radhakrishnan; Porubsky, David; Herbst, Sophie A.; Erarslan-Uysal, Büşra; Jann, Johann-Christoph; Marschall, Tobias; Nowak, Daniel; Bourquin, Jean-Pierre; Kulozik, Andreas E.; Dietrich, Sascha; Bornhauser, Beat; Sanders, Ashley D.; Korbel, Jan O.

doi:10.1038/s41587-022-01551-4

Download PDF

Article
Open access
Published: 24 November 2022

Functional analysis of structural variants in single cells using Strand-seq

Nature Biotechnology volume 41, pages 832–844 (2023)Cite this article

17k Accesses
10 Citations
155 Altmetric
Metrics details

Subjects

Abstract

Somatic structural variants (SVs) are widespread in cancer, but their impact on disease evolution is understudied due to a lack of methods to directly characterize their functional consequences. We present a computational method, scNOVA, which uses Strand-seq to perform haplotype-aware integration of SV discovery and molecular phenotyping in single cells by using nucleosome occupancy to infer gene expression as a readout. Application to leukemias and cell lines identifies local effects of copy-balanced rearrangements on gene deregulation, and consequences of SVs on aberrant signaling pathways in subclones. We discovered distinct SV subclones with dysregulated Wnt signaling in a chronic lymphocytic leukemia patient. We further uncovered the consequences of subclonal chromothripsis in T cell acute lymphoblastic leukemia, which revealed c-Myb activation, enrichment of a primitive cell state and informed successful targeting of the subclone in cell culture, using a Notch inhibitor. By directly linking SVs to their functional effects, scNOVA enables systematic single-cell multiomic studies of structural variation in heterogeneous cell populations.

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

A single-cell atlas enables mapping of homeostatic cellular shifts in the adult human breast

Article Open access 28 March 2024

Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis

Article Open access 25 March 2024

Main

The mutational landscapes of numerous cancers were recently cataloged^1,2, revealing that somatic SVs represent around 55% of driver mutations^2,3. Somatic mutational processes generate a broad spectrum of SVs from simple (for example, deletions and inversions) to complex classes (for example, chromothripsis)^4,5,6,7,8, and these SVs are important drivers of malignancy, metastasis and relapse^9,10,11,12. However, with the exception of focal deletions and amplifications, somatic SVs have proven difficult to characterize functionally in cancer genomic surveys^1,2,3,13. Studies integrating transcriptome and whole genome sequencing (WGS) data have inferred SV functional outcomes^13,14,15,16, but these typically require large cohorts and do not account for intratumor heterogeneity (ITH)³. Instead, SV effects can be measured directly by reading both genotype and molecular phenotype in the same cell, using single-cell multiomics^{17,18,19,20,21}. Several such methods have been developed^17,18,19,20, but these do not presently account for small (<10 Mb) somatic copy number alterations (SCNAs), balanced SVs and complex rearrangement events like chromothripsis^4,5,7,22, which has limited efforts to functionally characterize the most common class of driver mutations in cancer.

To address this, we developed scNOVA (single-cell nucleosome occupancy and genetic variation analysis)—a method enabling functional characterization of the full spectrum of somatic SV classes. scNOVA uses Strand-seq²³ in two ways: (1) it uses the DNA fragmentation pattern resulting from micrococcal nuclease (MNase) digestion²³ to directly measure nucleosome occupancy (NO) and indirectly infer patterns of gene activity, and (2) it couples this ‘molecular phenotype’ with SVs discovered by single-cell tri-channel processing (scTRIP, which jointly models read-orientation, read depth and haplotype-phase²⁴) in the same cell. MNase digests the linker DNA between nucleosomes, leaving nucleosome-protected DNA intact, to enable genome-wide inference of NO by measuring sequence read counts^25,26,27,28. Previous work has shown that active enhancers and transcribed genes exhibit reduced NO^{25,26,27,28,29,30}. However, the relationships between NO and SV landscapes in cancer remain unexplored. scNOVA addresses this by integrating SVs and NO along the genome of a cell, to functionally characterize SVs in heterogeneous samples.

Results

NO classifies cell types and predicts gene activity changes

Strand-seq data reveals NO

We hypothesized that NO patterns derived from MNase fragmentation during Strand-seq library preparation could represent a readout to allow functional characterization of SVs (Fig. 1a and Extended Data Fig. 1). To test this, we evaluated whether Strand-seq data revealed nucleosome positioning through comparison with bulk MNase-seq data. We used the NA12878 lymphoblastoid cell line (LCL), which has both datatypes available, and pooled 95 Strand-seq libraries (sequenced to a median of 540,379 mapped nonduplicate reads per single cell³¹; Supplementary Table 1), into a ‘pseudobulk’ track, allowing direct comparison with the corresponding MNase-seq dataset (sequenced to 19-fold genomic coverage³²). We measured NO along the genome (Methods) and found Strand-seq and MNase-seq were highly concordant in terms of uniformity of coverage and inferred nucleosome positions at DNase I hypersensitive sites (Spearman’s r = 0.68) (Fig. 1b,c). Nucleosome positioning near the binding site of CTCF^26,28 (a key chromatin organizer) closely matched between both assays (Fig. 1d and Supplementary Fig. 1), and estimated nucleosome repeat lengths²⁸ were highly concordant (Supplementary Fig. 1). In addition, both assays measured NO in all 15 chromatin states identified by the Roadmap Epigenome Consortium³³. Among these chromatin states, Strand-seq and MNase-seq revealed the highest NO signals on average for the polycomb-repressed state and the bivalent enhancer state, whereas the lowest average NO signals were consistently seen for the active transcription start site (TSS) state (Extended Data Fig. 2). This indicates that Strand-seq enables direct measurement of NO to reveal a ‘molecular readout’. We thus developed the scNOVA framework, which harnesses Strand-seq to measure NO genome-wide and couples this with SVs discovered in the same sequenced cell (Fig. 1a).

**Fig. 1: Haplotype-aware single-cell multiomics to functionally characterize SVs.**

As Strand-seq resolves its measurements by haplotype³¹, we considered that haplotype-specific differences in NO (haplotype-specific NO) resulting from random monoallelic expression, germline SNPs and local effects of SVs could be harnessed for scNOVA. To assess the utility of haplotype-resolved NO, we phased 24,652,658 of 49,205,197 (50.1%) of the NA12878 Strand-seq read fragments, and pooled these reads to generate pseudobulk NO tracks for each chromosomal haplotype (denoted ‘H1’ and ‘H2’, respectively; Fig. 1b). Using the female-derived NA12878 cell line, we compared haplotype-specific NO to haplotype-resolved gene expression measurements from bulk RNA-seq data³⁴ (Methods). We identified a significant increase of NO in gene bodies mapping to H1 compared with H2 across the X chromosome (adjusted P = 0.0012; Wilcoxon rank sum test), suggesting that H1 represents the inactive X chromosome. These data were consistent with haplotype-resolved gene expression measurements at loci subject to X-inactivation³⁵, whereas genes escaping X-inactivation did not exhibit haplotype-specific NO (Fig. 1e,f and Supplementary Fig. 3). We also investigated whether Strand-seq data is informative of haplotype-specific NO at cis-regulatory elements (CREs), and identified a 1.4-fold enrichment for allele-specific CRE binding on the X chromosome (P = 0.015; hypergeometric test; based on 718 CREs with haplotype-specific NO genome-wide; 10% false discovery rate (FDR)) (Supplementary Fig. 2). Moreover, CREs with haplotype-specific NO were significantly over-represented near genes showing allele-specific expression in the genome (P < 0.0018, hypergeometric test; Supplementary Fig. 2). These data suggest that haplotype-specific NO, a signal directly obtained from Strand-seq datasets, reflects biological gene regulation patterns in the genome.

Cell-typing

Since NO within gene bodies reflects gene activity in MNase-seq data²⁸, we hypothesized that Strand-seq based NO patterns could be used to infer gene expression. To investigate this, we tested whether NO globally reflects cellular gene expression patterns in the retinal pigment epithelium-1 (RPE-1) cell line, for which we previously generated both Strand-seq and RNA-seq data²⁴. To profile NO globally, we pooled 33 million read fragments (including phased and nonphased reads) from 79 Strand-seq libraries into pseudobulk NO tracks. We identified an inverse correlation between NO at gene bodies and gene expression (P < 2.2 × 10^–16; Spearman’s r of up to −0.24; Fig. 1g and Supplementary Fig. 4), where highly expressed genes showed significantly lower NO within their gene bodies (and vice versa). We next explored the utility of NO for cell-type inference (‘cell-typing’), based on the activity of lineage-specific genes, by implementing a multivariate dimensionality reduction framework. We performed in silico mixing of Strand-seq libraries from different LCLs and RPE cell lines, and built a classifier that separates distinct cell types by partial least squares discriminant analysis (PLS-DA). We used a training set of 179 mixed libraries, and initially considered 19,629 features, which reflect ENSEMBL³⁶ genes with sufficient read coverage (Methods). After feature selection, 1,738 features were retained. We then used a nonoverlapping set of 123 cells to assess performance, all of which scNOVA classified accurately (area under the curve (AUC) = 1; Extended Data Fig. 3). Our framework also discriminated between cells from three related RPE cell lines derived from the same donor, which exhibit distinct SV landscapes^24,37 (AUC = 0.96; Fig. 1h) indicating that scNOVA enables accurate cell-typing.

Gene activity changes between cell populations

Having established that scNOVA can use the expression of lineage-specific genes for cell-typing, we evaluated if it could predict gene expression differences between defined cell populations, such as subclones bearing distinct SVs. We devised a module that integrates deep convolutional neural networks and negative binomial generalized linear models (Supplementary Figs. 5 and 6), to measure differential gene activity between two defined cell populations. To benchmark this module, we mixed Strand-seq libraries from different cell lines in silico, creating ‘pseudoclones’, and evaluated the predicted changes in gene activity between defined pseudoclones (each composed of cells from one cell line) by analyzing NO at gene bodies (Supplementary Fig. 7 and Extended Data Fig. 4). We first compared RPE-1 to the HG01573 LCL line, and defined the ground truth of expression using RNA-seq. We found that the differential gene activity score of scNOVA (Methods) was highly predictive of the ten most differentially expressed genes, where analyses of pseudoclones comprising 156 RPE-1 and 46 HG01573 libraries revealed an AUC of 0.93 (we observed a similar performance when analyzing the 50 most differentially expressed genes; Fig. 1i). Gene activity changes inferred included well-known markers of epithelial (for example, EGFR, VCAN) and lymphoid (for example, CD74, CD100) cell types (Supplementary Table 2). The scNOVA predictions were informative also when we simulated minor subclones present with clonal frequency (CF) = 20%, CF = 5% and CF = 1.3%, resulting in AUCs of 0.92, 0.79 and 0.68, respectively (Extended Data Fig. 4). We obtained similar results when applying scNOVA to pseudoclones derived from different (genetically related) RPE cell lines (Supplementary Fig. 7). These benchmarking exercises suggest that scNOVA can accurately infer gene activity changes between defined cell populations, suggesting that this framework can be used to functionally characterize subclonal SVs.

Functional outcomes of SVs in cell lines

To test this, we set out to investigate the functional outcomes of somatic SV landscapes in a panel of LCL samples³⁸ (N = 25) from the 1000 Genomes Project³⁹ (1KGP). Single-cell SV discovery in 1,372 Strand-seq libraries generated for this panel (Supplementary Table 1) discovered 205 somatic SVs, with 24 of 25 (96%) LCLs showing at least one SV subclone—a sevenfold increase compared to a previous report⁴⁰ (Supplementary Table 3 and Supplementary Data). Of all the cell lines, 13 (52%) contained an SV subclone above 10% CF. This included the widely used NA12878 cell line^34,39, in which we discovered a subclonal 500 kb deletion at19q13.12 (CF = 21%) that was mutually exclusive with two 22q11.2 deletions seen at CFs of 21% and 57%, respectively (Supplementary Figs. 9 and 10). The 22q11.2 SVs mapped to the well-known site of IGL recombination occurring during normal B cell development⁴¹. We hence focused on the 19q13.12 event, which resulted in the loss of a copy of ZNF382—a tumor suppressor and repressor of c-Myc⁴². Application of scNOVA measured significantly increased activity of ERCC6—a target gene of the c-Myc/Max transcription factor (TF) dimer⁴³—and decreased activity of PIEZO2 and TRAPPC9, in cells harboring this deletion (10% FDR; Supplementary Table 2).

To validate these findings, we reanalyzed Fluidigm and Smart-seq single-cell RNA-seq (scRNA-seq) datasets generated for NA12878 (refs. ^44,45). We employed several established tools for SCNAs discovery from scRNA-seq data^46,47,48 (Supplementary Table 4), all of which failed to discover any of the SV subclones seen in this cell line (Supplementary Table 4). Yet, upon directly inputting the respective SV breakpoint coordinates into the CONICSmat tool⁴⁶, we succeeded in identifying the 19q13.12 deletion (denoted ‘19q-Del’) through ‘targeted SCNA recalling’. We next pursued differential gene expression analyses by scRNA-seq, comparing 19q-Del cells to unaffected (‘19q-Ref’) cells, and verified overexpression of ERCC6 in 19q-Del cells (10% FDR; Supplementary Fig. 10). For PIEZO2 and TRAPPC9, the scRNA-seq-based expression trends were consistent with scNOVA (Supplementary Fig. 10), but did not reach the FDR threshold. A search for the over-represented TF targets amongst the differentially active genes identified c-Myc and Max as the most over-represented TFs in 19q-Del cells (10% FDR; Supplementary Fig. 10). These results indicate that scNOVA can functionally characterize SVs inaccessible to scRNA-seq-based SCNA discovery.

We next focused on NA20509, the LCL with the most abundant SV subclone (85% CF). Somatic SVs in NA20509 arose primarily through the breakage-fusion-bride-cycle (BFB) process^24,49 involving a 49 Mb terminal duplication on 5q, and a 2.5 Mb inverted duplication on 17p with an adjacent terminal deletion (terDel) (Fig. 2a). The 5q and 17p segments became fused into a derivative chromosome of around 115 Mb (Supplementary Fig. 13), which probably stabilized the BFB. We searched for global gene activity changes in this ‘17p-BFB’ subclone compared with the nonrearranged cells (‘17p-Ref’) and identified 18 dysregulated genes (Fig. 2b). Testing for gene set over-representation⁵⁰ (Methods) revealed an enrichment of the target genes of c-Myc/Max heterodimers (10% FDR; Fig. 2c), that is, the same TFs we observed in the 19q-Del subclone in NA12878. Consistent with this, we identified somatic copy-number gain of MAP2K3, which encodes a gene activating c-Myc/Max⁵¹, resulting from the BFB (Fig. 2a).

**Fig. 2: Linking subclonal SVs to their functional consequences in LCLs.**

We performed several orthogonal analyses to validate these findings. First, we verified all somatic SVs using deep WGS data generated for the 1KGP sample panel⁵² (Supplementary Fig. 13). Second, we analyzed RNA-seq data³⁸ for this LCL panel, which revealed that NA20509 exhibits the highest MAP2K3 expression and the highest c-Myc/Max target expression (Supplementary Fig. 14 and Fig. 2d). Third, we followed the 17p-BFB subclone in culture, by subjecting early (p4) and late passage (p8) cells to Strand-seq, which revealed outgrowth of the 17p-BFB subclone (CF = 23% at p4, CF = 100% at p8; P < 0.00001, Fisher’s exact test; Fig. 2e), suggesting these cells have a proliferative advantage. Quantitative real-time PCR experiments verified this clonal outgrowth pattern (Fig. 2f).

Since the functional impact of SVs on clonal expansion is unexplored in LCLs, we more deeply characterized the molecular phenotypes of 17p-BFB cells by pursuing RNA-seq in p4 and p8 cultures. We observed increased MAP2K3 expression (1.39-fold, 10% FDR) at p8, consistent with MAP2K3 dysregulation as a result of copy-number gain in the 17p-BFB subclone (Fig. 2g and Supplementary Note). Pathway-level analysis showed deregulation of c-Myc/Max target genes following clonal expansion (P = 0.036; Wilcoxon rank sum test; Fig. 2h and Supplementary Fig. 14). Collectively, these data link the outgrowth of SV subclones to the deregulation of c-Myc/Max targets, which could represent a common driver of clonal expansion in LCLs.

Local effects of copy-balanced driver SVs in leukemia

To deconvolute the effects of driver SVs in patients, we applied scNOVA to analyze the local consequences of balanced SVs, which are widespread in leukemia^3,53. We analyzed primary cells from a patient with acute myeloid leukemia (AML) (32-year-old male; patient-ID = AML_1) bearing a balanced t(8;21) translocation that results in RUNX1-RUNX1T1 gene fusion⁵⁴. We sorted CD34⁺ cells from AML_1 (Supplementary Fig. 15), and sequenced 42 Strand-seq libraries. SV discovery revealed a 46,XY,t(8;21)(q22;q22) karyotype (Fig. 3a, Supplementary Fig. 16 and Supplementary Table 3) consistent with clinical diagnosis. We fine-mapped the translocation breakpoint to intron 1 of RUNX1T1 and intron 5 of RUNX1 (Supplementary Fig. 17), and subsequently identified haplotype-specific NO at 11 genes, genome-wide (10% FDR; Supplementary Table 2). This included RUNX1T1, which showed reduced NO on the derivative (H2) haplotype (Fig. 3b), consistent with increased gene activity mediated as a local effect of the translocation⁵⁵. The remaining genes did not reside near a detected somatic SV, suggesting other factors (such as germline SNPs; Supplementary Fig. 17) may have affected their NO.

**Fig. 3: Haplotype-specific NO analysis shows local effects of a copy-neutral driver SV in AML.**

To systematically investigate potential local effects, we used a sliding window (Methods) to measure NO on both sides of the translocation breakpoint. We observed decreased NO, suggesting increased chromatin accessibility, from the breakpoint junction up to the respective nearest topologically associating domain (TAD) boundaries (Fig. 3c). This signal was most pronounced in an enhancer-rich region around 0.8 to 1.1 Mb upstream of RUNX1 originating from chromosome 21 (P < 0.003; likelihood ratio test, adjusted using permutations; Fig. 3c), found to physically interact with the RUNX1 promoter in CD34⁺ cells⁵⁶. Within this segment, we identified two CREs with significantly reduced NO (10% FDR; Exact test) (Fig. 3d and Supplementary Table 5), which may foster RUNX1-RUNX1T1 expression. Chromosome-wide analysis showed haplotype-specific NO patterns were restricted to the fused TAD (Fig. 3e,f), in line with these patterns resulting from the translocation.

We also revisited Strand-seq datasets with previously reported copy-neutral SVs, including the BM510 cell line in which copy-neutral interchromosomal SVs resulted in TP53–NTRK3 gene fusion²⁴. In agreement with the oncogenic role of TP53–NTRK3 (ref. ²⁴), scNOVA identified NTRK3 upregulation as the only significant local effect (10% FDR), consistent with allele-specific TP53–NTRK3 expression measured on the rearranged haplotype (Extended Data Fig. 5). Second, we revisited a 2.6 Mb inversion mapping to 14q32 in a T-cell acute lymphoblastic leukemia (T-ALL) patient-derived xenograft (T-ALL_P1)²⁴. scNOVA discovered downregulation of BCL11B, a known haploinsufficient T-ALL tumor suppressor⁵⁷, as a significant local effect of this balanced inversion, supporting allele-specific silencing of BCL11B on the rearranged haplotype as measured by RNA-seq²⁴ (Extended Data Fig. 6). These data collectively show that scNOVA allows linking balanced SVs to their local functional consequences—a functionality not provided by any previous single-cell multiomic method²⁰.

Dissecting functional effects of heterogeneous somatic SVs

We next set out to functionally dissect a leukemia sample with unknown genetic drivers, by characterizing B-cells from a 61-year-old patient with chronic lymphocytic leukemia (CLL) (CLL_24)⁵⁸. Analysis of 86 Strand-seq libraries revealed an unprecedented level of somatic SVs, with 11 different karyotypes represented by 13 SVs occurring in subclones with CFs of 1–5% (Supplementary Table 3). This vastly exceeds intrapatient diversity estimates for CLLs from the Pan-Cancer Analysis of Whole Genomes (PCAWG), where maximally three subclones were reported⁵⁹, highlighting how Strand-seq provides access to SVs escaping discovery by WGS^3,24. Chromosome 10q showed especially pronounced subclonal heterogeneity; we identified seven partially overlapping deletions ranging from 2 to 31 Mb in size, and residing proximal to the fragile site FRA10B⁶⁰ (Fig. 4a and Supplementary Fig. 18). These SVs clustered into a 1.4 Mb ‘minimal segment’ at 10q24.32, arising independently from both haplotypes (Fig. 4b). While previous studies reported somatic 10q24.32 deletions in 1–4% of CLLs^61,62,63, molecular analysis of this recurrent somatic SV has so far been lacking.

**Fig. 4: Deconvoluting consequences of subclonal SV heterogeneity in a CLL primary sample.**

We first compared all cells bearing a 10q24.32 deletion (‘10q-Del’, N = 11) to cells lacking such SV (‘10q-Ref’, N = 75), hence disregarding the fine-scale subclonal structure of CLL_24, and predicted 115 dysregulated genes (Fig. 4c and Supplementary Table 2). Next, we performed molecular phenotype analysis using MsigDB⁶⁴ (Methods), which revealed that 10q-Del cells exhibit increased activity in several leukemia-relevant signaling pathways, including Wnt, c-Met (a pathway promoted by Wnt signaling⁶⁵), B cell receptor (BCR) signaling, phosphatidylinositol (3,4,5)-trisphosphate (PIP3) signaling and the CREB pathway (10% FDR; Fig. 4d). RNA-seq data available for 178 CLLs⁶² and stratified by 10q24.32 status, revealed upregulation of Wnt and c-Met signaling—but not of BCR, PIP3 and CREB signaling—in CLLs exhibiting 10q24.32 deletions (10% FDR; CLLs with 10q-Del: N = 4; 10q-Ref: N = 174; Fig. 4e and Supplementary Fig. 24). These data therefore suggest a link between 10q24.32 deletion and the promotion of Wnt signaling.

We further tested whether the different 10q-Del events seen in CLL_24 subclones have led to distinct functional outcomes, focusing on three subclones represented by at least two cells: ‘SCa,’ showing one interstitial deletion directly at the minimal segment; ‘SCb,’ harboring a terDel, with the breakpoint located at the minimal segment boundary and ‘SCc,’ containing two interstitial deletions, at the minimal segment and at 10q23.31 (Fig. 4b and Supplementary Table 3). Molecular phenotype analysis of each subclone identified 109, 206 and 266 differentially active genes, respectively (Supplementary Table 2), with the most pronounced levels of Wnt upregulation in SCb and SCc (Fig. 4f). SCb showed the highest activation of c-Met, BCR and PIP3 signaling, whereas CREB signaling was highest in SCc (Supplementary Fig. 21). This suggests that deletion location and length at 10q24.32 affect their molecular consequences, and furthermore illustrates the ability of scNOVA to predict molecular differences in subclones represented by as few as two cells.

To more deeply characterize the CLL_24 subclones, we generated CITE-seq (cellular indexing of transcriptomes and epitopes by single-cell sequencing) data, which couples scRNA-seq with protein surface marker measurements⁶⁶. Again, we attempted SCNA discovery in the scRNA-seq data, which failed to detect any SCNAs, or subclones, in CLL_24 (Supplementary Table 4). However, targeted SCNA recalling⁴⁶ identified 82 CITE-seq cells harboring the greater than 31 Mb 10q-terDel of SCb (‘10q-terDel’), whereas the deletions in SCa (2.2 Mb) and SCc (2.1 Mb and 1.9 Mb, respectively) escaped detection (Extended Data Fig. 7 and Supplementary Notes). Having recovered the SCb subclone in the CITE-seq data, we performed single-cell gene set enrichment analysis⁶⁷ (Methods), which verified that all pathways inferred by scNOVA (Wnt, c-Met, BCR, PIP3 and CREB) are upregulated in 10q-terDel cells (Fig. 4d,g). A gene regulatory network analysis⁶⁸ comparing 10q-terDel with 10q-Ref cells identified 43 differentially active TFs (FDR 10%; Fig. 4h) and a functional enrichment analysis⁶⁹ showed over-representation of Wnt signaling, BCR signaling and the PD-1 checkpoint pathway (Supplementary Table 16 and Fig. 4h); the PD-1 checkpoint pathway has been linked to immune resistance and transformation of CLL to aggressive lymphoma^70,71. Since somatic lesions mediating PD-1 expression in CLL have remained elusive, we used the CITE-seq data to analyze PD-1 protein expression, which demonstrated upregulation of PD-1 in 10q-terDel-containing cells as the only significant hit at the protein level (Fig. 4i). Notably, NFATC1, a TF predicted to be differentially active by both scNOVA and CITE-seq, regulates Wnt⁷², PIP3 (refs. ^73,74), CREB⁷⁵ and BCR signaling⁷⁶ as well as PD-1 expression⁷⁷, and thus may contribute to global pathway dysregulation in CLL_24. Our analysis reveals subtle pathway activities of somatic deletions present at low CF (Fig. 4f,j), and collectively implicates 10q24.32 deletions in dysregulated Wnt signaling—a crucial pathway for CLL pathogenesis⁷⁸.

Functional characterization of subclonal chromothripsis

While chromothripsis is a widespread mutational process in cancer^3,4,22, this process is not ascertained by previous single-cell multiomic methods, and its molecular outcomes remain largely elusive^3,79. We previously discovered a subclonal chromothripsis event²⁴ in T-ALL_P1 that affects most of 6q (denoted ‘6q-CT’; CF = 30%) (Fig. 5a and Supplementary Table 3); however, the consequences of this complex rearrangement were uncharacterized. Using scNOVA, we identified 12 genes with differential NO between 6q-CT and 6q-Ref cells (denoted the ‘CT gene signature’; 10% FDR; Fig. 5a,b and Supplementary Table 2). A closer analysis showed 27 TF genes overlapping the chromothriptic region (Fig. 5a). Gene set over-representation testing using the target genes of these TFs revealed that c-Myb, product of the MYB oncogene, was significantly enriched among the genes included in the CT gene signature (10% FDR; adjusted P = 0.00015; Fig. 5b,c and Supplementary Table 6). The MYB gene is located within a region that was duplicated (and inverted) as a result of 6q-CT, suggesting a potential dosage effect (Fig. 5a). Corroborating these predictions, we performed RNA-seq in a panel of 13 T-ALLs, amongst which T-ALL_P1 showed the highest expression of c-Myb targets (Fig. 5d and Supplementary Table 7). We also verified that MYB is allele-specifically expressed from the SV-affected haplotype (P = 0.0317; likelihood ratio test; Supplementary Fig. 30), which together nominates MYB as a candidate driver gene dysregulated as a consequence of 6q-CT.

**Fig. 5: scNOVA identifies functional effects of a subclonal chromothripsis event.**

To more deeply characterize this sample, we generated scRNA-seq data for T-ALL_P1 (5,504 cells; Fig. 6a). Since scRNA-seq-based SCNAs discovery^46,47,48 missed the 6q-CT event (Supplementary Table 4), we again performed targeted SCNA recalling (Supplementary Notes) generating confident calls for 838 (around 15%) cells in the scRNA-seq dataset (the remaining 4,666 cells lacked a confident assignment; ‘NA’). Out of these 838 cells, 729 were predicted to harbor the 6q-CT event, and 109 were called 6q-Ref. Unsupervised clustering⁸⁰ of the scRNA-seq data stratified by 6q status (Methods) revealed that 6q-CT cells (as predicted through targeted recalling) were enriched in two expression clusters (clusters 3 and 7; P = 3.43 × 10^–5 and 1.15 × 10^–3; FDR-adjusted Fisher’s exact test; Fig. 6d and Supplementary Fig. 34), in line with a distinctive expression profile. To corroborate this, we applied UCell⁸¹ to assign cells into ‘6q-CT’ or ‘6q-Ref’ based on the CT gene signature, which confirmed enrichment of 6q-CT in clusters 3 and 7 (Fig. 6c,d; P = 3.39 × 10^–38 and P = 2.15 × 10^–4; FDR-adjusted Fisher’s exact test). Trajectory analysis⁸² showed the 6q-CT cells (as defined by UCell) were enriched for DNearly (double-negative early; P = 2.78 × 10^–13), DNQ (double-negative quiescent; P = 1.27 × 10^–5) and DPP (double-positive proliferating; P = 1.88 × 10^–7) T cells (FDR-corrected Fisher’s exact tests; Fig. 6b and Supplementary Fig. 35), and depleted of mature CD4⁺ T cells (P = 1.45 × 10^–11, Supplementary Fig. 35). This suggests a potential differentiation block at the progenitor stage as a result of 6q-CT and, more generally, that 6q-CT cells bear a distinctive molecular phenotype as a result of the chromothriptic rearrangements.

**Fig. 6: Targeting the chromothriptic subclone in cell culture.**

Having identified c-Myb pathway activation as a consequence of 6q-CT in TALL_P1, we hypothesized this molecular phenotype could guide drug targeting in cell culture. We selected NOTCH1 as a suitable candidate for targeting this subclone because this c-Myb target was (1) inferred by scNOVA to be highly upregulated in 6q-CT cells (Fig. 5b) and (2) is targetable by different compounds and strategies⁸³. We treated T-ALL_P1 cell cultures with the CB-103 pan-NOTCH small-molecule inhibitor (targeting the Notch1 intracellular domain (N1-ICD)^84,85) or a vehicle control for 8 h and 24 h (Methods). Using scRNA-seq (3,663 single cells) to analyze drug response patterns, we inferred 6q-CT and 6q-Ref cells at each timepoint by transferring the cell annotation labels from the untreated (reference) sample with Seurat⁸⁰ (Fig. 6c and Supplementary Fig. 37). After 24 h in culture, vehicle-treated T-ALL_P1 cells showed a 45% relative increase in the 6q-CT subclone compared to 8 h (CF of 17.1% to 24.6%; P = 0.0180; FDR-adjusted Fisher’s exact test), indicating that 6q-CT cells expanded clonally. By contrast, upon CB-103 treatment, the CF of the 6q-CT subclone was reduced at 24 h (to CF = 15.5%; P = 0.0064; Fig. 6e and Supplementary Fig. 38), indicating that 6q-CT cells were preferentially lost with N1-ICD inhibition. Additionally, we observed specific depletion of the REACTOME N1-ICD gene set only in 6q-CT cells after 24 h of CD-103 treatment, consistent with specific subclone targeting (P = 0.0096; FDR-adjusted Wilcoxon rank sum test; Fig. 6f and Supplementary Fig. 39). These results highlight the potential of scNOVA to functionally characterize highly complex classes of DNA rearrangement (that is, chromothripsis events), and to clinically target subclones bearing complex cancer driver SVs.

Discussion

The functional characterization of SVs is of critical importance for precision oncology^1,2,3. Our method characterizes a wide spectrum of SV classes²⁴, and couples these with NO analysis to link somatic SVs to local or global gene activity changes. Accounting for balanced SVs, scNOVA allows the investigation of copy-number stable (that is, euploid) malignancies previously inaccessible to single-cell multiomics^3,20 (Supplementary Table 12). Strand-seq derived SCNA calls were far better resolved compared to scRNA-seq based calls (Supplementary Table 4), suggesting a more limited utility of scRNA-seq data for discovering SCNA drivers in cancer, with the exception of malignancies displaying extremely high levels of chromosomal instability with particularly large-scale SCNAs^3,86.

We uncovered unprecedented karyotypic diversity in a CLL sample, comprising distinct deletions at 10q24.32, which we link to leukemia-related signaling pathways, particularly Wnt signaling. Read depth based profiling of SCNAs is prone to underreport such subclonal structural diversity³. Enrichment of cases bearing 10q24.32 deletions amongst relapsed/refractory and high-risk CLL⁸⁷ suggests a potential role of Wnt pathway dysregulation mediated through 10q24.32 in disease progression. Whether the FRA10B fragile site is involved in the formation of these deletions remains to be seen and requires larger cohorts. Interestingly, CLL_24 exhibits a SNP (rs118137427; 3.7% allele frequency in Europeans) within FRA10B associated with the acquisition of 10q-terDel in normal blood⁸⁸. Based on the PCAWG resource comprising 94 CLLs², rs118137427 is seen in 2 out of 4 (50%) CLLs with 10q24.32 deletions, but in only 6 of 90 (6.7%) CLLs with 10q-Ref (P = 0.035; Fisher’s exact test), suggesting a possible link between SNPs at FRA10B and ITH in leukemia that warrants future investigation.

Our framework readily functionally characterizes complex rearrangements previously inaccessible to single-cell multiomics³. Complex somatic SVs are prevalent in cancer and linked with aggressive tumor phenotypes^2,3,22 underlining significant potential of scNOVA for the comprehensive functional characterization of cancer cells. Since scNOVA does not require coupling distinct experimental modalities in each individual cell, it overcomes important methodological challenges²⁰, including data sparseness and higher costs from generating data for more than one modality^20,89. Additionally, the coverage achieved by Strand-seq enables the analysis of haplotype-specific NO along the entire genome (Supplementary Fig. 41), providing advantages over classical allele-specific analyses that are restricted to regionally phased SNPs¹⁵.

Nonetheless, important challenges remain, and the full spectrum of mutations arising in an individual cell is likely to remain inaccessible to a single method in the foreseeable future. Strand-seq does not capture SVs less than 200 kb that more rarely acts as cancer drivers². Additionally, while scNOVA infers differentially active genes, it does not span the same dynamic expression range as scRNA-seq (Supplementary Table 12). This suggests that pairing scNOVA with targeted SCNA recalling by scRNA-seq can provide added value by allowing variants outside the detection range of other methods to be characterized. Finally, Strand-seq requires dividing cells for BrdU labeling²³ (Fig. 1a), and is therefore not applicable for nondividing cells or fixed samples. However, it can be used for dividing cells in organoids, primary fresh frozen progenitor cells, cells in regenerating tissues and cancer samples amenable to culture. Our study used cell lines for benchmarking followed by proof-of-principle application in patient samples. Generalization of these analyses to larger cohorts will allow systematic investigation of the roles subclonal SVs play in leukaemogenesis.

We foresee a wide variety of potential future applications. Our framework offers potential for studies on the determinants and consequences of chromosomal instability in cancer, and may promote research into the interplay of genetic and nongenetic cancer determinants²⁰. It likewise could be used to advance surveys of precancerous lesions^3,90. Additionally, scNOVA may offer value in precision oncology by exposing subclonal driver alterations along with their targetable functional outcomes, to target cancer subclones in patients. Furthermore, SVs can accidentally arise in key model cell lines, as we demonstrate for widely used LCLs, and the features of scNOVA are ideally suited to functionally characterize unwanted heterogeneity in such samples. Unwanted somatic SVs also arise as a by-product of CRISPR-Cas9 genome editing, which generates micronuclei and chromosome bridges in human primary cells, structures that initiate the formation of chromothripsis⁹¹. scNOVA could promote the safety of therapeutically relevant genome editing in the future, by enabling the simultaneous detection and functional characterization of such potentially pathogenic editing outcomes.

In summary, scNOVA moves directly from SV landscapes to their functional consequences in heterogeneous cell populations. By making a broad spectrum of somatic SVs accessible for functional characterization genome-wide, this single-cell multiomic framework serves as a foundation for deciphering the impact of somatic rearrangement processes in cancer.

Methods

Strand-seq library preparation

NA20509 Strand-seq libraries were prepared as previously described⁹⁴. Strand-seq libraries of primary leukemia samples were generated as follows: peripheral blood mononuclear cells of a previously untreated female CLL patient (routine diagnostics: IGHV unmutated, no TP53 mutation, no detected alteration in 6q21, 8q24, 11q22.3, 12q13, 13q14 and 17p13) were isolated after obtaining informed consent. Cells were isolated and cultured using previously established protocols⁹⁵. CLL cells were cultured at 1 × 10⁶ cells ml^–1 in Roswell Park Memorial Institute (RPMI) medium (Gibco by Life technologies), supplemented with 10 % human serum (PAN BIOTECH), 1% Pen/Strep (GIBCO by Life Technologies) and 1% Glutamine (GIBCO by Life Technologies). Cells were stimulated with 1 µg ml^–1 Resiquimod (Enzo) and 50 ng ml^–1 IL-2 (Sigma). BrdU (40 µM; Sigma) was incorporated for 90 h and 120 h, respectively, to perform nontemplate strand labeling. Single nuclei from each timepoint were sorted into 96-well plates using a BD FACSMelody cell sorter, followed by Strand-seq library preparation (described below). In the case of the AML sample, frozen primary mononuclear cells from a bone marrow aspirate were thawed and stained with CD34-APC (clone 581; Biolegend), CD38-PeCy7 (clone HB7; eBioscience), CD45Ra-FITC (clone HI100; eBioscience), CD90-PE (clone 5E10; eBioscience) and LIVE/DEAD Fixable Near-IR Dead Cell Stain (Thermofisher). Single, viable, CD34⁺ cells (Supplementary Fig. 15) were sorted using a BD FACSAria Fusion Cell Sorter into ice-cold serum-free expansion medium (SFEM) supplemented with 100 ng ml^–1 SCF and Flt3 (Stem Cell Technologies), 20 ng ml^–1 IL-3, IL-6, G-CSF and TPO (Stem Cell Technologies). Cells were plated in Corning Costar Ultra-Low Attachment 96-well flat-bottom plates (Sigma) at 1 × 10⁵ cells ml^–1 in warm medium as above. At 24 h after culture, 40 µM BrdU was added. Nuclei were isolated after 43 h total culture time, and BrdU-incorporating nuclei sorted into 96-well plates followed by Strand-seq library preparation. All Strand-seq libraries were automatically prepared using a Biomek FXP liquid handling robotic system, as described previously^23,96. Libraries were sequenced on an Illumina NextSeq 500 sequencing platform (MID-mode, 75 base pair (bp) paired-end sequencing protocol).

Strand-seq data preprocessing

Reads from Strand-seq (fastq) libraries were aligned to the hg38 assembly using BWA⁹⁷, as previously described²⁴. Sequence reads with low quality (MAPQ < 10), supplementary reads and duplicated reads were removed. Single-cell library selection was performed as described previously²⁴. The single-cell footprints of different SV classes were discovered using the principle of scTRIP of Strand-seq data using the MosaiCatcher computational pipeline with default settings²⁴.

Coupling NO measurements and SV discovery in the same cell with scNOVA

We developed scNOVA as a computational framework for coupling discovered somatic SVs with analyses of NO profiles in the same cell. The scNOVA workflow covers a set of different operations from single-cell SV discovery (using the previously described scTRIP method²⁴) to NO profiling at CREs, and gene as well as pathway dysregulation inference based on NO at gene bodies, and can be used in a haplotype-aware or -unaware manner (Extended Data Fig. 1). To maximize reusability, interoperability and reproducibility we combined all scNOVA modules into a coherent workflow using snakemake. Alternatively, these modules can be executed individually.

Data analysis and operational definition utilized for NO

We operationally defined NO closely following definitions from a previous study²⁸: NO maps were calculated by counting how many reads from the Strand-seq libraries (which typically comprise mono-nucleosomal fragments around 140–180 bp in size; see Supplementary Table 1 and Supplementary Fig. 1) covered a given bp based on aligning reads to the GRCh38 (hg38) genome assembly with BWA⁹⁷. Genomic regions with unusual (such as artificially high) coverage were considered artifacts, and were automatically excluded (‘blacklisted’) by our Strand-seq analysis workflow as previously described²⁴. No further peak calling or smoothing was conducted, and no assumptions on the length of the nucleosomal DNA were made to derive NO maps, as nucleosome boundaries were determined on both sides of the nucleosome by paired-end sequencing²⁸. For the calculation of NO around bound CTCF binding sites (downloaded from ENCODE³⁴), the averaged profile was scaled²⁸ to yield an NO equal to 1 at position –2,000 bp from the center of the bound CTCF site.

Cell type classification

We generated feature sets from the NO at the body of genes (defined as the region from the TSS to the transcription termination site, which includes exons and introns) at the single-cell level. When several sequencing batches from the same samples were available, we applied batch correction to the NO count matrix using ComBat-seq⁹⁸. NO in gene body regions was normalized by segmental copy number status, and by library size to obtain reads per million, which we transformed into log₂ scale. This feature set was used for the unsupervised dimension reduction plot (Extended Data Fig. 3) and for training of a supervised classification model based on PLS-DA⁹⁹.

Haplotype-phasing of single-cell NO tracks

As previously described, Strand-seq directly resolves its underlying sequence reads onto haplotypes ranging from telomere to telomere³¹ (chromosome-length haplotyping). scNOVA phases NO profiles onto a chromosomal homolog using the StrandPhaseR algorithm³¹, which is employed wherever the template strand segregation pattern of a chromosome enables unambiguous haplotype-phasing, that is, for Watson/Crick (WC) or Crick/Watson (CW) template state configurations in Strand-seq libraries^31,96. Haplotype-specific analyses pursued by scNOVA employ phased reads (normalized by locus copy number), whereas the inference of gene activity changes uses both phased reads (from chromosomes with a WC or CW configuration) and unphased reads (from chromosomes with a CC or WW configuration^31,96).

Inference of haplotype-specific NO and identification of local effects of SVs

To dissect local effects of SVs, the scNOVA framework performs a genome-wide haplotype-specific NO analysis at gene bodies in pseudobulk, which yields a haplotype-specific NO matrix. Using this matrix, scNOVA then scans up to ±1 Mb around each somatic SV breakpoint to infer local effects of these breakpoints on haplotype-specific gene activity, using FDR-adjusted Wilcoxon rank sum tests. Once a local effect on gene activity is identified, scNOVA additionally provides the option to locally scan for CREs exhibiting haplotype-specific NO. To do so, user-provided CRE positions from the cell type of interest are used by scNOVA to calculate haplotype-specific NO at CREs, and the Exact test (10% FDR) is used for significance testing.

Inference of genome-wide changes in gene activity

This haplotype-unaware module of scNOVA considers all reads—whether phased or not—to infer gene activity alterations via analysis of differential patterns of NO along gene bodies. scNOVA obtains gene loci from ENSEMBL (GRCh38.81), converted into bed format (Genebody_hg38.81.bed). Strand-seq reads falling within the start and end position of genes (Genebody_hg38.81.bed) were identified with the Deeptool multiBamSummary function¹⁰⁰, using the following parameters: [multiBamSummary BED-file –BED Genebody_hg38.81.bed –bamfiles Input.bam –extendReads –outRawCounts output.tab -out output.npz]. The scNOVA gene dysregulation inference module contains two steps: Step 1 filters out genes unlikely to be expressed (‘not expressed’, NEs), whereas Step 2 infers dysregulated (that is, differentially expressed) genes between subclones using a generalized linear model.

In Step 1, scNOVA first aims to infer gene expression ‘On’ and ‘Off’ states¹⁰¹ from NO, by analyzing NO as well as gene context-specific sequence features along gene bodies using deep convolutional neural networks¹⁰² (CNNs).

By default, scNOVA operates with the model trained with a pseudobulk of 80 cells, to estimate the probability of each gene to represent an NE in each clone. Genes likely to be unexpressed (NE status probability ≥0.9) across clones are filtered out in Step 1, and all remaining genes used in Step 2.

In Step 2, scNOVA by default employs negative binomial generalized linear models, available in the DESeq2 algorithm¹⁰³, to infer genes with differential activity between individual cells or clones. As an input, scNOVA computes single-cell count tables of gene body NO. When running this step with subclones, all individual cells of the subclone are considered ‘replicates’ in DESeq2 terminology¹⁰³. Subclones (or cells) are compared in a pairwise manner using a two-sided Wald test to infer genome-wide alterations in gene activity. Based on this, we defined the differential gene activity score as the sign of the fold change (FC) in NO at gene bodies, multiplied by –log₁₀ P values. Genes with significantly altered activity were identified using a 10% FDR threshold. Additionally, to facilitate the analysis of small CF subclones, scNOVA provides an alternative mode which employs PLS-DA⁹⁹ to identify discriminatory feature sets as gene sets showing altered activity. To do this, scNOVA builds a PLS-DA⁹⁹ discriminant model to classify cells in a given subclone 1 and subclone 2 based on single-cell count tables of gene body NO as feature sets. This model provides a variable importance of projection (VIP) and significance compared with a null distribution in the form of a P value for each gene analyzed. Similar to the default setting, genes with altered activity were identified using a 10% FDR cutoff when using PLS-DA for inferring changes in gene activity between subclones. Benchmarking both modes (see Extended Data Fig. 4) suggested that, whereas both DESeq2 and PLS-DA offer acceptable performance, the alternative mode (PLS-DA) outperforms the default setting when the subclonal CF is below 10%, whereas the default mode (DESeq2) generated superior results for CF values of 10% or greater.

Genes with altered somatic copy number were masked (removed) when investigating gene activity changes based on NO at gene bodies, since differences in copy number status could confound differential NO measurements.

Molecular phenotype analysis in gene sets

This module of scNOVA uses defined gene sets, obtained from public resources, to identify over-represented sets of functionally related genes changing in activity between subclones (or individual cells). Two types of analyses are enabled by this module: (1) gene set over-representation analysis, which can be used to investigate, for example, the enrichment of targets of an important TF among genes showing a change in activity according to gene body analysis of NO; (2) joint modeling of NO across predefined gene sets, using pathway definitions from MSigDB⁶⁴. Throughout the manuscript, we applied an FDR of 10% (P_adj. < 0.1) as a significance threshold.

In the case of gene set over-representation analysis, we collected TF target genes from database entries (EnrichR⁵⁰) as well as by reviewing the literature. When reviewing the literature, we created curated lists of target genes for TFs based on published genome-wide studies using the following strict criteria: (1) target genes show evidence of binding of the TF of interest by chromatin immunoprecipitation followed by sequencing (ChIP–seq); (2) the same genes must additionally show differential expression when the TF of interest is experimentally silenced (our curated target gene lists are available in Supplementary Table 7). For each TF, the significance of overlap between its target gene set and genes exhibiting differential NO was computed using hypergeometric tests, followed by controlling the FDR at 10%.

To jointly model differential NO across all genes of predefined pathways, scNOVA first generates a single-cell gene body NO table using Strand-seq read count data, with these read counts then being normalized using the median-of-ratios method from DESeq2 (ref. ¹⁰³). For each member in the biological pathway gene sets from MSigDB⁶⁴, scNOVA then computes mean normalized NO values, in each single-cell, as a proxy for pathway-level NO. Lowly variable genes (s.d. <80%) are removed. Pathway-level NO is compared between cells with and without SVs using linear mixed model fitting followed by likelihood ratio testing, and controlling the FDR at 10%. For linear mixed model fitting, SV status is defined as a fixed effect and different Strand-seq library batches are defined as random effects, by scNOVA.

Quantitative real-time PCR

NA20509 was ordered from Coriell and taken into culture at passage four. The late passage was grown until passage eight in a time span of 8 weeks. HG01505 was taken into culture at passage five and was grown until passage nine within a total time span of 6 weeks. DNA, RNA and protein were isolated with the NucleoSpin TriPrep Mini kit (740966.50) according to the manufacturer’s protocol. qPCR was performed on genomic DNA. PCR primers for MAP2K3 and TP53 were obtained from Sigma. qPCR was performed using BD SYBR Green PCR Master Mix (4309155) with a final primer concentration of 300 nM each and 10 ng input gDNA. A GAPDH control region was used as a normalizer. The primer sequences for DNA qPCR are provided in Supplementary Table 17.

Drug treatment with CB-103

Primary human T-ALL cells were recovered from cryopreserved bone marrow aspirates of patients enrolled in the ALL-BFM 2009 study. Patient-derived xenografts were generated as previously described by intrafemoral injection of 1 million viable primary ALL cells in NSG mice¹⁰⁴ Patient-derived xenografts (T-ALL_P1)²⁴ cells were frozen until processing. Human hTERT immortalized primary bone marrow mesenchymal stroma cells (MSC; provided by D. Campana) were cultured in Roswell Park Memorial Institute (RPMI) 1640 medium supplemented with 10% heat-inactivated fetal bovine serum, l-glutamine (2 mM), penicillin/streptomycin (100 IU ml^–1) and hydrocortisone (1 μM). MSCs were seeded in 24-well plates at a concentration of 500,000 cells per well in 1 ml Aim V medium. After 24 h, T-ALL cells were added at a concentration of 1.5 million cells per well in 1 ml Aim V. CB-103 (MedChemExpress, HY-135145) or DMSO (vehicle) as control was added after an additional 24 h at a concentration of 10 μM. After 8 h and 24 h, cells were trypsinized, collected and frozen in 90% fetal bovine serum /10% DMSO.

Single-cell RNA sequencing and data processing

For scRNA-seq library preparation, cryopreserved cells were thawed rapidly at 37 °C and resuspended in 10 ml warm RPMI medium with 100 μg ml^–1 Dnase I. Cells were centrifuged for 5 mins at 300g, and resuspended in ice-cold PBS with 2% fetal bovine serum and 5 mM EDTA. Cells were stained on ice with anti-murine-CD45-PE (mCD45)(clone 30-F11; BioLegend; 1:20) in the dark for 30 mins. 1:100 4,6-diamidino-2-phenylindole (DAPI) was added and incubated in the dark for 5 mins before sorting. Triple negative cells (4,6-diamidino-2-phenylindole-mCD45-GFP^–) were sorted (Supplementary Fig. 32) using a BD FACSAria fusion cell sorter into ice-cold 0.03% bovine serum albumin (BSA) in PBS. All isolated cells were used immediately for scRNA-seq libraries, which were generated as per the standard 10x Genomics Chromium 3′ (v.3.1 Chemistry) protocol. Completed libraries were sequenced on a NextSeq5000 sequencer (HIGH-mode, 75 bp paired-end).

Sequenced transcripts were aligned to both human and mouse genomes (GRCh38 and mm10) and quantified into count matrices using cell ranger mkfastq and count workflows (10X Genomics, v.3.1.0, default parameters). The R package Seurat⁸⁰ (v.4.0.3) was used for quality control of single cells and unsupervised clustering of the data. Briefly, human cells were separated from multiplets/mouse contamination based on greater than 97 % of their reads aligning to GRCh38. Further filtering for high quality cells accepted only those with more than 200 but less than 20,000 total RNA counts, and a percentage of mitochondrial reads less than 10% for the untreated data, and less than 40% for the drug-treated samples. Finally, remaining mouse transcripts were removed before further analysis.

In the untreated data, normalization, scaling and regression of mitochondrial read percentage was carried out using the scTransform package¹⁰⁵. Dimensionality reduction and differential expression analysis of identified clusters was performed as standard using Seurat. Trajectory analysis was performed using Monocle3 (ref. ¹⁰⁶). In the drug treatment data, individual Seurat objects that had been quality controlled as above were normalized by scTransform^105,107 and then integrated to correct for batch effects and allow for comparative analysis. To re-annotate clusters from the untreated data in the drug treatment data, the TransferData() function from Seurat⁸⁰ was used to project labels from our reference (that is, untreated data) onto the integrated drug treatment data. Single-cell gene set enrichment analysis was performed using the R package ‘escape’⁶⁷.

Cellular indexing of transcriptomes and epitopes by single-cell sequencing

A peripheral blood-derived sample (CLL_24) was recovered from cryopreservation as previously described¹⁰⁸ to reach viability above 90%. Then, 5 × 10⁵ viable cells were stained by a premixed cocktail of oligonucleotide-conjugated antibodies (Supplementary Table 14) and incubated at 4 °C for 30 min. We provided dilution used for each antibody in Supplementary Table 14. Cells were washed three times with ice-cold washing buffer. After completion, bead-cell suspensions, synthesis of complementary DNA and single-cell gene expression and antibody-derived tag (ADT) libraries were performed using a Chromium single cell v.3.1 3ʹ kit (10× Genomics) according to the manufacturer’s instructions. Then, 3′ gene expression and ADT libraries were pooled in a ratio of 3:1 aiming for 40,000 reads (gene expression) and 15,000 reads per cell (ADT), respectively. Sequencing was performed on a NextSeq 500 (Illumina). After sequencing, the cell ranger wrapper function (10x Genomics, v.6.1.1) cellranger mkfastq was used to demultiplex and to align raw base-call files to the human reference genome (hg38). The obtained FASTQ files were counted by the cellranger count command. If not otherwise indicated default settings were used. Single-cell gene set enrichment analysis was performed using the R package ‘escape’⁶⁷.

Single-cell gene signature scoring using UCell

The activity of the scNOVA-identified gene set from T-ALL_P1 in scRNA-seq data was profiled using the UCell package⁸¹. Briefly, signature genes considered were those with either increased (implying decreased expression) or decreased (implying increased expression) nucleosome occupancy (see Fig. 5b), or genes encoding TFs whose targets showed differential nucleosome occupancy (see Fig. 5c). The following gene set was used for T-ALL_P1: ‘PRKCB–’, ‘RPS6KA2–’, ‘FAM120B–’, ‘FAM86C1+’, ‘FBXO22+’, ‘RHOH+’, ‘SLC9A7+’, ‘NASP+’, ‘NOTCH1+’, ‘MRPL48+’, ‘MFSD9+’, ‘MVB12B+’, ‘MYB+’ (with ‘+’ for upregulated, and ‘–’ for downregulated). The score per single cell for the entire directional gene set was calculated using the AddModuleScore_UCell() function. Cells were considered to be ‘active’ for the signature genes if their respective UCell score was greater than or equal to the median UCell score of the entire dataset, plus the s.d. Similarly, for T-cell cell-type labeling, marker gene sets for T-cell subsets were obtained from Park et al.¹⁰⁹ and single cells were scored for their activity in each gene set. Cells were labeled by their best-fit cell type, that is the cell-type whose gene set gave the highest UCell score.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

Sequencing data from this study can be retrieved from the European Genome-phenome Archive (EGA) and the European Nucleotide Archive (ENA). LCL data are available under the following accessions: Strand-seq (PRJEB39750, PRJEB55038); RNA-seq (ERP123231); WGS (PRJEB37677). C11 cell line data are available under the accession PRJEB55012. Leukemia patient data and human primary cells derived data were deposited in the European Genome-phenome Archive (EGA) under the following accession numbers: skin fibroblast (EGAS00001006498); cord blood (EGAS00001006567). T-ALL Strand-seq and scRNA-seq (EGAS00001003365), CLL Strand-seq (EGAS00001004925), AML Strand-seq (EGAS00001004903), T-ALL bulk RNA-seq (EGAS00001003248), CLL bulk RNA-seq (EGAS00001005746), CLL CITE-seq (EGAS00001004925). Access to human patient data is governed by the EGA Data Access Committee.

Code availability

The computational code of our analytical framework scNOVA is available open source at https://github.com/jeongdo801/scNOVA, with no restrictions on reuse. Other software used: Mosaicatcher (https://github.com/friendsofstrandseq/mosaicatcher-pipeline), StrandPhaseR (https://github.com/daewoooo/StrandPhaseR), InferCNV (https://github.com/broadinstitute/inferCNV/), HoneyBADGER (https://jef.works/HoneyBADGER/), CONICSmat (https://github.com/diazlab/CONICS), NucTools (https://homeveg.github.io/nuctools), Delly2 (https://github.com/dellytools/delly), BWA (v.0.7.15), STAR (v.2.7.9a), SAMtools (v.1.3.1), biobambam2 (v.2.0.76), deeptools (v.2.5.1), perl (v.5.16.3), Python (v.3.7.4), cuDNN (v.7.6.4.38), CUDA (v.10.1.243), TensorFlow (v.1.15.0), scikit-learn (v.0.21.3), matplotlib (v.3.1.1), R v.4.0.0, DESeq2, FlowJo and BD FACSDiva.

References

Priestley, P. et al. Pan-cancer whole-genome analyses of metastatic solid tumours. Nature 575, 210–216 (2019).
Article CAS PubMed PubMed Central Google Scholar
ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature 578, 82–93 (2020).
Article Google Scholar
Cosenza, M. R., Rodriguez-Martin, B. & Korbel, J. O. Structural variation in cancer: role, prevalence, and mechanisms. Annu. Rev. Genomics Hum. Genet. 23, 123–152 (2022).
Article PubMed Google Scholar
Stephens, P. J. et al. Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell 144, 27–40 (2011).
Article CAS PubMed PubMed Central Google Scholar
Rausch, T. et al. Genome sequencing of pediatric medulloblastoma links catastrophic DNA rearrangements with TP53 mutations. Cell 148, 59–71 (2012).
Article CAS PubMed PubMed Central Google Scholar
Baca, S. C. et al. Punctuated evolution of prostate cancer genomes. Cell 153, 666–677 (2013).
Article CAS PubMed PubMed Central Google Scholar
Umbreit, N. T. et al. Mechanisms generating cancer genome complexity from a single cell division error. Science 368, eaba0712 (2020).
Article CAS PubMed PubMed Central Google Scholar
Hadi, K. et al. Distinct classes of complex structural variation uncovered across thousands of cancer genome graphs. Cell 183, e32 (2020).
Article Google Scholar
Minussi, D. C. et al. Breast tumours maintain a reservoir of subclonal diversity during expansion. Nature 592, 302–308 (2021).
Article CAS PubMed PubMed Central Google Scholar
Watkins, T. B. K. et al. Pervasive chromosomal instability and karyotype order in tumour evolution. Nature 587, 126–132 (2020).
Article CAS PubMed PubMed Central Google Scholar
Viswanathan, S. R. et al. Structural alterations driving castration-resistant prostate cancer revealed by linked-read genome sequencing. Cell 174, e19 (2018).
Article Google Scholar
McPherson, A. et al. Divergent modes of clonal spread and intraperitoneal mixing in high-grade serous ovarian cancer. Nat. Genet. 48, 758–767 (2016).
Article CAS PubMed Google Scholar
Weischenfeldt, J. et al. Pan-cancer analysis of somatic copy-number alterations implicates IRS4 and IGF2 in enhancer hijacking. Nat. Genet. 49, 65–74 (2016).
Article PubMed PubMed Central Google Scholar
Liu, Y. et al. Discovery of regulatory noncoding variants in individual cancer genomes by using cis-X. Nat. Genet. 52, 811–818 (2020).
Article CAS PubMed PubMed Central Google Scholar
PCAWG Transcriptome Core Group. et al. Genomic basis for RNA alterations in cancer. Nature 578, 129–136 (2020).
Article Google Scholar
Northcott, P. A. et al. Enhancer hijacking activates GFI1 family oncogenes in medulloblastoma. Nature 511, 428–434 (2014).
Article CAS PubMed PubMed Central Google Scholar
Dey, S. S., Kester, L., Spanjaard, B., Bienko, M. & van Oudenaarden, A. Integrated genome and transcriptome sequencing of the same cell. Nat. Biotechnol. 33, 285–289 (2015).
Article CAS PubMed PubMed Central Google Scholar
Macaulay, I. C. et al. G&T-seq: parallel sequencing of single-cell genomes and transcriptomes. Nat. Methods 12, 519–522 (2015).
Article CAS PubMed Google Scholar
Yin, Y. et al. High-throughput single-cell sequencing with linear amplification. Mol. Cell 76, e10 (2019).
Article Google Scholar
Nam, A. S., Chaligne, R. & Landau, D. A. Integrating genetic and non-genetic determinants of cancer evolution by single-cell multi-omics. Nat. Rev. Genet. 22, 3–18 (2020).
Article PubMed PubMed Central Google Scholar
Nam, A. S. et al. Somatic mutations and cell identity linked by genotyping of transcriptomes. Nature 571, 355–360 (2019).
Article CAS PubMed PubMed Central Google Scholar
Cortés-Ciriano, I. et al. Comprehensive analysis of chromothripsis in 2,658 human cancers using whole-genome sequencing. Nat. Genet. 52, 331–341 (2020).
Article PubMed PubMed Central Google Scholar
Sanders, A. D., Falconer, E., Hills, M., Spierings, D. C. J. & Lansdorp, P. M. Single-cell template strand sequencing by Strand-seq enables the characterization of individual homologs. Nat. Protoc. 12, 1151–1176 (2017).
Article CAS PubMed Google Scholar
Sanders, A. D. et al. Single-cell analysis of structural variations and complex rearrangements with tri-channel processing. Nat. Biotechnol. 38, 343–354 (2020).
Article CAS PubMed Google Scholar
Schones, D. E. et al. Dynamic regulation of nucleosome positioning in the human genome. Cell 132, 887–898 (2008).
Article CAS PubMed Google Scholar
Lai, B. et al. Principles of nucleosome organization revealed by single-cell micrococcal nuclease sequencing. Nature 562, 281–285 (2018).
Article CAS PubMed PubMed Central Google Scholar
Struhl, K. & Segal, E. Determinants of nucleosome positioning. Nat. Struct. Mol. Biol. 20, 267–273 (2013).
Article CAS PubMed PubMed Central Google Scholar
Teif, V. B. et al. Genome-wide nucleosome positioning during embryonic stem cell development. Nat. Struct. Mol. Biol. 19, 1185–1192 (2012).
Article CAS PubMed Google Scholar
Lam, F. H., Steger, D. J. & O’Shea, E. K. Chromatin decouples promoter threshold from dynamic range. Nature 453, 246–250 (2008).
Article CAS PubMed PubMed Central Google Scholar
Shivaswamy, S. et al. Dynamic remodeling of individual nucleosomes across a eukaryotic genome in response to transcriptional perturbation. PLoS Biol. 6, e65 (2008).
Article PubMed PubMed Central Google Scholar
Porubský, D. et al. Direct chromosome-length haplotyping by single-cell sequencing. Genome Res. 26, 1565–1574 (2016).
Article PubMed PubMed Central Google Scholar
Kundaje, A. et al. Ubiquitous heterogeneity and asymmetry of the chromatin environment at regulatory elements. Genome Res. 22, 1735–1747 (2012).
Article CAS PubMed PubMed Central Google Scholar
Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Article PubMed Central Google Scholar
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Article Google Scholar
Loda, A., Collombet, S. & Heard, E. Gene regulation in time and space during X-chromosome inactivation. Nat. Rev. Mol. Cell Biol. 23, 231–249 (2022).
Article CAS PubMed Google Scholar
Yates, A. D. et al. Ensembl 2020. Nucleic Acids Res. 48, D682–D688 (2020).
CAS PubMed Google Scholar
Mardin, B. R. et al. A cell-based model system links chromothripsis with hyperploidy. Mol. Syst. Biol. 11, 828 (2015).
Article PubMed PubMed Central Google Scholar
Ebert, P. et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 372, eabf7117 (2021).
Article CAS PubMed PubMed Central Google Scholar
1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article Google Scholar
Shirley, M. D. et al. Chromosomal variation in lymphoblastoid cell lines. Hum. Mutat. 33, 1075–1086 (2012).
Article CAS PubMed PubMed Central Google Scholar
Mraz, M. et al. The origin of deletion 22q11 in chronic lymphocytic leukemia is related to the rearrangement of immunoglobulin lambda light chain locus. Leuk. Res. 37, 802–808 (2013).
Article CAS PubMed Google Scholar
Dang, S. et al. Dynamic expression of ZNF382 and its tumor-suppressor role in hepatitis B virus-related hepatocellular carcinogenesis. Oncogene 38, 4804–4819 (2019).
Article CAS PubMed Google Scholar
Li, Z. et al. A global transcriptional regulatory role for c-Myc in Burkitt’s lymphoma cells. Proc. Natl Acad. Sci. USA 100, 8164–8169 (2003).
Article CAS PubMed PubMed Central Google Scholar
Marinov, G. K. et al. From single-cell to cell-pool transcriptomes: stochasticity in gene expression and RNA splicing. Genome Res. 24, 496–510 (2014).
Article CAS PubMed PubMed Central Google Scholar
Li, H. et al. Reference component analysis of single-cell transcriptomes elucidates cellular heterogeneity in human colorectal tumors. Nat. Genet. 49, 708–718 (2017).
Article CAS PubMed Google Scholar
Müller, S., Cho, A., Liu, S. J., Lim, D. A. & Diaz, A. CONICS integrates scRNA-seq with DNA sequencing to map gene expression to tumor sub-clones. Bioinformatics 34, 3217–3219 (2018).
Article PubMed PubMed Central Google Scholar
Fan, J. et al. Linking transcriptional and genetic tumor heterogeneity through allele analysis of single-cell RNA-seq data. Genome Res. 28, 1217–1227 (2018).
Article CAS PubMed PubMed Central Google Scholar
Patel, A. P. et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 344, 1396–1401 (2014).
Article CAS PubMed PubMed Central Google Scholar
McClintock, B. The stability of broken ends of chromosomes in Zea mays. Genetics 26, 234–282 (1941).
Article CAS PubMed PubMed Central Google Scholar
Kuleshov, M. V. et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 44, W90–W97 (2016).
Article CAS PubMed PubMed Central Google Scholar
Yang, X. et al. Discovery of the first chemical tools to regulate MKK3-mediated MYC activation in cancer. Bioorg. Med. Chem. 45, 116324 (2021).
Article CAS PubMed PubMed Central Google Scholar
Byrska-Bishop, M. et al. High coverage whole genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. Cell 185, 3426–3440.e19 (2022).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Y. & Rowley, J. D. Chromatin structural elements and chromosomal translocations in leukemia. DNA Repair 5, 1282–1297 (2006).
Article CAS PubMed Google Scholar
Erickson, P. et al. Identification of breakpoints in t(8;21) acute myelogenous leukemia and isolation of a fusion transcript, AML1/ETO, with similarity to Drosophila segmentation gene, runt. Blood 80, 1825–1831 (1992).
Article CAS PubMed Google Scholar
Xiao, Z. et al. Molecular characterization of genomic AML1-ETO fusions in childhood leukemia. Leukemia 15, 1906–1913 (2001).
Article CAS PubMed Google Scholar
Mifsud, B. et al. Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C. Nat. Genet. 47, 598–606 (2015).
Article CAS PubMed Google Scholar
Gutierrez, A. et al. The BCL11B tumor suppressor is mutated across the major molecular subtypes of T-cell acute lymphoblastic leukemia. Blood 118, 4169–4173 (2011).
Article CAS PubMed PubMed Central Google Scholar
Döhner, H. et al. Genomic aberrations and survival in chronic lymphocytic leukemia. N. Engl. J. Med. 343, 1910–1916 (2000).
Article PubMed Google Scholar
Dentro, S. C. et al. Characterizing genetic intra-tumor heterogeneity across 2,658 human cancer genomes. Cell 184, 2239–2254.e39 (2021).
Article CAS PubMed PubMed Central Google Scholar
Hewett, D. R. et al. FRA10B structure reveals common elements in repeat expansion and chromosomal fragile site genesis. Mol. Cell 1, 773–781 (1998).
Article CAS PubMed Google Scholar
Edelmann, J. et al. High-resolution genomic profiling of chronic lymphocytic leukemia reveals new recurrent genomic alterations. Blood 120, 4783–4794 (2012).
Article CAS PubMed Google Scholar
Puente, X. S. et al. Non-coding recurrent mutations in chronic lymphocytic leukaemia. Nature 526, 519–524 (2015).
Article CAS PubMed Google Scholar
Malek, S. N. The biology and clinical significance of acquired genomic copy number aberrations and recurrent gene mutations in chronic lymphocytic leukemia. Oncogene 32, 2805–2817 (2013).
Article CAS PubMed Google Scholar
Liberzon, A. et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 1739–1740 (2011).
Article CAS PubMed PubMed Central Google Scholar
Boon, E. M. J., van der Neut, R., van de Wetering, M., Clevers, H. & Pals, S. T. Wnt signaling regulates expression of the receptor tyrosine kinase met in colorectal cancer. Cancer Res. 62, 5126–5128 (2002).
CAS PubMed Google Scholar
Stoeckius, M. et al. Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods 14, 865–868 (2017).
Article CAS PubMed PubMed Central Google Scholar
Borcherding, N. et al. Mapping the immune environment in clear cell renal carcinoma by single-cell genomics. Commun. Biol. 4, 122 (2021).
Article CAS PubMed PubMed Central Google Scholar
Garcia-Alonso, L., Holland, C. H., Ibrahim, M. M., Turei, D. & Saez-Rodriguez, J. Benchmark and integration of resources for the estimation of human transcription factor activities. Genome Res. 29, 1363–1375 (2019).
Article CAS PubMed PubMed Central Google Scholar
Kamburov, A. & Herwig, R. ConsensusPathDB 2022: molecular interactions update as a resource for network biology. Nucleic Acids Res. 50, D587–D595 (2022).
Article CAS PubMed Google Scholar
Böttcher, M. et al. Control of PD-L1 expression in CLL-cells by stromal triggering of the Notch-c-Myc-EZH2 oncogenic signaling axis. J. Immunother. Cancer 9, e001889 (2021).
Article PubMed PubMed Central Google Scholar
Wang, Y. et al. Distinct immune signatures in chronic lymphocytic leukemia and Richter syndrome. Blood Cancer J. 11, 86 (2021).
Article PubMed PubMed Central Google Scholar
Fromigué, O., Haÿ, E., Barbara, A. & Marie, P. J. Essential role of nuclear factor of activated T cells (NFAT)-mediated Wnt signaling in osteoblast differentiation induced by strontium ranelate. J. Biol. Chem. 285, 25251–25258 (2010).
Article PubMed PubMed Central Google Scholar
Moon, J. B. et al. Akt induces osteoclast differentiation through regulating the GSK3β/NFATc1 signaling cascade. J. Immunol. 188, 163–169 (2012).
Article CAS PubMed Google Scholar
Nurieva, R. I. et al. A costimulation-initiated signaling pathway regulates NFATc1 transcription in T lymphocytes. J. Immunol. 179, 1096–1103 (2007).
Article CAS PubMed Google Scholar
Park, H.-J., Baek, K., Baek, J.-H. & Kim, H.-R. The cooperation of CREB and NFAT is required for PTHrP-induced RANKL expression in mouse osteoblastic cells. J. Cell. Physiol. 230, 667–679 (2015).
Article CAS PubMed Google Scholar
Li, L. et al. B-cell receptor-mediated NFATc1 activation induces IL-10/STAT3/PD-L1 signaling in diffuse large B-cell lymphoma. Blood 132, 1805–1817 (2018).
Article CAS PubMed PubMed Central Google Scholar
Oestreich, K. J., Yoon, H., Ahmed, R. & Boss, J. M. NFATc1 regulates PD-1 expression upon T cell activation. J. Immunol. 181, 4832–4839 (2008).
Article CAS PubMed Google Scholar
Staal, F. J. T., Famili, F., Garcia Perez, L. & Pike-Overzet, K. Aberrant Wnt signaling in leukemia. Cancers (Basel) 8, 78 (2016).
Article PubMed Google Scholar
Korbel, J. O. & Campbell, P. J. Criteria for inference of chromothripsis in cancer genomes. Cell 152, 1226–1236 (2013).
Article CAS PubMed Google Scholar
Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).
Article CAS PubMed PubMed Central Google Scholar
Andreatta, M. & Carmona, S. J. UCell: robust and scalable single-cell gene signature scoring. Comput. Struct. Biotechnol. J. 19, 3796–3798 (2021).
Article CAS PubMed PubMed Central Google Scholar
Street, K. et al. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics 19, 477 (2018).
Article PubMed PubMed Central Google Scholar
Majumder, S. et al. Targeting notch in oncology: the path forward. Nat. Rev. Drug Discov. 20, 125–144 (2021).
Article CAS PubMed Google Scholar
Study of CB-103 in adult patients with advanced or metastatic solid tumours and haematological malignancies. https://clinicaltrials.gov/ct2/show/NCT03422679 (2017).
Lehal, R. et al. Pharmacological disruption of the Notch transcription factor complex. Proc. Natl Acad. Sci. USA 117, 16292–16301 (2020).
Article CAS PubMed PubMed Central Google Scholar
Drews, R. M. et al. A pan-cancer compendium of chromosomal instability. Nature 606, 976–983 (2022).
Article CAS PubMed PubMed Central Google Scholar
Edelmann, J. et al. Genomic alterations in high-risk chronic lymphocytic leukemia frequently affect cell cycle key regulators and NOTCH1-regulated transcription. Haematologica 105, 1379–1390 (2020).
Article CAS PubMed PubMed Central Google Scholar
Loh, P.-R. et al. Insights into clonal haematopoiesis from 8,342 mosaic chromosomal alterations. Nature 559, 350–355 (2018).
Article CAS PubMed PubMed Central Google Scholar
Zhu, C., Preissl, S. & Ren, B. Single-cell multimodal omics: the power of many. Nat. Methods 17, 11–14 (2020).
Article CAS PubMed Google Scholar
Forsberg, L. A., Gisselsson, D. & Dumanski, J. P. Mosaicism in health and disease - clones picking up speed. Nat. Rev. Genet. 18, 128–142 (2017).
Article CAS PubMed Google Scholar
Leibowitz, M. L. et al. Chromothripsis as an on-target consequence of CRISPR-Cas9 genome editing. Nat. Genet. 53, 895–905 (2021).
Article CAS PubMed PubMed Central Google Scholar
Gaffney, D. J. et al. Controls of nucleosome positioning in the human genome. PLoS Genet. 8, e1003036 (2012).
Article CAS PubMed PubMed Central Google Scholar
Fishilevich, S. et al. GeneHancer: genome-wide integration of enhancers and target genes in GeneCards. Database (Oxford) 2017, bax028 (2017).
Article PubMed Google Scholar
Porubsky, D. et al. Recurrent inversion polymorphisms in humans associate with genetic instability and genomic disorders. Cell 185, 1986–2005.e26 (2022).
Article CAS PubMed PubMed Central Google Scholar
Dietrich, S. et al. Drug-perturbation-based stratification of blood cancer. J. Clin. Invest. 128, 427–445 (2018).
Article PubMed Google Scholar
Falconer, E. et al. DNA template strand sequencing of single-cells maps genomic rearrangements at high resolution. Nat. Methods 9, 1107–1112 (2012).
Article CAS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Y., Parmigiani, G. & Johnson, W. E. ComBat-seq: batch effect adjustment for RNA-seq count data. NAR Genom. Bioinform. 2, lqaa078 (2020).
Article PubMed PubMed Central Google Scholar
Boulesteix, A.-L. & Strimmer, K. Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Brief. Bioinform. 8, 32–44 (2007).
Article CAS PubMed Google Scholar
Ramírez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, W160–W165 (2016).
Article PubMed PubMed Central Google Scholar
Ulz, P. et al. Inferring expressed genes by whole-genome sequencing of plasma DNA. Nat. Genet. 48, 1273–1278 (2016).
Article CAS PubMed Google Scholar
Eraslan, G., Avsec, Ž., Gagneur, J. & Theis, F. J. Deep learning: new computational modelling techniques for genomics. Nat. Rev. Genet. 20, 389–403 (2019).
Article CAS PubMed Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Article PubMed PubMed Central Google Scholar
Schmitz, M. et al. Xenografts of highly resistant leukemia recapitulate the clonal composition of the leukemogenic compartment. Blood 118, 1854–1864 (2011).
Article CAS PubMed Google Scholar
Hafemeister, C. & Satija, R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol. 20, 296 (2019).
Article CAS PubMed PubMed Central Google Scholar
Cao, J. et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature 566, 496–502 (2019).
Article CAS PubMed PubMed Central Google Scholar
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, e21 (2019).
Article Google Scholar
Roider, T. et al. Dissecting intratumour heterogeneity of nodal B-cell lymphomas at the transcriptional, genetic and drug-response levels. Nat. Cell Biol. 22, 896–906 (2020).
Article CAS PubMed Google Scholar
Park, J.-E. et al. A cell atlas of human thymic development defines T cell repertoire formation. Science 367, eaay3224 (2020).
Article CAS PubMed PubMed Central Google Scholar
Schep, A. N., Wu, B., Buenrostro, J. D. & Greenleaf, W. J. chromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data. Nat. Methods 14, 975–978 (2017).
Article CAS PubMed PubMed Central Google Scholar
Weirauch, M. T. et al. Determination and inference of eukaryotic transcription factor sequence specificity. Cell 158, 1431–1443 (2014).
Article CAS PubMed PubMed Central Google Scholar
Nagel, S. et al. Activation of TLX3 and NKX2-5 in t(5;14)(q35;q32) T-cell acute lymphoblastic leukemia by remote 3’-BCL11B enhancers and coregulation by PU.1 and HMGA1. Cancer Res. 67, 1461–1471 (2007).
Article CAS PubMed Google Scholar
Xaus, J. et al. The expression of MHC class II genes in macrophages is cell cycle dependent. J. Immunol. 165, 6364–6371 (2000).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank A. Krebs, J. Zaugg, K. Rippe and I. Cortés-Ciriano for providing thoughtful feedback on the development of scNOVA. We also thank M. Paulsen (Flow Cytometry Core Facility) for assistance in cell sorting, B. Raeder for assisting in Strand-seq library preparation, and the EMBL Genomics Core Facility for assisting in single-cell automation (J. Zimmermann and V. Benes) and scRNA-seq library preparation (L. Villacorta). Finally, we thank W. Höps for assistance with single-cell analysis, as well as M. Happich and P. Richter-Pechanska for assistance with RNA-seq analysis. Principal funding came from the European Research Council (ERC Consolidator grant no. 773026, to J.O.K.). Funding also came from the an ERC Starting Grant (grant number 336045) to J.O.K., the National Institutes of Health (grant no. 2U24HG007497-05) to J.O.K. and T.M., the Baden-Württemberg Stiftung (for supporting the projects ‘Epigenetics in T-ALL’ and ‘SV_Surveillance’) to J.O.K. and A.E.K., a Volkswagen Foundation grant (VW - 95826) to J.O.K. and the German Federal Ministry of Education and Research (grant no. 031A537B; de.NBI project) to J.O.K. H.J. and A.D.S. acknowledge fellowships through the Alexander von Humboldt Foundation. We thank the Human Genome Structural Variation Consortium for providing early access to deep bulk RNA-seq data from several LCLs (generated using funds provided by NHGRI Grant 2U24HG007497-05). D.N. is an endowed Professor of the German José-Carreras-Foundation (DJCLSH03/01). J.C.J. was funded by a Gerok position of the ‘Deutsche Forschungsgemeinschaft’ (DFG) (NO 817/5-2, FOR2033, NICHEM). K.K.R. received postdoctoral funding from the Deutsche Krebshilfe (Mildred-Scheel-Fellowship).

Author information

Hyobin Jeong
Present address: Hanyang Institute of Bioscience and Biotechnology, Hanyang University, Seoul, Republic of Korea
David Porubsky
Present address: Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
These authors contributed equally: Hyobin Jeong, Karen Grimes
These authors jointly supervised this work: Ashley D. Sanders, Jan O. Korbel.

Authors and Affiliations

Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
Hyobin Jeong, Karen Grimes, Tobias Rausch, Patrick Hasenfeld, Eva Benito, Tobias Roider, Ashley D. Sanders & Jan O. Korbel
Faculty of Biosciences, EMBL and Heidelberg University, Heidelberg, Germany
Karen Grimes
Division of Pediatric Oncology, University Children’s Hospital, Zürich, Switzerland
Kerstin K. Rauwolf, Jean-Pierre Bourquin & Beat Bornhauser
Department of Hematology, Oncology and Rheumatology, Heidelberg University Hospital, Heidelberg, Germany
Peter-Martin Bruch, Tobias Roider, Sophie A. Herbst & Sascha Dietrich
Molecular Medicine Partnership Unit, European Molecular Biology Laboratory, University of Heidelberg, Heidelberg, Germany
Peter-Martin Bruch, Tobias Rausch, Tobias Roider, Sophie A. Herbst, Büşra Erarslan-Uysal, Andreas E. Kulozik, Sascha Dietrich & Jan O. Korbel
Department of Hematology and Oncology, University Hospital Düsseldorf, Düsseldorf, Germany
Peter-Martin Bruch & Sascha Dietrich
National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bangalore, India
Radhakrishnan Sabarinathan
Center for Bioinformatics, Saarland University, Saarbrücken, Germany
David Porubsky
Max Planck Institute for Informatics, Saarbrücken, Germany
David Porubsky
Department of Pediatric Oncology, Hematology, and Immunology, University of Heidelberg and Hopp Children’s Cancer Center, Heidelberg, Germany
Büşra Erarslan-Uysal & Andreas E. Kulozik
Department of Hematology and Oncology, Medical Faculty Mannheim of the Heidelberg University, Heidelberg, Germany
Johann-Christoph Jann & Daniel Nowak
Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
Tobias Marschall
Department of Translational Medical Oncology, National Center for Tumor Diseases (NCT) Heidelberg and German Cancer Research Center (DKFZ), Heidelberg, Germany
Sascha Dietrich
Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
Ashley D. Sanders
Berlin Institute of Health (BIH), Berlin, Germany
Ashley D. Sanders
Charité-Universitätsmedizin, Berlin, Germany
Ashley D. Sanders
Bridging Research Division on Mechanisms of Genomic Variation and Data Science, German Cancer Research Center (DKFZ), Heidelberg, Germany
Jan O. Korbel

Authors

Hyobin Jeong
View author publications
You can also search for this author in PubMed Google Scholar
Karen Grimes
View author publications
You can also search for this author in PubMed Google Scholar
Kerstin K. Rauwolf
View author publications
You can also search for this author in PubMed Google Scholar
Peter-Martin Bruch
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Rausch
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Hasenfeld
View author publications
You can also search for this author in PubMed Google Scholar
Eva Benito
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Roider
View author publications
You can also search for this author in PubMed Google Scholar
Radhakrishnan Sabarinathan
View author publications
You can also search for this author in PubMed Google Scholar
David Porubsky
View author publications
You can also search for this author in PubMed Google Scholar
Sophie A. Herbst
View author publications
You can also search for this author in PubMed Google Scholar
Büşra Erarslan-Uysal
View author publications
You can also search for this author in PubMed Google Scholar
Johann-Christoph Jann
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Marschall
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Nowak
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Pierre Bourquin
View author publications
You can also search for this author in PubMed Google Scholar
Andreas E. Kulozik
View author publications
You can also search for this author in PubMed Google Scholar
Sascha Dietrich
View author publications
You can also search for this author in PubMed Google Scholar
Beat Bornhauser
View author publications
You can also search for this author in PubMed Google Scholar
Ashley D. Sanders
View author publications
You can also search for this author in PubMed Google Scholar
Jan O. Korbel
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

H.J., K.G., A.D.S. and J.O.K. designed the study (including conceptualization of haplotype-specific NO analysis, cell-type classification and altered gene activity using Strand-seq data). H.J., K.G., A.D.S. and J.O.K. developed the scNOVA computational method. H.J., K.G., A.D.S. and J.O.K. performed single-cell SV discovery. A.D.S. and P.H. performed LCL Strand-seq experiments; K.G., P.-M.B. and S.D. performed CLL Strand-seq experiments; K.G., J.-C.J. and D.N. performed AML Strand-seq experiments and K.G., K.K.R. and P.H. performed T-ALL Strand-seq experiments. T.R. carried out WGS-based SV discovery and verification. H.J., D.P. and T.M. performed haplotype-phasing. LCL scRNA-seq analysis was carried out by H.J.; CLL scRNA-seq analysis by K.G., H.J. and T.R. and T-ALL scRNA-seq analysis by K.G., H.J. and K.K.R. K.K.R., K.G. and B.B. performed drug treatment experiments. K.G., P.H. and E.B. analyzed LCL clonal expansion. LCL RNA-seq analysis was carried out by H.J.; CLL RNA-seq analysis by H.J., S.H., P.-M.B. and S.D. and T-ALL RNA-seq analysis by H.J., B.E.-U. and A.E.K. R.S. and J.O.K. performed PCAWG SV driver spectrum analysis. The manuscript was written by H.J., K.G., A.D.S. and J.O.K., with additional contributions from all authors.

Corresponding authors

Correspondence to Ashley D. Sanders or Jan O. Korbel.

Ethics declarations

Competing interests

The following authors have previously disclosed a patent application (no. EP19169090) that is relevant to this manuscript: A.D.S., J.O.K., T.M. and D.P. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Biotechnology thanks Jonas Demeulemeester, Elisa Oricchio, Peter Park and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Overview of components of the scNOVA computational workflow.

scNOVA employs single cell tri-channel processing (scTRIP) as realized in the MosaiCatcher pipeline to perform haplotype-aware somatic SV discovery²⁴. Modules of scNOVA enable single-cell mulitomics of these somatic SVs, including inference of haplotype-specific NO to investigate local (cis) effect of SVs, and inference of altered gene/pathway activity to investigate global (trans) effect of SVs detectable between geneticlaly distinct subclones. To infer alterations in gene activity, scNOVA integrates deep convolutional neural network (CNN) based machine learning, and negative binomial generalized linear models. The framework dissects intra-sample genetic heterogeneity at single-cell resolution, measures the local haplotype-specific impact of somatic SVs, can be used to explore global gene dysregulation in SV-containing cells, can discriminate between genetically-distinct subclones, and can uncover shared functional consequences of heterogeneous SVs affecting the same chromosomal interval.

Extended Data Fig. 2 Read depth of Strand-seq and MNase-seq data stratified into 15 chromatin states defined by Roadmap epigenome consortium 33.

15 chromatin states based on the NA12878 cell line were utilized in this genome-wide analysis. Plots generated represent Strand-seq data from NA12878 (n = 95 cells) (a), and publicly available MNase-seq from NA12878, NA19193, and NA19238 (n = 1 sample each) (b-d). The bulk MNase-seq experiment of NA12878 was pursued using single-end SOLID sequencing reads, and that of NA19193 and NA19238 was done using paired-end Illumina reads. The X-axis in the box plot indicates reads per kilobase per million (RPKM) measured for each genomic segment annotated by one of the 15 chromatin states. Abbreviations for chromatin states³³ are: TssA-Active TSS, TssAFlnk-Flanking Active TSS, TxFlnk - Transcription at gene 5’and 3’, Tx - Strong transcription, TxWk - Weak transcription, EnhG - Genic enhancers, Enh - Enhancers, ZNV/Rpts - ZNF genes & repeats, Het - Heterochromatin, TssBiv - Bivalent/Poised TSS, BivFlnk - Flanking Bivalent TSS/Enh, EnhBiv - Bivalent Enhancer, ReprPC - Repressed PolyComb, ReprPCWk - Weak Repressed PolyComb, Quies - Quiescent/Low. Boxplots were defined by minima = 25th percentile - 1.5X interquartile range (IQR), maxima = 75th percentile + 1.5X IQR, center = median, and bounds of box = 25th and 75th percentile. Both Strand-seq and MNase-seq assays measured NO in all fifteen chromatin states. Among these chromatin states, Strand-seq and MNase-seq revealed the highest NO signals on average for the polycomb repressed state and the bivalent enhancer state; whereas the lowest average NO signals were consistently seen for the active transcription start site (TSS) state.

Extended Data Fig. 3 Utility of NO for cell-typing.

(a) Cell-typing based on NO at gene bodies (AUC = 1). Epi1: RPE-1 replicate 1 (79 cells); Epi2: replicate 2 (77 cells); LCL1: HG01573 (46 cells); LCL2: HG02018 (50 cells), LCL3: NA19036 (50 cells); LV: latent variable. (b) UMAP visualization of Strand-seq libraries based on NO at gene-bodies (normalized by segmental ploidy status²⁴). (c) We also explored dimensionality reduction of Strand-seq libraries based on DNA motif accessibility. Using the chromVAR package¹¹⁰, single-cell NO profiles for 2 kb DNase I hypersensitive sites (DHSs) were transformed into a deviation Z-score, which measures how likely a certain motif accessibility would occur when randomly sampling sets of peaks with similar GC content and read depth. For each single-cell, the deviation Z-score was calculated for 870 human TF motifs from the cisBP database¹¹¹. These dimensionality reduction plots suggest that batch effect within the same cell type (three individuals in LCL, and two batches in RPE-1 sequenced separately) is minimal, and far less than the cell-type dependent variability. (d) UMAP using scMNase-seq²⁶, including 45 NIH3T3 cells and 272 murine naive T cells, based on NO at the gene-bodies. (e) UMAP of RPE-1 (the originally commercially available cell line) and its transformed derived³⁷ cell lines (BM510 and C7). Two biological replicates were sequenced for each cell line. (f) Receiver operating characteristic (ROC) using the PLS-DA based classifier. AUC for classifying each cell line was 0.9614, 0.9694, and 0.9892 for RPE-1, BM510, and C7 respectively. (g-h) Cell-typing for LCL, RPE-1, skin fibroblast, AML, T-ALL, and umbilical cord blood cells (g), and ROC curve depicting classification performance (overall AUC = 0.998) (h). (i-j) Cell-typing in five RPE-1 derived cell lines³⁷ (RPE-1, BM510, C7, C29, and C11) (i), and ROC curve depiciting classification performance (overall AUC = 0.9648) (j).

Extended Data Fig. 4 In silico downsampling experiments.

We performed in silico cell mixing of RPE-1 and HG01573 cells to simulate application of scNOVA to different cell fractions (CFs). In this analysis six different CF ranges were considered (20, 10, 5, 3.3, 2, and 1.3). For each in silico cell mixing experiment, a total of 150 single cells were randomly subsampled for the major pseudo-clone (containing RPE-1 cells) and the minor pseudo-clone (HG01573 cells), by controlling the minor pseudo-clone CF at 20, 10, 5, 3.3, 2, and 1.3%, respectively. AUC, area under the curve. DEGs, differentially expressed genes. For each CF, we performed random subsampling of single-cell libraries 10 times, and depicted the respective mean AUC in the plot. Two different analysis modes - default (dashed lines, CNN with negative binomial generalized linear model), and alternative (solid lines, CNN with PLS-DA) are depicted. When the CF is larger than 10%, the default mode performs better, whereas for CFs smaller than 10%, the alternative mode outperforms the default mode.

Extended Data Fig. 5 Haplotype-specific NO analysis in RPE-1 and BM510.

(a, b) Haplotype-specific NO analysis of NO at gene bodies genome-wide in RPE-1 (a) and BM510 (b). For each chromosomal karyogram, the y-axis indicates the significance of haplotype-specific NO for each gene (-log10 p.adjust). All the significant genes were indicated in red dots (FDR 10%; two-sided wilcoxon rank sum test followed by Benjamini Hochberg multiple correction; derived from n = 33 cells and n = 79 cells for RPE-1 and BM510, respectively; Boxplots were defined by minima = 25th percentile - 1.5X interquartile range (IQR), maxima = 75th percentile + 1.5X IQR, center = median, and bounds of box = 25th and 75th percentile.). NTRK3 (identified in BM510) is the only significant gene adjacent to an SV breakpoint. Haplotype-resolved RNA expression at the NTRK3 locus is depicted using bar graphs in the right panel (two-sided likelihood ratio test followed by Benjamini Hochberg multiple correction; n = 2 biological replicates; Data are presented as mean values +/− SEM). (c-d) Haplotype-specific NO analysis at CREs. Browser track depicts the haplotype-resolved NO of the not rearranged (Ref) homolog in red, and the SV homolog in blue. scNOVA identified two CREs with significant haplotype-specific NO, including an intergenic CRE spanning chr15:87527100-87528100 (p.adjust = 0.029, log2-fold change = −2.01) (c) and an intronic CRE at chr15:88246388-88247388 (p.adjust = 0.076, log2-fold change = −1.39) (d).

Extended Data Fig. 6 Haplotype-specific NO analysis in T-ALL_P1.

(a) For each chromosomal karyogram, the y-axis indicates the significance of haplotype-specific NO at each gene (-log10 p.adjust). Genes with haplotype-specific NO are indicated using red dots (FDR 10%). An inlet figure depicts haplotype-specific NO (two-sided wilcoxon rank sum test and Benjamini Hochberg multiple correction; n = 56 cells) and RNA expression at the BCL11B gene locus (two-sided likelihood ratio test and Benjamini Hochberg multiple correction; n = 2 biological replicates), which has a nearby somatic SV (within 1 Megabase) and represents the (only) predicted local SV effect. (b) We did not measure haplotype-specific NO for TCL1A (two-sided wilcoxon rank sum test and Benjamini Hochberg multiple correction; n = 56 cells), a small gene with 4229 bp in size, in spite of its haplotype-specific gene expression²⁴ (two-sided likelihood ratio test and Benjamini Hochberg multiple correction; n = 2 biological replicates). Boxplots were defined by minima=25th percentile-1.5X interquartile range (IQR), maxima=75th percentile+1.5X IQR, center=median, and bounds of box=25th and 75th percentile. For bargraphs, data are presented as mean values +/− SEM (a-b). (c) Simulation analysis revealed a minimum gene length (7219 bp) needed to robustly detect haplotype-specific NO at gene bodies, a gene length met by 80% of genes in the genome (Supplementary Notes). (d) Inversion breakpoints and rearranged TADs. Known 3’ BCL11B enhancers¹¹² are depicted in orange. In the not rearranged haplotype, they are located proximal to BCL11B, but in the inverted haplotype these enhancers they are located far away from BCL11B, and proximal to TCL1A in the different TAD boundary. (e) Application of scNOVA identified an intergenic CRE near the BCL11B with haplotype-specific NO. The browser track depicts the haplotype-resolved NO of the not rearranged (Ref) homolog in red and the SV homolog in blue. (f) The known 3’ BCL11B enhancer does not show significant haplotype-specific NO, but the inversion physically relocates these enhancers to the far distance from the BCL11B. A representative CRE is shown amongst four CREs overlapping with known 3’ BCL11B enhancers.

Extended Data Fig. 7 Inference of SCNAs using CITE-seq data from the CLL_24 sample.

(a) InferCNV⁴⁸ analysis of 3,919 high quality CLL cells, and 540 control cells (cells sequenced by CITE-seq not originating from the B-cell lineage; see Supplementary Fig. 25), profiled by CITE-seq. This analysis did not discover any subclones in CLL_24. (Note that the high variability observed on the 6p-arm, not only seen in CLL cells but also in control cells, likely arose from the presence of MHC genes in this locus, whose expression is cell cycle dependent¹¹³.) (b) CONICSmat based targeted SCNA recalling of the 10q-terDel (previously discovered in SCb; see Fig. 4b) using the high-resolution breakpoints derived from Strand-seq. Use of these SV breakpoints allowed CONICSmat to confidently call the 10q-terDel in 82 single cells from the CITE-seq data.

Supplementary information

Supplementary Information

Supplementary Figs. 1–41, descriptions for Tables 1–17, notes for methodological details and Discussion.

Reporting Summary

Supplementary Table

Supplementary Tables 1–17.

Supplementary Data

Snapshots of somatic SV events in LCLs.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Jeong, H., Grimes, K., Rauwolf, K.K. et al. Functional analysis of structural variants in single cells using Strand-seq. Nat Biotechnol 41, 832–844 (2023). https://doi.org/10.1038/s41587-022-01551-4

Download citation

Received: 29 October 2021
Accepted: 07 October 2022
Published: 24 November 2022
Issue Date: June 2023
DOI: https://doi.org/10.1038/s41587-022-01551-4

This article is cited by

Scrambling the genome in cancer: causes and consequences of complex chromosome rearrangements
- Ksenia Krupina
- Alexander Goginashvili
- Don W. Cleveland
Nature Reviews Genetics (2024)
Long-read whole-genome analysis of human single cells
- Joanna Hård
- Jeff E. Mold
- Adam Ameur
Nature Communications (2023)
Unintended CRISPR-Cas9 editing outcomes: a review of the detection and prevalence of structural variants generated by gene-editing in human cells
- John Murray Topp Hunt
- Christopher Allan Samson
- Hilary M. Sheppard
Human Genetics (2023)