Detection of early seeding of Richter transformation in chronic lymphocytic leukemia

Richter transformation (RT) is a paradigmatic evolution of chronic lymphocytic leukemia (CLL) into a very aggressive large B cell lymphoma conferring a dismal prognosis. The mechanisms driving RT remain largely unknown. We characterized the whole genome, epigenome and transcriptome, combined with single-cell DNA/RNA-sequencing analyses and functional experiments, of 19 cases of CLL developing RT. Studying 54 longitudinal samples covering up to 19 years of disease course, we uncovered minute subclones carrying genomic, immunogenetic and transcriptomic features of RT cells already at CLL diagnosis, which were dormant for up to 19 years before transformation. We also identified new driver alterations, discovered a new mutational signature (SBS-RT), recognized an oxidative phosphorylation (OXPHOS)high–B cell receptor (BCR)low-signaling transcriptional axis in RT and showed that OXPHOS inhibition reduces the proliferation of RT cells. These findings demonstrate the early seeding of subclones driving advanced stages of cancer evolution and uncover potential therapeutic targets for RT.

C lonal evolution 1 drives cancer initiation, progression and relapse due to the stepwise acquisition and/or selection of fitter subclones 2,3 . The understanding of tumor evolution is hampered by the analysis of bulk tumor cell populations at low resolution and at single or limited time points of the disease course in most studies 4 . A better knowledge of this process might translate into anticipation-based treatment strategies 5 . RT in CLL represents a paradigmatic model of cancer evolution occurring rarely in treatment-naive patients with CLL but found in 4-20% of patients after chemoimmunotherapy (CIT) and targeted therapies 6 . RT sometimes occurs within the first months after treatment initiation [7][8][9] , suggesting selection of pre-existing subclones 10 . Nonetheless, the genomic/epigenomic mechanisms driving RT after CIT [11][12][13][14][15][16][17] or targeted agents [18][19][20][21] are not well known. The aims of the present study were to reconstruct the evolutionary history of RT and to reveal the molecular processes underlying this transformation. plasmablastic lymphoma (RT-PBL; n = 1) or prolymphocytic leukemia (RT-PLL; n = 1). Nontumor samples were available in 12 patients. RT occurred simultaneously with CLL at diagnosis (n = 3) or after up to 19 years following different lines of treatment with CIT (n = 6) and targeted therapies (n = 10; BCR inhibitors, ibrutinib n = 6; duvelisib n = 2; idelalisib n = 1; and BCL2 inhibitor, venetoclax n = 1). All instances of RT were clonally related to CLL, 15 tumors had unmutated IGHV (U-CLL) and 4 had mutated IGHV (M-CLL). Whole-genome sequencing (WGS) data were integrated with bulk epigenetic and transcriptomic analyses as well as single-cell DNA and RNA sequencing (Fig. 1a, Extended Data Fig. 1 and Supplementary Tables 1 and 2).
The WGS and epigenome of CLL and RT revealed a concordant increased complexity from CLL diagnosis to relapse and RT (Fig. 1b, Extended Data Fig. 2a and Supplementary Tables 3-8). The RT genomes carried a median of 1.8 mutations per megabase, 18 copy number alterations (CNAs) and 37 structural variants (SVs) that contrasted with 1.1 mutations per megabase, 4 CNAs and 5 SVs observed at CLL diagnosis. No major differences were seen among RT occurring after different therapies ( Fig. 1b and Extended Data Fig. 2b). We discovered new driver genes and mechanisms in RT, expanding previous observations [12][13][14][15][16][17][18][21][22][23][24]  The main alterations involved cell-cycle regulators (17 of 19, 89%), chromatin modifiers (79%), MYC (74%), nuclear factor (NF)-κB (74%) and NOTCH (32%) pathways. These aberrations were simultaneously present in most cases but alterations in MYC and NOTCH pathways only co-occurred in 2 of 19 cases (Fig. 1c). Aberrations in genes such as TP53, NOTCH1, BIRC3, EGR2 and NFKBIE were usually present and clonally dominant after the first CLL sample, whereas others were only detected at RT or during the disease course (for example CDKN2A/B, CDKN1A/B, ARID1A, CREBBP, TRAF3 and TNFAIP3) (Fig. 1c). New alterations included deletions of CDKN1A and CDKN1B in five cases of RT associated with downregulation of their expression, one immunoglobulin (IG)-CDK6 translocation and one CCND2 mutation already present at CLL diagnosis, and CCND3-IG and MYCN-IG translocations acquired at RT in two different cases (Fig. 1d,e, Extended Data Fig. 3a,b and Supplementary Table 11). Most chromatin remodelers were affected by deletions with reduced gene expression. New alterations in this group were deletions of ARID4B and truncations of CREBBP 25 and SMARCA4 (ref. 16 ) by translocations and chromoplexy ( Fig. 1f and Extended Data Fig. 3c-e). We also identified recurrent IRF4 alterations in RT, which have been linked to increased MYC levels in CLL 26 . BTK/PLCG2 or BCL2 mutations were not detected in any RT after treatment with BCR or BCL2 inhibitors, respectively. Notably, the two cases of M-CLL developing RT after targeted therapies carried the IGLV3-21 R110 mutation, which triggers cell-autonomous BCR signaling 27 (Fig. 1c).
In addition to the high frequency of CNAs previously identified in RT 11,14 , we observed a high number of complex structural alterations (Fig. 1c). Chromothripsis was found in eight RT tumors targeting CDKN2A/B and the new CDKN1B in five and one cases, respectively, and MYC, MGA, SPEN, TNFAIP3 and chromatin remodeling genes in additional cases ( Fig. 1g and Extended Data Fig. 3f-j).
Altogether, our analyses expand the catalog of driver genes, pathways and mechanisms involved in RT and recognize a similar distribution of these alterations in RT after different therapies, suggesting that treatment-specific pressure is not a major determinant of the driver genomic landscape of these tumors.

New mutational processes in RT.
To understand the increased mutational burden of RT, we explored the mutational processes re-shaping the genome of CLL and RT. An unsupervised analysis showed that the mutational profile of RT was notably different from M-CLL and U-CLL before therapy (ICGC-CLL cohort, n = 147) 28 or at post-treatment relapse (independent cohort of 27 CLL post-treatment samples) (Fig. 2a). We identified 11 mutational signatures distributed genome-wide and 2 in clustered mutations (Extended Data Fig. 4 and Supplementary . Among the former, we extracted a new signature characterized by (T>A)A and, to a lesser extent, (T>C/G)A mutations not recognized previously in any cancer type, including CLL and DLBCL [28][29][30][31][32][33] . We named this single-base substitution signature, SBS-RT (Fig. 2b). SBS-RT was present in the RT sample of 7 of 18 patients, 1 of 6 after CIT and 6 of 10 after multiple therapies, including targeted agents and detected in all subtypes of transformation (RT-DLBCL, RT-PBL and RT-PLL) ( Fig. 2c and Supplementary Table 15). It was also present in CLL samples before RT in patients 12 and 3,299 but was not identified in the reanalysis of our ICGC-CLL or post-treatment CLL cohorts. None of the patients in these two additional cohorts had evidence of RT (median follow-up 9.  Table 16).
Among the remaining ten genome-wide signatures, five were previously identified in CLL and DLBCL (SBS1 and SBS5 (clock-like), SBS8 (unknown etiology), SBS9 (attributed to polymerase eta) and SBS18 (possibly damage by reactive oxygen species)); three had been only found in DLBCL (SBS2 and SBS13 (APOBEC enzymes) and SBS17b (unknown)); and two have been recently described related to treatments with melphalan 34 or ganciclovir 35 , which were named here as SBS-melphalan and SBS-ganciclovir, respectively (Fig. 2b,c and Extended Data Fig. 4). SBS-melphalan was found in three RT cases, two had received melphalan as a conditioning of their allogenic stem-cell transplant 1.9 and 4.2 years before RT, respectively. SBS-ganciclovir was found in the RT sample of one patient that had received valganciclovir (prodrug of ganciclovir) due to cytomegalovirus reactivation (Fig. 2c,d and Extended Data Fig. 1a). Notably, all cases with the new SBS-RT at time of RT had been treated with the alkylating agents bendamustine (n = 5) or chlorambucil (n = 2) during their CLL history at a median of 2.9 years (range 0.7 to 6.8) before RT. Contrarily, RT cases lacking the SBS-RT had never received these drugs (Fig. 2c,d and Extended Data Fig. 1a).
RT subclones also acquired kataegis, mainly within the immunoglobulin loci, attributed to activation-induced cytidine deaminase (AID) activity (SBS84 and SBS85) 29,32 ( Fig. 2i and Extended Data Fig. 4). These kataegis led to the acquisition of mutations in the rearranged V(D)J gene in five RT cases (one after CIT and four targeted therapies) (Fig. 2i, Extended Data Fig. 5i,j and Supplementary Table  19). This canonical AID activity in RT is concordant with the acquisition of SBS9 mutations in two RT samples (4,686 (CIT) and 3,495 (targeted therapies)) and SVs mediated by aberrant class-switch recombination or somatic hypermutation in six RT (one before therapy, two CIT and three new agents), which targeted MYC, MYCN, TRAF3 and CCND3 ( Fig. 1c and Supplementary Table 2).
SBS-RT mutations were found in CLL samples before the transformation in patient 3,299 although it was only present in the RT subclone (Fig. 2c,e). SBS-RT was also found in two different subclones in case 12 and 19. We speculated that these secondary subclones with SBS-RT (named 'RT-like' subclones) could correspond to the single-cell expansion of a 'transformed' cell that could have been missed by the routine analysis (Fig. 2e). The reanalysis of flow cytometry data available for case 12 detected two cell populations at time point (T) 4 differing in size and surface markers (likely CLL and RT-like subclones), whereas at T5 we detected an additional population of large cells (RT subclone, 0.2% cells) that expanded at T6, substituting the previous large cell population (RT-like subclone) ( Fig. 2j and Extended Data Fig. 5k-m). WGS analysis showed that the RT-like and RT subclones diverged from a cell carrying a deletion of CDKN2A/B and truncation of CREBBP, each acquiring more than 2,100 specific mutations (Fig. 2e,j).
Altogether, these findings show that RT may arise simultaneously from different subclones and that such subclones can be detectable time before their final expansion and clinical manifestation. The identification of mutations in RT associated with early-in-time CLL therapies demonstrates that RT emerges from the clonal expansion of a single cell previously exposed to these therapies.

Dormant seeds of RT at CLL diagnosis.
The WGS-based subclonal phylogeny of the nine patients with fully characterized longitudinal samples predicted that the RT subclone was present at low cancer cell fraction (CCF) in the preceding CLL samples in five (56%) patients and only detected at time of transformation in the remaining four (44%) (Fig. 3a). Indeed, the RT subclone was detected at time of CLL diagnosis in three of five patients, remained stable at a minute size (<1%) for 6-19 years of natural and treatment-influenced CLL course and expanded at the moment of clinical manifestations (patients 12, 19 and 63) (Fig. 3a). In the other two patients, the RT subclone was also detected in the first CLL sample analyzed but rapidly expanded driving the RT 0.6 and 3.5 years later in patients 3,034 and 3,299 (RT-PLL), respectively ( Fig. 3a and Extended Data Fig. 6).
We next performed single-cell DNA sequencing (scDNA-seq) of 32 genes in 16 (Fig. 3b), the RT subclone (subclone 5) at transformation (T6) carried CDKN2A/B and TP53 (p.G245D) alterations, whereas the main CLL subclones driving the relapse after therapy at T4 and T5 harbored a different TP53 mutation (p.I195T; subclones 3 and 4). The WGS predicted the presence of all these subclones at CLL diagnosis (T1). Using scDNA-seq we identified two small populations accounting for 0.1% of cells carrying the TP53 p.I195T and p.G245D mutations, respectively, at T1, which were also detected at relapse 7.2 years later (T3). The subclone carrying TP53 p.I195T expanded to dominate the second relapse after 3.7 years at T4 and T5 but was substituted by the subclone carrying TP53 p.G245D at T6 in the RT 14.4 years after diagnosis. All these subclones carried the SF3B1 and NOTCH1 mutations of the initial CLL subclone ( Fig. 3c and Supplementary Table 20). The scDNA-seq of the three additional cases also corroborated the phylogenies and most of the dynamics inferred from WGS (Extended Data Fig. 6a). These results suggest that CLL evolution to RT is characterized by an early driver diversification probably generated before diagnosis, consistent with the early immunogenetic and DNA methylation diversification previously reported in CLL 37-39 and that RT may emerge by a selection of pre-existing subclones carrying potent driver mutations rather than a de novo acquisition of leading clones.
As we identified five cases of RT carrying specific mutations in the immunoglobulin genes by WGS (Fig. 2i), we analyzed whether these immunoglobulin-based RT subclones were already present at CLL diagnosis using high-coverage NGS in patients 12 and 3,495 (Supplementary Table 21). Focusing on patient 3,495, for which the lack of germline material precluded our phylogenetic analyses, the RT occurring after treatment with ibrutinib harbored two new V(D)J mutations generating an unproductive IGH gene. NGS identified 0.002% sequences carrying the same two mutations at CLL diagnosis 1.72 years before (Fig. 3d). We also observed the expansion of additional unproductive subclones accounting for 11.8% of all sequences at time of RT, suggesting that BCR-independent subclones may have a proliferative advantage under therapy with BCR inhibitors (Fig. 3d). Similar results were found in patient 12 in which the V(D)J sequence of RT carrying a new mutation was already identified at CLL diagnosis 19.5 years before at DNA and RNA level (Fig. 3e). As the immunogenetic features represent a faithful imprint of the B cell of origin, the early identification of the same immunogenetic subclone provides further evidence for an early seeding of RT.
We finally tracked RT subclones during the disease course using single-cell RNA sequencing (scRNA-seq) of 19 longitudinal samples of five patients (24,800 tumor cells passing filters, mean of 1,305 cells per sample; Fig. 1a and Supplementary Table 22). As expected, RT and CLL cells had remarkably different gene expression profiles ( Fig. 3f and Extended Data Fig. 7a-d). The transcriptome of CLL cells was dominated by three main clusters identified across patients and characterized by different expression of CXCR4, CD27 and MIR155HG, respectively, which may represent the recirculation of CLL cells between peripheral blood and lymph nodes 40-42 (Fig. 3f,g and Extended Data Fig. 7a-d). Contrarily, RT intraclonal heterogeneity was mainly related to distinct proliferative capacities with a cluster of cells showing high MKI67 and PCNA expression as well as high S and G2M cell-cycle phase scores. The remaining RT clusters were characterized by the expression of different marker genes among patients, including CCND2, MIR155HG and TP53INP1 (  Fig. 7a-i). The presence and dynamics of these RT subclones according to their transcriptomic profile recapitulated the findings obtained by WGS, scDNA-seq and immunoglobulin analyses in all five patients, suggesting that they captured the same cells. Indeed, using scRNA-seq we could identify the CNAs involved in simple and complex structural alterations found at time of RT by WGS already in the dormant RT cells at CLL diagnosis and subsequent time points before their final expansion ( Fig. 3j and Extended Data Fig. 8). These findings suggest an early acquisition of SVs, including chromothripsis and transcriptomic identity in RT.
To validate our observations, we reanalyzed the longitudinal scRNA-seq dataset from Penter et al. 43 consisting of nine patients with CLL, one of which developed RT. In this case, we identified RT cells in the CLL sample collected 1.6 years before the RT (Extended Data Fig. 7j). Overall, our integrative analyses uncovered a widespread early seeding of RT cells up to 19 years before their expansion and clinical manifestation.
OXPHOS high -BCR low transcriptional axis of RT. To understand the transcriptomic evolution from CLL to RT and its epigenomic regulation, we integrated genome-wide profiles of DNA methylation, chromatin activation (H3K27ac) and chromatin accessibility (ATAC-seq) with bulk RNA-seq and scRNA-seq of multiple longitudinal samples of six patients treated with BCR inhibitors (Fig.  1a). The DNA methylome of RT mainly reflected the naive and memory-like B cell derivation of their CLL counterpart, whereas chromatin activation and accessibility were remarkably different upon transformation (Fig. 4a). We identified 150 regions with increased H3K27ac and 426 regions that gained accessibility in RT (  Table 7). The top TF was TEAD4, which          *T2  T1  *T4  *T3  T2  T1  T2  T1  T2  T1  T2  T1  T2  T1  T5 Table 6).
Genes upregulated in RT involved pathways that seem independent of BCR signaling such as Wnt (WNT5A and others) 50 , Toll-like receptors (TLR9 among others) 51 and a number of cyclin-dependent kinases. Downregulated genes included, among others, CXCR4, HLA-A/B and chromatin remodelers also targeted by genetic alterations in some cases ( Fig. 4d and Extended Data Fig. 10b,c). Gene sets modulated by gene expression in RT were in harmony with the identified chromatin-based changes and included upregulation of E2F targets, G2M checkpoints, MYC targets, MTORC1 signaling, OXPHOS, mitochondrial translation, glycolysis, reactive oxygen species and DNA repair pathways, among others. In addition, RT showed downmodulation of BCR signaling (Fig. 4g,h, Extended Data Fig. 10d and Supplementary Table 11). The OXPHOS high -BCR low pattern observed by bulk RNA-seq in RT was further refined using scRNA-seq: two of five tumors had OXPHOS high -BCR low (12 and 63, although the latter showed some intercluster variability), the two M-CLL carrying IGLV3-21 R110 had RT with BCR expression similar to CLL and were OXPHOS high -BCR normal (365) or OXPHOS normal -BCR normal (19) and the RT-PLL (3,299) was OXPHOS low -BCR low (Fig. 4i, Extended Data Fig. 10e-j and Supplementary Table 23). In addition, the scRNA-seq analysis showed that the OXPHOS/ BCR profiles of RT were already identified in the early dormant RT cells, suggesting that they might represent an intrinsic characteristic of RT cells rather than being modulated by BCR inhibitors (Fig. 4j and Extended Data Fig. 10g-j). To expand these observations, we measured the expression of OXPHOS and BCR pathways in the scRNA-seq dataset from Penter et al. 43 . Case CLL9, which   developed RT in the absence of any therapy, showed a remarkably higher OXPHOS and slightly lower BCR expression at time of RT compared to CLL ( Fig. 4k and Extended Data Fig. 10k,l). Overall, the epigenome and transcriptome of RT converge to an OXPHOS high -BCR low axis reminiscent of that observed in the de novo DLBCL subtype characterized by high OXPHOS (DLBCL-OXPHOS) and insensitive to BCR inhibition 52-54 . This axis might explain the selection and rapid expansion of small RT subclones under therapy with BCR inhibitors.
OXPHOS and BCR activity in RT. We next validated experimentally the OXPHOS and BCR activity of RT in samples of patients 12, 19 and 63. Respirometry assays confirmed that OXPHOS high RT cells (patients 12 and 63) had a 3.5-fold higher oxygen consumption at routine respiration and fivefold higher electron transfer system capacity (ETC) compared to CLL. In addition, OXPHOS normal RT (patient 19) showed a routine oxygen consumption similar to CLL, although also had a relatively higher ETC than its CLL counterpart (Fig. 5a, Supplementary Fig. 3a 19) and all of them were higher than their respective CLL. OXPHOS inhibition resulted in a marked decrease in proliferation in OXPHOS high RT (mean 49.1%), which contrasted with that observed in OXPHOS normal RT (2.2% decrease) and CLL (23.2% decrease) ( Fig. 5c and Supplementary  Fig. 4d). Overall, these results confirm the role of OXPHOS high phenotype in high proliferation of RT and suggest its potential therapeutic value in RT as proposed for other neoplasms 53-57 .

Discussion
The genome of RT is characterized by a compendium of driver alterations in cell cycle, MYC, NOTCH and NF-κB pathways, frequently targeted in single catastrophic events and by the footprints of early-in-time, treatment-related, mutational processes, including the new SBS-RT potentially associated with bendamustine and chlorambucil exposure. A very early diversification of CLL leads to emergence of RT cells with fully assembled genomic, immunogenetic and transcriptomic profiles already at CLL diagnosis up to 19 years before the clonal explosion associated with the clinical transformation. RT cells have a notable shift in chromatin configuration and transcriptional program that converges into activation of the OXPHOS pathway and downregulation of BCR signaling, the latter potentially compensated by activating Toll-like, MYC and MAPK pathways 17,51,58,59 . The rapid expansion of RT subclones under treatment with BCR inhibitors is consistent with its low BCR signaling, except when carrying the IGLV3-21 R110 and further supported by the increased number of subclones carrying unproductive immunoglobulin genes and the development of RT with plasmablastic differentiation, a cell type independent of BCR signaling 60 . Finally, we also uncovered that OXPHOS inhibition reduced the proliferation of RT cells in vitro, a finding worth exploring in future therapeutic strategies 55,57 .
In conclusion, our comprehensive characterization of CLL evolution toward RT has revealed new genomic drivers and epigenomic reconfiguration with very early emergence of subclones driving late stages of cancer evolution, which may set the basis for developing single-cell-based predictive strategies. Furthermore, this study also identifies new RT-specific therapeutic targets and suggests that early intervention to eradicate dormant RT subclones may prevent the future development of this lethal complication of CLL.

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/ s41591-022-01927-8. Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http:// creativecommons.org/licenses/by/4.0/.

NATuRE MEDiCiNE
Methods Consent and sample processing. Written informed consent was obtained from all patients. The study was approved by the Hospital Clinic of Barcelona Ethics Committee. Tumor DNA was extracted from tumor cells purified from fresh/cryopreserved mononuclear cells, frozen lymph nodes or formalin-fixed paraffin-embedded (FFPE) tissue (n = 1, CLL sample of patient 1,669). Germline DNA was obtained from the non-tumoral purified cell fraction in 12 cases. In two patients (1,523 and 4,675) who had received allogeneic stem-cell transplant before RT, germline DNA of the donor was also collected. All  A specific flow cytometry analysis was conducted on peripheral blood samples of patient 12, which were stained with the Lymphocyte Screening Tube according to EuroFlow protocols (https://www.euroflow.org/protocols). At least 100,000 cells were acquired in a FACSCanto II instrument. Analysis was conducted using the Infinicyt 2.0 software. The sequential gating analysis was as follows: singlet identification in a FSC-W versus FSC-H plot; leukocyte identification in SSC-A versus CD45 (V500-C) plot and FSC-A versus SSC-A; lymphocytes identified as SSC-A low and CD45 high and back-gated in FSC-A versus SSC-A to exclude monocytes; in the lymphocyte gate, T cells were identified as CD3 + cells in SSC-A versus CD3 (APC) followed by sequentially distinguishing TCRγδ + T cells, CD4 T cells and CD8 T cells; after excluding T cells, B cells were selected in a SSC-A versus CD19 (PE-Cy7), followed by inspection of CD19 (PECy7) versus CD20 (PacB), CD5 (PerCPCy5.5) versus CD20 (PacB) and CD20 (PacB) versus CD38 (APC-H7) plots to evaluate the expression of these B cell markers and the assignation of κ and λ expression in a plot of IgK (PE) versus IgL (FITC); after excluding B cells, natural killer cells were identified in a SSC-A versus CD56 (PE) plot followed by SSC-A versus CD38 (APC-H7) plot. General considerations. Overall, 12 patients had a complete dataset (germline, CLL and RT samples), 6 patients lacked germline DNA and 1 patient had only the RT sample (case 4,676). We conducted tumor versus normal analyses in cases with a complete dataset. For the six patients lacking the germline sample, we used the CLL samples as 'normal' to identify SNV acquired at RT for mutational signature analyses. In addition, tumor-only analyses were conducted in these CLL and RT samples, as well as in the patient with only a RT sample available, to identify driver gene mutations and genome-wide CNAs (Supplementary Table 1).
Tumor-only CNA calling. CNAs were extracted using CNVkit (v.0.9.3) 80 . CNAs <500 kb, with an absolute log 2 copy ratio (log 2 CR) < 0.3 or located within any of the immunoglobulin loci were removed. CNAs were classified as gains if log 2 CR > 0.3, deletions if log 2 CR < −0.3, high-copy gains if log 2 CR > 1.1 and homozygous deletions if log 2 CR < −1.1. The log 2 CR cutoff was set to 0.15 for two samples with low tumor cell content (102-01-01TD and 4690-03-01BD). To avoid a high segmentation of the CNA profile, CNAs belonging to the same class were merged if they were separated by <1 Mb and had an absolute log 2 CR difference <0.25.

Array-based CNA calling in FFPE.
CNAs were examined in the FFPE CLL sample using the Oncoscan CNV FFPE Assay kit (Thermo Fisher Scientific, cat. no. 902695) and analyzed using Nexus 9.0 software (Biodiscovery).
Tumor versus normal SV calling. SVs were extracted using SMuFin (v.0.9.4) 68 , BRASS (v.6.0.5) 81 , SvABA (v.7.0.2) 70 and DELLY2 (v.0.8.1) 82 . SVs identified were intersected considering a window of 300 bp around break points. We kept for downstream analyses the SVs identified by at least two programs if at least one of the algorithms called the alteration with high quality (MAPQ ≥ 90 for BRASS, MAPQ = 60 for SvABA and DELLY2). In addition, IgCaller (v.1.2) 62 was used to call SVs within any of the immunoglobulin loci. All SVs were visually inspected using IGV 63 . SVs were categorized into simple or complex events. Chromothripsis 83 was defined as ≥7 oscillating changes between two or three copy number states or the presence of >7 SV break points occurring in a single chromosome and supported by additional criteria 83,84 . Chromoplexy was determined by the presence of ≥3 chained chromosomal rearrangements, where chains were identified using a window of 50 kb 85,86 . Cycles of templated insertions were defined as copy number gains in ≥3 chromosomes linked by SVs 87 . Breakage-fusion bridge cycles were defined as patterns of focal copy number increases and fold-back inversions, together with telomeric deletions. Chains of rearrangements having >2 SVs and not fulfilling any of the previous criteria were classified as 'other complex events' . Chromothripsis and 'other complex events' were subcategorized according to the number of chromosomes involved. The longitudinal nature of our dataset allowed us to refine the obtained classification based on the presence of the involved alterations in each time point analyzed.
Patients who underwent allogenic stem-cell transplant. In these patients, we conducted tumor versus patient's germline and tumor versus donor's germline variant calling in parallel. Only the intersection of variants identified was considered.

WGS-based subclonal reconstruction.
A Markov chain Monte Carlo sampler for a Dirichlet process mixture model was used to infer putative subclones, to assign mutations to subclones and to estimate the subclone frequencies in each sample from the SNV read counts, copy number states and tumor purities (Supplementary Table 17) 78,88 . Clusters with <100 mutations were excluded. The phylogenetic relationships between subclones were identified following the 'pigeonhole principle' , which was relaxed using a case-specific 'tolerated error' 88 . Clusters not assigned to the reconstructed phylogenetic tree were excluded. Fish plots were generated using the TimeScape R package (v.1.6.0). The CCF of indels was calculated integrating read counts, CNAs and tumor purity 89 . Driver indels subjected to validation by scDNA-seq and/or relevant to the tumor phylogeny were manually assigned to subclones. Similarly, driver CNAs relevant to the phylogeny were manually assigned. Seven SNVs found in TP53/ATM overlapping with CNAs were manually assigned to the most likely subclone as they were not automatically assigned by the Dirichlet process and were subjected to scDNA-seq (Supplementary Table 9).
Mutational signatures. We studied mutational signatures acting genome-wide and in localized regions (inter-mutation distance ≤1Kb) 29,32 . We integrated the mutations identified in this CLL/RT cohort together with those of 147 CLL treatment-naive samples (ICGC-CLL) 28 and 27 new CLL collected at relapse post-treatment (mean coverage 31.5×; Supplementary Table 15). The WGS of these two additional cohorts was (re-)analyzed using our current bioinformatic pipeline (Supplementary Table 12). Mutational signatures were analyzed for SNVs or single-base substitutions (SBSs) according to their 5′ and 3′ flanking bases following three steps 30 :  . 4a); (2) APOBEC signatures (SBS2 and SBS13) were favored to be assigned to one of the signatures extracted by HDP and SignatureAnalyzer although it was not the best EM solution probably because they were only found in one sample, which impaired a clean extraction of the signatures (Extended Data Fig. 4f); and (3) one signature extracted by HDP and SignatureAnalyzer was directly assigned to the mutational signature associated with ganciclovir treatment 35 (cosine similarity 0.987 and 0.993, respectively) (Extended Data Fig. 4). The new SBS-RT extracted by HDP was considered for downstream analyses as it had less background noise than the one extracted by SignatureAnalyzer, favoring a higher specificity during the fitting step. Similarly, the SBS-ganciclovir extracted by HDP was used in downstream analyses (Extended Data Fig. 4). We also performed a detailed review to remove signatures susceptible of being originated due to sequencing artifacts (Supplementary Table 13). 3. Fitting: we used a fitting approach (MutationalPatterns, v.3.0.1) to measure the contribution of each mutational signature in each sample. Based on (1) the de novo identification of the therapy-related SBS-ganciclovir and (2) that two patients received melphalan before RT, the mutational signature associated with melphalan therapy 34 was also included in this step.  Table 15). To assess the contribution of each signature to each subclone we followed the same fitting strategy but (1) considered only the signatures that were present in the corresponding sample and (2) removed the final step of adding SBS9 in M-CLL to avoid its addition in multiple subclones with low evidence.

Genomic locations and strand bias.
We assessed the contribution of SBS-RT to coding SNVs in RT subclones (also including cases in which the CLL sample was used as a 'germline') by calculating the probability that a given mutation was caused by SBS-RT. To perform this calculation, we considered the signatures present in the subclone/sample and their signature profile 92 . The reference epigenomes of CLL 44 were used to explore the contribution of the mutational processes in different regulatory regions. We simplified the described chromatin states in four categories: heterochromatin (H3K9me3_Repressed, Heterochromatin Low_Signal), polycomb (Posied_Promoter, H3K27me3_Repressed), enhancer/promoter (Active_Promoter, Strong_Enhancer1, Weak_Promoter, Weak_Enhancer, Strong_Enhancer) and transcription (Transcription_Transition, Weak_Transcription, Transcription_ Elongation). We also mapped the activity of mutational processes in early/late replication regions of the genome considering peaks/valleys of early/late replication as those regions of ≥1 kb with absolute replication timing >0.5 (ref. 93 ). All SNVs of the CLL and RT subclones were classified in any of the four chromatin states and early/late replication regions before fitting mutational signatures. A cutoff of 0.005 was used to remove the less-contributing signature during the fitting step. We also generated replication and transcriptional strand bias profiles of the RT-specific mutations using the MutationalPatterns R package 34 . The replication strand was annotated based on the left/right replication direction of the timing transition regions 94 67 . Allele positions lacking mutations by WGS were used to model the background sequencing noise, which was unified according to the trinucleotide context of each possible mutation. Mutations of interest were annotated as high confidence when their frequency was above the background noise with a probability of 95%.   Table 6). R and core Bioconductor packages, including minfi (v.1.34.0) 96 , were used to integrate and normalize DNA methylation data 49 . We removed non-CpG probes, CpGs representing single nucleotide polymorphisms, CpGs with individual-specific methylation previously reported in B cells, CpGs in sex chromosomes and CpGs with a detection P value >0.01 in >10% of the samples. The data were normalized using the SWAN algorithm and CpGs were annotated using the IlluminaHumanMethylationEPICanno.ilm10b4. hg19 package (v.0.6). Tumor cell content of each sample was inferred from DNA methylation 49 and samples with a tumor cell content <60% were excluded. After all filtering criteria, we retained 33 samples (NBCs, n = 2; GCs, n = 1; MBCs, n = 3; tPCs, n = 1; CLL controls, n = 12; CLL/RT samples, n = 14 (six patients); Supplementary Table 6).

Differential analyses, CLL epitypes and epiCMIT.
We compared the DNA methylation status of each CpG to the mean of such CpGs in NBCs to calculate the number of hyper-and hypomethylation changes per CLL/RT sample. Changes in each sample were defined based on a minimum difference of 0.25 methylation. To perform a differential analysis between CLL and RT, we compared the DNA methylation of each CpG in each CLL sample (first available time point used) versus their respective RT sample. Differentially methylated CpGs were considered as those showing a minimum difference of 0.25 in at least four of the five longitudinal cases of RT versus CLL analyzed (Supplementary Table 6). The epigenetic subtypes (epitypes) and epiCMIT score for each CLL and RT sample were calculated 49 .
Read mapping and initial data processing. FASTQ files were aligned to the reference genome (GRCh38) using BWA-ALN (v.0.7.7, parameter: -q 5) 61 , duplicated reads were marked using Picard tools (v.2.8.1) and low-quality and duplicated reads were removed using SAMtools (v.1.3.1, parameters: -b -F 4 -q 5 -b -F 1,024) 67 . PhantomPeakQualTools (v.1.1.0) were used to generate wiggle plots and for extracting the predominant insert-size. Peaks were called using MACS2 (v.2.1.1.20160309, parameters for H3K27ac: -g hs -q 0.05 -keep-dup all -nomodel -extsize insert-size; parameters for ATAC-seq: -g hs -q 0.05-keep-dup all -f BAM -nomodel -shift −96 -extsize 200; no input control) 97 . Peaks with q values <1 × 10 −3 were included for downstream analyses. For each mark separately, a set of consensus peaks, including regions within chromosomes 1-22 and present in published healthy B cells 44 and CLL samples was generated by merging the locations of the separate peaks per individual sample. For ChIP-seq, the numbers of reads per sample per consensus peak were calculated using the genomecov function (bedtools, v.2.25.0). For ATAC-seq, the number of Tn5 transposase insertions per sample per consensus peak was calculated by first determining the estimated insertion sites (shifting the start of the first mate 4 bp downstream) before using the genomecov function. Variance stabilizing transformation (VST) values were calculated for all consensus peaks using DESeq2 (v.1.28.1) 98 , which were then corrected for the consensus SPOT score (the percentage of reads that fall within the consensus peaks) using the ComBat function (sva R package, v.3.36.0). To that purpose, the cell condition (tumor and different healthy B cell subtypes) was assigned to each sample and samples were clustered in 20 bins of 5% according to their consensus SPOT score. The bins on the extremes, which contained fewer than five samples, were joined with their neighboring bins to ensure that each bin contained five samples or more. PCA was generated using the corrected VST values of peaks that were present in more than one sample.
Detection of differential epigenetic regions and RT-specific changes. We first determined the regions with stable epigenetic profiles in the healthy B cell counterparts (NBCs and MBCs) by applying a threshold of s.d. < 0.8 with respect to the mean value. For all these NBC/MBC stable regions, we then calculated the log 2 FC between the mean of VST-corrected healthy B cell values and each of the tumor samples. Due to the data distribution variability, we applied slightly different thresholds of log 2 FC for each case (Supplementary Tables 7 and 8). To identify regions changing in RT for each case individually, we selected the regions that presented substantial epigenetic changes as compared to the normal counterpart and to the previous CLL (absolute log 2 FC > 1). The ATAC-seq RT-specific signature encompassed differential regions common in two or more cases of RT, whereas the H3K27ac RT-specific signature included differential regions common in three or more cases. Potential protein-coding target genes were assigned to each of the RT-specific regions using two strategies. To identify close target genes, we took the overlap with the regions of genes of interest adding 2 kb upstream of their transcription start site. To identify distant target genes, we used Hi-C data from the GM12878 cell line and selected all genes located within the same topologically associated domain as the region of interest. We only considered DEGs identified by bulk RNA-seq (Supplementary Tables 7 and 8).
Transcription factor analysis. Enrichment for TF-binding sites was analyzed in chromatin accessible regions within the RT-specific active chromatin regions. Accessible peaks were determined as regions with presence of ATAC peaks in two or more RT cases. Enrichment analysis of known TF-binding motifs was performed using the AME tool (MEME suite) considering the non-redundant Homo sapiens 2020 Jaspar database and applying one-tailed Wilcoxon rank-sum tests with the maximum score of the sequence, a 0.01 FDR cutoff and a background formed by reference GRCh38 sequences extracted from the consensus ATAC-seq peaks (91,671 regions). We then established the occupancy of these motifs in RT and CLL by calculating the percentage of the target RT-specific active regions and of the regions with increased H3K27ac in CLL, respectively, which contained these motifs. Finally, we selected TFs presenting an occupancy difference between RT and CLL ≥ 10% and overexpressed in RT (bulk RNA-seq, log 2 FC > 0, adjusted P value <0.01). Non-ribosomal reads were trimmed using Trimmomatic (v.0.38) 95 . Gene-level counts (GRCh38.p13, Ensembl release 100) were calculated using kallisto (v.0.46.1) 100 and tximport (v.1.14.2). A paired DEA was conducted using DESeq2 (v.1.26.0) 98 . Adjusted P value <0.01 and absolute log 2 (fold change) > 1 were used to identify DEGs. Gene set enrichment analysis (GSEA) was conducted using a pre-ranked gene list ordered by −log 10 (P) × (sign of fold change) using the 'GSEA' function (clusterProfiler R package, v.3.14.3). We focused on C2 (curated) and Hallmark gene sets from the Molecular Signatures Database (v.7.4) with a minimal size of 10 and maximal size of 250. Gene ontology (GO) GSEA was conducted using the pre-ranked gene list as input of the 'gseGO' function (clusterProfiler) focusing on biological processes. Redundancy in the output list of GO terms was removed using the 'simplify' function (cutoff of 0.35).  101 . Genotypes were encoded as zero for wild-type, one for heterozygous mutation, two for homozygous mutation and three for missing data. ∞SCITE was used to find the mutation tree that best fitted the genotypes observed and to assign cells into subclones. ∞SCITE was run using a global sequencing error rate (false-positive rate) of 1% 102 , an estimated rate of non-mutated sites called as homozygous mutations of 0% and a patient-specific estimated rate of the allele dropout rate (false-negative rate). For each patient, the estimated rate of missed heterozygous mutations (dropout of the mutated allele) and the estimated rate of heterozygous mutations called as homozygous mutations (dropout of the normal allele) were calculated from germline single-nucleotide polymorphisms reported in gnomAD with a population frequency >1% and called as mutated in at least 75% of cells with a VAF per read count between 47% and 53% according to Tapestri Insights. Patient-specific allele dropout rates were calculated for all patients except for patient 365, which did not have any heterozygous polymorphisms fulfilling the previous criteria. In this case, we used an allele dropout rate of 0.07, which is within the range measured in the other cases. We ran ∞SCITE with and without considering NOTCH1 mutations and manually curated the result of patient 3,299 carrying an RPS15 mutation due to the high allele dropout rate observed in these genes ( Supplementary Fig. 2). We ran ∞SCITE for each patient combining all time points and obtained time-point-specific subclone sizes by counting the cells assigned to each subclone in each sample 102 . Only cells uniquely assigned to one subclone were considered. Cells genotyped as wild-type for all selected mutations were considered as non-tumoral cells and were removed.

Single-cell
Single-cell RNA-seq. Data generation. scRNA-seq was performed on longitudinal samples of five patients using three different approaches: 1. Smart-seq2: full-length scRNA-seq libraries were prepared for samples of patient 63 using the Smart-seq2 protocol 103  Dealing with confounders. We observed batch effects between 10x Genomics experiments. To avoid batch effects within samples of the same patient, we focused on the BCLLATLAS_10 experiment for patients 12, 19 and 3,299. Conversely, as we did not obtain a clear signal-to-noise separation in the HTO demultiplexing of case 365, we analyzed the cells obtained with BCLLATLAS_29. We also found some cell neighborhoods that harbored a high percentage of mitochondrial expression and a low number of detected genes. In such cases, we were more stringent with the thresholds or fetched and eliminated these clusters with FindClusters. We also excluded some clusters of doublets that expressed markers of microenvironment cells (erythroblasts, T cells or natural killer cells). Finally, for patient 3,299 in which one sample was obtained from peripheral blood (PB), whereas the others were obtained from bone marrow (BM), we focused solely on the BM samples to avoid misinterpretations. For patient 365, the CLL and RT time points were sampled from PB and lymph nodes, respectively. As the same RT sample profiled with bulk RNA-seq clustered with other RT samples from PB, we analyzed them jointly. After all the filtering, we recomputed the highly variable genes and PCAs.  110  Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
Sequencing data are available from the European Genome-phenome Archive (http://www.ebi.ac.uk/ega/) under accession no. EGAS00001006327. scRNA-seq expression matrices, Seurat objects and corresponding metadata are available at Zenodo (https://doi.org/10.5281/zenodo.6631966). Fig. 1 | Cohort studied and types of Richter transformation. a. Representation of the disease course of the patients included in the study. Each sample analyzed, treatment and date of RT are depicted. Patients labeled in gray lacked germline DNA. Patient 4676 also lacked DNA from the previous CLL sample. Patients are grouped based on the last line of therapy received before RT in three groups: patients developing RT before any treatment, after chemo(immuno)therapy, and after targeted therapy. The type of transformation (RT-DLBCL, diffuse large B cell lymphoma type; RT-PLL, prolymphocytic transformation; RT-PBL, plasmablastic transformation) and IGHV mutational status are also shown. Additional molecular studies conducted in each case are also depicted. Extended Data Fig. 2 | Genetic and epigenetic changes from CLL to RT, CNA profiles, and landscape of driver alterations. a. Number of somatic genetic alterations and epigenetic changes compared to normal counterparts along the course of the disease. Cases/time points with no grid lines correspond to unavailable data. b. Mutational burden, number of CNAs and number of SVs found in RT stratified according to the last therapy prior transformation. Targeted, targeted therapies. center line, median; box limits, upper/lower quartiles; whiskers, 1.5×interquartile range; points, individual samples. c. Copy number landscape of the studied cohort grouped by patient. The diagnosis, IGHV mutational status, last therapy prior RT, and total number of CNAs are indicated for each time point. d. Aggregated copy number profile of RT vs CLL. The first CLL samples (time point 1, T1) were considered. The plot shows the percentage of samples with gains (up) and losses (down). Among recurrent alterations found either in CLL or RT samples (n ≥ 5), deletions of 9p (PTPRD and CDKN2A/B) and deletions of 15q (MGA) were enriched in RT whereas deletions of ATM (11q), TP53 (17p), and 13q14 were found at similar frequencies in CLL and RT. e. Oncoprint of putative driver alterations. Samples, grouped by patient (patient id at the top), are represented by columns while genes in rows. Novel drivers in RT are labeled in blue. Genes are grouped according to their biological function or if they were previously described as potential driver genes in CLL and/or mature B cell lymphomas. Metadata including the type of therapy before RT, number of treatment lines before each sample, the spatial/longitudinal nature of the CLL/RT samples analyzed, IGHV mutational status, and diagnosis is detailed in the upper rows. In the main plot, mutations (SNVs and indels) are depicted with horizontal rectangles, CNAs using the background color of each cell, and SVs with vertical rectangles. The transparency of the color of mutations and CNAs indicates the cancer cell fraction (CCF). For patients lacking the germline sample (patient id indicated in gray), the CCF of the alterations could not be inferred and a CCF of 100% was used for illustrative purposes. Fig. 4 | Extraction and assignment of mutational signatures. a-d. Signatures extracted by the Hierarchical Dirichlet Process (HDP) (a), SignatureAnalyzer (b), SigProfiler (c), and sigfit (d). COSMIC signatures needed to reconstruct the extracted signatures are shown together with their contribution (in percentage). The cosine similarities between the extracted and reconstructed signatures are shown in brackets. e. Workflow of the mutational signature analysis. f. The 96-mutation profile of the RT sample of patient 839 (time point 2), which had marked evidence of APOBEC activity (SBS2 and SBS13). g. Comparison of the SBS-ganciclovir extracted by HDP and SignatureAnalyzer. Based on the high cosine similarity (0.996), we considered that both signatures represented the same mutational process and selected the one extracted by HDP for downstream analyses. h. Comparison of the SBS-ganciclovir extracted by HDP and the ganciclovir signature reported by de Kanter et al. 35 . i. Comparison of the SBS-RT extracted by HDP and SignatureAnalyzer. Based on the high cosine similarity (0.941), we considered that both signatures represented the same mutational process and selected the one extracted by HDP for downstream analyses. j. Pairwise comparisons of the SBS-RT with known signatures from COSMIC and Kucab et al. 33 . k. Decomposition of the SBS-RT in "n" known signatures using an expectation maximization approach. The low cosine similarity (<0.85) between SBS-RT and the best reconstituted signature obtained using any combination of known signatures suggests that SBS-RT represents a novel mutational signature.  Supplementary Table 20. The mutation tree inferred from scDNA-seq data is shown at the bottom-right part. b-c. Subclonal architecture and dynamics of six cases with longitudinal samples (b) and two cases with spatial samples (c) analyzed by WGS.