Recent advances in cancer characterization have consistently revealed marked heterogeneity, impeding the completion of integrated molecular and clinical maps for each malignancy. Here, we focus on chronic lymphocytic leukemia (CLL), a B cell neoplasm with variable natural history that is conventionally categorized into two subtypes distinguished by extent of somatic mutations in the heavy-chain variable region of immunoglobulin genes (IGHV). To build the ‘CLL map,’ we integrated genomic, transcriptomic and epigenomic data from 1,148 patients. We identified 202 candidate genetic drivers of CLL (109 new) and refined the characterization of IGHV subtypes, which revealed distinct genomic landscapes and leukemogenic trajectories. Discovery of new gene expression subtypes further subcategorized this neoplasm and proved to be independent prognostic factors. Clinical outcomes were associated with a combination of genetic, epigenetic and gene expression features, further advancing our prognostic paradigm. Overall, this work reveals fresh insights into CLL oncogenesis and prognostication.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 per month
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Rent or buy this article
Get just this article for as long as you need it
Prices may be subject to local taxes which are calculated during checkout
The molecular data used in this study are publicly available and are included in the following patient cohorts (Table 1, Supplementary Tables 1 and 2 and Extended Data Fig. 1a): DFCI, Dana-Farber Cancer Institute; GCLLSG, German CLL Study Group; ICGC, International Cancer Genome Consortium; MDACC, MD Anderson Cancer Center; NHLBI, National Heart Lung and Blood Institute; UCSD, University of California San Diego. Sequencing, expression, and genotyping is available at EGA (http://www.ebi.ac.uk/ega/), which is hosted at the European Bioinformatics Institute, under accession number EGAS00000000092 (ICGC cohort) and in dbGaP under accession numbers phs001473.v2.p1 (MDACC, NHLBI), phs000922.v2.p1 (GCLLSG), phs001431.v2.p1 (DFCI, UCSD), phs001091.v1.p1 (MDACC), phs000435.v3.p1 (DFCI), phs002297.v2.p1 (NHLBI) and phs000879.v1.p1 (DFCI) and GEO accession number GSE143673 (GCLLSG). 450K array data are available at EGA under accession number EGAD00010001975 (ICGC). The project data portal is available at https://cllmap.org.
Terra methods used in the study can be found at https://app.terra.bio/#workspaces/broad-firecloud-wupo1/CLLmap_Methods_Apr2021. Source code used in the study can be found at https://github.com/getzlab/CLLmap. The RFcaller pipeline is available at https://github.com/xa-lab/RFcaller. The new epiCMIT suitable for Illumina arrays and NGS approaches as well as the CLL epitype classifier can be found at https://github.com/Duran-FerrerM/CLLmap-epigenetics.
Landau, D. A. et al. Mutations driving CLL and their evolution in progression and relapse. Nature 526, 525 (2015).
Puente, X. S. et al. Non-coding recurrent mutations in chronic lymphocytic leukaemia. Nature 526, 519 (2015).
Gruber, M. et al. Growth dynamics in naturally progressing chronic lymphocytic leukaemia. Nature 570, 474–479 (2019).
Dvinge, H. et al. Sample processing obscures cancer-specific alterations in leukemic transcriptomes. Proc. Natl Acad. Sci. USA 111, 16802–16807 (2014).
Ferreira, P. G. et al. Transcriptome characterization by RNA sequencing identifies a major molecular and clinical subdivision in chronic lymphocytic leukemia. Genome Res. 24, 212–226 (2014).
Oakes, C. C. et al. DNA methylation dynamics during B cell maturation underlie a continuum of disease phenotypes in chronic lymphocytic leukemia. Nat. Genet. 48, 253–264 (2016).
Kulis, M. et al. Epigenomic analysis detects widespread gene-body DNA hypomethylation in chronic lymphocytic leukemia. Nat. Genet. 44, 1236–1242 (2012).
Beekman, R. et al. The reference epigenome and regulatory chromatin landscape of chronic lymphocytic leukemia. Nat. Med. 24, 868–880 (2018).
Bloehdorn, J. et al. Multi-platform profiling characterizes molecular subgroups and resistance networks in chronic lymphocytic leukemia. Nat. Commun. 12, 5395 (2021).
Landau, D. A. et al. The evolutionary landscape of chronic lymphocytic leukemia treated with ibrutinib targeted therapy. Nat. Commun. 8, 2185 (2017).
Kasar, S. et al. Whole-genome sequencing reveals activation-induced cytidine deaminase signatures during indolent chronic lymphocytic leukaemia evolution. Nat. Commun. 6, 8866 (2015).
Burger, J. A. et al. Safety and activity of ibrutinib plus rituximab for patients with high-risk chronic lymphocytic leukaemia: a single-arm, phase 2 study. Lancet Oncol. 15, 1090–1099 (2014).
Burger, J. A. et al. Clonal evolution in patients with chronic lymphocytic leukaemia developing resistance to BTK inhibition. Nat. Commun. 7, 11589 (2016).
Shuai, S. et al. The U1 spliceosomal RNA is recurrently mutated in multiple cancers. Nature 574, 712–716 (2019).
Minici, C. et al. Distinct homotypic B-cell receptor interactions shape the outcome of chronic lymphocytic leukaemia. Nat. Commun. 8, 15746 (2017).
Maity, P. C. et al. IGLV3-21*01 is an inherited risk factor for CLL through the acquisition of a single-point mutation enabling autonomous BCR signaling. Proc. Natl Acad. Sci. USA 117, 4320–4327 (2020).
Lawrence, M. S. et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495 (2014).
Kleinstern, G. et al. Tumor mutational load predicts time to first treatment in chronic lymphocytic leukemia (CLL) and monoclonal B‐cell lymphocytosis beyond the CLL international prognostic index. Am. J. Hematol. 95, 906–917 (2020).
Leeksma, A. C. et al. Clonal diversity predicts adverse outcome in chronic lymphocytic leukemia. Leukemia 33, 390–402 (2019).
Landau, D. A. et al. Evolution and impact of subclonal mutations in chronic lymphocytic leukemia. Cell 152, 714–726 (2013).
Kamburov, A. et al. Comprehensive assessment of cancer missense mutation clustering in protein structures. Proc. Natl Acad. Sci. USA 112, E5486–E5495 (2015).
Dziembowski, A., Lorentzen, E., Conti, E. & Séraphin, B. A single subunit, Dis3, is essentially responsible for yeast exosome core activity. Nat. Struct. Mol. Biol. 14, 15–22 (2007).
Chapman, M. A. et al. Initial genome sequencing and analysis of multiple myeloma. Nature 471, 467–472 (2011).
Chang, M. T. et al. Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity. Nat. Biotechnol. 34, 155–163 (2016).
Amblar, M., Barbas, A., Fialho, A. M. & Arraiano, C. M. Characterization of the functional domains of Escherichia coli RNase II. J. Mol. Biol. 360, 921–933 (2006).
Papamichos-Chronakis, M., Watanabe, S., Rando, O. J. & Peterson, C. L. Global regulation of H2A.Z localization by the INO80 chromatin-remodeling enzyme is essential for genome integrity. Cell 144, 200–213 (2011).
McKinney, M. et al. The genetic basis of hepatosplenic T-cell lymphoma. Cancer Discov. 7, 369–379 (2017).
López, C. et al. Genomic and transcriptomic changes complement each other in the pathogenesis of sporadic Burkitt lymphoma. Nat. Commun. 10, 1459 (2019).
Weber, J. et al. PiggyBac transposon tools for recessive screening identify B-cell lymphoma drivers in mice. Nat. Commun. 10, 1415 (2019).
Edelmann, J. et al. High-resolution genomic profiling of chronic lymphocytic leukemia reveals new recurrent genomic alterations. Blood 120, 4783–4794 (2012).
Setlur, S. R. et al. Comparison of familial and sporadic chronic lymphocytic leukaemia using high resolution array comparative genomic hybridization. Br. J. Haematol. 151, 336–345 (2010).
Stilgenbauer, S. et al. Incidence and clinical significance of 6q deletions in B cell chronic lymphocytic leukemia. Leukemia 13, 1331–1334 (1999).
Boultwood, J. et al. Narrowing and genomic annotation of the commonly deleted region of the 5q− syndrome. Blood 99, 4638–4641 (2002).
Schneider, R. K. et al. Rps14 haploinsufficiency causes a block in erythroid differentiation mediated by S100A8 and S100A9. Nat. Med. 22, 288–297 (2016).
Ciccia, A. et al. Treacher Collins syndrome TCOF1 protein cooperates with NBS1 in the DNA damage response. Proc. Natl Acad. Sci. 111, 18631–18636 (2014).
Nowinski, S. M. et al. Mitochondrial uncoupling links lipid catabolism to Akt inhibition and resistance to tumorigenesis. Nat. Commun. 6, 8137 (2015).
Aguilar, E. et al. UCP2 Deficiency increases colon tumorigenesis by promoting lipid synthesis and depleting NADPH for antioxidant defenses. Cell Rep. 28, 2306–2316.e5 (2019).
Sondka, Z. et al. The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers. Nat. Rev. Cancer 18, 696–705 (2018).
Burns, A. et al. Whole-genome sequencing of chronic lymphocytic leukaemia reveals distinct differences in the mutational landscape between IgHVmut and IgHVunmut subgroups. Leukemia 32, 332–342 (2018).
Zhang, Q., Lenardo, M. J. & Baltimore, D. 30 Years of NF-κB: a blossoming of relevance to human pathobiology. Cell 168, 37–57 (2017).
Gandhi, V. & Plunkett, W. Cellular and clinical pharmacology of fludarabine. Clin. Pharmacokinet. 41, 93–103 (2002).
Sellmann, L. et al. Trisomy 19 is associated with trisomy 12 and mutated IGHV genes in B‐chronic lymphocytic leukaemia. Br. J. Haematol. 138, 217–220 (2007).
Shilatifard, A. The COMPASS family of histone H3K4 methylases: mechanisms of regulation in development and disease pathogenesis. Annu. Rev. Biochem. 81, 65–95 (2012).
Kleinstern, G. et al. Tumor mutational load predicts time to first treatment in chronic lymphocytic leukemia (CLL) and monoclonal B-cell lymphocytosis beyond the CLL international prognostic index. Am. J. Hematol. 95, 906–917 (2020).
Nadeu, F. et al. IgCaller for reconstructing immunoglobulin gene rearrangements and oncogenic translocations from whole-genome sequencing in lymphoid neoplasms. Nat. Commun. 11, 3390 (2020).
Hodson, D. J. et al. Deletion of the RNA-binding proteins ZFP36L1 and ZFP36L2 leads to perturbed thymic development and T lymphoblastic leukemia. Nat. Immunol. 11, 717–724 (2010).
Oppezzo, P. et al. Chronic lymphocytic leukemia B cells expressing AID display dissociation between class switch recombination and somatic hypermutation. Blood 101, 4029–4032 (2003).
Roco, J. A. et al. Class-switch recombination occurs infrequently in germinal centers. Immunity 51, 337–350.e7 (2019).
Leshchiner, I. et al. Comprehensive analysis of tumour initiation, spatial and temporal progression under multiple lines of treatment. Preprint at bioRxiv https://doi.org/10.1101/508127 (2019).
Tausch, E. et al. Prognostic and predictive impact of genetic markers in patients with CLL treated with obinutuzumab and venetoclax. Blood 135, 2402–2412 (2020).
Burger, J. A. et al. Long-term efficacy and safety of first-line ibrutinib treatment for patients with CLL/SLL: 5 years of follow-up from the phase 3 RESONATE-2 study. Leukemia 34, 787–798 (2020).
Duran-Ferrer, M. et al. The proliferative history shapes the DNA methylome of B-cell tumors and predicts clinical outcome. Nat. Cancer 1, 1066–1081 (2020).
Sellmann, L. et al. Trisomy 19 is associated with trisomy 12 and mutatedIGHVgenes in B-chronic lymphocytic leukaemia. Br. J. Haematol. 138, 217–220 (2007).
Nadeu, F. et al. IGLV3-21R110 identifies an aggressive biological subtype of chronic lymphocytic leukemia with intermediate epigenetics.Blood 137, 2395–2946 (2021).
Agathangelidis, A. et al. Higher-order connections between stereotyped subsets: implications for improved patient classification in CLL. Blood 137, 1365–1376 (2021).
Dobbelstein, M., Strano, S., Roth, J. & Blandino, G. p73-induced apoptosis: a question of compartments and cooperation. Biochem. Biophys. Res. Commun. 331, 688–693 (2005).
Chinnadurai, G., Vijayalingam, S. & Rashmi, R. BIK, the founding member of the BH3-only family proteins: mechanisms of cell death and role in cancer and pathogenic processes. Oncogene 27, S20–S29 (2008).
Wang, W. et al. MAPK4 overexpression promotes tumor progression via noncanonical activation of AKT/mTOR signaling. J. Clin. Invest. 129, 1015–1029 (2019).
Herling, C. D. et al. Time-to-progression after front-line fludarabine, cyclophosphamide, and rituximab chemoimmunotherapy for chronic lymphocytic leukaemia: a retrospective, multicohort study. Lancet Oncol. 20, 1576–1586 (2019).
Bilban, M. et al. Deregulated expression of fat and muscle genes in B-cell chronic lymphocytic leukemia with high lipoprotein lipase expression. Leukemia 20, 1080–1088 (2006).
Dietrich, S. et al. Drug-perturbation-based stratification of blood cancer. J. Clin. Invest. 128, 427–445 (2018).
Stilgenbauer, S. et al. Gene mutations and treatment outcome in chronic lymphocytic leukemia: results from the CLL8 trial. Blood 123, 3247–3254 (2014).
Stilgenbauer, S. et al. Alemtuzumab combined with dexamethasone, followed by alemtuzumab maintenance or Allo-SCT in ‘ultra high-risk’ CLL: Final results from the CLL2O phase II study. Blood 124, 1991–1991 (2014).
Wang, L. et al. SF3B1 and other novel cancer genes in chronic lymphocytic leukemia. N. Engl. J. Med. 365, 2497–2506 (2011).
Landau, D. A. et al. Locally disordered methylation forms the basis of intratumor methylome variation in chronic lymphocytic leukemia. Cancer Cell 26, 813–825 (2014).
Javed, N. et al. Detecting sample swaps in diverse NGS data types using linkage disequilibrium. Nat. Commun. 11, 3697 (2020).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Costello, M. et al. Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation. Nucleic Acids Res. 41, e67 (2013).
Cibulskis, K. et al. ContEst: estimating cross-contamination of human samples in next-generation sequencing data. Bioinformatics 27, 2601–2602 (2011).
Berger, M. F. et al. The genomic complexity of primary human prostate cancer. Nature 470, 214–220 (2011).
Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).
Benjamin, D. et al. Calling somatic SNVs and indels with Mutect2. Preprint at bioRxiv 861054 https://doi.org/10.1101/861054 (2019).
Kim, S. et al. Strelka2: fast and accurate calling of germline and somatic variants. Nat. Methods 15, 591–594 (2018).
Taylor-Weiner, A. et al. DeTiN: overcoming tumor-in-normal contamination. Nat. Methods 15, 531–534 (2018).
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
Kim, J. et al. Somatic ERCC2 mutations are associated with a distinct genomic signature in urothelial tumors. Nat. Genet. 48, 600–606 (2016).
Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).
Olshen, A. B., Venkatraman, E. S., Lucito, R. & Wigler, M. Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5, 557–572 (2004).
Mermel, C. H. et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 12, R41 (2011).
Morton, L. M. et al. Radiation-related genomic profile of papillary thyroid carcinoma after the Chernobyl accident. Science https://doi.org/10.1126/science.abg2538 (2021).
Chen, X. et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32, 1220–1222 (2016).
Wala, J. A. et al. SvABA: genome-wide detection of structural variants and indels by local assembly. Genome Res. 28, 581–591 (2018).
Bass, A. J. et al. Genomic sequencing of colorectal adenocarcinomas identifies a recurrent VTI1A-TCF7L2 fusion. Nat. Genet. 43, 964–968 (2011).
Drier, Y. et al. Somatic rearrangements across cancer reveal classes of samples with distinct patterns of DNA breakage and rearrangement-induced hypermutability. Genome Res. 23, 228–235 (2013).
Bolotin, D. A. et al. MiXCR: software for comprehensive adaptive immunity profiling. Nat. Methods 12, 380–381 (2015).
Brochet, X., Lefranc, M.-P. & Giudicelli, V. IMGT/V-QUEST: the highly customized and integrated system for IG and TR standardized V-J and V-D-J sequence analysis. Nucleic Acids Res. 36, W503–W508 (2008).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Graubert, A., Aguet, F., Ravi, A., Ardlie, K. G. & Getz, G. RNA-SeQC 2: efficient RNA-seq quality control and quantification for large cohorts. Bioinformatics 37, 3048–3050 (2021).
Robertson, A. G. et al. Comprehensive molecular characterization of muscle-invasive bladder cancer. Cell 171, 540–556 (2017).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. 57, 289–300 (1995).
Pandit, B. et al. Gain-of-function RAF1 mutations cause Noonan and LEOPARD syndromes with hypertrophic cardiomyopathy. Nat. Genet. 39, 1007–1012 (2007).
Rommel, C. et al. Activated Ras displaces 14-3-3 protein from the amino terminus of c-Raf-1. Oncogene 12, 609–619 (1996).
Dhillon, A. S., Meikle, S., Yazici, Z., Eulitz, M. & Kolch, W. Regulation of Raf-1 activation and signalling by dephosphorylation. EMBO J. 21, 64–71 (2002).
Provost, P. et al. Ribonuclease activity and RNA binding of recombinant human Dicer. EMBO J. 21, 5864–5874 (2002).
Loenarz, C. et al. Hydroxylation of the eukaryotic ribosomal decoding center affects translational accuracy. Proc. Natl Acad. Sci. USA 111, 4019–4024 (2014).
Qiu, W., Zhou, B., Darwish, D., Shao, J. & Yen, Y. Characterization of enzymatic properties of human ribonucleotide reductase holoenzyme reconstituted in vitro from hRRM1, hRRM2, and p53R2 subunits. Biochem. Biophys. Res. Commun. 340, 428–434 (2006).
Ernst, J. & Kellis, M. ChromHMM: automating chromatin-state discovery and characterization. Nat. Methods 9, 215–216 (2012).
We thank W. Zhang, S. Gohil, I. Leshchiner, D. Livitz, D. Rosebrock, J. Gribben, K. R. Rai, M. J. Keating, J. M. Hess, N. J. Haradhvala, A. Mohammed and A. Gnirke for helpful discussions. We thank C. Patterson, S. Pollock, K. Slowik, O. Olive, C. J. Shaughnessy and H. Lyon for assistance in data collection and organization. We thank the patients, their families and the investigators of the clinical trials for providing samples and clinical data. This study was supported by National Institutes of Health (NIH)/National Cancer Institute (NCI) grant P01 CA206978 (to C.J.W. and G.G.) and the Broad/IBM Cancer Resistance Research Project (G.G. and L.P.). B.A.K. was supported by a long-term EMBO fellowship (ALTF 14-2018). C.K.H. was supported by the NHLBI Training Program in Molecular Hematology (T32HL116324). F.N. acknowledges funding by the American Association for Cancer Research (2021 AACR-Amgen Fellowship in Clinical/Translational Cancer Research, 21-40-11-NADE), the European Hematology Association (EHA Junior Research Grant 2021, RG-202012-00245), and the Lady Tata Memorial Trust (International Award for Research in Leukaemia 2021-2022, LADY_TATA_21_3223). S.S. and E.T. were supported by the Deutsche Forschungsgemeinschaft (SFB1074, subproject B1, B2 and B10). A.W. and C. Sun were supported by the Intramural Research Program at NIH/NHLBI. J.A.B. was supported by MD Anderson’s Moon Shot Program in CLL and the CLL Global Research Foundation and in part by MDACC Support Grant CA016672. S.L. was supported by the NCI Research Specialist Award (R50CA251956). J.R.B. was supported by NIH grant R01 CA 213442, NIH/NCI grant P01 CA206978 and the Melton Family Foundation. X.S.P. acknowledges funding by the Spanish Ministerio de Economía y Competitividad (grants SAF2017-87811-R and PID2020-117185RB-I00). A.D.-N. was supported by the Department of Education of the Basque Government (PRE_2017_1_0100) and P.B.-M. by a fellowship by the Spanish Ministerio de Economía y Competitividad. This study was supported by “la Caixa” Foundation (CLLEvolution- LCF/PR/HR17/52150017, Health Research 2017 Program “HR17-0022” to E.C.), the European Research Council under the European Union’s Horizon 2020 research and innovation program (Project BCLLATLAS, grant agreement 810287) (to J.I.M.-S. and E.C.), the Accelerator award CRUK/AIRC/AECC joint funder-partnership (to J.I.M.-S.), Generalitat de Catalunya Suport Grups de Recerca AGAUR 2017-SGR-1142 (to E.C.) and 2017-SGR-736 (to J.I.M.-S.), CERCA Programme/Generalitat de Catalunya. E.C. is an Academia Researcher of Catalan Institution for Research and Advanced Studies.
The authors declare the following conflicts related to the CLLmap project: C.J.W. receives research support from Pharmacyclics. E.C. has been a consultant for Illumina. G.G. receives research funds from IBM and Pharmacyclics; and is an inventor on patent applications related to SignatureAnalyzer-GPU. S.S. reports honoraria for consultancy, advisory board membership, speaker honoraria, research grants and travel support from AbbVie, Amgen, AstraZeneca, Celgene, Gilead, GSK, Hoffmann La-Roche, Janssen, Novartis. C.J.W., G.G., B.A.K., Z.L. and C.K.H. are inventors on a patent “Compositions, panels, and methods for characterizing chronic lymphocytic leukemia” (PCT/US21/45144). The following conflicts are unrelated to the CLLmap project: F.N. has received honoraria from Janssen for speaking at educational activities. E.T. declares research support by AbbVie and Roche; Advisory Boards and Speakers Bureau for Janssen, AbbVie and Roche. A.W. received research funding from Pharmacyclics, Acerta, Merck, Verastem, Genmab, Nurix. J.R.B. has served as a consultant for AbbVie, Acerta/AstraZeneca, Beigene, Bristol-Myers Squibb/Juno/Celgene, Catapult, Genentech/Roche, Janssen, MEI Pharma, Morphosys AG, Novartis, Pfizer, Rigel; received research funding from Gilead, Loxo/Lilly, Verastem/SecuraBio, Sun, TG Therapeutics; and served on the data safety monitoring committee for Invectys. J.A.B. received research support from AstraZeneca, BeiGene, Gilead, and Pharmacyclics; travel and speaker honoraria from Janssen. X.S.P. is a cofounder of and holds an equity stake in DREAMgenics. C.J.W. holds equity in BioNTech, Inc.. E.C. has been a consultant for Takeda and NanoString Technologies; has received honoraria from Janssen and Roche for speaking at educational activities; and is an inventor on a Lymphoma and Leukemia Molecular Profiling Project patent “Method for subtyping lymphoma subtypes by means of expression profiling” (PCT/US2014/64161). G.G. is an inventor on patent applications related to MSMuTect, MSMutSig, MSIDetect and POLYSOLVER; and is a founder and consultant of and holds privately held equity in Scorpion Therapeutics. The other authors declare no competing interests.
Peer review information
Nature Genetics thanks Ingo Ringshausen and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended Data Fig. 1 Dataset description and representative driver gene maps.
a. Full dataset (n = 1148), with contributions by cohort and data type delineated (see Supplementary Table 1). b. Numbers of samples with genomic, epigenomic, and transcriptomic data. c. 3D protein structures of representative genes identified by CLUMPS in pan-CLL analysis (n = 984, see Supplementary Table 5). Mutated residues - red labels. A peptide from RAF1 (designated at bottom-center, in complex with 14-3-3 zeta) shows clustered mutations around S259, whose phosphorylation regulates RAF1 activity and is a cancer mutational hotspot91 that, when mutated, perturbs the interaction with the 14-3-3 zeta and upregulates RAF1 kinase activity92,93. In DICER1, mutations occur in the RNase III domain (purple), including the cancer hotspot residue E181321,94. This region is critical for Mg2+ binding and is required for ribonuclease activity to process microRNAs and mediate post-transcriptional gene regulation95. RPS23 mutations are clustered in a conserved loop of the ribosomal decoding center, surrounding P62, whose post-translational hydroxylation affects translation termination accuracy96. These RPS23 mutations have a median CCF > 80% (Extended Data Fig. 6d; Supplementary Table 3). d. Individual mutations maps of selected novel, putative driver genes. Mutation subtype and position are shown. e. Selected genes identified by CLUMPS in IGHV subtypes; mutated residues - red. Although BRAF was not identified as a potential M-CLL driver via MutSig2CV (see Extended Data Fig. 3, Methods), CLUMPS revealed three mutated sites clustered in the kinase domain (purple) that are cancer hotspots25, thus confirming BRAF as a shared driver (left). Mutated residues in BRAF in U-CLL (bottom) are shown for comparison, revealing a greater number of clustered mutations relative to M-CLL. In U-CLL, novel mutations were found in RRM1 (right). Somatic alterations were clustered in the N-terminal ATP-binding site (purple) and therefore have potential to impact enzymatic activity97.
Extended Data Fig. 2 CLL biological pathways affected by candidate driver genes.
a. Schema of CLL pathways containing previously identified (black) and novel (magenta) putative driver genes (see Supplementary Table 6). Novel drivers cluster in central processes driving CLL (for example, DNA damage, chromatin modification, RNA processing)1,2, but also highlight new pathways not previously implicated by driver genes (for example, cytoskeleton and extracellular matrix, proteostasis, metabolism). Asterisks - mutated genes discovered by CLUMPs. b. Stacked barplot ranked by the number of candidate driver genes per CLL pathway. Magenta bars show the number of newly identified drivers in each pathway.
Extended Data Fig. 3 Candidate driver alterations discovered in IGHV subtypes.
a-b. Landscape of putative driver genes and sCNAs in M-CLL (a, n = 512) and U-CLL (b, n = 459) with associated frequencies (rows, barplots). Header tracks annotate cohort, IGHV status (purple, M-CLL; orange, U-CLL), disease type (blue, CLL; yellow, MBL), epitype (blue, n-CLL; yellow, i-CLL; red, m-CLL), datatype (white, WES; yellow, WGS; blue, both); prior treatment, U1 and IGLV3-21R110 mutations are annotated in black; magenta label - novel alterations; asterisks - discovery by CLUMPS.
Extended Data Fig. 4 Chromosomal gains and losses identified in IGHV subtypes.
a-b. Recurrent copy-number gains (left) and losses (right) by GISTIC analysis showing arm-level (left per plot) and focal events (right per plot) in M-CLL (a, n = 512) and U-CLL (b, n = 459). Chromosomes are labeled along the vertical axis; dashed line - significance at q = 0.1. Blacklisted regions are colored gray. All arm-level events are labeled with cytoband arm and frequency in cohort. Focal events are annotated by cytoband, frequency, number of genes encompassed in peak (bracketed), and genes of interest. Red/blue font: novel focal events with frequency >2%. Black font: previously identified events (see Supplementary Table 7).
Extended Data Fig. 5 Landscape of driver alterations and chromosomal aberrations in IGHV subtypes.
a. The genomic landscape of CLL IGHV subtypes. Driver genes, U1 and IGLV3-21R110 mutations are labeled according to their genomic location (outside ring, numbered by chromosome). The tracks show the frequency and locations of driver genes in M-CLL (purple) vs. U-CLL (orange) (track 1; outermost), focal sCNAs (track 2; gains, red; losses, blue), and density of SV breakpoints of deletions (track 3) and translocations (track 4) (M-CLL n = 88; U-CLL n = 87; WGS, windows of 1-Mb). Innermost plot highlights translocations in which either one or both breakpoints are recurrent in at least 3 cases (windows of 1-Mb considered to define recurrence) in M-CLL (purple) and U-CLL (orange). Deletions, inversions, and tandem duplications where both breakpoints were found in at least 2 cases and did not overlap with a driver sCNA are shown (Note: only focal deletion in SP140 in two U-CLL cases met this criterion. b. Schema of recurrent IG-BCL2 translocation and IGH-ZFP36L1 deletion in the WGS cohort. All 5 BCL2 translocations were in M-CLL with immunoglobulin (IG) breakpoints in J or D genes, suggesting mediation by aberrant V(D)J recombination. In contrast, 4 U-CLL cases carried IGH-ZFP36L1 truncating deletions, which were all clonal (CCF = 1). Breakpoints in IGH class-switch regions suggested mediation by aberrant class-switch recombination (CSR). c. Immunoglobulin (IG) SVs in 177 WGS and 984 WES. In WES, 9 of 10 BCL2 translocations were in M-CLL and mediated by aberrant V(D)J recombination in IGH (n = 7) or IGK (n = 2). The sole BCL2 translocation in U-CLL was due to aberrant CSR. One CSR-mediated IGH-ZFP36L1 deletion was observed in a case with unclassified IGHV status due to presence of two populations (one M-CLL, one U-CLL; the latter was more prevalent). Of note, in WES, U-CLLs carry a higher number of non-recurrent IG events than M-CLL.
Extended Data Fig. 6 Mutational mechanisms and cancer cell fractions of candidate drivers.
a. Eight mutational signatures were identified in 177 WGS, but 3 signatures corresponded to known artifacts and were therefore excluded (see Supplementary Note 2). Boxplots demonstrating mutation contribution for each of the 5 signatures are labeled with single-base substitution (SBS) number and identity (per COSMIC v3.1). b. Comparison of the normalized signature intensity of the mutational signatures in U-CLL (orange, n = 87) vs. M-CLL (purple, n = 88). The nc-AID and c-AID 1 signatures were enriched in M-CLL, whereas the aging signature was more prevalent in U-CLL. Although not significant, there was a trend of increased mutations due to the c-AID 2 signature in U-CLL. All p-values were calculated with Wilcoxon rank-sum test, two-sided. Boxplots: center line, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range; points, outliers. c. Proportions of clustered mutations contributed by the two c-AID related signatures (SBS84, c-AID 1 vs. SBS85, c-AID 2) for each IGHV subtype (M-CLL, purple; U-CLL, orange) d. Mean cancer cell fraction (CCF) for each non-silent mutation across all candidate driver genes identified in WES samples (n = 984). Color of dots depicts the IGHV subtype (M-CLL, purple; U-CLL, orange). The horizontal red line is the threshold for clonality (CCF > 85%). Magenta labels - newly identified putative driver genes. The number of non-silent mutations per driver gene is shown at the bottom. Boxplots: center line, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range.
Extended Data Fig. 7 Development and validation of epitype assignment and epiCMIT in RRBS data.
a. Consensus clustering matrices for K = 3 groups for paired-end (n = 136; 153 CpGs in consensus matrix) and single-end (n = 388; 32 CpGs) RRBS data. (d). b. Empirical cumulative distribution functions (CDFs) for consensus matrices with K = 2 to K = 7. c. Relative change under the CDF for K = 2 to K = 7. d. Heatmaps of the CpGs used for consensus clustering in (a). Each sample (columns) is annotated by tracks: epitype max probability, IGHV status (M-CLL, purple; U-CLL, orange), IGHV percent identity, and presence of IGLV3-21R110 mutation (black). e. The development of the new epiCMIT methodology for RRBS data. The genome was segmented into Chromatin Hidden Markov Model (CHMM)24 states using ChIP-seq data to get repressed chromatin regions, where differential DNA methylation analyses was performed in high coverage whole-genome bisulfite sequencing (WGBS) data between the cells with the lowest and highest accumulated cell divisions in the B cell lineage, namely the hematopoietic precursor cells (HPC) and bone-marrow plasma cells (bmPC). Only CPGs showing extensive differences were retained and constituted the epiCMIT-hyper CpGs or epiCMIT-hypo CpGs depending whether they gain or lose DNA methylation from <0.1 to ≥0.5 and from >0.9 to ≤0.5 from HPC to bmPC, respectively. EpiCMIT-hyper and epiCMIT-hypo scores were calculated according to the available epiCMIT-CpGs per sample, and the higher score in each sample was then selected. f. epiCMIT values on the same samples profiled twice with different platforms. Approach 1 - profiled with Illumina-450K (green); approach 2 - profiled with RRBS-PE (violet). In samples profiled with Illumina 450K, the original epiCMIT-CpGs were used52. In samples profiled with RRBS, epiCMIT was calculated with all available epiCMIT-CpGs for the new catalog (e, Methods). P-value by Pearson correlation test, two-sided; Error band − 95% confidence intervals of the Pearson correlation coefficient.
Extended Data Fig. 8 Identification of expression clusters with associated biologic features.
a. Cohort representation in each expression cluster. b. Consensus matrix for RNA expression profiles of 603 treatment-naive CLLs by repeated hierarchical clustering with 80% resampling and varying cutoffs for number of clusters, which is inputted to the BayesNMF procedure (Methods). c. Uniform manifold approximation and projection (UMAP) showing clustering of ECs (n = 603; EC-u clusters (top), EC-m and EC-o (middle), EC-i (bottom)). Analysis was performed using the marker genes identified by BayesNMF. d. UMAP of H3K27ac profiles (n = 104)8 denoting EC designation where available (colored points, n = 73) and IGHV status. e. Comparison of the percent IGHV identity among ECs. Dotted line: 98% threshold defining M-CLL and U-CLL. P-values by two-sided t-tests. Boxplots: center line, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range. f. Comparison of the percent IGHV identity between those samples with concordant IGHV status and ECs (for example, M-CLLs in EC-m clusters) versus the discordant samples (for example, M-CLLs in EC-u clusters). IGHV-mutated cases - left; IGHV unmutated samples - right. P-values by two-sided t-tests. Boxplots: center line, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range. g. Percentage of cases carrying stereotyped immunoglobulin genes within each EC. Red horizontal line: percentage of stereotyped cases in the whole cohort. h. Fraction of cases classified in each CLL stereotype subset according to their EC. i. Percentage of IGHV (left) and IG(K/L)V (right) gene usage within each EC. IGKV genes from proximal and distal clusters were merged for simplification. All p-values were calculated using Chi-squared tests corrected by the Benjamini–Hochberg procedure (q-values, q). q < 0.1; *, q < 0.05; **, q < 0.001; ***, q < 0.0001. j-k. Heatmaps showing upregulated (j) and downregulated (k) H3K27ac levels of EC marker genes and 2,000 bp upstream to capture regulatory regions (Methods).
Extended Data Fig. 9 EC differential gene expression, pathway activity, and classifier.
a. Differentially expressed genes per EC (red) using discovery set (n = 603); EC marker genes by BayesNMF (blue). Significant up- or downregulation of H3K27ac levels are directionally marked with triangles (ChIP-seq available for n = 73; n = 1 for EC-o and EC-i, thus unevaluable). b. EC gene set enrichment analysis (GSEA). Diamond denotes the EC compared to all others (circles). c. Confusion matrix for the EC classifier on the test set (“Dominance” defined in Methods). d. Confidence in correctly classified samples (n = 95) is greater than for incorrectly classified samples (n = 25; two-sided t-test). “Prediction margin” defined in Methods. Boxplots: center line, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range. e. Receiver-operator curve (ROC) showing the tradeoff between sensitivity and specificity for the range of cutoffs that can be applied based on the “prediction margin”, where samples under the cutoff are excluded from performance evaluation. AUC, area under curve. f. Precision-recall (PR) curves for EC classification performance on the test set (n = 120), using the selected model (see Methods). The weighted average of AUC is 0.88. g. Performance metrics for models trained with differing amounts of input genes, demonstrating accuracy even with smaller gene sets. Metrics: Accuracy, overall; Average, weighted average across ECs (Methods). Nc, Ntot - number of genes (see Methods). h. EC distributions by BayesNMF compared to classifier predictions on the discovery cohort (n = 603), an extension cohort not included discovery (n = 105), and an external CLL cohort (n = 136)61. i. IGHV status distributions per EC in discovery (n = 603) and external (n = 136) cohorts. The difference in IGHV-mutated samples per EC is 2-10% (p > 0.05, Fisher’s Exact, Methods). j. Stability of the ECs over time in longitudinally sampled CLL samples3. Sample timepoints (x-axis); years between first and last sample (above curve).
Supplementary Notes 1-4 and Supplementary Figures 1–3.
Supplementary Tables 1–15.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Knisbacher, B.A., Lin, Z., Hahn, C.K. et al. Molecular map of chronic lymphocytic leukemia and its impact on outcome. Nat Genet 54, 1664–1674 (2022). https://doi.org/10.1038/s41588-022-01140-w