Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Molecular map of chronic lymphocytic leukemia and its impact on outcome

Abstract

Recent advances in cancer characterization have consistently revealed marked heterogeneity, impeding the completion of integrated molecular and clinical maps for each malignancy. Here, we focus on chronic lymphocytic leukemia (CLL), a B cell neoplasm with variable natural history that is conventionally categorized into two subtypes distinguished by extent of somatic mutations in the heavy-chain variable region of immunoglobulin genes (IGHV). To build the ‘CLL map,’ we integrated genomic, transcriptomic and epigenomic data from 1,148 patients. We identified 202 candidate genetic drivers of CLL (109 new) and refined the characterization of IGHV subtypes, which revealed distinct genomic landscapes and leukemogenic trajectories. Discovery of new gene expression subtypes further subcategorized this neoplasm and proved to be independent prognostic factors. Clinical outcomes were associated with a combination of genetic, epigenetic and gene expression features, further advancing our prognostic paradigm. Overall, this work reveals fresh insights into CLL oncogenesis and prognostication.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Increased power enables CLL driver gene detection.
Fig. 2: M-CLL and U-CLL have unique genomic landscapes.
Fig. 3: CLL subtypes based on epigenetic and transcriptomic features.
Fig. 4: ECs and integrated analysis predict clinical outcome.

Similar content being viewed by others

Data availability

The molecular data used in this study are publicly available and are included in the following patient cohorts (Table 1, Supplementary Tables 1 and 2 and Extended Data Fig. 1a): DFCI, Dana-Farber Cancer Institute; GCLLSG, German CLL Study Group; ICGC, International Cancer Genome Consortium; MDACC, MD Anderson Cancer Center; NHLBI, National Heart Lung and Blood Institute; UCSD, University of California San Diego. Sequencing, expression, and genotyping is available at EGA (http://www.ebi.ac.uk/ega/), which is hosted at the European Bioinformatics Institute, under accession number EGAS00000000092 (ICGC cohort) and in dbGaP under accession numbers phs001473.v2.p1 (MDACC, NHLBI), phs000922.v2.p1 (GCLLSG), phs001431.v2.p1 (DFCI, UCSD), phs001091.v1.p1 (MDACC), phs000435.v3.p1 (DFCI), phs002297.v2.p1 (NHLBI) and phs000879.v1.p1 (DFCI) and GEO accession number GSE143673 (GCLLSG). 450K array data are available at EGA under accession number EGAD00010001975 (ICGC). The project data portal is available at https://cllmap.org.

Code availability

Terra methods used in the study can be found at https://app.terra.bio/#workspaces/broad-firecloud-wupo1/CLLmap_Methods_Apr2021. Source code used in the study can be found at https://github.com/getzlab/CLLmap. The RFcaller pipeline is available at https://github.com/xa-lab/RFcaller. The new epiCMIT suitable for Illumina arrays and NGS approaches as well as the CLL epitype classifier can be found at https://github.com/Duran-FerrerM/CLLmap-epigenetics.

References

  1. Landau, D. A. et al. Mutations driving CLL and their evolution in progression and relapse. Nature 526, 525 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Puente, X. S. et al. Non-coding recurrent mutations in chronic lymphocytic leukaemia. Nature 526, 519 (2015).

    Article  CAS  PubMed  Google Scholar 

  3. Gruber, M. et al. Growth dynamics in naturally progressing chronic lymphocytic leukaemia. Nature 570, 474–479 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Dvinge, H. et al. Sample processing obscures cancer-specific alterations in leukemic transcriptomes. Proc. Natl Acad. Sci. USA 111, 16802–16807 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Ferreira, P. G. et al. Transcriptome characterization by RNA sequencing identifies a major molecular and clinical subdivision in chronic lymphocytic leukemia. Genome Res. 24, 212–226 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Oakes, C. C. et al. DNA methylation dynamics during B cell maturation underlie a continuum of disease phenotypes in chronic lymphocytic leukemia. Nat. Genet. 48, 253–264 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Kulis, M. et al. Epigenomic analysis detects widespread gene-body DNA hypomethylation in chronic lymphocytic leukemia. Nat. Genet. 44, 1236–1242 (2012).

    Article  CAS  PubMed  Google Scholar 

  8. Beekman, R. et al. The reference epigenome and regulatory chromatin landscape of chronic lymphocytic leukemia. Nat. Med. 24, 868–880 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Bloehdorn, J. et al. Multi-platform profiling characterizes molecular subgroups and resistance networks in chronic lymphocytic leukemia. Nat. Commun. 12, 5395 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Landau, D. A. et al. The evolutionary landscape of chronic lymphocytic leukemia treated with ibrutinib targeted therapy. Nat. Commun. 8, 2185 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Kasar, S. et al. Whole-genome sequencing reveals activation-induced cytidine deaminase signatures during indolent chronic lymphocytic leukaemia evolution. Nat. Commun. 6, 8866 (2015).

    Article  CAS  PubMed  Google Scholar 

  12. Burger, J. A. et al. Safety and activity of ibrutinib plus rituximab for patients with high-risk chronic lymphocytic leukaemia: a single-arm, phase 2 study. Lancet Oncol. 15, 1090–1099 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Burger, J. A. et al. Clonal evolution in patients with chronic lymphocytic leukaemia developing resistance to BTK inhibition. Nat. Commun. 7, 11589 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Shuai, S. et al. The U1 spliceosomal RNA is recurrently mutated in multiple cancers. Nature 574, 712–716 (2019).

    Article  CAS  PubMed  Google Scholar 

  15. Minici, C. et al. Distinct homotypic B-cell receptor interactions shape the outcome of chronic lymphocytic leukaemia. Nat. Commun. 8, 15746 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Maity, P. C. et al. IGLV3-21*01 is an inherited risk factor for CLL through the acquisition of a single-point mutation enabling autonomous BCR signaling. Proc. Natl Acad. Sci. USA 117, 4320–4327 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Lawrence, M. S. et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Kleinstern, G. et al. Tumor mutational load predicts time to first treatment in chronic lymphocytic leukemia (CLL) and monoclonal B‐cell lymphocytosis beyond the CLL international prognostic index. Am. J. Hematol. 95, 906–917 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Leeksma, A. C. et al. Clonal diversity predicts adverse outcome in chronic lymphocytic leukemia. Leukemia 33, 390–402 (2019).

    Article  CAS  PubMed  Google Scholar 

  20. Landau, D. A. et al. Evolution and impact of subclonal mutations in chronic lymphocytic leukemia. Cell 152, 714–726 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Kamburov, A. et al. Comprehensive assessment of cancer missense mutation clustering in protein structures. Proc. Natl Acad. Sci. USA 112, E5486–E5495 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Dziembowski, A., Lorentzen, E., Conti, E. & Séraphin, B. A single subunit, Dis3, is essentially responsible for yeast exosome core activity. Nat. Struct. Mol. Biol. 14, 15–22 (2007).

    Article  CAS  PubMed  Google Scholar 

  23. Chapman, M. A. et al. Initial genome sequencing and analysis of multiple myeloma. Nature 471, 467–472 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Chang, M. T. et al. Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity. Nat. Biotechnol. 34, 155–163 (2016).

    Article  CAS  PubMed  Google Scholar 

  25. Amblar, M., Barbas, A., Fialho, A. M. & Arraiano, C. M. Characterization of the functional domains of Escherichia coli RNase II. J. Mol. Biol. 360, 921–933 (2006).

    Article  CAS  PubMed  Google Scholar 

  26. Papamichos-Chronakis, M., Watanabe, S., Rando, O. J. & Peterson, C. L. Global regulation of H2A.Z localization by the INO80 chromatin-remodeling enzyme is essential for genome integrity. Cell 144, 200–213 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. McKinney, M. et al. The genetic basis of hepatosplenic T-cell lymphoma. Cancer Discov. 7, 369–379 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. López, C. et al. Genomic and transcriptomic changes complement each other in the pathogenesis of sporadic Burkitt lymphoma. Nat. Commun. 10, 1459 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  29. Weber, J. et al. PiggyBac transposon tools for recessive screening identify B-cell lymphoma drivers in mice. Nat. Commun. 10, 1415 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Edelmann, J. et al. High-resolution genomic profiling of chronic lymphocytic leukemia reveals new recurrent genomic alterations. Blood 120, 4783–4794 (2012).

    Article  CAS  PubMed  Google Scholar 

  31. Setlur, S. R. et al. Comparison of familial and sporadic chronic lymphocytic leukaemia using high resolution array comparative genomic hybridization. Br. J. Haematol. 151, 336–345 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  32. Stilgenbauer, S. et al. Incidence and clinical significance of 6q deletions in B cell chronic lymphocytic leukemia. Leukemia 13, 1331–1334 (1999).

    Article  CAS  PubMed  Google Scholar 

  33. Boultwood, J. et al. Narrowing and genomic annotation of the commonly deleted region of the 5q− syndrome. Blood 99, 4638–4641 (2002).

    Article  CAS  PubMed  Google Scholar 

  34. Schneider, R. K. et al. Rps14 haploinsufficiency causes a block in erythroid differentiation mediated by S100A8 and S100A9. Nat. Med. 22, 288–297 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Ciccia, A. et al. Treacher Collins syndrome TCOF1 protein cooperates with NBS1 in the DNA damage response. Proc. Natl Acad. Sci. 111, 18631–18636 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Nowinski, S. M. et al. Mitochondrial uncoupling links lipid catabolism to Akt inhibition and resistance to tumorigenesis. Nat. Commun. 6, 8137 (2015).

    Article  CAS  PubMed  Google Scholar 

  37. Aguilar, E. et al. UCP2 Deficiency increases colon tumorigenesis by promoting lipid synthesis and depleting NADPH for antioxidant defenses. Cell Rep. 28, 2306–2316.e5 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Sondka, Z. et al. The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers. Nat. Rev. Cancer 18, 696–705 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Burns, A. et al. Whole-genome sequencing of chronic lymphocytic leukaemia reveals distinct differences in the mutational landscape between IgHVmut and IgHVunmut subgroups. Leukemia 32, 332–342 (2018).

    Article  CAS  PubMed  Google Scholar 

  40. Zhang, Q., Lenardo, M. J. & Baltimore, D. 30 Years of NF-κB: a blossoming of relevance to human pathobiology. Cell 168, 37–57 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Gandhi, V. & Plunkett, W. Cellular and clinical pharmacology of fludarabine. Clin. Pharmacokinet. 41, 93–103 (2002).

    Article  CAS  PubMed  Google Scholar 

  42. Sellmann, L. et al. Trisomy 19 is associated with trisomy 12 and mutated IGHV genes in B‐chronic lymphocytic leukaemia. Br. J. Haematol. 138, 217–220 (2007).

    Article  PubMed  Google Scholar 

  43. Shilatifard, A. The COMPASS family of histone H3K4 methylases: mechanisms of regulation in development and disease pathogenesis. Annu. Rev. Biochem. 81, 65–95 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Kleinstern, G. et al. Tumor mutational load predicts time to first treatment in chronic lymphocytic leukemia (CLL) and monoclonal B-cell lymphocytosis beyond the CLL international prognostic index. Am. J. Hematol. 95, 906–917 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Nadeu, F. et al. IgCaller for reconstructing immunoglobulin gene rearrangements and oncogenic translocations from whole-genome sequencing in lymphoid neoplasms. Nat. Commun. 11, 3390 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Hodson, D. J. et al. Deletion of the RNA-binding proteins ZFP36L1 and ZFP36L2 leads to perturbed thymic development and T lymphoblastic leukemia. Nat. Immunol. 11, 717–724 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Oppezzo, P. et al. Chronic lymphocytic leukemia B cells expressing AID display dissociation between class switch recombination and somatic hypermutation. Blood 101, 4029–4032 (2003).

    Article  CAS  PubMed  Google Scholar 

  48. Roco, J. A. et al. Class-switch recombination occurs infrequently in germinal centers. Immunity 51, 337–350.e7 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Leshchiner, I. et al. Comprehensive analysis of tumour initiation, spatial and temporal progression under multiple lines of treatment. Preprint at bioRxiv https://doi.org/10.1101/508127 (2019).

  50. Tausch, E. et al. Prognostic and predictive impact of genetic markers in patients with CLL treated with obinutuzumab and venetoclax. Blood 135, 2402–2412 (2020).

    Article  PubMed  Google Scholar 

  51. Burger, J. A. et al. Long-term efficacy and safety of first-line ibrutinib treatment for patients with CLL/SLL: 5 years of follow-up from the phase 3 RESONATE-2 study. Leukemia 34, 787–798 (2020).

    Article  CAS  PubMed  Google Scholar 

  52. Duran-Ferrer, M. et al. The proliferative history shapes the DNA methylome of B-cell tumors and predicts clinical outcome. Nat. Cancer 1, 1066–1081 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Sellmann, L. et al. Trisomy 19 is associated with trisomy 12 and mutatedIGHVgenes in B-chronic lymphocytic leukaemia. Br. J. Haematol. 138, 217–220 (2007).

    Article  PubMed  Google Scholar 

  54. Nadeu, F. et al. IGLV3-21R110 identifies an aggressive biological subtype of chronic lymphocytic leukemia with intermediate epigenetics.Blood 137, 2395–2946 (2021).

    Article  Google Scholar 

  55. Agathangelidis, A. et al. Higher-order connections between stereotyped subsets: implications for improved patient classification in CLL. Blood 137, 1365–1376 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Dobbelstein, M., Strano, S., Roth, J. & Blandino, G. p73-induced apoptosis: a question of compartments and cooperation. Biochem. Biophys. Res. Commun. 331, 688–693 (2005).

    Article  CAS  PubMed  Google Scholar 

  57. Chinnadurai, G., Vijayalingam, S. & Rashmi, R. BIK, the founding member of the BH3-only family proteins: mechanisms of cell death and role in cancer and pathogenic processes. Oncogene 27, S20–S29 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Wang, W. et al. MAPK4 overexpression promotes tumor progression via noncanonical activation of AKT/mTOR signaling. J. Clin. Invest. 129, 1015–1029 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  59. Herling, C. D. et al. Time-to-progression after front-line fludarabine, cyclophosphamide, and rituximab chemoimmunotherapy for chronic lymphocytic leukaemia: a retrospective, multicohort study. Lancet Oncol. 20, 1576–1586 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Bilban, M. et al. Deregulated expression of fat and muscle genes in B-cell chronic lymphocytic leukemia with high lipoprotein lipase expression. Leukemia 20, 1080–1088 (2006).

    Article  CAS  PubMed  Google Scholar 

  61. Dietrich, S. et al. Drug-perturbation-based stratification of blood cancer. J. Clin. Invest. 128, 427–445 (2018).

    Article  PubMed  Google Scholar 

  62. Stilgenbauer, S. et al. Gene mutations and treatment outcome in chronic lymphocytic leukemia: results from the CLL8 trial. Blood 123, 3247–3254 (2014).

    Article  CAS  PubMed  Google Scholar 

  63. Stilgenbauer, S. et al. Alemtuzumab combined with dexamethasone, followed by alemtuzumab maintenance or Allo-SCT in ‘ultra high-risk’ CLL: Final results from the CLL2O phase II study. Blood 124, 1991–1991 (2014).

    Article  Google Scholar 

  64. Wang, L. et al. SF3B1 and other novel cancer genes in chronic lymphocytic leukemia. N. Engl. J. Med. 365, 2497–2506 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Landau, D. A. et al. Locally disordered methylation forms the basis of intratumor methylome variation in chronic lymphocytic leukemia. Cancer Cell 26, 813–825 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Javed, N. et al. Detecting sample swaps in diverse NGS data types using linkage disequilibrium. Nat. Commun. 11, 3697 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Costello, M. et al. Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation. Nucleic Acids Res. 41, e67 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Cibulskis, K. et al. ContEst: estimating cross-contamination of human samples in next-generation sequencing data. Bioinformatics 27, 2601–2602 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Berger, M. F. et al. The genomic complexity of primary human prostate cancer. Nature 470, 214–220 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Benjamin, D. et al. Calling somatic SNVs and indels with Mutect2. Preprint at bioRxiv 861054 https://doi.org/10.1101/861054 (2019).

  73. Kim, S. et al. Strelka2: fast and accurate calling of germline and somatic variants. Nat. Methods 15, 591–594 (2018).

    Article  CAS  PubMed  Google Scholar 

  74. Taylor-Weiner, A. et al. DeTiN: overcoming tumor-in-normal contamination. Nat. Methods 15, 531–534 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Kim, J. et al. Somatic ERCC2 mutations are associated with a distinct genomic signature in urothelial tumors. Nat. Genet. 48, 600–606 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Olshen, A. B., Venkatraman, E. S., Lucito, R. & Wigler, M. Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5, 557–572 (2004).

    Article  PubMed  Google Scholar 

  79. Mermel, C. H. et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 12, R41 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  80. Morton, L. M. et al. Radiation-related genomic profile of papillary thyroid carcinoma after the Chernobyl accident. Science https://doi.org/10.1126/science.abg2538 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  81. Chen, X. et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32, 1220–1222 (2016).

    Article  CAS  PubMed  Google Scholar 

  82. Wala, J. A. et al. SvABA: genome-wide detection of structural variants and indels by local assembly. Genome Res. 28, 581–591 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. Bass, A. J. et al. Genomic sequencing of colorectal adenocarcinomas identifies a recurrent VTI1A-TCF7L2 fusion. Nat. Genet. 43, 964–968 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Drier, Y. et al. Somatic rearrangements across cancer reveal classes of samples with distinct patterns of DNA breakage and rearrangement-induced hypermutability. Genome Res. 23, 228–235 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. Bolotin, D. A. et al. MiXCR: software for comprehensive adaptive immunity profiling. Nat. Methods 12, 380–381 (2015).

    Article  CAS  PubMed  Google Scholar 

  86. Brochet, X., Lefranc, M.-P. & Giudicelli, V. IMGT/V-QUEST: the highly customized and integrated system for IG and TR standardized V-J and V-D-J sequence analysis. Nucleic Acids Res. 36, W503–W508 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    Article  CAS  PubMed  Google Scholar 

  88. Graubert, A., Aguet, F., Ravi, A., Ardlie, K. G. & Getz, G. RNA-SeQC 2: efficient RNA-seq quality control and quantification for large cohorts. Bioinformatics 37, 3048–3050 (2021).

    Article  CAS  PubMed Central  Google Scholar 

  89. Robertson, A. G. et al. Comprehensive molecular characterization of muscle-invasive bladder cancer. Cell 171, 540–556 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. 57, 289–300 (1995).

    Google Scholar 

  91. Pandit, B. et al. Gain-of-function RAF1 mutations cause Noonan and LEOPARD syndromes with hypertrophic cardiomyopathy. Nat. Genet. 39, 1007–1012 (2007).

    Article  CAS  PubMed  Google Scholar 

  92. Rommel, C. et al. Activated Ras displaces 14-3-3 protein from the amino terminus of c-Raf-1. Oncogene 12, 609–619 (1996).

    CAS  PubMed  Google Scholar 

  93. Dhillon, A. S., Meikle, S., Yazici, Z., Eulitz, M. & Kolch, W. Regulation of Raf-1 activation and signalling by dephosphorylation. EMBO J. 21, 64–71 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  94. Provost, P. et al. Ribonuclease activity and RNA binding of recombinant human Dicer. EMBO J. 21, 5864–5874 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  95. Loenarz, C. et al. Hydroxylation of the eukaryotic ribosomal decoding center affects translational accuracy. Proc. Natl Acad. Sci. USA 111, 4019–4024 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  96. Qiu, W., Zhou, B., Darwish, D., Shao, J. & Yen, Y. Characterization of enzymatic properties of human ribonucleotide reductase holoenzyme reconstituted in vitro from hRRM1, hRRM2, and p53R2 subunits. Biochem. Biophys. Res. Commun. 340, 428–434 (2006).

    Article  CAS  PubMed  Google Scholar 

  97. Ernst, J. & Kellis, M. ChromHMM: automating chromatin-state discovery and characterization. Nat. Methods 9, 215–216 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank W. Zhang, S. Gohil, I. Leshchiner, D. Livitz, D. Rosebrock, J. Gribben, K. R. Rai, M. J. Keating, J. M. Hess, N. J. Haradhvala, A. Mohammed and A. Gnirke for helpful discussions. We thank C. Patterson, S. Pollock, K. Slowik, O. Olive, C. J. Shaughnessy and H. Lyon for assistance in data collection and organization. We thank the patients, their families and the investigators of the clinical trials for providing samples and clinical data. This study was supported by National Institutes of Health (NIH)/National Cancer Institute (NCI) grant P01 CA206978 (to C.J.W. and G.G.) and the Broad/IBM Cancer Resistance Research Project (G.G. and L.P.). B.A.K. was supported by a long-term EMBO fellowship (ALTF 14-2018). C.K.H. was supported by the NHLBI Training Program in Molecular Hematology (T32HL116324). F.N. acknowledges funding by the American Association for Cancer Research (2021 AACR-Amgen Fellowship in Clinical/Translational Cancer Research, 21-40-11-NADE), the European Hematology Association (EHA Junior Research Grant 2021, RG-202012-00245), and the Lady Tata Memorial Trust (International Award for Research in Leukaemia 2021-2022, LADY_TATA_21_3223). S.S. and E.T. were supported by the Deutsche Forschungsgemeinschaft (SFB1074, subproject B1, B2 and B10). A.W. and C. Sun were supported by the Intramural Research Program at NIH/NHLBI. J.A.B. was supported by MD Anderson’s Moon Shot Program in CLL and the CLL Global Research Foundation and in part by MDACC Support Grant CA016672. S.L. was supported by the NCI Research Specialist Award (R50CA251956). J.R.B. was supported by NIH grant R01 CA 213442, NIH/NCI grant P01 CA206978 and the Melton Family Foundation. X.S.P. acknowledges funding by the Spanish Ministerio de Economía y Competitividad (grants SAF2017-87811-R and PID2020-117185RB-I00). A.D.-N. was supported by the Department of Education of the Basque Government (PRE_2017_1_0100) and P.B.-M. by a fellowship by the Spanish Ministerio de Economía y Competitividad. This study was supported by “la Caixa” Foundation (CLLEvolution- LCF/PR/HR17/52150017, Health Research 2017 Program “HR17-0022” to E.C.), the European Research Council under the European Union’s Horizon 2020 research and innovation program (Project BCLLATLAS, grant agreement 810287) (to J.I.M.-S. and E.C.), the Accelerator award CRUK/AIRC/AECC joint funder-partnership (to J.I.M.-S.), Generalitat de Catalunya Suport Grups de Recerca AGAUR 2017-SGR-1142 (to E.C.) and 2017-SGR-736 (to J.I.M.-S.), CERCA Programme/Generalitat de Catalunya. E.C. is an Academia Researcher of Catalan Institution for Research and Advanced Studies.

Author information

Authors and Affiliations

Authors

Contributions

C.J.W., G.G., E.C. and S.S. conceived the study. E.T., J.D., S.M.F., C. Sun, M.S., L.Z.R., C. Schneider, L.P., J.A.B., A.W., T.J.K., J.R.B., M.H. and S.S. collected and contributed samples and annotations. B.A.K., Z.L., C.K.H., K.E.S., A.T.-W. and J.G.-A. assembled the data. B.A.K., Z.L., F.N., M.D.-F., K.E.S., A.B.-M., A.T.-W., P.B.-M., A.D.-N., A.D., S.A., H.K., F.A. and C. Stewart wrote analytic pipelines. B.A.K., Z.L., F.N., M.D.-F., K.E.S., A.B.-M., P.B.-M., A.D.-N., S.A. and C. Stewart performed the analysis. B.A.K., Z.L., C.K.H., F.N., M.D.-F., K.E.S., E.T., J.D., A.B.-M., P.B.-M., A.D.-N., S.L.-T., A.M., F.A., C. Stewart, J.R.B., D.S.N., J.I.M-S., X.S.P., S.S., C.J.W., E.C. and G.G. contributed to study design and interpreted the data. S.L. performed targeted sequencing. C.K.H., B.A.K., Z.L., C.J.W. and G.G. prepared the manuscript with input from all authors.

Corresponding authors

Correspondence to Catherine J. Wu or Gad Getz.

Ethics declarations

Competing interests

The authors declare the following conflicts related to the CLLmap project: C.J.W. receives research support from Pharmacyclics. E.C. has been a consultant for Illumina. G.G. receives research funds from IBM and Pharmacyclics; and is an inventor on patent applications related to SignatureAnalyzer-GPU. S.S. reports honoraria for consultancy, advisory board membership, speaker honoraria, research grants and travel support from AbbVie, Amgen, AstraZeneca, Celgene, Gilead, GSK, Hoffmann La-Roche, Janssen, Novartis. C.J.W., G.G., B.A.K., Z.L. and C.K.H. are inventors on a patent “Compositions, panels, and methods for characterizing chronic lymphocytic leukemia” (PCT/US21/45144). The following conflicts are unrelated to the CLLmap project: F.N. has received honoraria from Janssen for speaking at educational activities. E.T. declares research support by AbbVie and Roche; Advisory Boards and Speakers Bureau for Janssen, AbbVie and Roche. A.W. received research funding from Pharmacyclics, Acerta, Merck, Verastem, Genmab, Nurix. J.R.B. has served as a consultant for AbbVie, Acerta/AstraZeneca, Beigene, Bristol-Myers Squibb/Juno/Celgene, Catapult, Genentech/Roche, Janssen, MEI Pharma, Morphosys AG, Novartis, Pfizer, Rigel; received research funding from Gilead, Loxo/Lilly, Verastem/SecuraBio, Sun, TG Therapeutics; and served on the data safety monitoring committee for Invectys. J.A.B. received research support from AstraZeneca, BeiGene, Gilead, and Pharmacyclics; travel and speaker honoraria from Janssen. X.S.P. is a cofounder of and holds an equity stake in DREAMgenics. C.J.W. holds equity in BioNTech, Inc.. E.C. has been a consultant for Takeda and NanoString Technologies; has received honoraria from Janssen and Roche for speaking at educational activities; and is an inventor on a Lymphoma and Leukemia Molecular Profiling Project patent “Method for subtyping lymphoma subtypes by means of expression profiling” (PCT/US2014/64161). G.G. is an inventor on patent applications related to MSMuTect, MSMutSig, MSIDetect and POLYSOLVER; and is a founder and consultant of and holds privately held equity in Scorpion Therapeutics. The other authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Ingo Ringshausen and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Dataset description and representative driver gene maps.

a. Full dataset (n = 1148), with contributions by cohort and data type delineated (see Supplementary Table 1). b. Numbers of samples with genomic, epigenomic, and transcriptomic data. c. 3D protein structures of representative genes identified by CLUMPS in pan-CLL analysis (n = 984, see Supplementary Table 5). Mutated residues - red labels. A peptide from RAF1 (designated at bottom-center, in complex with 14-3-3 zeta) shows clustered mutations around S259, whose phosphorylation regulates RAF1 activity and is a cancer mutational hotspot91 that, when mutated, perturbs the interaction with the 14-3-3 zeta and upregulates RAF1 kinase activity92,93. In DICER1, mutations occur in the RNase III domain (purple), including the cancer hotspot residue E181321,94. This region is critical for Mg2+ binding and is required for ribonuclease activity to process microRNAs and mediate post-transcriptional gene regulation95. RPS23 mutations are clustered in a conserved loop of the ribosomal decoding center, surrounding P62, whose post-translational hydroxylation affects translation termination accuracy96. These RPS23 mutations have a median CCF > 80% (Extended Data Fig. 6d; Supplementary Table 3). d. Individual mutations maps of selected novel, putative driver genes. Mutation subtype and position are shown. e. Selected genes identified by CLUMPS in IGHV subtypes; mutated residues - red. Although BRAF was not identified as a potential M-CLL driver via MutSig2CV (see Extended Data Fig. 3, Methods), CLUMPS revealed three mutated sites clustered in the kinase domain (purple) that are cancer hotspots25, thus confirming BRAF as a shared driver (left). Mutated residues in BRAF in U-CLL (bottom) are shown for comparison, revealing a greater number of clustered mutations relative to M-CLL. In U-CLL, novel mutations were found in RRM1 (right). Somatic alterations were clustered in the N-terminal ATP-binding site (purple) and therefore have potential to impact enzymatic activity97.

Extended Data Fig. 2 CLL biological pathways affected by candidate driver genes.

a. Schema of CLL pathways containing previously identified (black) and novel (magenta) putative driver genes (see Supplementary Table 6). Novel drivers cluster in central processes driving CLL (for example, DNA damage, chromatin modification, RNA processing)1,2, but also highlight new pathways not previously implicated by driver genes (for example, cytoskeleton and extracellular matrix, proteostasis, metabolism). Asterisks - mutated genes discovered by CLUMPs. b. Stacked barplot ranked by the number of candidate driver genes per CLL pathway. Magenta bars show the number of newly identified drivers in each pathway.

Extended Data Fig. 3 Candidate driver alterations discovered in IGHV subtypes.

a-b. Landscape of putative driver genes and sCNAs in M-CLL (a, n = 512) and U-CLL (b, n = 459) with associated frequencies (rows, barplots). Header tracks annotate cohort, IGHV status (purple, M-CLL; orange, U-CLL), disease type (blue, CLL; yellow, MBL), epitype (blue, n-CLL; yellow, i-CLL; red, m-CLL), datatype (white, WES; yellow, WGS; blue, both); prior treatment, U1 and IGLV3-21R110 mutations are annotated in black; magenta label - novel alterations; asterisks - discovery by CLUMPS.

Extended Data Fig. 4 Chromosomal gains and losses identified in IGHV subtypes.

a-b. Recurrent copy-number gains (left) and losses (right) by GISTIC analysis showing arm-level (left per plot) and focal events (right per plot) in M-CLL (a, n = 512) and U-CLL (b, n = 459). Chromosomes are labeled along the vertical axis; dashed line - significance at q = 0.1. Blacklisted regions are colored gray. All arm-level events are labeled with cytoband arm and frequency in cohort. Focal events are annotated by cytoband, frequency, number of genes encompassed in peak (bracketed), and genes of interest. Red/blue font: novel focal events with frequency >2%. Black font: previously identified events (see Supplementary Table 7).

Extended Data Fig. 5 Landscape of driver alterations and chromosomal aberrations in IGHV subtypes.

a. The genomic landscape of CLL IGHV subtypes. Driver genes, U1 and IGLV3-21R110 mutations are labeled according to their genomic location (outside ring, numbered by chromosome). The tracks show the frequency and locations of driver genes in M-CLL (purple) vs. U-CLL (orange) (track 1; outermost), focal sCNAs (track 2; gains, red; losses, blue), and density of SV breakpoints of deletions (track 3) and translocations (track 4) (M-CLL n = 88; U-CLL n = 87; WGS, windows of 1-Mb). Innermost plot highlights translocations in which either one or both breakpoints are recurrent in at least 3 cases (windows of 1-Mb considered to define recurrence) in M-CLL (purple) and U-CLL (orange). Deletions, inversions, and tandem duplications where both breakpoints were found in at least 2 cases and did not overlap with a driver sCNA are shown (Note: only focal deletion in SP140 in two U-CLL cases met this criterion. b. Schema of recurrent IG-BCL2 translocation and IGH-ZFP36L1 deletion in the WGS cohort. All 5 BCL2 translocations were in M-CLL with immunoglobulin (IG) breakpoints in J or D genes, suggesting mediation by aberrant V(D)J recombination. In contrast, 4 U-CLL cases carried IGH-ZFP36L1 truncating deletions, which were all clonal (CCF = 1). Breakpoints in IGH class-switch regions suggested mediation by aberrant class-switch recombination (CSR). c. Immunoglobulin (IG) SVs in 177 WGS and 984 WES. In WES, 9 of 10 BCL2 translocations were in M-CLL and mediated by aberrant V(D)J recombination in IGH (n = 7) or IGK (n = 2). The sole BCL2 translocation in U-CLL was due to aberrant CSR. One CSR-mediated IGH-ZFP36L1 deletion was observed in a case with unclassified IGHV status due to presence of two populations (one M-CLL, one U-CLL; the latter was more prevalent). Of note, in WES, U-CLLs carry a higher number of non-recurrent IG events than M-CLL.

Extended Data Fig. 6 Mutational mechanisms and cancer cell fractions of candidate drivers.

a. Eight mutational signatures were identified in 177 WGS, but 3 signatures corresponded to known artifacts and were therefore excluded (see Supplementary Note 2). Boxplots demonstrating mutation contribution for each of the 5 signatures are labeled with single-base substitution (SBS) number and identity (per COSMIC v3.1). b. Comparison of the normalized signature intensity of the mutational signatures in U-CLL (orange, n = 87) vs. M-CLL (purple, n = 88). The nc-AID and c-AID 1 signatures were enriched in M-CLL, whereas the aging signature was more prevalent in U-CLL. Although not significant, there was a trend of increased mutations due to the c-AID 2 signature in U-CLL. All p-values were calculated with Wilcoxon rank-sum test, two-sided. Boxplots: center line, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range; points, outliers. c. Proportions of clustered mutations contributed by the two c-AID related signatures (SBS84, c-AID 1 vs. SBS85, c-AID 2) for each IGHV subtype (M-CLL, purple; U-CLL, orange) d. Mean cancer cell fraction (CCF) for each non-silent mutation across all candidate driver genes identified in WES samples (n = 984). Color of dots depicts the IGHV subtype (M-CLL, purple; U-CLL, orange). The horizontal red line is the threshold for clonality (CCF > 85%). Magenta labels - newly identified putative driver genes. The number of non-silent mutations per driver gene is shown at the bottom. Boxplots: center line, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range.

Extended Data Fig. 7 Development and validation of epitype assignment and epiCMIT in RRBS data.

a. Consensus clustering matrices for K = 3 groups for paired-end (n = 136; 153 CpGs in consensus matrix) and single-end (n = 388; 32 CpGs) RRBS data. (d). b. Empirical cumulative distribution functions (CDFs) for consensus matrices with K = 2 to K = 7. c. Relative change under the CDF for K = 2 to K = 7. d. Heatmaps of the CpGs used for consensus clustering in (a). Each sample (columns) is annotated by tracks: epitype max probability, IGHV status (M-CLL, purple; U-CLL, orange), IGHV percent identity, and presence of IGLV3-21R110 mutation (black). e. The development of the new epiCMIT methodology for RRBS data. The genome was segmented into Chromatin Hidden Markov Model (CHMM)24 states using ChIP-seq data to get repressed chromatin regions, where differential DNA methylation analyses was performed in high coverage whole-genome bisulfite sequencing (WGBS) data between the cells with the lowest and highest accumulated cell divisions in the B cell lineage, namely the hematopoietic precursor cells (HPC) and bone-marrow plasma cells (bmPC). Only CPGs showing extensive differences were retained and constituted the epiCMIT-hyper CpGs or epiCMIT-hypo CpGs depending whether they gain or lose DNA methylation from <0.1 to ≥0.5 and from >0.9 to ≤0.5 from HPC to bmPC, respectively. EpiCMIT-hyper and epiCMIT-hypo scores were calculated according to the available epiCMIT-CpGs per sample, and the higher score in each sample was then selected. f. epiCMIT values on the same samples profiled twice with different platforms. Approach 1 - profiled with Illumina-450K (green); approach 2 - profiled with RRBS-PE (violet). In samples profiled with Illumina 450K, the original epiCMIT-CpGs were used52. In samples profiled with RRBS, epiCMIT was calculated with all available epiCMIT-CpGs for the new catalog (e, Methods). P-value by Pearson correlation test, two-sided; Error band − 95% confidence intervals of the Pearson correlation coefficient.

Extended Data Fig. 8 Identification of expression clusters with associated biologic features.

a. Cohort representation in each expression cluster. b. Consensus matrix for RNA expression profiles of 603 treatment-naive CLLs by repeated hierarchical clustering with 80% resampling and varying cutoffs for number of clusters, which is inputted to the BayesNMF procedure (Methods). c. Uniform manifold approximation and projection (UMAP) showing clustering of ECs (n = 603; EC-u clusters (top), EC-m and EC-o (middle), EC-i (bottom)). Analysis was performed using the marker genes identified by BayesNMF. d. UMAP of H3K27ac profiles (n = 104)8 denoting EC designation where available (colored points, n = 73) and IGHV status. e. Comparison of the percent IGHV identity among ECs. Dotted line: 98% threshold defining M-CLL and U-CLL. P-values by two-sided t-tests. Boxplots: center line, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range. f. Comparison of the percent IGHV identity between those samples with concordant IGHV status and ECs (for example, M-CLLs in EC-m clusters) versus the discordant samples (for example, M-CLLs in EC-u clusters). IGHV-mutated cases - left; IGHV unmutated samples - right. P-values by two-sided t-tests. Boxplots: center line, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range. g. Percentage of cases carrying stereotyped immunoglobulin genes within each EC. Red horizontal line: percentage of stereotyped cases in the whole cohort. h. Fraction of cases classified in each CLL stereotype subset according to their EC. i. Percentage of IGHV (left) and IG(K/L)V (right) gene usage within each EC. IGKV genes from proximal and distal clusters were merged for simplification. All p-values were calculated using Chi-squared tests corrected by the Benjamini–Hochberg procedure (q-values, q). q < 0.1; *, q < 0.05; **, q < 0.001; ***, q < 0.0001. j-k. Heatmaps showing upregulated (j) and downregulated (k) H3K27ac levels of EC marker genes and 2,000 bp upstream to capture regulatory regions (Methods).

Extended Data Fig. 9 EC differential gene expression, pathway activity, and classifier.

a. Differentially expressed genes per EC (red) using discovery set (n = 603); EC marker genes by BayesNMF (blue). Significant up- or downregulation of H3K27ac levels are directionally marked with triangles (ChIP-seq available for n = 73; n = 1 for EC-o and EC-i, thus unevaluable). b. EC gene set enrichment analysis (GSEA). Diamond denotes the EC compared to all others (circles). c. Confusion matrix for the EC classifier on the test set (“Dominance” defined in Methods). d. Confidence in correctly classified samples (n = 95) is greater than for incorrectly classified samples (n = 25; two-sided t-test). “Prediction margin” defined in Methods. Boxplots: center line, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range. e. Receiver-operator curve (ROC) showing the tradeoff between sensitivity and specificity for the range of cutoffs that can be applied based on the “prediction margin”, where samples under the cutoff are excluded from performance evaluation. AUC, area under curve. f. Precision-recall (PR) curves for EC classification performance on the test set (n = 120), using the selected model (see Methods). The weighted average of AUC is 0.88. g. Performance metrics for models trained with differing amounts of input genes, demonstrating accuracy even with smaller gene sets. Metrics: Accuracy, overall; Average, weighted average across ECs (Methods). Nc, Ntot - number of genes (see Methods). h. EC distributions by BayesNMF compared to classifier predictions on the discovery cohort (n = 603), an extension cohort not included discovery (n = 105), and an external CLL cohort (n = 136)61. i. IGHV status distributions per EC in discovery (n = 603) and external (n = 136) cohorts. The difference in IGHV-mutated samples per EC is 2-10% (p > 0.05, Fisher’s Exact, Methods). j. Stability of the ECs over time in longitudinally sampled CLL samples3. Sample timepoints (x-axis); years between first and last sample (above curve).

Supplementary information

Supplementary Information

Supplementary Notes 1-4 and Supplementary Figures 1–3.

Reporting Summary

Supplementary Tables

Supplementary Tables 1–15.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Knisbacher, B.A., Lin, Z., Hahn, C.K. et al. Molecular map of chronic lymphocytic leukemia and its impact on outcome. Nat Genet 54, 1664–1674 (2022). https://doi.org/10.1038/s41588-022-01140-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-022-01140-w

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research