To gain insights into the molecular alterations that cause CLL, we performed whole-genome sequencing of four cases representative of different forms of the disease: two cases, CLL1 and CLL2, with no mutations in the immunoglobulin genes (IGHV-unmutated) and two cases, CLL3 and CLL4, with mutations in these genes (IGHV-mutated) (Supplementary Table 1 and Supplementary Information). We used a combination of whole-genome sequencing and exome sequencing, as well as long-insert paired-end libraries, to detect variants in chromosomal structure (Supplementary Fig. 1 and Supplementary Tables 2–5). We obtained more than 99.7% concordance between whole-genome sequencing calls and genotyping data, indicating that the coverage and parameters used were sufficient to detect most of the sequence variants in these samples (Supplementary Information). We detected about 1,000 somatic mutations per tumour in non-repetitive regions (Fig. 1a, Supplementary Fig. 2 and Supplementary Table 6). These numbers of somatic mutations were lower than the numbers in melanoma and lung carcinoma5,6, but in agreement with previous estimates of less than one mutation per megabase (Mb) for leukaemias7. The most common substitution was the transition G>A/C>T, usually occurring in a CpG context (Fig. 1b and Supplementary Fig. 2). We also detected marked differences in the mutation pattern between CLL samples and these differences were associated with tumour subtype (Fig. 1b). Thus, IGHV-mutated cases showed a higher proportion of A>C/T>G mutations than cases with unmutated IGHV (16 ± 0.2% versus 6.2 ± 0.1%). The base preceding the adenine in A to C transversions showed an over-representation of thymine, when compared to the prevalence expected from its representation in non-repetitive sequences in the wild-type genome (P < 0.001, Fig. 1c), and there were fewer A to C substitutions at GpA dinucleotides than would be expected by chance (P < 0.001). These differences between CLL subtypes might reflect the molecular mechanisms implicated in their respective development. The pattern and context of mutations are consistent with their being introduced by the error-prone polymerase η during somatic hypermutation in immunoglobulin genes8. This indicates that polymerase η could contribute to the high frequency of A > T to C > G transversions in cases with IGHV-mutated. It also extends the differences observed between these two CLL subtypes to the genomic level.

Figure 1: Profile of somatic mutations in four CLL genomes.
figure 1

a, Distribution of somatic alterations. For each tumour genome, copy number (solid lines), density of mutations per 5-Mb window (bars) and protein-coding mutations (dots) are shown. The shaded rectangle indicates the location of the 13q14 deletion that was present in three of the four CLL cases. Chromosome numbers are listed below the four profiles. b, Frequency of substitutions in each CLL tumour for the six possible classes of mutation. c, Distribution of the four possible NpA dinucleotides for the A to C transversion in each tumour genome, compared with the expected distribution across the genome. The total number of A to C substitutions per case is indicated at the top (**, P < 0.001).

PowerPoint slide

We classified the somatic mutations into three different classes according to their potential functional effect (Supplementary Information). We also searched for small insertions and deletions (indels) in coding regions: we found and validated five somatic indels, which caused frameshifts in protein-coding regions (Supplementary Table 7). We identified 46 mutations that changed the protein-coding sequences of 45 genes in the four patients analysed (Supplementary Table 7). None of these nucleotide substitutions had been previously linked to CLL and among the five indel mutations, only one, in NOTCH1 (p.P2515Rfs*4), had been previously found in various lymphoid malignancies, including CLL9,10. To determine whether any of these 45 genes was mutated in more than one CLL case, we analysed an initial validation set of 169 CLL patients. We focused on the 26 genes that are expressed at the RNA level in CLL cells (Supplementary Table 7) because mutations in expressed genes are more likely to have a biological effect than those in non-expressed genes. We used a pooled-sequencing strategy that led us to identify four genes with at least one additional mutation in the validation series: these were NOTCH1, MYD88, XPO1 and KLHL6 (Table 1 and Supplementary Information).

Table 1 Genes recurrently mutated in chronic lymphocytic leukaemia

Analysis of additional CLL cases revealed that the deletion of a CT dinucleotide in NOTCH1 (p.P2515Rfs*4) was found in 29 of 255 patients and two additional mutations in the same region were also found (p.Q2503* and p.F2482Ffs*2) (Fig. 2a, b). Accordingly, NOTCH1 is mutated in 12% of CLL patients (Supplementary Table 8). These mutations generate a premature stop codon, resulting in a NOTCH1 protein lacking the C-terminal domain, which contains a PEST sequence (a sequence rich in proline, glutamic acid, serine and threonine) (Fig. 2a). Removal of this region results in the accumulation of an active protein isoform in the mutated CLL cells (Fig. 2c and Supplementary Fig. 3). NOTCH1 is constitutively expressed in CLL11, but the NOTCH1 mutations identified herein generate a more stable and active isoform of the protein. Gene expression analysis of ten NOTCH1-mutated and 49 unmutated CLL cases revealed a high number of differentially expressed genes (n = 542, false discovery rate <0.05; Supplementary Table 9). Likewise, in a gene-set analysis, we found that there was significant differential expression of the NOTCH1 signalling pathway12 and two metabolic pathways (oxidative phosphorylation and glycolysis/gluconeogenesis). This is consistent with the NOTCH1-mediated activation of multiple biosynthetic routes in T acute lymphoblastic leukaemia13. When the differential expression of individual genes from the NOTCH1 pathway was analysed, 23 of the 46 genes assigned to this pathway12 showed a significant differential expression (P < 0.05) in NOTCH1-mutated CLL (Fig. 2d). NOTCH1-mutated patients had a more advanced clinical stage at diagnosis, more adverse biological features and an overall survival that was significantly shorter than those with NOTCH1 unmutated (10-yr overall survival: 21% versus 56%, P = 0.03; Fig. 2e, f). NOTCH1-mutated CLL also underwent transformation into diffuse large B-cell lymphoma more frequently than NOTCH1-unmutated CLL (7 of 31 cases, 23%, versus 3 of 224 cases, 1.3%; P < 0.001). The same IGHV clonal rearrangement and NOTCH1 mutation were found in the CLL and corresponding transformed diffuse large B-cell lymphoma of the four cases studied, indicating a clonal relationship of both components.

Figure 2: Mutational and functional analysis of NOTCH1 in CLL.
figure 2

a, Schematic representation of human NOTCH1, showing the main domains and locations of the three different somatic mutations identified in CLL. NEC, NOTCH1 extracellular subunit; NTM, NOTCH1 transmembrane subunit; ICN, intracellular domain of NOTCH1; LNR, Lin-12 NOTCH repeats; RAM, RAM domain; ANK, ankyrin repeat domain; PEST, PEST domain. b, Electropherogram showing the heterozygous CT deletion recurrently identified in CLL. c, Western blot showing NOTCH1 protein levels in CLL cases with or without the NOTCH1 p.P2515Rfs*4 mutation, and in Jurkat cells as a control. The arrow indicates the band corresponding to the NTM; the large arrowhead indicates the smaller band corresponding to the mutant form. d, Heat map showing the 23 genes of the NOTCH1 pathway that are differentially expressed in NOTCH1-mutated versus non-mutated CLL. e, Distribution of disease stage (Binet), ZAP-70 expression status, CD38 expression status and IGHV mutational status (UM, unmutated IGHV) in patients with or without mutations in NOTCH1 (*, P < 0.02; **, P < 0.01). f, Actuarial probability of overall survival of CLL patients with mutated or unmutated NOTCH1 (*, P = 0.03).

PowerPoint slide

A recurrent mutation (p.L265P) in the MYD88 gene (Fig. 3a, b) was also identified in 9 of 310 CLL patients (2.9%). During revision of this manuscript, the same mutation has been identified in different lymphomas14, highlighting its relevance in the pathogenesis of lymphoid neoplasias. This protein participates in the signalling pathways of interleukin-1 and Toll-like receptors during the immune response15. MyD88 immunoprecipitation from CLL cells with the p.L265P mutation resulted in the co-immunoprecipitation of large amounts of IRAK1, in contrast to cells lacking this mutation (Fig. 3c). Other effectors of this signalling pathway, including STAT3, IκBα and NF-κB p65 subunit, showed higher phosphorylation in MYD88-mutated than in unmutated CLL cells (Fig. 3d, e) and there was an increased DNA-binding activity of NF-κB in MYD88-mutated cells (Supplementary Fig. 4). These data support the hypothesis that the MYD88 p.L265P mutation constitutes an activating mutation of this novel proto-oncogene14,16. Stimulation of interleukin-1 receptor or Toll-like receptors in MYD88-mutated CLL cells induced the secretion of 5-fold to 150-fold higher levels of interleukin 1 receptor antagonist (IL1RN, also known as IL1RA), interleukin 6 and chemokine (C-C motif) ligands 2, 3 and 4 (CCL2, CCL3 and CCL4), when compared to the secretion of these cytokines by MYD88-unmutated CLLs. Cytokine secretion was elevated in MYD88-mutated cells in response to stimulation of at least four of the eight TLRs tested. No response was observed in lymphocytes carrying the inactivating MYD88 mutation E52DEL (Fig. 3f and Supplementary Fig. 5). The high production of these cytokines has been implicated in the recruitment of macrophages and T lymphocytes by CLL cells, creating a favourable niche for their survival17. Moreover, activation of Toll-like receptors in CLL cells promotes the proliferation of tumour cells and protects them from spontaneous apoptosis18. Patients with MYD88-mutated CLL were diagnosed at a younger age than those with wild-type MYD88 (median 43 yr, range 38–63, versus median 63 yr, range 27–94; P < 0.001) and the disease presented with a more advanced clinical stage (Fig. 3g), although no differences were observed in progression or survival rates. Notably, almost all patients with the MYD88 p.L265P mutation (seven of the eight evaluated) belonged to the IGHV-mutated group.

Figure 3: Mutational and functional analysis of MYD88 in CLL.
figure 3

a, Multiple sequence alignment of MyD88 around the mutated residue (arrow) in different species. Cons., degree of conservation. b, Electropherogram showing the recurrent heterozygous p.L265P MYD88 mutation (arrow) detected in CLL. c, Cell extracts from a MYD88-mutated CLL (L265P) and a MYD88-unmutated CLL (WT) were immunoprecipitated with anti-MyD88 antibody. The immunoprecipitated and unbound fractions were analysed by western blot using anti-IRAK1 and anti-MyD88 antibodies. d, Western blot analysis of phosphorylated STAT3 (p-STAT3 (Tyr 705)) and total STAT3 in cell extracts from MYD88-mutated or unmutated CLL tumour cells. β-Actin was used as a control to show equal loading. e, Western blots showing phosphorylated IκBα (p-IκBα), total IκBα, phosphorylated p65 subunit of NF-κB (p-p65) and total p65 subunit of NF-κB in cell extracts from MYD88-mutated or unmutated CLL tumour cells. f, Heat map representing the cytokine levels secreted by B cells from eight individuals after Toll-like receptor stimulation. Only the five cytokines that showed the most significant differences between MYD88-mutated and MYD88-unmutated CLL are shown. ‘E52DEL’ indicates B cells from two patients with an inactivating MYD88 mutation, ‘WT’ corresponds to tumour cells from CLL patients without MYD88 mutation and ‘L265P’ indicates tumour cells from patients carrying a mutated MYD88. The stimulation experiments for each of the Toll-like receptors (TLRs) are represented in different colours. NS, no stimulus. g, Distribution of disease stage (Binet), ZAP-70 expression status, CD38 expression status and IGHV mutational status (UM, unmutated IGHV) in patients according to the presence or absence of p.L265P MYD88 mutation (*, P < 0.03).

PowerPoint slide

We also identified four cases with mutations in the same codon of the exportin 1 gene (XPO1; p.E571K and p.E571G). Exportin 1 is implicated in the nuclear export of proteins and mRNAs in yeast, including members of the MAP kinase pathway19. The fact that the same residue is mutated in four CLL cases and is part of a highly conserved region (Supplementary Fig. 6) indicates that the mutation affects XPO1 activity. Notably, all four cases with mutations in XPO1 belonged to the IGHV-unmutated subtype and two of them also had the p.P2515Rfs*4 mutation in NOTCH1, indicating that both mutations could have synergic effects in CLL development.

We identified three patients carrying a total of six mutations (F49L/L65P, L90F and L58P/T64A/Q81P) in the gene encoding kelch-like protein 6 (KLHL6), which is implicated in the formation of the germinal centre during B cell maturation20. All six mutations were clustered between residues 49 and 90 (Supplementary Fig. 7). The presence of several point mutations in cis, located near the transcriptional start site of a gene that is highly expressed in the germinal centre, is a characteristic feature of somatic hypermutation. In fact, all three patients had CLL with mutated IGHV. Although somatic hypermutation occurs mainly in IGHV regions, other proto-oncogenes, including BCL6, MYC and PIM1, are mutated by somatic hypermutation in different lymphomas21. However, only BCL6 has been previously shown to be hypermutated by this mechanism in CLL21. Our data show that KLHL6 is probably also a target of somatic hypermutation in IGHV-mutated patients, although its precise contribution to the oncogenic process in CLL remains to be determined.

In addition to these four genes, we identified a series of large genomic alterations that were previously reported2. They included the deletion, in three cases, of the 13q14 region22, and a 40-Mb deletion in chromosome 6q14–q22 (Fig. 1a, Supplementary Fig. 1 and Supplementary Table 5). Finally, in one patient we detected a p.P281R mutation in the cyclin D2 gene (CCND2), which resulted in the accumulation of cyclin D2 in tumour cells (Supplementary Fig. 8). This finding, together with the high conservation of this residue and the identification of mutations in the equivalent residue of cyclin D1 (CCND1) in endometrial cancer23, indicates that this CCND2 mutation could be a driver contributing to the development of CLL in this patient. The finding illustrates the putative relevance of non-recurrent mutations for the pathogenesis of CLL.

The International Cancer Genome Consortium project was founded on the concept that sequencing of cancer genomes could reshape our understanding of cancer biology, with direct implications for clinical translation24. Our study of four CLL genomes underscores this transformative potential, although additional studies will be necessary to translate these findings to the clinic. We have identified four recurrently mutated genes and provided novel insights into the mechanisms by which leukaemic cells recruit, instruct and coordinate a tumour microenvironment. Currently, the biological identification of different subgroups of CLL is based on markers such as IGHV mutational status, cytogenetics, ZAP-70 expression or CD38 expression, which are not fundamental agents in the leukaemic process. The classification of patients based on genomic drivers of the disease is conceptually appealing, as shown by our demonstration that NOTCH1 and MYD88 mutations identify distinct subgroups of patients with particular clinical and biological features. Furthermore, we provide functional evidence that both NOTCH1 and MYD88 mutations are activating events and potential therapeutic targets. The potential to personalize therapeutic choices for patients on the basis of the genomic architecture of their cancers is the long-term aspiration for studies such as this, combining whole-genome sequencing, functional studies and clinical analysis of patients with cancer.

Methods Summary

Four patients with CLL, who had given informed consent for sample collection and analysis, were studied. Tumour samples were obtained before treatment and tumour cells were separated from non-tumour cells by immunomagnetic depletion of T cells, natural killer cells, monocytes and granulocytes (Supplementary Information). Tumour cell purity was ≥98% as assessed by flow cytometry. Normal blood cells from the same patient were obtained after treatment, resulting in no detectable, or less than 0.05%, tumour cell contamination, as assessed by flow cytometry. Additional samples from 363 patients were obtained for clinical validation. Protocols for long-insert and short-insert library construction and for massively parallel paired-end sequencing have been described elsewhere (ref. 25 and Supplementary Information). Genotyping and copy number analysis were performed using the Affymetrix SNP6.0, Agilent 1M and Illumina OmniQuad arrays on the same cases used for whole-genome sequencing. For the validation of candidate genes in a set of 169 additional CLL patients, we used a combination of PCR amplification and Illumina sequencing in pooled samples, resulting in efficient identification of germline and somatic mutations (Supplementary Information). Sequencing data were aligned to the human reference genome (GRCh37) using Burrows–Wheeler alignment (BWA)26 and somatic substitutions were identified using Sidrón, a probabilistic binomial model that uses genotyping data to calibrate sequencing error per sample. Functional analyses of the identified mutations were performed using cryopreserved primary tumour cells. For gene expression analysis, RNA was purified from tumour cells and analysed using the HU133 plus 2.0 GeneChip (Affymetrix). For immunoprecipitation and western blotting, CLL cell extracts were prepared and detected using the indicated antibodies (Supplementary Information). For Toll-like receptor stimulation of CLL cells, the Human TLR1–9 agonist kit (InvivoGen) was used.