Follicular lymphoma (FL) and diffuse large B-cell lymphoma (DLBCL) are the two most common non-Hodgkin lymphomas (NHLs). Here we sequenced tumour and matched normal DNA from 13 DLBCL cases and one FL case to identify genes with mutations in B-cell NHL. We analysed RNA-seq data from these and another 113 NHLs to identify genes with candidate mutations, and then re-sequenced tumour and matched normal DNA from these cases to confirm 109 genes with multiple somatic mutations. Genes with roles in histone modification were frequent targets of somatic mutation. For example, 32% of DLBCL and 89% of FL cases had somatic mutations in MLL2, which encodes a histone methyltransferase, and 11.4% and 13.4% of DLBCL and FL cases, respectively, had mutations in MEF2B, a calcium-regulated gene that cooperates with CREBBP and EP300 in acetylating histones. Our analysis suggests a previously unappreciated disruption of chromatin biology in lymphomagenesis.
Non-Hodgkin lymphomas (NHLs) are cancers of B, T or natural killer lymphocytes. The two most common types of NHL, follicular lymphoma (FL) and diffuse large B-cell lymphoma (DLBCL), together comprise 60% of new B-cell NHL diagnoses each year in North America1. FL is an indolent and typically incurable disease characterized by clinical and genetic heterogeneity. DLBCL is aggressive and likewise heterogeneous, comprising at least two distinct subtypes that respond differently to standard treatments. Both FL and the germinal centre B-cell (GCB) cell of origin (COO) subtype of DLBCL derive from germinal centre B cells, whereas the activated B-cell (ABC) variety, which has a more aggressive clinical course, is thought to originate from B cells that have exited, or are poised to exit, the germinal centre2. Current knowledge of the specific genetic events leading to DLBCL and FL is limited to the presence of a few recurrent genetic abnormalities2. For example, 85–90% of FL and 30–40% of GCB DLBCL cases3,4 harbour t(14;18)(q32;q21), which results in deregulated expression of the BCL2 oncoprotein. Other genetic abnormalities unique to GCB DLBCL include amplification of the c-REL gene and of the miR-17-92 microRNA cluster5. In contrast to GCB cases, 24% of ABC DLBCLs harbour structural alterations or inactivating mutations affecting PRDM1, which is involved in differentiation of GCB cells into antibody-secreting plasma cells6. ABC-specific mutations also affect genes regulating NF-κB signalling7,8,9, with TNFAIP3 (also known as A20) and MYD88 (ref. 10) the most abundantly mutated in 24% and 39% of cases, respectively. To enhance our understanding of the genetic architecture of B-cell NHL, we undertook a study to (1) identify somatic mutations and (2) determine the prevalence, expression and focal recurrence of mutations in FL and DLBCL. Using strategies and techniques applied to cancer genome and transcriptome characterization by ourselves and others11,12,13, we sequenced tumour DNA and/or RNA from 117 tumour samples and 10 cell lines (Supplementary Tables 1 and 2) and identified 651 genes (Supplementary Figure 1) with evidence of somatic mutation in B-cell NHL. After validation, we showed that 109 genes were somatically mutated in two or more NHL cases. We further characterized the frequency and nature of mutations within MLL2 and MEF2B, which were among the most frequently mutated genes with no previously known role in lymphoma.
Identification of recurrently mutated genes
We sequenced the genomes or exomes of 14 NHL cases, all with matched constitutional DNA sequenced to comparable depths (Supplementary Tables 1 and 2). After screening for single nucleotide variants followed by subtraction of known polymorphisms and visual inspection of the sequence read alignments, we identified 717 non-synonymous variants (coding single nucleotide variants; cSNVs) affecting 651 genes (Supplementary Figure 1 and Supplementary Methods). We identified between 20 and 135 cSNVs in each of these genomes. Only 25 of the 651 genes with cSNVs were represented in the cancer gene census (December 2010 release)14.
We performed RNA sequencing (RNA-seq) on these 14 NHL cases and an expanded set of 113 samples comprising 83 DLBCL, 12 FL and 8 B-cell NHL cases with other histologies and 10 DLBCL-derived cell lines (Supplementary Table 2). We analysed these data to identify novel fusion transcripts (Supplementary Table 3) and cSNVs (Fig. 1). We identified 240 genes with at least one cSNV in a genome/exome or an RNA-seq ‘mutation hot spot’ (see later), and with cSNVs in at least three cases in total (Supplementary Table 4). We selected cSNVs from each of these 240 genes for re-sequencing to confirm their somatic status. We did not re-sequence genes with previously documented mutations in lymphoma (for example, CD79B, BCL2). We confirmed the somatic status of 543 cSNVs in 317 genes, with 109 genes having at least two confirmed somatic mutations (Supplementary Tables 4 and 5). Of the successfully re-sequenced cSNVs predicted from the genomes, 171 (94.5%) were confirmed somatic, 7 were false calls and 3 were present in the germ line. These 109 recurrently mutated genes were significantly enriched for genes implicated in lymphocyte activation (P = 8.3 × 10−4; for example, STAT6, BCL10), lymphocyte differentiation (P = 3.5 × 10−3; for example, CARD11), and regulation of apoptosis (P = 1.9 × 10−3; for example, BTG1, BTG2). Also significantly enriched were genes linked to transcriptional regulation (P = 5.4 × 10−4; for example, TP53) and genes involved in methylation (P = 2.2 × 10−4) and acetylation (P = 1.2 × 10−2), including histone methyltransferase (HMT) and acetyltransferase (HAT) enzymes known previously to be mutated in lymphoma (for example, EZH2 (ref. 13) and CREBBP (ref. 15); Supplementary Methods).
Mutation hot spots can result from mutations at sites under strong selective pressure and we have previously identified such sites using RNA-seq data13. We searched our RNA-seq data for genes with mutation hot spots, and identified 10 genes that were not mutated in the 14 genomes (PIM1, FOXO1, CCND3, TP53, IRF4, BTG2, CD79B, BCL7A, IKZF3 and B2M), of which five (FOXO1, CCND3, BTG2, IKZF3 and B2M) were not previously known targets of point mutation in NHL (Supplementary Table 6 and Supplementary Methods). FOXO1, BCL7A and B2M had hot spots affecting their start codons. The effect of a FOXO1 start codon mutation, which was observed in three cases, was further studied using a cell line in which the initiating ATG was mutated to TTG. Western blots probed with a FOXO1 antibody revealed a band with a reduced molecular weight, indicative of a FOXO1 amino-terminal truncation (Supplementary Figure 2), consistent with use of the next in-frame ATG for translation initiation. A second hot spot in FOXO1 at T24 was mutated in two cases. T24 is reportedly phosphorylated by AKT subsequent to B-cell receptor (BCR) stimulation16 inducing FOXO1 nuclear export.
We analysed the RNA-seq data to determine whether any of the somatic mutations in the 109 recurrently mutated genes showed evidence for allelic imbalance with expression favouring one allele. Out of 380 expressed heterozygous mutant alleles, we observed preferential expression of the mutation for 16.8% (64/380) and preferential expression of the wild type for 27.8% (106/380; Supplementary Table 7). Seven genes showed evidence for significant preferential expression of the mutant allele in at least two cases: BCL2, CARD11, CD79B, EZH2, IRF4, MEF2B and TP53; Supplementary Methods. In 27 out of 43 cases with BCL2 cSNVs, expression favoured the mutant allele, consistent with the previously-described hypothesis that the translocated (and hence, transcriptionally deregulated) allele of BCL2 is targeted by somatic hypermutation17. Examples of mutations at known oncogenic hot spot sites such as F123I in CARD11 (ref. 18) showed allelic imbalance favouring the mutant allele in some cases. Similarly, we noted expression favouring two novel hot spot mutations in MEF2B (Y69 and D83) and two sites in EZH2 not previously reported as mutated in lymphoma (A682G and A692V).
We sought to distinguish new cancer-related mutations from passenger mutations using the approach proposed previously19. We reasoned that this would reveal genes with strong selection signatures, and mutations in such genes would be good candidate cancer drivers. We identified 26 genes with significant evidence for positive selection (false discovery rate = 0.03, Supplementary Methods), with either selective pressure for acquiring non-synonymous point mutations or truncating/nonsense mutations (Supplementary Methods; Table 1 and Supplementary Table 8). Included were known lymphoma oncogenes (BCL2, CD79B (ref. 9), CARD11 (ref. 18), MYD88 (ref. 10) and EZH2 (ref. 13)), all of which showed signatures indicative of selection for non-synonymous variants.
Evidence for selection of inactivating changes
We expected tumour suppressor genes to show strong selection for the acquisition of nonsense mutations. In our analysis, the eight most significant genes included seven with strong selective pressure for nonsense mutations, including the known tumour suppressor genes TP53 and TNFRSF14 (ref. 20 ; Table 1). CREBBP, recently reported as commonly inactivated in DLBCL15, also showed some evidence for acquisition of nonsense mutations and cSNVs (Supplementary Figure 3 and Supplementary Table 9). We also observed enrichment for nonsense mutations in BCL10, a positive regulator of NF-κB, in which oncogenic truncated products have been described in lymphomas21. The remaining strongly significant genes (BTG1, GNA13, SGK1 and MLL2) had no reported role in lymphoma. GNA13 was affected by mutations in 22 cases including multiple nonsense mutations. GNA13 encodes the alpha subunit of a heterotrimeric G-protein coupled receptor responsible for modulating RhoA activity22. Some of the mutated residues negatively affect its function23,24, including a T203A mutation, which also showed allelic imbalance favouring the mutant allele (Supplementary Table 7). GNA13 protein was reduced or absent on western blots in cell lines harbouring either a nonsense mutation, a stop codon deletion, a frame shifting deletion, or changes affecting splice sites (Supplementary Methods and Supplementary Figure 4).
SGK1 encodes a phosphatidylinositol-3-OH kinase (PI(3)K)-regulated kinase with functions including regulation of FOXO transcription factors25, regulation of NF-κB by phosphorylating IκB kinase26, and negative regulation of NOTCH signalling27. SGK1 also resides within a region of chromosome 6 commonly deleted in DLBCL (Fig. 1)5. The mechanism by which SGK1 and GNA13 inactivation may contribute to lymphoma is unclear, but the strong degree of apparent selection towards their inactivation and their overall high mutation frequency (each mutated in 18 of 106 DLBCL cases) suggests that their loss contributes to B-cell NHL. Certain genes are known to be mutated more commonly in GCB DLBCLs (for example, TP53 (ref. 28) and EZH2 (ref. 13)). Here, both SGK1 and GNA13 mutations were found only in GCB cases (P = 1.93 × 10−3 and 2.28 × 10−4, Fisher’s exact test; n = 15 and 18, respectively) (Fig. 2). Two additional genes (MEF2B and TNFRSF14) with no previously described role in DLBCL showed a similar restriction to GCB cases (Fig. 2).
Inactivating MLL2 mutations
MLL2 showed the most significant evidence for selection and the largest number of nonsense SNVs. Our RNA-seq analysis indicated that 26.0% (33/127) of cases carried at least one MLL2 cSNV. To address the possibility that variable RNA-seq coverage of MLL2 failed to capture some mutations, we PCR-amplified the entire MLL2 locus (∼36 kilobases) in 89 cases (35 primary FLs, 17 DLBCL cell lines, and 37 DLBCLs). Of these cases 58 were among the RNA-seq cohort. Illumina amplicon re-sequencing (Supplementary Methods) revealed 78 mutations, confirming the RNA-seq mutations in the overlapping cases and identifying 33 additional mutations. We confirmed the somatic status of 46 variants using Sanger sequencing (Supplementary Table 10), and showed that 20 of the 33 additional mutations were insertions or deletions (indels). Three SNVs at splice sites were also detected, as were 10 new cSNVs that had not been detected by RNA-seq.
The somatic mutations were distributed across MLL2 (Fig. 3a). Of these, 37% (n = 29/78) were nonsense mutations, 46% (n = 36/78) were indels that altered the reading frame, 8% (n = 6/78) were point mutations at splice sites and 9% (n = 7/78) were non-synonymous amino acid substitutions (Table 2). Four of the somatic splice site mutations had effects on MLL2 transcript length and structure. For example, two heterozygous splice site mutations resulted in the use of a novel splice donor site and an intron retention event.
Approximately half of the NHL cases we sequenced had two MLL2 mutations (Supplementary Table 10). We used bacterial artificial chromosome (BAC) clone sequencing in eight FL cases to show that in all eight cases the mutations were in trans, affecting both MLL2 alleles. This observation is consistent with the notion that there is a complete, or near-complete, loss of MLL2 in the tumour cells of such patients.
With the exception of two primary FL cases and two DLBCL cell lines (Pfeiffer and SU-DHL-9), the majority of MLL2 mutations seemed to be heterozygous. Analysis of Affymetrix 500k SNP array data from two FL cases with apparent homozygous mutations revealed that both tumours showed copy number neutral loss of heterozygosity (LOH) for the region of chromosome 12 containing MLL2 (Supplementary Methods). Thus, in addition to bi-allelic mutation, LOH is a second, albeit less common mechanism by which MLL2 function is lost.
MLL2 was the most frequently mutated gene in FL, and among the most frequently mutated genes in DLBCL (Fig. 2). We confirmed MLL2 mutations in 31 of 35 FL patients (89%), in 12 of 37 DLBCL patients (32%), in 10 of 17 DLBCL cell lines (59%) and in none of the eight normal centroblast samples we sequenced. Our analysis predicted that the majority of the somatic mutations observed in MLL2 were inactivating (91% disrupted the reading frame or were truncating point mutations), indicating to us that MLL2 is a tumour suppressor of significance in NHL.
Recurrent point mutations in MEF2B
Our selective pressure analysis also revealed genes with stronger pressure for acquisition of amino acid substitutions than for nonsense mutations. One such gene was MEF2B, which had not previously been linked to lymphoma. We found that 20 (15.7%) cases had MEF2B cSNVs and 4 (3.1%) cases had MEF2C cSNVs. All cSNVs detected by RNA-seq affected either the MADS box or MEF2 domains. To determine the frequency and scope of MEF2B mutations, we Sanger-sequenced exons 2 and 3 in 261 primary FL samples; 259 DLBCL primary tumours; 17 cell lines; 35 cases of assorted NHL (IBL, composite FL and PBMCL); and eight non-malignant centroblast samples. We also used a capture strategy (Supplementary Methods) to sequence the entire MEF2B coding region in the 261 FL samples, revealing six additional variants outside exons 2 and 3. We thus identified 69 cases (34 DLBCL, 12.67%; and 35 FL, 15.33%) with MEF2B cSNVs or indels, failing to observe novel variants in other NHL and non-malignant samples. Of the variants 55 (80%) affected residues within the MADS box and MEF2 domains encoded by exons 2 and 3 (Supplementary Table 11; Fig. 3b). Each patient generally had a single MEF2B variant and we observed relatively few (eight in total, 10.7%) truncation-inducing SNVs or indels. Non-synonymous SNVs were by far the most common type of change observed, with 59.4% of detected variants affecting K4, Y69, N81 or D83. In 12 cases MEF2B mutations were shown to be somatic, including representative mutations at each of K4, Y69, N81 and D83 (Supplementary Table 12). We did not detect mutations in ABC cases, indicating that somatic mutations in MEF2B have a role unique to the development of GCB DLBCL and FL (Fig. 2).
In our study of genome, transcriptome and exome sequences from 127 B-cell NHL cases, we identified 109 genes with clear evidence of somatic mutation in multiple individuals. Significant selection seems to act on at least 26 of these for the acquisition of either nonsense or missense mutations. To the best of our knowledge, the majority of these genes had not previously been associated with any cancer type. We observed an enrichment of somatic mutations affecting genes involved in transcriptional regulation and, more specifically, chromatin modification.
MLL2 emerged from our analysis as a major tumour suppressor locus in NHL. It is one of six human H3K4-specific methyltransferases29, all of which share homology with the Drosophila trithorax gene. Trimethylated H3K4 (H3K4me3) is an epigenetic mark associated with the promoters of actively transcribed genes. By laying down this mark, MLLs are responsible for the transcriptional regulation of developmental genes including the homeobox (Hox) gene family30 which collectively control segment specificity and cell fate in the developing embryo31,32. Each MLL family member is thought to target different subsets of Hox genes33 and in addition, MLL2 is known to regulate the transcription of a diverse set of genes34. Recently, MLL2 mutations were reported in a small-cell lung cancer cell line35 and in renal carcinoma36, but the frequency of nonsense mutations affecting MLL2 in these cancers was not established in these reports. Inactivating mutations were reported recently in MLL2 or MLL3 in 16% of medulloblastoma patients37, further implicating MLL2 as a cancer gene.
Our data link MLL2 somatic mutations to B-cell NHL. The reported mutations are likely to be inactivating and in eight of the cases with multiple mutations, we confirmed that both alleles were affected, presumably resulting in essentially complete loss of MLL2 function. The high prevalence of MLL2 mutations in FL (89%) equals the frequency of the t(14;18)(q32;q21) translocation, which is considered the most prevalent genetic abnormality in FL3. In DLBCL tumour samples and cell lines, MLL2 mutation frequencies were 32% and 59%, respectively, also exceeding the prevalence of the most frequent cytogenetic abnormalities, such as the various translocations involving 3q27, which occur in 25–30% of DLBCLs and are enriched in ABC cases38. Importantly, we found MLL2 mutated in both DLBCL subtypes (Fig. 2). Our analyses thus indicate that MLL2 acts as a central tumour suppressor in FL and both DLBCL subtypes.
The MEF2 gene family encodes four related transcription factors that recruit histone-modifying enzymes including histone deacetylases (HDACs) and HATs in a calcium-regulated manner. Although truncating variants were detected in our analysis of MEF2 gene family members, our analysis suggests that, in contrast to MLL2, MEF2 family members tend to selectively acquire non-synonymous amino acid substitutions. In the case of MEF2B, 59.4% of all the cSNVs were found at four sites within the protein (K4, Y69, N81 and D83), and all four of these sites were confirmed to be targets of somatic mutation. D83 is affected in 39% of the MEF2B alterations, resulting in replacement of the charged aspartate with any of alanine, glycine or valine. Although we cannot yet predict the consequences of these substitutions on protein function, it seems likely that their effect would have an impact on the ability of MEF2B to facilitate gene expression and thus have a role in promoting the malignant transformation of germinal centre B cells to lymphoma (Supplementary Discussion).
MEF2B mutations can be linked to CREBBP and EP300 mutations, and to recurrent Y641 mutations in EZH2 (ref. 13). One target of CREBBP/EP300 HAT activity is H3K27, which is methylated by EZH2 to repress transcription. There is evidence that the action of EZH2 antagonizes that of CREBBP/EP300 (ref. 39). One function of MEF2 is to recruit either HDACs or CREBBP/EP300 to target genes40, and it has been suggested that HDACs compete with CREBBP/EP300 for the same binding site on MEF2 (ref. 41). Under normal Ca2+ levels, MEF2 is bound by type IIa HDACs, which maintain the tails of histone proteins in a deacetylated repressive chromatin state42. Increased cytoplasmic Ca2+ levels induce the nuclear export of HDACs, enabling the recruitment of HATs such as CREBBP/EP300, facilitating transcription at MEF2 target genes. Mutation of CREBBP, EP300 or MEF2B may have an impact on the expression of MEF2 target genes owing to reduced acetylation of nucleosomes near these genes (Supplementary Figure 5; Supplementary Discussion). In light of the recent finding that heterozygous EZH2 Y641 mutations enhance overall H3K27 trimethylation activity of PRC2 (refs 43, 44), it is possible that mutation of both MLL2 and EZH2 could cooperate in reducing the expression of some of the same target genes. Our data indicate that (1) post-transcriptional modification of histones is of key importance in germinal centre B cells and (2) deregulated histone modification due to these mutations is likely to result in reduced acetylation and enhanced methylation, and acts as a core driver event in the development of NHL (Supplementary Figure 5).
All samples analysed contained at least 50% tumour cells. Genomes, exomes and transcriptomes were sequenced using a combination of Illumina GAIIx and HiSeq 2000 instruments to read lengths of between 36 and 100 nucleotides. Exome capture was performed using the Agilent SureSelect Target Enrichment System Protocol (Version 1.0, September 2009). Alignment was accomplished using BWA45 and variants were identified using SNVmix46. Variants were manually reviewed in IGV and were confirmed (where applicable) by PCR followed by either Sanger sequencing or Illumina re-sequencing. Structural rearrangements in genomes and transcriptomes were identified using ABySS47. Gene expression values used for subtype assignment were calculated as reads per kilobase gene model per million mapped reads (RPKM) values48 and subtypes were assigned using an adaptation of the method developed for data from Affymetrix expression arrays49 trained with samples previously classified by this standard approach.
Sequence Read Archive
The SRA accession number for the submission of the data not included in previous publications is SRP001599, which is linked to the dbGAP study accession phs000235.v2.p1.
This study was funded in part by funding from the National Cancer Institute Office of Cancer Genomics (Contract No. HHSN261200800001E), the Terry Fox Foundation (grant 019001, Biology of Cancer: Insights from Genomic Analyses of Lymphoid Neoplasms) and Genome Canada/Genome British Columbia Grant Competition III (Project Title: High Resolution Analysis of Follicular Lymphoma Genomes) to J.M.C., R.D.G. and M.A.M. We acknowledge support from NIH grants P50CA130805-01 “SPORE in Lymphoma, Tissue Resource Core (PI Fisher)” and 1U01CA114778 “Molecular Signatures to Improve Diagnosis and Outcome in Lymphoma (PI Chan)”. A.J.M. is a Career Development Program Fellow of the Leukemia and Lymphoma Society. N.A.J. was a research fellow of the Terry Fox Foundation (award NCIC 019005) and the Michael Smith Foundation for Health Research (ST-PDF-01793). M.A.M. is a Terry Fox Young Investigator and a Michael Smith Senior Research Scholar. R.D.M. is a Vanier Scholar (CIHR) and holds a MSFHR senior graduate studentship. M.M.-L. acknowledges support from a Postdoctoral Fellowship from the Spanish Ministry of Education, under the “Programa Nacional de Movilidad de Recursos Humanos del Plan Nacional de I-D+i 2008-2011”. D.W.S. was supported by the Terry Fox Foundation Strategic Health Research Training Program in Cancer Research at Canadian Institutes of Health Research (Grant No. TGT-53912). J.J.S. acknowledges funding from The Canadian Cancer Society and the Canadian Institutes of Health Research. R.G. is supported by a UBC Four Year Fellowship. I.M.M. acknowledges the Canadian Foundation for Innovation for a Leaders Opportunity Fund. The laboratory work for this study was undertaken at the Genome Sciences Centre, British Columbia Cancer Research Centre and the Centre for Translational and Applied Genomics, a program of the Provincial Health Services Authority Laboratories. The authors would like to thank C. Greenman for supplying his software and also acknowledge D. Gerhard and S. Aparicio for discussions and guidance. Special thanks to C. Suragh, R. Roscoe, A. Troussard and A. Drobnies for expert project management assistance, and to the Library Construction, Sequencing and Bioinformatics teams at the Genome Sciences Centre. The content of this publication does not necessarily reflect the views of policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the US Government.
The table displays clinical details of exome and genome.
The table displays sample and mutation overview for all cases.
The table displays recurrence of cSNVs and validation summary.
The table displays details of validated and putative cSNVs.
The table displays information on skewed expression.
The table displays information on Selective Pressure.
The table displays information all confirmed MLL2 mutations.
The table displays Information on cell line cSNVs.
The table displays primers used.