Follicular lymphoma (FL) is the second most common form of non-Hodgkin lymphoma (NHL) and often follows an indolent disease course with slow progression.1, 2 However, patients may develop resistant disease; in addition, transformation to a more aggressive subtype of lymphoma occurs at a rate of 2–3% per year.3, 4, 5, 6 FL arises from germinal center B cells and the most common molecular defect is t(14;18)(q32;q21) in 85–90% of the cases, which results in the IGH-BCL2 fusion gene and the overexpression of the anti-apoptotic oncogene BCL2.7 However, the t(14;18) is also observed in healthy individuals and FL patients who are in long-term remission2 suggesting the existence of additional genomic alternations that impact disease course. Recently, massive parallel exome or whole genome sequencing of FL tumors and follow-up validation studies have discovered that point mutations in genes involved in epigenetic regulation and chromatin modification, including MLL2, EZH2, CREBBP, EP300 and MEF2B, dominate the FL landscape.8, 9, 10, 11, 12 Mutations in JAK-STAT pathway genes and B-cell receptor (BCR)/NF-κB signaling genes have also been identified, but with lower frequencies.11

Whole genome and exome sequencing studies to date have primarily focused on identification of small site mutations that are recurrent in FL tumorigenesis8, 9 or involved in tumor clonal evolution.11, 13 A comprehensive genomic and transcriptomic survey of multiple mutation types of different sizes in FL patients with detailed clinical annotations and long-term follow-up has not been accomplished. To gain insight into genetic biomarkers that may predict clinical features, we performed exome and whole genome mate-pair sequencing of fresh frozen tumor and paired peripheral blood DNAs, and transcriptome sequencing of tumor RNAs, from eight FL patients. This comprehensive genomic and transcriptomic profiling enables the detection of not only small site mutations (single nucleotide variants or SNVs; and small insertions and deletions or INDELs), but also large structural mutations such as copy number variants (CNVs), translocations and inversions in FL tumors. The patients sequenced were either below the median age of FL onset (n=7, median 54.5 years old) or had a family history of FL (n=1) (Table 1). All samples were from the patient’s initial diagnosis. These patients were clinically diverse, and included patients who had Grade 1 or 2 disease (n=4), classified as ‘indolent’; and patients with Grade 3a disease (n=2) or who subsequently had pathologic transformation (n=2), classified as ‘aggressive’ (Table 1). An overview of all SNVs, small INDELs, large SVs and CNVs in individual patients is illustrated in Figures 1a–c, which highlights the genetic heterogeneity of these eight newly diagnosed FL. Our analysis of SNVs and INDELs revealed mutations in previously reported genes including MLL2, CREBBP, TNFRSF14 and histone cluster genes (HIST1H2AM and HIST1H2BD) (Figure 1c and Supplementary File S1), although the mutation frequencies of some genes are lower in our samples. For example, MLL2 and CREBBP were reported to be mutated in >80% and >60% of the FL cases9, 11, 12 whereas we detected MLL2 mutations in three out of eight cases (38%) and CREBBP mutations in two out of eight (25%) cases. In addition to the previously reported genes, we also identified novel recurrent mutations in cysteine-rich PAK1 inhibitor (CRIPAK) in two out of eight tumors. In a secondary analysis performed by Sanger sequencing in a second cohort of FL and diffuse large B-cell lymphoma (DLBCL) tumors, we identified CRIPAK mutations in 55% of FL (11 out of 20) and 38.7% of DLBCL (12 out of 31) tumors (Supplementary File S6, Figure A). Bioinformatics analysis shows that the coding region of CRIPAK is highly enriched with the protein functional domain, post-SET, which is found in histone lysine methyltransferases genes including MLL2 and EZH2 that are known to be important in lymphomagenesis. Interestingly, CRIPAK is part of the same regulatory network of 31 genes including previously identified lymphoma genes such as MLL2, EZH2, CREBBP, EP300, TP53, MYC, STAT6 and BCL2, according to the shortest path algorithm by MetaCore (Philadelphia, Pennsylvania) (Supplementary File S4, Supplementary File S6 Figure A). We showed that the connectivity between the 31 genes in this network is highly significant with an empirical P-value of 0 based on 10 000 simulations of calculating average pair-wise shortest distances from randomly chosen proteins from the protein-protein-interaction database (Supplementary File S6 Figure A).

Table 1 Patient clinical characteristics
Figure 1
figure 1figure 1

The mutation landscape of eight FL tumors. (a) Circos plots of tumor-specific structural variants (SVs) identified from whole genome mate-pair sequencing data and somatic copy number variants (CNV) detected in the exome sequencing data. The case number of each patient is indicated in the center of each Circos plot. SVs are displayed either in blue (with inversion) or green (without inversion) lines connecting the two break points within the same chromosome or across two different chromosomes. The thickness of the lines correlate with the number disconcordant read pairs supporting the SV. The common t(14;18) translocations are labelled. The two outer rings of the Circos plots are the display of the CNVs observed in mate-pair and exome sequencing data, respectively. The green outward dots and lines are amplifications and the red inward dots and lines are deletions. Note that only the CNVs supported by both mate-pair and exome sequencing data were discussed in this manuscript. (b) Number of genes per tumor located within somatic CNV regions from the exome-sequencing data (upper panel); number of genes with somatic point mutations per tumor (including SNVs and small INDELs) (middle panel); and number of genes involved in large SVs per tumor (lower panel) are plotted. (c) SNV and small INDEL mutation status of previously reported genes and CRIPAK; the observed SVs; as well as several clinical factors of eight FL tumors.

Recurrent structural variants identified in the FL tumors included the well-known IGH-BCL2 translocation or t(14;18), in six out of eight cases (75%) and chromosome 1q amplifications in four out of eight (50%) tumors (Figure 1a, Supplementary Files S3 and S7). Other nonrecurrent large CNVs involving entire chromosomes or at arm levels, as well as other inter- and intra-chromosomal structural variants were also detected in individual tumors (Figure 1a, Supplementary Files S3 and S7). In addition, we identified six fusion transcripts from the transcriptome sequencing data in three out of eight cases (38%) (Supplementary File S5, Supplementary File S6 Figure B). Copy number alteration gains in 1q21-q31 have been reported in ~50% of FL14 and ~20% of DLBCL15 using nonsequencing based technologies including array comparative genomic hybridization and fluorescence in situ hybridization. The functional significance of the CNV gains in this region is still unknown. However, the prominent amplification of the region may indicate a potential role for this variant in lymphomagenesis.

Although our sample size was small, we found that SNVs and INDELs in MLL2, CREBBP, TNFRSF14, CRIPAK and histone cluster genes, as well as t(14;18), did not distinguish the indolent tumors from the aggressive tumors. However, gains in chr1q (4/4, 100%) and the presence of RNA fusion transcripts (3/4, 75%) were specific to aggressive tumors and were not detected in any of the indolent tumors. In addition, we found that the grade 3A tumors and the subsequently transformed tumors had higher numbers of genes with point mutations (SNVs and short INDELs) (40±7.6 vs 26±2.6; aggressive vs indolent), higher numbers of genes impacted by copy number alterations (1060±263.7 vs 233±133.9; aggressive vs indolent) and higher numbers of large SVs (24±10 vs 6±1.6; aggressive vs indolent) (Figure 1b). Taken together, our comprehensive analysis of eight FL tumors reveals genetic diversity among newly diagnosed FL patients, identifies novel and recurrent mutations in CRIPAK, and finds that high tumor complexity and DNA instability may be indicators of more aggressive disease.