Alternative splicing discriminates molecular subtypes and has prognostic impact in diffuse large B-cell lymphoma

Effect of alternative splicing (AS) on diffuse large B-cell lymphoma (DLBCL) pathogenesis and survival has not been systematically addressed. Here, we compared differentially expressed genes and exons in association with survival after chemoimmunotherapy, and between germinal center B-cell like (GCB) and activated B-cell like (ABC) DLBCLs. Genome-wide exon array-based screen was performed from samples of 38 clinically high-risk patients who were treated in a Nordic phase II study with dose-dense chemoimmunotherapy and central nervous system prophylaxis. The exon expression profile separated the patients according to molecular subgroups and survival better than the gene expression profile. Pathway analyses revealed enrichment of AS genes in inflammation and adhesion-related processes, and in signal transduction, such as phosphatidylinositol signaling system and adenosine triphosphate binding cassette transporters. Altogether, 49% of AS-related exons were protein coding, and domain prediction showed 28% of such exons to include a functional domain, such as transmembrane helix domain or phosphorylation sites. Validation in an independent cohort of 92 DLBCL samples subjected to RNA-sequencing confirmed differential exon usage of selected genes and association of AS with molecular subtypes and survival. The results indicate that AS events are able to discriminate GCB and ABC DLBCLs and have prognostic impact in DLBCL.


INTRODUCTION
Diffuse large B-cell lymphoma (DLBCL) is the most common lymphoid neoplasm in adults, comprising 30-40% of all malignant lymphomas. It is an aggressive disease, where ∼ 60% of patients can be cured with a combination of rituximab and anthracyclinebased CHOP or CHOP-like chemoimmunotherapy. 1,2 However, responses to treatment are still largely unpredictable, and survival of the patients, who experience disease relapse or have primary refractory disease, is dismal. Thus, there is a growing need to further understand the molecular mechanisms underlying the disease that would not only provide information to predict the survival, but also enable the design of more targeted therapeutic strategies.
Based on gene expression profiling, three molecular subtypes showing germinal center B-cell (GCB), activated B-cell (ABC) and primary mediastinal B-cell lymphoma signatures have been identified. These subtypes differ in their genotypic, phenotypic and clinical features. [3][4][5][6] During the past few years, studies applying next-generation sequencing techniques have provided further insights into the heterogeneity and pathogenesis of DLBCL. [7][8][9] Especially, a number of genetic alterations have been shown to be characteristic of GCB or ABC subtypes. However, despite the rapidly growing number of genetic aberrations reported in DLBCL, association of these findings with treatment outcome remains to be shown.
Alternative splicing (AS) is a common regulatory mechanism generating multiple RNA transcripts from a single gene and allowing enormous functional diversity in protein isoforms. The vast majority of the human genes, evaluates ranging from 70 to 95%, are alternatively spliced. 10 Growing evidence suggests that AS is closely associated with cancer pathogenesis and progression. [11][12][13] Alternatively spliced signatures derived from gene expression profiling have been shown to be more reliable for diagnostic purposes than signatures derived from expression profiling, for example in prostate cancer. 14 Furthermore, isoformlevel expression profiles can discriminate cancer cell lines from noncancer cells better than gene-level expression profiles. 15 In DLBCL, the role of AS remains largely unexplored. Previous studies have mainly focused on AS isoforms of individual genes, such as FOXP1, where contradictory findings concerning its prognostic role can partly be explained by smaller and potentially oncogenic FOXP1 isoforms primarily expressed in ABC-type DLBCL. 16 Other previously reported genes with AS in DLBCL include a developmentally regulated B-lymphoid phosphatase, PTPRO, regulating G 0 /G 1 arrest 17 and a leukocyte homing and hyaluronidase receptor CD44. [18][19][20][21] In this study, we have evaluated genome-wide gene and exon expression profiles in DLBCL. Differential exon usage was found to be a common event in DLBCL. Splicing events affected pivotal genes involved in DLBCL signaling, and were able to discriminate the patients according to cell of origin (COO) and survival.

MATERIALS AND METHODS Patients
Prospectively collected discovery cohort consisted of 38 DLBCL patients o65 years old with clinically high-risk (age-adjusted International Prognostic Index Score 2-3) disease (Table 1). The patients were treated  in a Nordic phase II NLG-LBC-04 protocol with six courses of dose-dense  R-CHOEP-14 chemoimmunotherapy followed by systemic central nervous  system prophylaxis with high-dose methotrexate and high-dose cytarabine. 22 Further information on the patient cohort is provided in the Supplementary Materials and methods. The protocol and sampling were approved by Institutional Review Boards, National Medical Agencies and Ethics Committees in Finland, Denmark, Sweden and Norway, and the trial was registered at ClinicalTrials.gov, identifier number NCT01502982. All patients gave informed consent.

Samples and gene/exon expression analysis
Total RNA was extracted with Qiagen (Hilden, Germany) AllPrep DNA/RNA/ Protein Mini kit and examined using Affymetrix Human Exon 1.0 ST arrays according to the manufacturer's instructions (Affymetrix, Santa Clara, CA, USA). Hybridization protocols and raw expression microarray data are available at ArrayExpress archive (http://www.ebi.ac.uk/arrayexpress/experi ments/E-MEXP-3488). Gene-and exon-level expression data for the NLG-LBC-04 discovery cohort were quantified by MEAP (Multiple Exon Array Preprocessing) algorithm 23 with MEAP probe annotation version 70.
In the CGCI validation cohort, adaptors and low-quality bases were trimmed from the samples before alignment. The reads were aligned with TopHat 2.0.8b using the hg19 human reference genome with Ensembl gene annotation version 70. HTSeq was then used to obtain the exon-and gene-level read counts. Differential expression at exon and gene levels was estimated using DEXSeq (version 1.7.14) and DESeq2 (version 1.4.5), respectively (https://bioconductor.org/packages/3.5/bioc/).

Molecular subgroup prediction
Samples with exon array data were classified into GCB, ABC and nonclassified subgroups using the gene predictor from Lymphochip data as previously described. 24 More details are provided in the Supplementary Materials and methods.
Pathway and GO analysis KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway functional enrichment analyses were performed with the Pathway-Express (Intelligent Systems and Bioinformatics Laboratory, Detroit, MI, USA). Only enrichment results with false discovery rate o0.05 were considered significant. Gene annotation and gene ontology (GO) term enrichment for biological processes was performed with Gene Set Enrichment Analysis (GSEA) 25 for the 289 genes with differentially expressed exons (DEEs) log2 fold change 41.5. This resulted into altogether 38 separate ontology terms with false discovery rate o0.05. To reduce the number of GO terms, we regrouped the related ontologies into 9 larger groups.

Statistical analyses
Data were analyzed using IBM SPSS Statistics 22.0 (IBM, Armonk, NY, USA). P-values of o 0.05 were considered statistically significant and all P-values are two tailed. The χ 2 test was used to assess the differences in the frequency of the prognostic factors. Cox univariate and multivariate analyses were performed to study the prognostic value of the factors. Kaplan-Meier method was used to estimate survival rates and their differences were compared with log-rank test. Overall survival (OS) was determined from the date of study entry or diagnosis until the last followup or death from any cause. Disease-specific survival was calculated as a period between the registration date and the date of death due to lymphoma. Progression-free survival (PFS) was measured as the period between the date of registration or diagnosis and progression or death from any cause. OS, disease-specific survival and PFS were reported in months. A web-based cutoff finder tool at http://molpath.charite.de/cutoff analysis 26 was used to determine the most prognostic cutoff level for survival outcomes.

Exon-specific domain identification
DEEs were annotated by their genomic locations (5′ untranslated region (UTR), 3′ UTR, coding, noncoding and unknown). Domain analysis was done for all coding DEEs by translating coding exonic regions into peptide sequences using Ensembl API (version 70) and fetching domain information (Pfam, Smart, SignalP and TMHMM) for all peptide sequences with InterProScan (version 5). 27 Phosphorylation sites within peptide sequence of each exon were searched against all known phosphorylation motifs downloaded from the PhosphoSitePlus. 28

RESULTS
Clinical characteristics of the patients Baseline characteristics of the discovery cohort of 38 patients treated in the NLG-LBC-04 protocol 22 are presented in Table 1. Median age of the patients was 55 years (range 21-65 years). During the median follow-up time of 66 months, 9 patients had relapsed and a total of 9 died. Four of the deaths were not lymphoma related. The estimated 5-year OS, disease-specific survival and PFS rates were 76%, 82% and 74%, respectively. In the CGCI validation cohort of 92 patients, the median age was 61 years (range 17-75 years). During the follow-up time of 59 months, 24 patients had relapsed and 21 died. Of the deaths, three were not lymphoma related. The estimated 5-year OS, disease-specific survival and PFS rates were 79%, 80% and 74%, respectively. The main differences between the discovery and validation cohort was that the patients in the discovery cohort were younger and had clinically higher risk disease ( Table 1).

Identification of splicing events
Global mRNA and AS variations with prognostic impact were identified by comparing gene and exon expression profiles between the patients who experienced relapse (n = 9, poor prognosis group) or remained in long-term remission (424 months; 29 n = 29, good prognosis group) after chemoimmunotherapy ( Figure 1). No significant differences were observed in baseline characteristics between the groups (Table 2). Using the gene-level expression data, 220 differentially expressed genes (DEGs) between the groups were identified (Supplementary  Table S1). Of these, 59% were suppressed and 41% upregulated in patients with poor prognosis. The most upregulated genes included not only many nonprotein coding genes such as microRNAs and long noncoding RNAs but also MME, which is coding for CD10 and is included in the COO classification, 24 and TRAF4, which has been shown to downregulate the innate immunity signaling. 30 Innate immunity was also affected because of suppression of other genes in the pathway such as HLA-DQB1, and especially genes involved in T cell-mediated immune responses (FOS, IL2RB) or B-cell maturation (IL7 and KLF3). The pathways significantly enriched among the DEGs included immune responserelated processes, such as antigen processing and presentation, JAK/STAT (Janus kinase/signal transducer and activator of transcription) signaling, circadian rhythm and hematopoietic cell lineage (Table 3 and Supplementary Table S2).
To identify AS events between the good and poor prognosis groups, we applied similar criteria to exon-level expression data with additional filtering of the DEGs from the gene-level analysis ( Figure 1). As a result, 8785 unique exonic regions from 3888 genes with differential expression were found (Supplementary  Table S3). Among the genes with DEEs were genes with known alternative isoforms associated with cancer, including RUNX1, 31   TERT 32,33 and VEGFA, 34 as well as CD44 and PTPRO that have previously been shown to be alternatively spliced in DLBCL. 17,19,20,35 In the case of CD44, differential exon inclusion was seen particularly in the variable region of the gene (exons v2-v10), involved in variant isoforms of CD44 that have been shown to substantiate to poor prognosis in DLBCL in the prerituximab era. 18,21 In the PTPRO gene, the DEE was localized in the catalytic domain region, present also in the lymphoid-predominant truncated isoform, shown to modulate SYK phosphorylation and B-cell receptor activity. 36 Other genes with the most suppressed exon expression in the poor prognosis group included EPN1 and ARAP1, which are involved in endocytosis, and HLA-DQA1, CD4, EBI3, ICAM1 and UBA5, which are related to the immune system. Differential splicing was also observed in many splicing factors and regulators, such as NOVA1, TRA2B, SF3B1, ESRP1, RBFOX1 and CELF5, suggesting that the splicing mechanism might be affected through modulation of splicing factors themselves. In line with the pathways significantly enriched among the DEGs, the AS genes were significantly enriched for immune response-related pathways, such as antigen processing and presentation, and leukocyte transendothelial migration. In addition, the AS genes were enriched for the pathways related to adhesion, extracellular matrix interactions and signal transduction such as phosphatidylinositol signaling system and adenosine triphosphate (ATP)-binding cassette transporters (Table 3). Interestingly, when the patient groups with poor and favorable outcomes were clustered according to the DEGs and DEEs, the expression profile of the most variable exons showed better separation of the groups than the profile of the most variable genes (Figure 2a). This indicates that AS plays an important role in DLBCL pathogenesis. Gene-and exon-level profiles according to COO subtypes The observed differences in gene and exon levels between the patient groups with different outcomes raised the question of whether a similar phenomenon occurs when the patients are divided according to their molecular subtypes. When the gene expression-based COO classification was performed using previously described gene predictor, 24 19 of the cases were predicted as GCB, 12 as ABC and 7 as other DLBCL subtypes (Supplementary Figure S1). In this cohort, no significant association between the molecular subtypes and survival was found (not shown). With similar criteria as previously used in the screen for the outcomerelated DEGs and genes with DEEs, 1012 unique DEGs (Supplementary Table S4) and 20 386 exonic regions from 6726 unique genes (Supplementary Table S5) were discovered to be differentially expressed between the GCB and the ABC DLBCLs. As expected, the DEGs included many of the genes used in the 'Wright' subtype classification (SERPINA9, MYBL1, FUT8, IRF4), as well as other genes (FOXP1, MAPK10) shown to have subtype preferential expression profile. 37,38 On the contrary, the DEEs corresponded for only two genes (CCND2, DDB1) used in the subtype classification. In the pathway analysis, the genes from differential gene expression and exon usage between GCB and ABC subtypes were enriched in, for instance, phosphatidylinositol signaling system, regulation of actin cytoskeleton, adherens junction, focal adhesion and pathways in cancer (Supplementary  Table S6). Altogether, the number of DEGs and DEEs was higher in the COO classification than in the survival-related profiles. Clustering of the patient groups with molecular subtypes according to the DEGs or DEEs was able to separate the ABC and GCB subtypes (Figure 2b).
Functional relevance of the alternative splicing To analyze in more detail the functional effect of the DEEs identified from the comparison of the groups with poor and favorable outcomes, we studied the association of the exonic regions with their corresponding protein domains using our exonspecific functional analysis pipeline. Altogether, 49% (4351) of the AS-related exons were protein coding, whereas 12% were mapped to 5′ UTR, 7% to the 3′ UTR and 32% to other noncoding regions (Figure 3a and Supplementary Table S7). Domain prediction showed 28% of translated exons (1193) to include a functional protein domain. In addition, 18% of the coding exons were predicted to include phosphorylation sites, of which 53% were serines, 25% treonines and 22% tyrosines (not shown).
In order to define the biological functions potentially affected by AS, we performed GO annotations for biological processes with the genes with most differentially expressed exons (log2 fold change ⩾ |1.5|, n = 289). To reduce the redundancy among several categories, related ontologies were regrouped into larger groups (Figure 3b). From the separate ontologies, the highest number of genes was annotated to signal transduction (for example, SMAD5 and BCAR1). After regrouping, different cellular metabolic processes formed the most commonly implicated processes (for example, RAD17 and TRAF1). Transport and localization annotation included, for example, EPN1 that is involved in receptor-mediated endocytosis 39 and ATP-binding cassette transporters ABCA2, ABCA3 and ABCA7 that are mostly known for their role in evolving drug resistance. 40 Closer examination of ATPbinding cassette transporters revealed that many of the affected exons included a predicted functional protein domain or phosphorylation sites (not shown), suggesting that AS events might affect drug resistance by regulating the function and activity of the ATP-binding cassette transporters.

Validation of exon inclusion events
To validate the outcome-associated DEEs, we used data from a publicly available CGCI (RNAseq cohort of 92 patients). 8 Of the 3888 AS genes, 547 were validated in the CGCI cohort. Thirtyseven DEEs (33 unique genes), corresponding to 17 protein coding transcripts, were exactly in the same location with the expression difference in the same direction (Supplementary Table S8). According to Cox univariate analysis, 29 out of the 37 DEEs were associated with PFS (P ⩽ 0.05) and 20 with OS in the CGCI cohort (Table 4). In Cox multivariate analyses with International Prognostic Index score, 22 DEEs retained independence for prediction of PFS (Supplementary Table S9). These included several interesting genes, such as BCAR1 (coding for p130Cas, a docking protein involved in many intracellular signaling pathways), APH1A (anterior pharynx defective-1α, a component of the γ-secretase complex that cleaves integral membrane proteins such as Notch receptors and β-amyloid precursor protein), KCNH6 (member of a potassium voltage-gated channel family, shown to be associated with drug sensitivity) and CUL3 (cullin 3, a core component of an E3 ubiquitin ligase complex). Alternative splicing in DLBCL S-K Leivonen et al Some of the validated survival-associated DEEs that targeted protein coding regions could be linked to their protein-level functions. For example, the first coding exon of APH1A overexpressed in patients with poor prognosis corresponded to the entire first APH-1 transmembrane domain (Figure 4a). Similarly, the exon with differential expression in the KCNH6 gene mapped to the amino acids 226-367 including three transmembrane helix regions, and exon skipping was associated with better survival (Figure 4b). This suggests that differential coding exon usage could alter the protein functions and thus result in drug resistance and disease progression. On the contrary, ABCB1 represents a gene, where exon 2 skipping occurring in the promoter region is an unfavorable event and correlates with poor survival (Figure 4c).

DISCUSSION
There has been an increasing interest in studying whole genomelevel alterations in DLBCL during the recent years. 9 The contribution of these studies in the identification of key elements in DLBCL pathogenesis has been crucial, yet the impact of AS on DLBCL pathogenesis and survival has to date remained largely unexplored. In this work, we have performed microarray-based exon and whole transcriptome profiling on freshly frozen lymphoma tissue collected prospectively from DLBCL patients treated homogenously in a Nordic phase II study. 22 We show association of AS genes with molecular subtypes and survival. Interestingly, exon-level profile can separate clinically high-risk DLBCL patients into subgroups with poor and favorable survival more accurately than gene-level profile. We observe that many of the genes are differentially expressed only at the exon level but not at the gene level and would have been missed if only gene-level analysis or conventional 3′ oligonucleotide microarrays had been used. Finally, we validate molecular subtype and survival association of the key DEGs and DEEs in an independent DLBCL cohort. Our observations provide important basis for understanding the mechanisms of DLBCL pathogenesis and prognosis, and development of novel therapeutic strategies.
Approximately half of the AS events target noncoding regions. A significant proportion of the alterations affected 5′ and 3′ UTRs, suggesting modulation of epigenetic regulatory pathways in DLBCL. Although alternative exon usage in the coding regions can generate different protein isoforms, splicing in the noncoding 3′ UTRs can compromise microRNA-dependent gene regulation and change the composition of translational regulatory elements, or result in differential promoter usage in 5′ UTRs that in turn can further alter protein expression. 41 Moreover, the upstream promoter site may be selectively affected and differential promoter usage produce N-terminal splice variants. 42 Our results further suggest that the splicing mechanism might be affected through modulation of splicing factors themselves, providing yet another regulatory level for gene expression.
A novel finding from the pathway analysis of the DEGs was the enrichment of the genes in the circadian rhythm pathway. Disturbances of the mammalian clock genes have been linked to tumorigenesis. In DLBCL, circadian genes CEBPA and its downstream target PER2 are highly deregulated, 43 and BMAL1 (ARNTL) is epigenetically inactivated. 44 Furthermore, genetic variants of CRY2 have been associated with a risk of non-Hodgkin's lymphoma. In our exon array results, the expression of the circadian clock genes PER1 and ARNTL was reduced in relapsed patients and thus associated with dismal prognosis.
ATP-binding cassette transporters, in turn, represent a novel group of genes we found to be regulated at the exon level and thereby potentially affected by AS. To date, ATP-binding cassette transporters have been mostly recognized from their contribution to drug resistance. The highly conserved ATP-binding cassette domains of these transporters provide the ATP-powered translocation of many substrates across the membranes, whereas the transmembrane domains creating the translocation pathway are more variable. According to our domain prediction analysis of splicing-associated ATP-binding cassette transporters, many of the affected domains included either a transmembrane domain or an ATP-binding cassette domain that might directly have an effect to the proper function of the transporter. Most of the AS-associated ATP-binding cassette transporters belonged to ABCA family. These included genes encoding ABCA2 and ABCA3 that have been shown to contribute to drug resistance in T-cell acute lymphoblastic leukemia. 45 In addition, increased expression of ABCA3 has been linked to enhanced exosomal evasion of humoral immunotherapy. 46 We validated the DEGs and DEEs in an independent patient cohort consisting of RNAseq data. APH1A was one of the interesting examples of the AS genes that could be validated by RNAseq and linked to protein-level functions. It encodes an essential component of the multi-transmembrane γ-secretase complex that is required for the cleavage and activation of integral membrane proteins, including Notch. 47 Considering the increasing evidence of the deregulated Notch signaling in cancer progression, γ-secretases playing an important role in Notch activation and that APH1A is critically required for γ-secretase activity, our finding showing that alternative exon usage of APH1A gene has prognostic impact in DLBCL may have therapeutic implications. Thus, future studies should be directed to enhance our understanding of the function of APH1A isoforms and their relation to Notch signaling in DLBCL that may further improve opportunities for the design of selective lymphoma therapeutics.
KCNH6, also known as HERG2, encodes a pore-forming subunit of a voltage-gated potassium channel. Potassium channels comprise the largest family of ion channels encoded by genes with phenotypic diversity generated through alternative splicing, variable association of channel subunits and posttranslational modifications. Recent data indicate that blocking of the ion channel activity can impair cancer cell growth. 48 Interestingly, HERG potassium channels are constitutively active in acute myeloid leukemia, 49,50 and have favorable prognostic impact on survival. In our study, no differences were observed in HERG2 gene expression levels between poor and good prognosis patients. However, low expression of exon 5, which corresponds to the first transmembrane and ion transport domains, was associated with favorable survival. The result suggests that these functional domains mediate prosurvival signals that can be overcome by exon 5 skipping. ABCB1 in turn represents a gene where exon 2 skipping occurring in the promoter region is an unfavorable event correlating with lymphoma progression.
Taken together, the results presented herein are promising and novel, and suggest a significant role for alternative exon usage in the molecular profiles of DLBCL. The importance of studying connections between splicing and DLBCL is emphasized by the possibility that some AS isoforms drive progression and may represent attractive therapeutic targets even if total gene expression is not affected. Moreover, unique or sets of isoforms may be used as biomarkers for disease progression.