Gene expression derived from alternative promoters improves prognostic stratification in multiple myeloma

Clinical and genetic risk factors are currently used in multiple myeloma (MM) to stratify patients and to design specific therapies. However, these systems do not capture the heterogeneity of the disease supporting the development of new prognostic factors. In this study, we identified active promoters and alternative active promoters in 6 different B cell subpopulations, including bone-marrow plasma cells, and 32 MM patient samples, using RNA-seq data. We find that expression initiated at both regular and alternative promoters was specific of each B cell subpopulation or MM plasma cells, showing a remarkable level of consistency with chromatin-based promoter definition. Interestingly, using 595 MM patient samples from the CoMMpass dataset, we observed that the expression derived from some alternative promoters was associated with lower progression-free and overall survival in MM patients independently of genetic alterations. Altogether, our results define cancer-specific alternative active promoters as new transcriptomic features that can provide a new avenue for prognostic stratification possibilities in patients with MM.


Introduction
Multiple myeloma (MM) is a hematological malignancy characterized by an abnormal accumulation of clonal plasma cells (PC) in the bone marrow. In recent years, the survival of MM patients has increased significantly [1] but, regrettably, MM is still considered an incurable disease [2]. Given the underlying heterogeneity of MM, and in spite of the clinical value of genetic alterations [3], the discovery of novel biomarkers to further improve its prognostic stratification remains challenging. Several studies have attempted to address this issue using different high-throughput- technologies [3]. While the study of the transcriptome using RNA-seq is very common in cancer research, it is not currently applied as a risk stratification tool in patients with MM [4].
Recently, Demircioğlu et al. [5] presented a novel approach to study transcriptome regulation in cancer cells, defining gene active promoters (AP) from RNA-seq data and showing that expression derived from alternative active promoters (AAP) can be used as a biomarker to improve the stratification of cancer patients. Inspired by this approach, we have exploited RNA-seq data from B-cells and MM patients to investigate the possible role of AAP as a new prognostic biomarker for improving the survival stratification of MM patients.

Promoter activity definition
We used the R package proActive to identify promoters within the annotated genes in Gencode v27. Moreover, we used STAR v 2.6.1a and our strand-specific RNA-seq (ssRNA-seq) data from PC of MM patient samples (n = 32)

Survival analyses
Survival analyses were performed with the IA14 release data of the multiple myeloma research foundation (MMRF) CoMMpass Study dataset (595 samples). Please be referred to Supplemental Methods for further information.

Results and discussion
Based on the strategy described by Demircioğlu et al. [5], we initially defined the AP in different B-cell subpopulations including normal PCs from healthy donors and PCs from MM patients taking the Gencode v27 annotation as reference [7]. We identified 115,496 possible promoters and defined their activity using our ssRNA-seq data by quantifying the expression that is initiated at each promoter for each B cell subpopulation and PC, therefore identifying AP in each specific subpopulations (Supplemental data 1). Using this promoter activity, principal component analysis (PCA) revealed robust segregation of PC from the rest of B-cells including a clear distinction between normal and MM PC (Fig. 1A). Furthermore, differential analyses revealed the presence of AP specific for each B-cell subpopulation and MM PC (Fig. 1B). These results are consistent with our previous results using global gene expression [6,7].
The accuracy of promoter activity was validated using our previously described chromatin data generated from the same populations of B cells and PC including H3K4me3 and H3K27Ac ChIP-seq data, histone marks associated with AP [6,8]. RNA-based estimates from patient samples accurately reflect the promoter activity, as 50-75% of the APs detected in each cell subpopulation show chromatin states indicative for AP (Fig. 1C). Focusing on MM, we observed that some of these promoters were de novo AP (Fig. 1D, E), i.e., AP associated with genes expressed only in MM and not in normal B-cells, such as PRDM5, IGF1, or BMP6, among others, as we recently described [6]. These results indicate that expression and chromatin-based estimates are consistent and therefore, RNA-seq can provide, in addition to gene expression, an important estimation of the chromatin states and promoters location from which gene transcription is taking place in each cell type.
The presence of AAP in cancer has recently been associated with the transcription of different gene isoforms [5]. Interestingly, we observed that the expression of AAP was also specific to MM and B-cell subsets and that it also correlated with epigenomic data (Fig. 1C, F, Supplemental data 2). Such is the case of ARID5B, a gene that shows a shift in promoter usage throughout B cell differentiation and MM (Fig. 1G-I).
The role of the transcriptional profile in the prognosis of MM has been clearly demonstrated and specific transcriptional signatures have been associated with specific subgroups of MM patients [9]. Based on our previous analysis, we hypothesized that expression of AAP could represent a new prognostic factor in MM, as different isoforms of the same genes may contribute to distinct clinical impact. To address this question we took advantage of the RNA-seq data of the CoMMpass study that includes 595 MM patient samples acquired at diagnosis. Following the AAP selection criteria defined by Demircioğlu et al. [5], we identified 1539 AAP within MM patient samples. We divided the cohort of MM patients into training and test datasets. Using the training data, we performed a univariate coxph analysis for each AAP in both PFS and OS (Supplemental methods). The expression derived from 6 AAP within MM patient samples was significantly associated with PFS, and 18 AAP with OS. To avoid bias derived from differentially expressed genes, we kept AAP whose associated gene expression did not show statistical significance for PFS and/or OS, which yielded a final list of 3 AAP significantly correlated with PFS and 10 AAP with OS of MM patients (Supplemental data 3). As an example, while SLAMF7 expression was not associated with PFS or OS, a higher activity of AAP1 of SLAMF7 was associated with improved PFS and OS in MM patients ( Fig. 2A-D). These results indicate that assessment of AAP may contribute to identifying new prognostic factors.
Next, using multivariate Cox regression model and stepwise model selection with Bayesian information criterion (BIC) [10], we evaluated the hazard prediction power, for both PFS and OS, of the combination between AAP and risk genetic markers in MM patients [11][12][13][14][15]: ISS, t(4;14), t(14;16), t(14;20), del17p, deletion of CDKN2C, del1p, amp1q and mutations of TP53. Interestingly, we detected that AAP significantly over-perform the information provided by genetic alterations in terms of PFS and OS. In the multivariate analysis, we identified ISS stage and AAP of REEP5 and SLAMF7 genes (Supplemental Fig. 1, Supplemental data 4) with independent prognostic value for PFS, and provided a predictive model whose cumulative activity discriminated 3 different risk groups (Fig. 2E, F). This was also validated in the test cohort (Fig. 2G). Regarding OS, we identified the AAP of five genes, RWDD1, SLAMF7, ACSS1, BTNG3A1, and RPL30 (Supplemental Fig. 1, Supplemental data 4) that together with the ISS stage and amplification of 1q discriminated patients with different prognosis (Fig. 2H; Supplemental Fig. 2A). Again, these factors generated a risk assessment model with four distinct risk groups for OS (Fig. 2I) in the training cohort that was validated in the test cohort ( Fig. 2J; Supplemental Fig. 2B). Finally, we also performed an ANOVA test to compare the models derived from genetic risk factors only or combining genetic risk factors with AAP, finding a significant improvement for the combination of both risk factors for PFS (p-value = 2e-5) and OS (p-value = 4e-10).
In summary, in this study, we demonstrate that RNA-seq data can be exploited in a non-conventional way to identify C PFS by  AP and AAP in MM and that the expression derived from AAP shows a greater contribution as a survival risk biomarker than high-risk genetic classifiers used currently in the clinical outcome of MM patients.