Molecular-Based Score inspired on metabolic signature improves prognostic stratification for myelodysplastic syndrome

Deregulated cellular energetics is formally incorporated as an emerging hallmark of cancer, however little is known about its processes in myelodysplastic syndromes (MDS). Using transcriptomic data of CD34+ cells from 159 MDS patients and 17 healthy donors, we selected 37 genes involved in cellular energetics and interrogated about its clinical and prognostic functions. Based on the low expression of ACLY, ANPEP, and PANK1, as well as high expression of PKM and SLC25A5, we constructed our Molecular-Based Score (MBS), that efficiently discriminated patients at three risks groups: favourable risk (n = 28; 3-year overall survival (OS): 100%); intermediate (n = 60; 76% [62–93%]) and adverse (n = 71; 35% [17–61%]). Adverse MBS risk was independently associated with inferior OS (HR = 10.1 [95% CI 1.26–81]; P = 0.029) in multivariable analysis using age, gender and the revised international prognostic score system as confounders. Transcriptional signature revealed that Favourable- and intermediate-risk patients presented enriched molecular programs related to mature myeloid progenitors, cell cycle progression, and oxidative phosphorylation, indicating that this cells differs in their origin, metabolic state, and cell cycle regulation, in comparison to the adverse-risk. Our study provides the first evidence that cellular energetics is transcriptionally deregulated in MDS CD34+ cells and establishes a new useful prognostic score based on the expression of five genes.

Patients classified as adverse by MBS had significantly decreased platelets counts (median for FR:250 × 10 3 /µL; IR: 157 × 10 3 /µL and AR: 109 × 10 3 /µL; P = 0.001) and absolute neutrophil counts (FR:2.5 × 10 3 /µL; IR: 2.3 × 10 3 / µL and AR: 1.3 × 10 3 /µL; P = 0.003), while presented higher percentages of bone marrow blasts (FR: 2.5%; IR: 3% and AR: 8.5%; P < 0.001). MBS risk categories were differently distributed across World Health Organization (WHO) MDS entities and IPSS-R classification (both P < 0.001). According to recurrently mutated genes, MBS-AR showed lower frequency of mutations in SF3B1 (FR:50%; IR: 32% and AR: 15%; P < 0.001), and higher frequency of mutations in RUNX1 (FR: 0; IR: 2% and AR: 13%; P = 0.03) (Table 3). Collectively, these data suggest a link between MBS and pathophysiology of MDS. MBS Receiving-operating characteristics concordance statistic (ROC C-statistic) was 0.70 (95% CI 0.62-0.78; Table 4), representing a 20% improvement in OS prediction when compared with IPSS-R (Δ-AUC, 0.13; 95%CI 0.02-0.22; P = 0.01). According to IPSS-R risk stratification, MBS www.nature.com/scientificreports/ retained its prognostic prediction function when analysed in IPSS-R very-low-and low-risk patients (Fig. 3A) and was widely distributed across all risk categories (Fig. 3B). For non-low IPSS-R patients (i.e., intermediate, high, and very-high), MBS-favourable patients presented a distinctive superior outcome (Supplemental Fig. 3). Of note, none of favourable MBS patients classified as non-low IPSS-R deceased, while 4 of 6 low risk IPSS-R classified as adverse by MBS died with median survival of 18.4 months (Supplemental Table 3). www.nature.com/scientificreports/ Internal validation. Based on the unique characteristics of this cohort, mainly by microarray-based transcriptomic data from CD34+ cells, we decided to internally validate our data using the bootstrap resampling procedure. The bootstrap results are depicted in Table 5, and, for all time-points, the procedure yielded a mean 95%CI virtually identical to its original match. In addition, the pairwise hypothesis test showed a strong significance (P < 0.001) for the difference across the distributions' means for all comparisons. The procedure showed the stability of MBS prediction for 2-and 3-years OS and reinforce the validity of its prediction in a new, but similar, patient collective.  Table 1). For specific comparisons, favourable MBS patients were positively enriched with a transcriptional program of megakaryocytic-erythroid progenitor (MEP) 16 and negative enrichment with leukemic stem cell signature 17 compared with adverse patients (Fig. 4D). In accordance with the previous observations, favourable patients presented a positive enrichment with mitochondria metabolism 18 and downregulated genes in hematopoietic stem cell 19 (Fig. 4E). Adverse MBS patients presented negative enrichment with MEP and downregulated genes in leukemic stem cell (Fig. 4F). Applying stringent statistical criteria (upregulation: log2 fold change > 1.5; downregulation < -1.5, all P < 0.05), we identified differentially expressed genes (DEG) for the following comparisons: favourable vs intermediate (8 upregulated and 16 downregulated), favourable vs adverse (10 upregulated and 129 downregulated) and intermediate vs adverse (5 upregulated and 42 downregulated) ( Fig. 4G-I). Unsupervised hierarchical clustering of transcriptomic data clearly segregated favourable and adverse patients with distinctive DEG signature (Fig. 4J). Taken together, these results suggest that MBS risk categories can efficiently stratify differential transcriptional programs, especially related to cellular energetics and hematopoietic progenitor differentiation.

Discussion
Here, we described a new prognostic scoring system for patients with MDS based on gene-expression of five metabolic enzymes in CD34+ cells, useful to distinguish patients at three risk categories. Regardless of the wide clinical application of IPSS-R 20 for risk assessment in MDS, refining its prognostic function with additional clinical information 21 , flow-cytometry 22 , or mutations 23 has been of great interest, whereas gene expression analysis it has been underexplored for this purpose. Our proposed MBS efficiently discriminate very-low and low IPSS-R in three risk categories, as well as identified a subset of very favourable prognosis among non-low IPSS-R patients. As far as we know, only two gene expression-based risk scores were published for MDS patients 4,5 , and because both of them have used bulk of bone marrow mononuclear cells, its translation to MDS biology is limited. We Table 2. Genes associated with overall survival in Cox Proportional Hazard Model. IC95% confidence interval of 95%. Genes highlighted in bold were independently associated with overall survival and selected to Molecular Based Score. 1 Hazard ratios (HRs) > 1 or < 1 indicate that higher or lower gene expression predicts increased risk of death, respectively. www.nature.com/scientificreports/ have demonstrated that deregulated gene expression in at least two of selected genes is capable to independently predict poorer OS in MDS, with superior prediction capacity than IPSS-R. The high degree of molecular complexity in MDS represents a challenge to properly define the contribution of all alterations to the pathophysiology of these diseases. Moreover, the majority of MDS biomarkers is still based on mutational profiling 24,25 . Despite the limitation in implementing molecular investigations in clinical setting, particularly in low-and middle-income countries, several initiatives had efficiently established molecular tests validated for risk assessment for other myeloid neoplasms 26,27 .
The strong prognostic function of the MBS across the spectrum of MDS entities and risk categories indicates that perturbations caused by driver molecular alterations might result in metabolic reprogramming and that the MBS is capable to efficiently capture these downstream consequences. Based on Molecular-Based Score classification, we were able to identify patients with differential transcriptional programs that reflect an increased mitochondrial respiration capacity, protein synthesis and, molecular signature related to more mature hematopoietic progenitors in MBS favourable-and intermediate-risk comparing with adverse-risk. Stemness-related transcriptional signature is recognized as a relevant predictor of inferior survival in acute myeloid leukaemia 26 . Moreover, more mature hematopoietic progenitors, such as multipotent and myeloid progenitors, show increased baseline oxygen consumption, mitochondrial ATP production, and respiratory capacity than HSC 28   www.nature.com/scientificreports/ mitochondrial respiration capacity and cell cycle progression. As a consequence, this delayed haematopoiesis could result in more severe cytopenia in peripheral blood and accumulation of blasts in the bone marrow. Using advanced stage MDS patients, it has already been demonstrated that CD34 + CD123 + primitive stem cell is responsible for clonal maintenance and expansion. This compartment has distinctive metabolic characteristics, with activation of protein synthesis machinery and increased oxidative phosphorylation, in comparison to CD34 + CD123 − counterparts 13 . Conversely, in our study, we demonstrated that lower MBS risk was associated with increased oxidative phosphorylation and protein biosynthesis signatures. We may hypothesize that metabolic reprogramming in CD123 + cells occurs to a different extent for non-advanced stage MDS patients. Indeed, the IL3RA is not differentially expressed among MBS risk categories (Supplemental Fig. 2). As we used transcriptomic from CD34+ bulk cells, the molecular signatures that we observed are probably related to other more frequent subsets of cells. In addition, ectopic expression of SF3B1 mutations in breast cells was associated Table 3. Baseline characteristics of patients included for Molecular-Based Score. IPSS-R: Revised International Prognostic Score System; RA: refractory anemia; RCMD: refractory cytopenia with multilineage dysplasia; RCMD-RS: refractory cytopenia with multilineage dysplasia with ring sideroblasts; RARS: refractory anemia with ring sideroblasts; RARS-T: refractory anemia with ring sideroblasts and thrombocytosis; RAEB: refractory anemia with excess blasts. AML-MDS: acute myeloid leukemia with myelodysplastic alterations; NA: Not available. www.nature.com/scientificreports/ with disrupted mitochondrial respiration capacity 14 . SF3B1 mutated MDS is considered as having a good prognosis and was recently proposed as a specific disease subtype 29 . Favourable MBS-risk was associated with SF3B1 mutation (Table 3) and as having an oxidative phosphorylation signature. Then, we propose that disruption of mitochondrial complex III mediated by mutant SF3B1 could be dependent on the cellular context, and the metabolic consequences of SF3B1 mutations in CD34 + of MDS patients still of major importance. Ideally, validation of a new prognostic model should determine its capacity in a new data-set scenario. However, external validation is not feasible in most situations. The cohort used in this manuscript shows some unique Table 4. Overall survival for IPSS-R and molecular based score (MBS). IC95% confidence interval of 95%; IPSS-R International Prognostic Score System-Revised. 1 Log-rank test.   www.nature.com/scientificreports/ www.nature.com/scientificreports/ characteristics, such as: 1) transcriptomic data from microarray of CD34 + cells, 2) and availability of clinical and demographic data, such as survival, gender, haematological parameters and risk stratification, as well as mutation data. To overcome the impossibility of external validation, we considered internal validation using bootstrap resampling method to evaluate both predictive accuracy and to check overfitting. Of note, this procedure is aligned with the best analytical rigor and was widely used in clinical studies with singular characteristics [30][31][32][33] . Independent external cohorts' validations and evaluations in the context of response to different therapies would reinforce the clinical relevance of the proposed score. The proposition of more efficient and less toxic new therapies is dependent on the ability to exploit a specific weakness that is inherited preferentially in the neoplastic stem cell population. The identification of the MBS for MDS patients contributes to the knowledge of disease pathobiology and provides novelty data according to altered cellular metabolism of the MDS-initiating cell.

Methods
Clinical and molecular data. Patients' features, mutational status and CD34 + cells transcriptome data from 159 MDS patients and 17 healthy donors are publicly available at Gene Expression Omnibus (GEO-NCBI; GSE58831) 34 . Briefly, classification of MDS was updated at sample collection and made according to World Health Organization criteria 35 , while risk stratification determined by IPSS-R 20 . All patients and healthy controls were from Europe and the centres included: Oxford and Bournemouth (UK), Duisburg (Germany), Stockholm (Sweden) and Pavia (Italy). Baseline features for entire cohort are included in Table 3.
Expression of 37 genes that codify to enzymes related to glycolysis, mitochondrial tricarboxylic acid cycle and oxidative phosphorylation transcriptionally regulated and previously listed as a phenotypic modifiers across different cancer types [36][37][38] were selected to interrogate its differential gene expression and predictive outcome function (Table 1).

Transcriptomic analysis. Diagnosis CD34 + cells were enriched from mononuclear cells using CD34
MicroBeads (Miltenyi Biotec, Germany). For each sample, total RNA was extracted using TRIZOL (Invitrogen, UK) and 50 ng were amplified and labelled using Two-Cycle cDNA Synthesis and the Two-Cycle Target Labelling and Control Reagent kits (Affymetrix, USA). Ten µg of cRNA was hybridized to Affymetrix GeneChip Human Genome U133 Plus 2.0 arrays (Affymetrix, USA), covering 47 000 transcripts. Normalized gene expression was calculated using a multichip analysis approach 39 . Mutation data were obtained by targeted gene sequencing, using Illumina Platform, designed to cover 111 genes implicated in myeloid neoplasms pathobiology 40 .
The quantile normalized gene expression was used for a ranking using limma-voom package at Galaxy (https ://usega laxy.org/) comparing MBS groups (i.e. favourable versus intermediate, favourable versus adverse, and intermediate versus adverse). Pre ranked gene set enrichment analysis (GSEA) was performed using GSEA 4.0.3 software 41 . The gene sets curated by MSigDB hallmark, reactome, hematopoietic progenitors, mitochondrial, and apoptosis were selected for comparisons. Volcano plots computing differentially expressed across MBS entities were constructed correlating the Log 2 -adjusted P value and Log 2 -Fold-Change in GraphPad Prism 8.0 (GraphPad Software, USA). Heat map was constructed to represent top differentially expressed genes in MBS risk groups using the online available tool Morpheus (https ://softw are.broad insti tute.org/morph eus).

Statistical considerations.
Descriptive analyses were performed for patient baseline features. Fisher's exact test or Chi-square test, as appropriate, was used to compare categorical variables. Non-parametric Mann-Whitney test was used to compare continuous variables.
In order to optimize the cut off selection for gene expression, we opted to use "cutpointr" package and automatically determined the critical points for each 37 genes using receiver operating characteristic curve analysis 42 and the C-index 43 pre-selected for our score (Table 1). After dichotomization, we evaluated the predictive capacity of each gene (Table 2) in a univariate and multivariate way by Proportional Hazard Cox regression analysis using the "Cox_HR" function of "SurvivalAnalysis" package 44,45 . Genes (n = 11) significantly associated with survival in univariate analysis were individually considered in multivariate analysis using age, gender, and IPSS-R stratification as cofounders. Five genes independently predicted OS and were selected for MBS estimation.
MBS was calculated by computing 1 for every molecular risk factor, e.g. high expression of ANPEP and PKM, and low expression of ACLY, PANK1 and SLC25A5, varying from 0 (summing zero molecular risk factor) to 5 (summing all five molecular risk factor). MBS risk groups were determined by Kaplan-Meyer inspection 46 , and were defined as MBS-Favourable for patients without molecular risk factor, MBS-Intermediate for patients with one molecular risk factor and as MBS-Adverse with two or more molecular risk factors.
To determine the predictive capacity for MBS, a receiver operating characteristic (ROC) curve and the respective concordance statistics (C-statistics) were performed. The respective area under the curve (AUC) were derived from an R implementation of DeLong's algorithm 47 . To determine if MBS predictive capacity is superior to IPSS-R, we calculated differences between AUC (Δ-AUC) as Δ-AUC = AUC MBS − AUC IPSS-R . For this purpose, we performed 10,000 bootstrap resampling procedure and calculated the Δ-AUC for each interaction. Positive values represent that MBS performed better than IPSS-R 48 .
The bootstrap resampling procedure performed 1,000 resampling of the original cohort and calculated all clinical endpoints in two different time points (2-year, and 3-year) for three MBS-categories (favourable-, intermediate-and adverse-risk MBS). The procedure also estimated their respective 95% confidence interval (CI) computing the bias-corrected and accelerated bootstrap interval.
Proportional hazards (PH) assumption for each continuous variable of interest was tested. Linearity assumption for all continuous variables was examined in logistic and PH models using restricted cubic spline estimates of the relationship between the continuous variable and log relative hazard/risk. All P values were two sided www.nature.com/scientificreports/ with a significance level of 0.05. All calculations were performed using Stata Statistic/Data Analysis version 12 (Stata Corporation, USA), Statistical Package for Social Sciences 19 (SPSS 19) and R 3.5.2 (The CRAN project, www.r-proje ct.org) software.