Introduction

Myelodysplastic syndromes (MDS) are a heterogeneous group of clonal myeloid neoplasms, which are characterized by bone marrow failure, abnormal cell morphology, and increased risk for evolution to acute myeloid leukaemia1. The recent efforts to uncover the molecular heterogeneity of MDS, mainly by new sequencing technologies, has continually allowed the comprehensive identification of driver mutations or altered gene expression recurrently found in a recognizable fraction of patients2,3. Deregulated gene expression is prognostically useful in haematological neoplasms, but still underexplored in MDS4,5. Moreover, very few data, if any, are available considering deregulated gene expression processes of MDS-initiating cell.

Cancer cells preferentially upregulates glucose uptake and glycolysis to give rise to increased yield of intermediate glycolytic metabolites, and, as consequence, glycolysis is uncoupled from the mitochondrial tricarboxylic acid (TCA) cycle and oxidative phosphorylation (OXPHOS) in cancer cells6,7. This effect, also known as Warburg effect, results in reduced mitochondrial oxidative metabolism6,8,9, and deregulated cellular energetics is formally incorporated as an emerging hallmark of cancer10,11. Yet, besides the concept of how glucose metabolism influences cellular functions, studies still necessary in order to properly define if the up-regulation of anaerobic glycolysis is a true cancer cell-specific deviation or related to normal stem/progenitor cell maintenance and self-renewal mechanisms12.

The in-depth evaluation of MDS-initiating metabolism provided by Stevens et al. demonstrated that the CD123 + hematopoietic progenitor compartment is the clonal reservoir for MDS maintenance and evolution13. This CD123 + stem cells have distinctive metabolic properties, and the upregulation of protein synthesis, RNA translation, and increased oxidative phosphorylation were directly linked to MDS stem cell self-renewal and survival13. Mutations in the SF3B1 gene, represents a subset of MDS with favourable prognosis, results in reprogramming of mitochondrial metabolism related to decreased cellular respiration capacity in a process mediated by the mis-splicing of and downregulation of UQCC114. Therefore, identification of metabolic vulnerabilities in MDS-initiating cells represents a promising strategy to better understand the pathophysiology and propose new therapeutical vulnerabilities for MDS patients.

Our rationale was to design a prognostic score interrogating the clinical and prognostic importance of transcriptionally-regulated enzymes involved in cellular energetics mechanisms of glycolysis, tricarboxylic acid cycle, and oxidative phosphorylation, and to depict the molecular process mediated by our proposed score.

Results

CD34+ cells from MDS show differential gene expression for cellular energetics-related genes

To examine the differential expression of cellular energetics-related genes, we selected 37 genes (Table 1) and normalize their expression values from microarray data for GSE58831 cohort15. The cohort was composed by 159 MDS patients and 17 healthy donors. Nineteen of pre-selected genes were differentially expressed between CD34+ cells from MDS patients and healthy donors (6 downregulated and 13 upregulated; Fig. 1, all P < 0.05).

Table 1 Cellular energetics-related genes selected for the study.
Figure 1
figure 1

Gene expression from glycolysis and tricarboxylic acid cycle elements in CD34+ cells from healthy donors (HD) and myelodysplastic syndromes patients (MDS). A microarray-based gene expression analysis of selected genes for 17 HD and 159 MDS patients for selected genes used in Molecular Based Score (MBS) (A) and for genes differentially expressed between HD and MDS (B). Horizontal lines indicate medians and the P values are indicated. Notes: *P  < 0.05, **P  < 0.01, ***P < 0.001; Mann–Whitney test.

Molecular-Based Score efficiently discriminates MDS patients at differential risk and is associated with clinical and molecular characteristics

To interrogate the prognostic capacity for each selected gene, we dichotomized the gene expression in high- or low-expression according to their receiving operating characteristics (ROC) curve and the C-index. Fifteen genes were associated with prognosis in a univariate analysis, while multivariate analyses identified expression of 5 genes as independent prognostic factors: ACLY (HR: 0.48; 95% CI 0.24–0.96; P = 0.04), ANPEP (HR: 2.16; 95% CI 1.08–4.31; P = 0.02), PANK1 (HR: 0.43; 95% CI 0.19–0.98; P = 0.04), PKM (HR: 2.01; 95% CI 1.02–3.93; P = 0.04), and SLC25A5 (HR: 0.49; 95% CI 0.27–0.99; P = 0.05) (Table 2). The molecular-Based Score (MBS) was calculated by summing 1 for every gene as a risk factor. The MBS varied from 0 to 5 and was stratified as: MBS Favourable-Risk = 0 (MBS-FR; 18% [28/159]); MBS Intermediate-Risk = 1 (MBS-IR; 38% [60/159]) and Adverse-Risk: ≥ 2 (MBS-AR; 44% [71/159]).

Table 2 Genes associated with overall survival in Cox Proportional Hazard Model.

Molecular-Based Score efficiently discriminated patients at different risks groups: MBS-FR (3-year overall survival (OS): 100%; median time [MT]: not reached); MBS-IR (3-year OS: 76% [95% CI 62–93%]; MT: 67.6 months [95% CI 48.3–86.8]) and MBS-AR (3-year OS: 35% [95% CI 17–61%]; MT: 31.7 months [95% CI 21.2–42.1]) (Fig. 2A,B). The univariate HRs for IR versus FR and AR versus IR were 8.99 (95% CI 1.19–68.1; P = 0.02) and 20.1 (95% CI 0.2.71–149; P = 0.003), respectively (Supplemental Fig. 1). After multivariate adjust, MBS-AR was the most significant covariate as measured by the Wald chi-square statistic and was independently associated to inferior OS (HR = 10.1 [95% CI 1.26–81]; P = 0.029) (Fig. 2C,D). We also identified increased age as an independent prognostic covariate in our model (HR = 1.03 [95% CI 1–1.87]; P = 0.034), representing an increment of 3% of risk of death by year of age at diagnosis (Fig. 2D).

Figure 2
figure 2

Survival analyses of Molecular-Based Score (MBS) on overall survival (OS) of myelodysplastic syndrome. (A) Kaplan–Meier curves for the three MBS risk categories. (B) MBS was built based on gene expression of ACLY, ANPEP, PANK1, PKM and SLC25A5. MBS efficiently identify three risk groups. (C) Significance (χ2-statistic) of each covariate for prediction of OS in the multivariate model, in which higher values represents increased predictive capacity; df: degrees of freedom. (D) Forest plot for multivariable analysis identified adverse risk-MBS and age as independent predictors of OS. Hazard ratios (HR) > 1 indicates that increasing values for continuous variable or the first factor for categorical variables has the poorer outcome. HR and their respective 95% confidence interval (95%CI) are indicated with black square and a line, respectively. IPSS-R non-low patients included intermediate, high and very-high patients.

Patients classified as adverse by MBS had significantly decreased platelets counts (median for FR:250 × 103/µL; IR: 157 × 103/µL and AR: 109 × 103/µL; P = 0.001) and absolute neutrophil counts (FR:2.5 × 103/µL; IR: 2.3 × 103/µL and AR: 1.3 × 103/µL; P = 0.003), while presented higher percentages of bone marrow blasts (FR: 2.5%; IR: 3% and AR: 8.5%; P < 0.001). MBS risk categories were differently distributed across World Health Organization (WHO) MDS entities and IPSS-R classification (both P < 0.001). According to recurrently mutated genes, MBS-AR showed lower frequency of mutations in SF3B1 (FR:50%; IR: 32% and AR: 15%; P < 0.001), and higher frequency of mutations in RUNX1 (FR: 0; IR: 2% and AR: 13%; P = 0.03) (Table 3). Collectively, these data suggest a link between MBS and pathophysiology of MDS. MBS Receiving-operating characteristics concordance statistic (ROC C-statistic) was 0.70 (95% CI 0.62–0.78; Table 4), representing a 20% improvement in OS prediction when compared with IPSS-R (Δ-AUC, 0.13; 95%CI 0.02–0.22; P = 0.01). According to IPSS-R risk stratification, MBS retained its prognostic prediction function when analysed in IPSS-R very-low- and low-risk patients (Fig. 3A) and was widely distributed across all risk categories (Fig. 3B). For non-low IPSS-R patients (i.e., intermediate, high, and very-high), MBS-favourable patients presented a distinctive superior outcome (Supplemental Fig. 3). Of note, none of favourable MBS patients classified as non-low IPSS-R deceased, while 4 of 6 low risk IPSS-R classified as adverse by MBS died with median survival of 18.4 months (Supplemental Table 3).

Table 3 Baseline characteristics of patients included for Molecular-Based Score.
Table 4 Overall survival for IPSS-R and molecular based score (MBS).
Figure 3
figure 3

Molecular-Based Score (MBS) prognostic prediction in IPSS-R very-low- and low-risk patients, and distribution across all IPSS-R risk categories. (A) Kaplan–Meier curves of MBS on overall survival (OS) of IPSS-R very-low and low-risk myelodysplastic syndrome. (B) Distribution of MBS across all IPSS-R classification.

Internal validation

Based on the unique characteristics of this cohort, mainly by microarray-based transcriptomic data from CD34+ cells, we decided to internally validate our data using the bootstrap resampling procedure. The bootstrap results are depicted in Table 5, and, for all time-points, the procedure yielded a mean 95%CI virtually identical to its original match. In addition, the pairwise hypothesis test showed a strong significance (P < 0.001) for the difference across the distributions’ means for all comparisons. The procedure showed the stability of MBS prediction for 2- and 3-years OS and reinforce the validity of its prediction in a new, but similar, patient collective.

Table 5 Bootstrap (R = 1000) for 2-years and 3-years OS.

Molecular-Based Score categories are associated with differential gene expression signatures

To further understand the potential mechanisms by which MBS entities regulate hematopoietic progenitor-associated transcriptional programs, we comprehensively compared the transcriptomics signatures among MBS risk categories. Gene set enrichment analysis (GSEA) revealed that increasing MBS risk (i.e. favourable versus (vs) intermediate; favourable vs adverse; and intermediate vs adverse) was consistently characterized by upregulation of genes related to oxidative phosphorylation, upregulation of controllers circuits of the cell cycle progression (e.g. G2M_checkpoint and E2F_Targets), and fatty-acid metabolism (Fig. 4A–C; Supplemental Table 1). For specific comparisons, favourable MBS patients were positively enriched with a transcriptional program of megakaryocytic-erythroid progenitor (MEP)16 and negative enrichment with leukemic stem cell signature17 compared with adverse patients (Fig. 4D). In accordance with the previous observations, favourable patients presented a positive enrichment with mitochondria metabolism18 and downregulated genes in hematopoietic stem cell19 (Fig. 4E). Adverse MBS patients presented negative enrichment with MEP and downregulated genes in leukemic stem cell (Fig. 4F).

Figure 4
figure 4

Molecular-Based Score (MBS) entities are associated with differential transcriptomic programs. (AC) Gene set enrichment analysis (GSEA) with compiled modules from Hallmarks of the molecular signatures database. * indicates a GSEA for Reactome database. Specific comparisons are indicated in the figure. False discovery rate (FDR) < 0.25, normalized enrichment score (NES) >|1.5|. (DF) Representative enrichment graphs from ranked GSEA analysis. Specific comparisons are indicated in the figure. (GI) Volcano plots depicting the extent (x-axis) and significance (y-axis) of differential gene expression for each gene comparing favourable versus (vs) intermediate, favourable vs adverse, and intermediate vs adverse MBS categories, respectively. (J) Heat map summarizing expression of the top 200 differentially expressed genes across MBS entities. Colour intensity represents the by ɀ-score within each row. Expression values for genes represented by multiple probes reflect the median across-array intensity and the gene expression profiles were clustered using the K-means algorithm. Heat map was constructed using Morpheus (https://software.broadinstitute.org/morpheus).

Applying stringent statistical criteria (upregulation: log2 fold change > 1.5; downregulation < -1.5, all P < 0.05), we identified differentially expressed genes (DEG) for the following comparisons: favourable vs intermediate (8 upregulated and 16 downregulated), favourable vs adverse (10 upregulated and 129 downregulated) and intermediate vs adverse (5 upregulated and 42 downregulated) (Fig. 4G-I). Unsupervised hierarchical clustering of transcriptomic data clearly segregated favourable and adverse patients with distinctive DEG signature (Fig. 4J). Taken together, these results suggest that MBS risk categories can efficiently stratify differential transcriptional programs, especially related to cellular energetics and hematopoietic progenitor differentiation.

Discussion

Here, we described a new prognostic scoring system for patients with MDS based on gene-expression of five metabolic enzymes in CD34+ cells, useful to distinguish patients at three risk categories. Regardless of the wide clinical application of IPSS-R20 for risk assessment in MDS, refining its prognostic function with additional clinical information21, flow-cytometry22, or mutations23 has been of great interest, whereas gene expression analysis it has been underexplored for this purpose. Our proposed MBS efficiently discriminate very-low and low IPSS-R in three risk categories, as well as identified a subset of very favourable prognosis among non-low IPSS-R patients. As far as we know, only two gene expression-based risk scores were published for MDS patients4,5, and because both of them have used bulk of bone marrow mononuclear cells, its translation to MDS biology is limited. We have demonstrated that deregulated gene expression in at least two of selected genes is capable to independently predict poorer OS in MDS, with superior prediction capacity than IPSS-R.

The high degree of molecular complexity in MDS represents a challenge to properly define the contribution of all alterations to the pathophysiology of these diseases. Moreover, the majority of MDS biomarkers is still based on mutational profiling24,25. Despite the limitation in implementing molecular investigations in clinical setting, particularly in low- and middle-income countries, several initiatives had efficiently established molecular tests validated for risk assessment for other myeloid neoplasms26,27.

The strong prognostic function of the MBS across the spectrum of MDS entities and risk categories indicates that perturbations caused by driver molecular alterations might result in metabolic reprogramming and that the MBS is capable to efficiently capture these downstream consequences. Based on Molecular-Based Score classification, we were able to identify patients with differential transcriptional programs that reflect an increased mitochondrial respiration capacity, protein synthesis and, molecular signature related to more mature hematopoietic progenitors in MBS favourable- and intermediate-risk comparing with adverse-risk. Stemness-related transcriptional signature is recognized as a relevant predictor of inferior survival in acute myeloid leukaemia26. Moreover, more mature hematopoietic progenitors, such as multipotent and myeloid progenitors, show increased baseline oxygen consumption, mitochondrial ATP production, and respiratory capacity than HSC28. Therefore, is conceivable that high MBS risk patients have CD34+ cells in a more undifferentiated state, related to its reduced mitochondrial respiration capacity and cell cycle progression. As a consequence, this delayed haematopoiesis could result in more severe cytopenia in peripheral blood and accumulation of blasts in the bone marrow.

Using advanced stage MDS patients, it has already been demonstrated that CD34+CD123+ primitive stem cell is responsible for clonal maintenance and expansion. This compartment has distinctive metabolic characteristics, with activation of protein synthesis machinery and increased oxidative phosphorylation, in comparison to CD34+CD123 counterparts13. Conversely, in our study, we demonstrated that lower MBS risk was associated with increased oxidative phosphorylation and protein biosynthesis signatures. We may hypothesize that metabolic reprogramming in CD123+ cells occurs to a different extent for non-advanced stage MDS patients. Indeed, the IL3RA is not differentially expressed among MBS risk categories (Supplemental Fig. 2). As we used transcriptomic from CD34+ bulk cells, the molecular signatures that we observed are probably related to other more frequent subsets of cells. In addition, ectopic expression of SF3B1 mutations in breast cells was associated with disrupted mitochondrial respiration capacity14. SF3B1 mutated MDS is considered as having a good prognosis and was recently proposed as a specific disease subtype29. Favourable MBS-risk was associated with SF3B1 mutation (Table 3) and as having an oxidative phosphorylation signature. Then, we propose that disruption of mitochondrial complex III mediated by mutant SF3B1 could be dependent on the cellular context, and the metabolic consequences of SF3B1 mutations in CD34+ of MDS patients still of major importance.

Ideally, validation of a new prognostic model should determine its capacity in a new data-set scenario. However, external validation is not feasible in most situations. The cohort used in this manuscript shows some unique characteristics, such as: 1) transcriptomic data from microarray of CD34+ cells, 2) and availability of clinical and demographic data, such as survival, gender, haematological parameters and risk stratification, as well as mutation data. To overcome the impossibility of external validation, we considered internal validation using bootstrap resampling method to evaluate both predictive accuracy and to check overfitting. Of note, this procedure is aligned with the best analytical rigor and was widely used in clinical studies with singular characteristics30,31,32,33. Independent external cohorts’ validations and evaluations in the context of response to different therapies would reinforce the clinical relevance of the proposed score.

The proposition of more efficient and less toxic new therapies is dependent on the ability to exploit a specific weakness that is inherited preferentially in the neoplastic stem cell population. The identification of the MBS for MDS patients contributes to the knowledge of disease pathobiology and provides novelty data according to altered cellular metabolism of the MDS-initiating cell.

Methods

Clinical and molecular data

Patients’ features, mutational status and CD34+ cells transcriptome data from 159 MDS patients and 17 healthy donors are publicly available at Gene Expression Omnibus (GEO-NCBI; GSE58831)34. Briefly, classification of MDS was updated at sample collection and made according to World Health Organization criteria35, while risk stratification determined by IPSS-R20. All patients and healthy controls were from Europe and the centres included: Oxford and Bournemouth (UK), Duisburg (Germany), Stockholm (Sweden) and Pavia (Italy). Baseline features for entire cohort are included in Table 3.

Expression of 37 genes that codify to enzymes related to glycolysis, mitochondrial tricarboxylic acid cycle and oxidative phosphorylation transcriptionally regulated and previously listed as a phenotypic modifiers across different cancer types36,37,38 were selected to interrogate its differential gene expression and predictive outcome function (Table 1).

Transcriptomic analysis

Diagnosis CD34+ cells were enriched from mononuclear cells using CD34 MicroBeads (Miltenyi Biotec, Germany). For each sample, total RNA was extracted using TRIZOL (Invitrogen, UK) and 50 ng were amplified and labelled using Two-Cycle cDNA Synthesis and the Two-Cycle Target Labelling and Control Reagent kits (Affymetrix, USA). Ten µg of cRNA was hybridized to Affymetrix GeneChip Human Genome U133 Plus 2.0 arrays (Affymetrix, USA), covering 47 000 transcripts. Normalized gene expression was calculated using a multichip analysis approach39. Mutation data were obtained by targeted gene sequencing, using Illumina Platform, designed to cover 111 genes implicated in myeloid neoplasms pathobiology40.

The quantile normalized gene expression was used for a ranking using limma-voom package at Galaxy (https://usegalaxy.org/) comparing MBS groups (i.e. favourable versus intermediate, favourable versus adverse, and intermediate versus adverse). Pre ranked gene set enrichment analysis (GSEA) was performed using GSEA 4.0.3 software41. The gene sets curated by MSigDB hallmark, reactome, hematopoietic progenitors, mitochondrial, and apoptosis were selected for comparisons. Volcano plots computing differentially expressed across MBS entities were constructed correlating the Log2-adjusted P value and Log2-Fold-Change in GraphPad Prism 8.0 (GraphPad Software, USA). Heat map was constructed to represent top differentially expressed genes in MBS risk groups using the online available tool Morpheus (https://software.broadinstitute.org/morpheus).

Statistical considerations

Descriptive analyses were performed for patient baseline features. Fisher’s exact test or Chi-square test, as appropriate, was used to compare categorical variables. Non-parametric Mann–Whitney test was used to compare continuous variables.

In order to optimize the cut off selection for gene expression, we opted to use “cutpointr” package and automatically determined the critical points for each 37 genes using receiver operating characteristic curve analysis42 and the C-index43 pre-selected for our score (Table 1). After dichotomization, we evaluated the predictive capacity of each gene (Table 2) in a univariate and multivariate way by Proportional Hazard Cox regression analysis using the “Cox_HR” function of “SurvivalAnalysis” package44,45. Genes (n = 11) significantly associated with survival in univariate analysis were individually considered in multivariate analysis using age, gender, and IPSS-R stratification as cofounders. Five genes independently predicted OS and were selected for MBS estimation.

MBS was calculated by computing 1 for every molecular risk factor, e.g. high expression of ANPEP and PKM, and low expression of ACLY, PANK1 and SLC25A5, varying from 0 (summing zero molecular risk factor) to 5 (summing all five molecular risk factor). MBS risk groups were determined by Kaplan-Meyer inspection46, and were defined as MBS-Favourable for patients without molecular risk factor, MBS-Intermediate for patients with one molecular risk factor and as MBS-Adverse with two or more molecular risk factors.

To determine the predictive capacity for MBS, a receiver operating characteristic (ROC) curve and the respective concordance statistics (C-statistics) were performed. The respective area under the curve (AUC) were derived from an R implementation of DeLong’s algorithm47. To determine if MBS predictive capacity is superior to IPSS-R, we calculated differences between AUC (Δ-AUC) as Δ-AUC = AUCMBS − AUCIPSS-R. For this purpose, we performed 10,000 bootstrap resampling procedure and calculated the Δ-AUC for each interaction. Positive values represent that MBS performed better than IPSS-R48.

The bootstrap resampling procedure performed 1,000 resampling of the original cohort and calculated all clinical endpoints in two different time points (2-year, and 3-year) for three MBS-categories (favourable-, intermediate- and adverse-risk MBS). The procedure also estimated their respective 95% confidence interval (CI) computing the bias-corrected and accelerated bootstrap interval.

Proportional hazards (PH) assumption for each continuous variable of interest was tested. Linearity assumption for all continuous variables was examined in logistic and PH models using restricted cubic spline estimates of the relationship between the continuous variable and log relative hazard/risk. All P values were two sided with a significance level of 0.05. All calculations were performed using Stata Statistic/Data Analysis version 12 (Stata Corporation, USA), Statistical Package for Social Sciences 19 (SPSS 19) and R 3.5.2 (The CRAN project, www.r-project.org) software.