Myelodysplastic syndromes (MDS) are clonal stem cell disorders characterized by peripheral blood cytopenia due to ineffective hematopoiesis and by their potential to evolve into acute myeloid leukemia (AML)1. The ineffective hematopoiesis is associated with elevated apoptosis of hematopoietic clones, a phenomenon that may be related to signaling mediated by TRAIL ligands/receptors2, FAS molecules3 and/or myelosuppressive TGFβ4. Excessive apoptosis occurs in the early stages of MDS and declines as MDS progresses toward the leukemic stage. About one third of MDS patients progress to AML, which is probably associated with expansion and evolution of subclones with certain mutations at the stem cell level5,6. The genetic profiles of individual patients are heterogeneous and an individual patient’s unique genetic makeup may impact their clinical phenotype, prognosis, and response to therapy. An MDS patient’s genome frequently harbors somatic mutations which may contribute to disease progression; however, there is no single gene that appears to be sufficient to elicit disease and the majority of genes mutated in MDS are present in fewer than 5% of cases, highlighting the complexity of the disease7,8. These heterogenity of MDS have led to the development of risk-based stratification systems, and treatment options such as hypomethylating agent (HMA) or allogeneic stem cell transplantation (SCT) are selected systematically according to risk groups9,10,11.

The use of HMA, such as azacitidine (AZA) and decitabine, has become a standard treatment for higher-risk MDS patients, as HMA confer a survival benefit over conventional care12,13, but many issues, including determining the optimum dose or duration of treatment, remain to be addressed. Importantly, it is still not clear which patients will respond to HMA or not. A number of predictive genomic markers of the response to HMA treatment have been reported among the frequent genomic aberrations in MDS genomes. For example, mutations in genes involved in DNA methylation (DNMT3A, TET2, IDH1, and IDH2) and genes encoding epigenetic regulations (ASXL1, EZH2, and TET2) have been proposed to predict AZA sensitivity14,15,16, but the clinical relevance of these associations remains controversial17,18,19. A DNA methylation-based scoring system using a suite of 10 genes has been proposed, but while the score was correlated with patient survival, it could not predict clinical responses to demethylating agents20. The expression of a number of other genes, such as BCL2L1021, FAS22, and PI-PLCbeta123, may also be indicative of HMA activity. Moreover, associations between repressed expression of anti- DNMT1 miRNAs and HMA resistance have been noted24. All of these findings have contributed substantially to a better understanding of individual variability in HMA responses among MDS patients; However, mutation, methylation, or expression information involving limited genes are not full enough to understand disease heterogeneity with respect to different response to HMA treatment. In this regard, genome-wide profiling of mRNA expression may facilitate comprehensive prediction of drug sensitivity.

One important usage of HMA is to serve as a bridge to SCT by diminishing marrow blasts to an acceptable level prior to transplantation25. Several retrospective studies have shown that pre-SCT HMA could be a feasible alternative to induction chemotherapy26,27,28 even for patients with excessive blasts or AML29. As the influence of HMA response on transplant outcomes is still controversial, we previously completed a retrospective analysis of 98 patients who received HMA for higher-risk MDS with > 5% marrow blasts30. Our study showed that continued marrow response at the time of SCT was an independent positive predictor of overall and disease-free survival after transplantation. In light of our findings and those of another study in which a positive correlation between marrow clearance and genomic mutations was reported31, we aimed to identify the molecular markers that could predict a patient’s marrow response to HMA.

In this study, we performed RNAseq-based gene expression profiling of advanced MDS patients with excess blasts (EB) prior to their receiving AZA treatment. Comparison of gene expression profiles between 14 responders and 9 non-responders revealed a number of molecular functions (e.g., apoptosis and cellular respiration) were relatively activated in responders. Functional scores of these molecular pathways were also correlated with patient survival in an independent MDS cohort, and the results suggested their potential prognostic value.


Sequencing data from enrolled patients

A total of 23 patients who received AZA for MDS with excess blasts 1 (MDS-EB-1; n = 9) and MDS-EB-2 (n = 14) before SCT were enrolled in the study (Table 1). The non-responder group consisted of 9 cases showing primary resistance, which included 8 cases with disease progression to AML (n = 6) or MDS-EB-2 (n = 2) and 1 case with stable disease without any hematological improvement (SD-HI) after 5 cycles of AZA treatment. The responder group consisted of cases with complete remission (CR) (n = 7) or marrow CR (mCR) with or without any hematological improvement (n = 7). The patients’ bone marrow derived mononuclear cells were subjected to transcriptome sequencing by RNAseq. The sequencing-related information is presented in Supplementary Table 1.

Table 1 Clinicopathological features of the study patients.

Differential-expression-based identification of marker genes and molecular functions

We first selected differentially expressed genes (DEGs) between responders and non-responders (n = 300; P < 0.01, t-test, unadjusted, FPKM and Combat adjusted data). The hierarchical clustering of 300 DEGs was able to segregate responders and non-responders, with the exception of one outlier (Fig. 1a). These findings suggest the baseline mRNA expression of MDS patients may have the potential to predict sensitivity to AZA. The list of DEGs is presented in Supplementary Table 2. To identify the molecular functions associated with the DEGs, we performed Fisher's exact test with Gene Ontology (GO) categories (MSigDB C5 category) and found significant associations in 22 GO categories (Bonferroni corrected P < 0.05). To summarize the identified GO categories by taking into account the redundancy in GO categories, we measured the extent of overlap for all possible pairs of the 22 GO categories identified above. Of note, all possible pairs of the 22 GO categories showed significant overlap between DEGs (P values of Fisher's exact test in the range of 1.2E−25 to 5.4E−55), which suggested the gene–gene co-expression networks were highly interconnected with respect to GO functional categories. The 22 GO categories are illustrated in a network diagram (Fig. 1b). The examination of 22 GO functional annotations in MDS patients indicated that the DEGs between AZA responders and non-responders largely represented 'cellular respiration' and 'mitochondria'. The most significant pairing between GO categories was for the 'electron transport chain' and 'respiratory chain' (P = 5.4E−55; Fisher's exact test) with overlapping DEGs for NDUFS7, COX8A, UBA52, UQCR11, NDUFA2, NDUFB7, NDUFB8, NDUFA7, NDUFB2, COX4I1, NDUFA13, ETFB, NDUFA1, NDUFB9, NDUFA11, and UBC. Oxidative stress and mitochondrial dysfunction have been postulated to play a role in the development of MDS32. Although the association between sensitivity to AZA and mitochondrial dysfunction has not been clearly defined, a number of reports suggest that increased oxidative phosphorylation relative to glycolysis may indicate metabolic plasticity of tumor cells and is associated with the resistance to chemotherapy in colorectal cancers33 and targeted therapeutics in melanomas34.

Figure 1
figure 1

Hierarchical clustering and functional enrichment analysis of DEGs. (a) The 300 DEGs (P < 0.01; t-test, unadjusted) are shown in a heatmap with gene- and sample-wise dendrograms by hierarchical clustering. Non-responders (red; n = 9) are largely segregated from responders (green; n = 14) except for an outlier. (b) Twenty-two GO categories with significant enrichment (Bonferroni corrected P < 0.05; Fisher's exact test) with DEGs are shown as nodes in a network. All the pairs of nodes showed significant overlap of DEGs and were connected by edges in the network. The GO annotations of 22 nodes converge on the function of ‘cellular respiration' and 'mitochondria'. The node size corresponds to the gene number of the corresponding GO categories.

Table 2 GSEA results of molecular functions enriched in AZA responders and non-responders.

Enrichment analyses

To further identify subtle but functionally coordinated mRNA expression changes that cannot be captured by DEG analyses, we performed additional gene set enrichment analyses (GSEA) to identify molecular functions for which the gene members are differentially expressed between AZA responders and non-responders. Table 2 lists the 10 most significant GO categories relatively up-regulated in responders compared to non-responders, as well as 10 GO categories up-regulated in non-responders compared to responders. To summarize the GO categories according to the overlap of gene members, we also calculated the significance of the overlap of gene members and a total of 20 GO categories are shown in a network diagram (Fig. 2a). In the network, a main network composed of eight GO categories largely representing oxidative phosphorylation and the electron transport chain were found to be relatively up-regulated in responders compared to non-responders (green nodes in Fig. 2a), which is consistent with the results of the DEG analyses. In addition, GO categories of 'leukocyte apoptotic process' and 'regulation of transcription elongation' were evident, suggesting that these functions are relatively activated in responders. With respect to GO categories relatively up-regulated in non-responders, 'dendrite development/morphogenesis' (3 GO categories) and 'Wnt protein binding/brain morphogenesis' (3 GO categories) were found to be major functional categories (red nodes in Fig. 2a).

Figure 2
figure 2

GSEA analysis. (a) The top 20 significant gene sets relatively up- and down-regulated (green and red, n = 10 and 10, respectively) in responders compared to non-responders are shown as nodes in a network. The node size is proportional to the number of genes in the gene set. Edges in the network represent the significant overlap of the leading edge genes between two nodes. (b) An enrichment plot of the leukocyte apoptotic process is shown as a snapshot of the GSEA analysis. A heatmap showing the expression of genes belonging to the set is shown below with the annotation of genes ordered. Yellow indicates the leading-edge gene subsets. (c) A similar plot of the gene set of Wnt protein binding.

Two examples of GO categories ('leukocyte apoptosis process' and ‘Wnt protein binding’, which are up- and down-regulated, respectively, in responders) are selected to show their enrichment plots as well as their expression heat maps and leading-edge genes (Fig. 2b,c), respectively. Ineffective hematopoiesis is one of cardinal features of MDS, and is thought to be due to increased apoptosis of myeloid progenitors; the acquisition of proliferative capacity by clonal progenitors has been considered the key event in the progression of MDS to acute myeloid leukemia8. Thus, it is reasonable to assume that elevated expression of apoptosis-related genes at baseline may predict a favorable prognosis and responsiveness to AZA. In our study, we found no significant differences in FAS expression between responders and non-responders (Fig. 2b; P = 0.48; t-test, unadjusted), or between BCL2L10 and PLCB1 expression (P = 0.69 and 0.42; t-test, unadjusted), both of which have previously been proposed as expression markers of AZA sensitivity21,23. In our study, only two genes, BAX and IFNG, had significantly different expressions between the responders and non-responders (P = 0.03 and 0.04, respectively; t-test, unadjusted). Wnt/β-catenin signaling has been presumed to be involved in the ineffective hematopoiesis of patients with myeloid neoplasms with 5q deletions, and has also been presumed to have therapeutic implications in MDS through inhibition of this signaling35. Our results further suggest that the mRNA expression levels in Wnt/β-catenin pathway genes including a number of frizzled receptors may indicate the responsiveness to AZA with similar implications in MDS-derived stromal cells35. It is postulated that up-regulated expression of Wnt pathway genes at baseline in non-responders encodes cellular resistance to AZA treatment. Other GO categories that showed substantial differences in expression with respect to HMA responses are shown in Supplementary Fig. 1, in which enrichment plots and heatmaps of two metabolic functions (‘Nucleoside monophosphate metabolic process’ and ‘Oxidoreductase activity acting on NADPH’) are demonstrated.

In addition to assessing DEG and GSEA, we further employed a DeMAND network-based system biological approach36 to identify the genes related to the mechanism of action (MoA) of AZA. In this process, the level of perturbation was measured for individual genes and their interacting partners in regulatory networks by comparing the gene expression profiles of responders and non-responders following drug perturbation. The 20 most significant candidate MoA genes are listed in Table 3. The modeling revealed UBC and PFDN2 as candidate MoA genes whose protein interactions with interacting partners were the most perturbed in the context of the regulatory network.

Table 3 MoA genes of AZA responder and non-responder revealed by DeMAND analysis.

Validation of azacitidine sensitivity signatures in independent cohorts

To confirm the impact of the identified gene markers on prognosis and HMA response, we applied the gene set-level scores (i.e., the mean expression of leading edge genes of 20 GO terms in Table 2) in independent data sets. We obtained publicly available gene expression profiles of 123 MDS patients with overall survival (GSE58831)37 and 32 patients with AZA treatment history (13 responders and 19 non-responders; GSE77750)38,39. For survival (GSE58831), MDS patients whose gene expression profiles resembled those of AZA responders in our study (‘responder-like’) showed better survival (green in Fig. 3a,b) compared to those whose gene expression resembled non-responders (‘non-responder-like’, red in Fig. 3a,b; log-rank test P = 0.017). This result suggests that the baseline expression profile from AZA responders may be associated with a favorable prognosis in MDS patients highlighting the prognostic implication of the identified genes in our study. For AZA responsiveness (GSE77750), the clustering using identified markers segregated the 32 MDS patients into ‘responder-like’ and ‘non-responder-like’ cases with the modest level of prediction accuracy 0.687 (Fig. 3c). Thus, our gene markers may have both prognostic and predictive implication with the AZA treatment for MDS patients.

Figure 3
figure 3

Survival analysis of independent MDS cohorts. (a) Clustering of marker gene expression by gene-set level scores segregated the 123 MDS patients (GSE58831) is shown with a heatmap. Column-wise green/red bars below indicate the patients whose expression profiles resemble those of azacitidine responders/non-responders, respectively (‘responder-like’ and ‘non-responder-like’) (b) MDS patients segregated according to expression profiles showed significant survival differences (P = 0.017, log-rank test). The green and red lines indicate the patients whose scores resemble those of azatidicine responders and non-responders (green and red bars below in Fig. 3a), respectively. (c) Clustering of marker gene expression segregated the 32 MDS patients (GSE77750) into responder-like and non-responder-like groups. Responder-like patients were enriched with AZA responders (green, top) while non-responder-like patients were enriched with non-responders (here, progressed/red and stable/orange were considered as AZA non-responders).

We also tested MoA genes in an independent cohort (GSE58831) to test the prognostic values of the markers. Although not significant, the expression of MoA genes was able to segregate the MDS patients with a substantial difference in overall survival suggesting that the MoA genes, in spite of their small size, may be utilized for prognostic markers (Supplementary Fig. 2a,b).


Although HMA response in MDS is known to be associated with better survival both in transplant and non-transplant settings12,30,40, it has not been possible to clearly identify which patients will respond to drug therapy. In this study, we performed next-generation transcriptome sequencing to identify gene-expression-based predictive markers that can discriminate responders from non-responders. For prediction, we first identified a set of DEGs that can segregate the AZA responders and non-responders. The DEGs identified in this study were also able to group the patients from an independent MDS cohort into subgroups of distinguished drug response and overall survival, indicating the selected genes may potentially identify AZA responders and MDS patients with favorable or unfavorable prognosis. Given our study was based on baseline gene expression data prior to therapy, our findings may potentially support a clinical decision to select AZA as a bridging treatment for MDS patients with excess blasts.

The expression of previously proposed single genetic markers of favorable treatment outcomes, such as FAS, BCL2L10, and PI-PLCbeta1 were not able to segregate AZA responders and non-responders in our study. One possible explanation for this discrepancy is that our study only included MDS-EB patients and the definition of a responder in this study was confined to bone marrow response and did not include either partial response or stable disease with hematological improvement. The identification of MDS at the level of single-gene markers is complicated by tumor heterogeneity: in this regard, it has been proposed that multi-gene signatures or gene-set-level classifiers may be more clinically relevant than single-gene markers. Although the repressed expression of the apoptosis-related molecule FAS by aberrant DNA methylation may be indicative of AZA sensitivity, the baseline expression of FAS was not significantly different between responders and non-responders in this study. However, we note that ‘leukocyte apoptosis process’ was identified as one of the significant GO terms showing differential expression in a set of genes in our study. Among the genes belonging to the GSEA top 20 gene set list, we identified two genes, BAX and IFNG belonging to the leukocyte apoptosis process has only significant differential expression (P < 0.05, t-test, unadjusted, Fig. 2b) and are up-regulated in responders compare with non-responders. The pro-apoptotic protein encoded by BAX is transcriptionally activated in MDS patients with more favorable survival outcomes41 and IFNG is known to trigger apoptosis in undifferentiated progenitor cells such as hematopoietic stem cells42. Although the potential roles of these genes in the context of marrow response to AZA require further validation in an independent cohort, the molecular pathways and marker genes identified in our functional-gene-set-level differential analyses may have clinical utility. Supplementary Fig. 3 shows the level of differential expression of BAX and IFNG were modest, e.g., ranked 429th and 442nd in the differential expression-ordered list of total genes, respectively. In addition, 'Wnt protein binding' was also identified as one of the molecular functions relatively up-regulated in non-responders at baseline. The Wnt pathway represents one of the key molecular pathways in carcinogenic processes43 and we assume that this pathway may provide survival signals to the tumor clones. In the survival analyses, we found that the gene set level of 20 GO terms can segregate the patients according to survival outcomes. Notably, gene expression profiles characteristic of AZA marrow responders identified in this study were significantly linked to better survival in an independent MDS cohort, which suggests that the genes identified in our study may have potential clinical utility as general prognostic markers. However, we acknowledge that the results must be interpreted with a caution given the substantial differences between our cohort and validation cohort regarding examined cells, patients’ characteristics and clinical outcomes of interests. We examined mononuclear cells in this study whereas CD34 + cells were used in two validation cohorts (GSE58831, GSE77750), although recent study by Shiozawa et al. showed the concordance of expression levels for selected genes between bone marrow CD34 + cells and mononuclear cells from MDS patients44. In addition, advanced MDS patients with excess blasts who were treated with HMA were only included in our study, while all subtypes of MDS patients with no information on treatment (GSE58831) or AZA treated AML patients, not only MDS patients (GSE77750) were analyzed in validation cohorts. Actually, when we selected EB patients only in validation cohort (GSE58831), the prognostic impact according to the expression of marker genes was not significant (Supplementary Fig. 2c,d), probably due to the small sample size. Based on these discrepancies between the studies along with the small sample size, our results should be explored in a further investigation.

The DeMAND algorithm was employed as a systems biology approach to identify network-level perturbation and to reveal key MoA genes. Through this process, UBC and PFDN2 were designated as potential MoA proteins whose interactions were substantially perturbed by AZA treatment. Of interest, two genes have roles in protein metabolism such as ubiquitination (UBC) and as protein chaperone (PFDN2). It has been previously reported that ubiquitination is implicated in histone modification45 and the ubiquitin–proteasome pathway is associated with the stability of the DNMT1 protein, findings which indicate a potential molecular link between ubiquitination and the demethylating agent AZA46. Therefore, we may assume that UBC may have a role in the regulation of gene expression associated with AZA treatment.

Taken together, the markers for the AZA sensitivity identified in this study may predict drug responsiveness prior to AZA treatment and may have general prognostic implications.


Patient selection

To identify molecular markers in MDS genomes associated with marrow responses following pre-transplant bridging treatment with AZA patients who received the standard schedule of AZA (75 mg/m2/day for 7 consecutive days) for MDS with > 5% marrow blasts and cases showing extremes of treatment response (marrow complete remission versus primary treatment failure) were screened. Response to treatment was assessed using the modified International Working Group response criteria47. Cases achieving CR or mCR with or without hematological improvement were categorized as responders, while non-responders consisted of those who experienced primary treatment failure defined either by primary disease progression or stable disease without hematological improvement (SD-HI)48. Minimum 4 cycles of AZA was administered before response assessment of SD-HI, whereas assessment of CR, mCR or disease progression was allowed to assess even before the 4th cycles. The patients did not receive previous treatment before AZA for their MDS with excess blasts. Patients with bone marrow samples available for research purposes were enrolled in the final study population. This study was approved by the Institutional Review Board of the Seoul St. Mary’s Hematology Hospital at the Catholic University of Korea, and complied with the tenets of the Declaration of Helsinki. Informed consent was obtained from all subjects.

mRNA sequencing

Bone marrow samples (10 mL) were mixed with 0.3 mL of heparin to prevent coagulation, then diluted with 20 mL of phosphate‐buffered saline (PBS, Welgene, Daegu, Korea). The cells were then fractionated on a Lymphoprep density gradient (Axis‐Shield, Oslo, Norway) through centrifugation at 600 g for 10 min. Interface mononuclear cells were isolated and washed with PBS. An erythrocyte (RBC) lysis buffer (0.154 M NH4Cl, 10 mM KHCO3, 0.1 mM EDTA; Sigma Aldrich, St. Louis, MO) was then added to destroy contaminating RBCs. The RNA was isolated using the standard Tri-Reagent1(Sigma Chemicals) protocol. Preparation of mRNA libraries was performed using an illumina (REF. RS-122-2101 ~ 2) TruSeq stranded kit as manufacturer’s recommendation. The sequencing was performed using the NextSeq500 platform (illumina, REF. SY-415-1001). Paired-end 75 bp sequencing reads were generated and the sequencing information is presented in Supplementary Table 1. Raw sequencing reads in FASTQ files were mapped and aligned by TopHat (version.2.1.1)49. Transcript-level alignment was completed using Cufflinks50 using annotated transcripts of hg19 GTF (UCSC, TCGA.hg19.June2011.gaf). The expression of each gene was represented in terms of fragments per kilobase million (FPKM) and used for the subsequent analyses. RNA sequencing data have been submitted to the Sequence Read Archive (SRA) under accession number of PRJNA650236.


Gene set enrichment analysis51 (GSEA, version 2.0, was used to identify differential expression of genes associated with specific molecular functions between drug responders and non-responders52. For molecular functions, we used Gene Ontology gene sets as available in MSigDB (MSigDB, C5:GO terms, version 6.0).