Abstract
Subtyping of acute myeloid leukaemia (AML) is predominantly based on recurrent genetic abnormalities, but recent literature indicates that transcriptomic phenotyping holds immense potential to further refine AML classification. Here we integrated five AML transcriptomic datasets with corresponding genetic information to provide an overview (n = 1224) of the transcriptomic AML landscape. Consensus clustering identified 17 robust patient clusters which improved identification of CEBPA-mutated patients with favourable outcomes, and uncovered transcriptomic subtypes for KMT2A rearrangements (2), NPM1 mutations (5), and AML with myelodysplasia-related changes (AML-MRC) (5). Transcriptomic subtypes of KMT2A, NPM1 and AML-MRC showed distinct mutational profiles, cell type differentiation arrests and immune properties, suggesting differences in underlying disease biology. Moreover, our transcriptomic clusters show differences in ex-vivo drug responses, even when corrected for differentiation arrest and superiorly capture differences in drug response compared to genetic classification. In conclusion, our findings underscore the importance of transcriptomics in AML subtyping and offer a basis for future research and personalised treatment strategies. Our transcriptomic compendium is publicly available and we supply an R package to project clusters to new transcriptomic studies.
Similar content being viewed by others
Introduction
In acute myeloid leukaemia (AML), recurrent genetic abnormalities (RGA) have been identified through systematic genomic studies [1,2,3,4,5] Based on these RGAs, the World Health Organization (WHO 2022) and International Consensus Classification (ICC 2022) define several AML subtypes, as well as a heterogeneous subtype of AML with myelodysplasia-related changes (AML-MRC) [6, 7]. RGAs are essential for risk-stratification and are increasingly targeted with drugs [8, 9].
AML subclassification is genetics-based, but transcriptomics holds immense potential to refine AML classification further [1,2,3, 10,11,12]. Transcriptomic studies have led to the discovery of CEBPA-mutated AML [13, 14], and NPM1-mutated AML subtypes with different cell differentiation arrests and ex-vivo drug responses [15, 16]. Similar stratification would be beneficial for AML-MRC, given its heterogeneity [17, 18]. Still, a comprehensive examination of AML subtypes defined by gene expression has yet to be performed. Furthermore, the differentiation arrest state is known to modify drug response in AML [19], and failing to account for this effect when comparing drug responses could skew conclusions.
Therefore, we integrated five mRNAseq datasets with corresponding genetic aberration data and annotated cases according to WHO and ICC 2022 standards. We outline AML’s transcriptomic landscape and define transcriptional subtypes with distinct gene expressions, genetic aberrations, and cell type arrests. We relate the clusters to ex-vivo drug responses independently of differentiation arrest and show how they superiorly capture differences in response compared to genetic classification. We provide all harmonised data and a transcriptional cluster predictor for future research. Our study underscores the importance of incorporating transcriptomic data in AML classification.
Methods
Transcriptomic data
We acquired transcriptomics data of primary AML patients from blood or bone marrow from BEAT [3, 20] (n = 425), TARGET [2] (n = 145), TCGA [1] (n = 150), and Leucegene [11, 21,22,23] (n = 399), and our in-house LUMC [24] dataset (n = 95). Data statements and methods for transcriptome sequencing are available in the referenced studies.
We acquired quantified gene expression for BEAT, TARGET and TCGA from https://portal.gdc.cancer.gov (release 36) and implemented the same pipeline for Leucegene and LUMC to harmonise quantification. In short, FASTQ files were aligned and quantified with STAR [25] to the GRCh38 reference genome [26], using Gencode v36 [27] as the gene annotation index which included 60 600 genes.
Gene expression count data were corrected with Combat-Seq [28] for the variables “cohort”, “sex”, and “tissue”. We split Leucegene for the batch correction into Leucegene_stranded and Leucegene_unstranded, since different sequencing libraries were used. We removed 8057 genes that were not detected in all cohorts, leaving 52,603 genes. Finally, we removed genes detected in less than 200 samples or with less than 300 counts leaving 41 862 genes for our final dataset. We normalised the corrected count data using the geometric mean and variance stabilising transformation (VST) [29] and quantified the remaining cohort-specific variation using kBET [30].
Genetic and patient data
We acquired genetic data for the samples from the referenced studies in the form of mutation and fusion calling, and cytogenetics data and clinical data on sex, age, blast percentage, and survival. Blacklisted fusions as reported by Arriba [31] were removed from the fusion calling data.
We harmonised the data by standardising the annotation of gene names, fusion genes, and karyotyping. Using genetic data, we subclassified samples according to the WHO 2022 [6] and ICC 2022 [7]. Samples for which we found no RGA and all genetic data available were annotated as “No RGA found”. We classified samples with missing data and “No RGA found” as “Inconclusive”.
Clustering
We employed consensus clustering [32, 33] on the batch-adjusted gene expression. First, we created a weighted nearest neighbour graph [34] using the 2000 most variable genes (MVGs). MVGs were selected via the median absolute deviation from samples with a blast percentage over 70% to minimise tumour microenvironment effects. Using the Leiden algorithm [35] – with seed and resolution varied per iteration – we generated 300 cluster assignments from the graph for each n_clusters ranging from 10 to 20, totalling 3300 assignments.
From these 3300 assignments we created a consensus matrix with values ranging between 0 and 1 based on pairwise co-clustering. We then converted this matrix into a distance matrix (1 - consensus matrix) and conducted Ward.D2 hierarchical clustering. The final cluster count was determined based on the individual separation of WHO classes and clusters displaying differential traits.
Cluster stability
To evaluate per sample clustering stability, we devised a stability score. We constructed a consensus matrix for each n_clusters (300 assignments) and subtracted each co-clustering value from 1 if it was below 0.5. Then, we took the mean of all values per sample as the stability score, which ranged from 0.5 to 1, with higher scores indicating less clustering ambiguity. To investigate correlation between cluster stability and blast percentage we performed a Spearman correlation test. Additionally, we generated tSNEs using 100 to 2500 MVGs to visually assess cluster stability.
Cluster prediction
To predict cluster assignments we trained a one-vs-rest SVM per cluster. As input we used the uncorrected gene expression of the 2000 MVGs used for clustering. To select hyperparameters and evaluate performance we utilised 5 × 5 nested cross-fold validation.
To improve predictions we included a reject option using a minimum distance to the decision boundary. We determined this distance by looping over possible minimum values for the predictions of the inner fold. We selected the minimum value with the highest Kappa for the accepted inner fold samples and an accuracy <0.5 for the rejected inner folds samples.
The final model was trained on the whole dataset, using 5-fold cross-validation to select hyperparameters and the minimum decision boundary distance.
Differential gene expression analysis
Differential gene expression analysis between the clusters was performed using DESeq2 [29] using the corrected gene counts. We performed one-versus-rest analyses to identify differentially expressed genes in one cluster compared to all others. We annotated genes as transcription factors or coding for cell surface proteins using public databases [36, 37].
Aberration enrichment analysis
To test if aberrations occurred more in a cluster than in others we first removed aberrations found in only one cohort or which occurred in less than 1% of the samples. We also included high MECOM expression in the analysis (VST expression >6, based on the tail of a MECOM expression density plot). We tested for enrichment per aberration by performing an one-sided Fisher exact test for one cluster versus all others and adjusted p-values using the Benjamini–Hochberg (BH) procedure. We considered aberrations with a false discovery ratio (FDR) < 0.05 enriched.
Survival analysis
We performed survival analysis using right censored overall survival data by generating Kaplan–Meier (KM) curves on BEAT and TCGA survival data, comparing different groups of patients with the log rank test. We also performed Cox-regression using BEAT, TARGET and TCGA survival data for different patient groups and included cohort, sex and age as co-variables to analyse hazard ratios.
Expression based score
We created cell type scores to assess the differentiation arrest of AML samples, using the mean expression of 30 marker genes for six haematological cells [38]. Additionally, we created immune phenotype scores for cytolytic infiltration and HLA I and HLA II antigen presenting cells using the mean expression of marker genes [39].
Drug response analysis
To analyse drug response differences, we used ex-vivo drug response data of 331 BEAT [3] samples, quantified as area under the curve (AUC). We excluded drugs with less than 200 samples or missing data for any cluster, leaving 103 of the 123 drugs. We used a Kruskal–Wallis test for each drug with the AUC as response and clusters as groups to compare the average drug response per cluster. Drugs with a significant difference (FDR < 0.05) were analysed with one-sided Wilcoxon tests to identify clusters with low AUCs. Additionally, we performed a Kruskal–Wallis test for each drug with the ICC 2022 diagnosis as groups, to compare with clusters as groups.
Multivariate linear models (LM) were evaluated per drug to test if clusters were sensitive to a drug when adjusted for cell type, with AUC as response and cluster membership (one-versus-rest) and the six cell type scores as explaining variables. Similarly, we fitted LMs but with cluster membership and ICC 2022 diagnoses as variables. We considered clusters sensitive to a drug if the cluster membership’s FDR was below 0.05 and the LM coefficient was negative. All p-values were corrected using BH.
Results
Multi-cohort AML gene expression compendium
We collected 1224 RNAseq samples from adult (BEAT, TARGET, TCGA, Leucegene, LUMC) and paediatric (TARGET) cohorts with corresponding genetic and clinical data (Fig. 1A). We quantified gene expression with the same pipeline and corrected counts for cohort, sex and source tissue (Supplementary Fig. 1). Sample classification by their genetic data according to the WHO (Fig. 1C) and ICC was successful for 97% of the samples. In line with previous reports, frequencies of the AML subtypes were similar for the adult cohorts but different between paediatric and adult cohorts (Supplementary Table 1), confirming that our dataset is representative of the AML landscape [5, 40].
Transcriptomics define 17 AML clusters
Next, we assigned AML cases to 17 transcriptional clusters using consensus clustering (Supplementary Figs. 2, 3). We named the clusters based on genetic diagnoses (Fig. 1B, Supplemental Figs. 4, 5). As expected, the distribution over the clusters was different for paediatric and adult cohorts, exemplified by the large percentage of paediatric samples in the KMT2AT clusters (26%), and adult samples in the NPM1T (93%) and AML-MRCT (94%) clusters (Supplementary Table 1, Supplementary Fig. 6). However, samples of identical AML genetic subtypes from adult and paediatric cohorts clustered together, indicating that the 17 clusters capture differences in AML biology.
We examined clustering robustness using the stability score (Supplementary Fig. 7). Median clustering stability was high (0.97–1.00), with AML-MRC(3)T showing the lowest stability. A correlation test revealed a significant but weak correlation (rho = 0.18, p-value < 0.001) between blast percentage and clustering robustness, but blast percentage varied greatly in clusters. tSNEs generated using different MVGs (Supplementary Fig. 8) were stable from 500 to 2500 features. These results show that clustering was robust and only weakly influenced by blast percentage.
We developed a transcriptional cluster predictor using uncorrected counts as input (accuracy = 0.90), demonstrating the persistence of expression patterns. The quality of the predictor was further improved (accuracy = 0.95) by including a reject option (10% rejected) (Supplementary Fig. 9).
Next, we tested for enrichment of mutations, fusions and cytogenetic aberrations (n = 102) (Fig. 1E, p-values and frequencies in Supplementary Table 2). Four transcriptomic clusters corresponded to singular genetic AML subtypes: RUNX1::RUNX1T1T (RUNX1::RUNX1T1: 94%, FDR < 0.001), CBFB::MYH11T (CBFB::MYH11: 95%, FDR < 0.001), PML::RARAT (PML::RARA: 100%, FDR < 0.001), and NUP98T (NUP98::NSD1: 45%, FDR < 0.001). Risk-stratification for survival based on transcriptional subtypes performed similarly to genetics (Supplementary Figs. 10 and 11). We identified no enrichment for BCR::ABL1 and DEK::NUP214, possibly due to their limited occurrence. For KMT2A rearrangements, CEBPA mutations, NPM1 mutations, and AML-MRC, we found evidence that transcriptomics can refine subtyping, as described below.
Transcriptome analysis identifies two KMT2A-related clusters
The WHO classification defines a single KMT2A-rearranged subtype (KMT2A-r), while the ICC recognises KMT2A::MLLT3 and other KMT2A fusions as distinct [6, 7]. We identified two KMT2A fusion clusters. KMT2A(1)T was significantly enriched for KMT2A::MLLT3 (31%, FDR < 0.001), KMT2A::MLLT10 (19%, FDR < 0.001) and any KMT2A fusion (67%, FDR < 0.001), while KMT2A(2)T was enriched for KMT2A::MLLT4 (67%, FDR < 0.001) and high MECOM expression (80%, FDR < 0.001) (Fig. 2A, B). Interestingly, we found cases with NPM1 mutations and trisomy 8/8q localised in KMT2A(1)T, indicating that these lead to KMT2A fusion-like gene expression.
The genes LAMP5, and ADCY9 showed high expression in KMT2A(1)T and low expression in KMT2A(2)T (Fig. 2B), and all have been shown to contribute to AML pathogenesis [41, 42]. Additionally, the transcription factor (TF) ETV2 was highly expressed in KMT2A(1)T, while the TF ERG displayed high expression in KMT2A(2)T. The cell type scores revealed KMT2A(1)T to have a significantly higher promonocyte-like score (FDR < 0.001), while KMT2A(2)T was more hematopoietic stem cell (HSC)-like (FDR < 0.001) (Fig. 2B, Supplementary Fig. 12). FAB annotations showed similar results for KMT2A(1)T, which had a high M5 (monocytic leukaemia) fraction (90%), while KMT2A(2)T was more mixed (Fig. 2C). Overall, we found that gene expression-based separation of KMT2A-r did not align with the ICC 2022 classification.
The CEBPAT cluster indicates a favourable prognosis
As acknowledged in the ELN2022, patients with a CEBPA bZIP inframe mutation have a favourable prognosis [8, 43]. We identified a transcriptional CEBPAT cluster significantly enriched for mutated CEBPA cases (72%, FDR < 0.001), with 42% of the samples having a CEBPA bZIP indel, either as single mutation or combined with an N-terminal frameshift mutation (Fig. 3A, B). The remaining samples contained other mutations in the bZIP area or N-terminal region or had no detectable CEBPA mutation. Of note, a single CEBPA bZIP indel case resided outside the CEBPAT cluster. This patient had an IDH-R132 mutation with a VAF = 0.47, while the CEBPA bZIP in-frame mutation had a VAF = 0.21. This finding suggests that the IDH-R132 mutation dominates the expression pattern, placing this case in cluster AML-MRC(1)T. Conversely, all CEBPAT cluster patients showed similar favourable outcomes (log-rank test: p-value = 0.80), irrespective of whether the CEBPA bZIP inframe mutation was detected (Fig. 3C). The CEBPAT cluster thus marks patients with a favourable outcome regardless of CEBPA mutation detection, which the CEBPAT expression profile can detect.
Gene expression profiling identifies five transcriptional NPM1-related clusters
The 2022 WHO and ICC classifications include one subtype of NPM1-mutated AML [6, 7]. However, we identified five clusters enriched for mutated NPM1 (Fig. 4). We observed elevated expression of HOXA3, HOXB5, and MEIS1 (Fig. 4A), which has been earlier associated with NPM1 mutations [44]. Interestingly, NPM1 mutation-lacking samples generally also exhibited high expression of these genes, suggesting that there are alternative mishaps that disrupt these genes leading to NPM1 mutated-like AML.
NPM1(1)T exhibited the highest percentage (95%, FDR < 0.001) of NPM1 mutated samples and was significantly enriched for IDH1-R132 (25%, FDR < 0.001), IDH2-R140 (37%, FDR < 0.001), and TET2 (33%, FDR < 0.001) co-mutations (Fig. 4A). NPM1(2)T samples were enriched for FLT3-ITD mutations (84%, FDR < 0.001), but FLT3-ITD was also enriched in NPM1(1)T, NPM1(3)T and NPM1(4)T (42–43%, all FDR < 0.001). Additionally, NPM1(4)T and NPM1(5)T had a significantly lower variant allele frequency for mutated NPM1 (Fig. 4B). We found two NPM1::MLF1 cases in our compendium, which both clustered in NPM1(3)T. NPM1::MLF1 has been shown to localise in the cytoplasm [45], like mutated NPM1, possibly leading to a similar expression profile as NPM1-mutated cases.
Each of the NPM1-related clusters exhibited unique marker genes (Fig. 4A). For instance, FTO expression was high in NPM1(1)T. Additionally, LYRM1, ADAM8, and DNAJC13 were elevated in NPM1(2)T, NPM1(4)T, and NPM1(5)T, respectively. NPM1(3)T had a less distinct expression pattern, suggesting a more heterogeneous cluster. Also, we observed differential expression of TFs (RUNX1, PRDM16, SPI1) [46, 47] – even in samples lacking the NPM1 mutation – and TF expression aligned with cell differentiation stages.
NPM1(1)T and NPM1(2)T displayed a HSC-like expression pattern, NPM1(3)T was mixed, whereas NPM1(4)T and NPM1(5)T were more differentiated (Fig. 4A, Supplementary Fig. 12). FAB annotations showed additional differences, with NPM1(5)T containing fewer M4 (myelomonocytic leukaemia) but more M5 (monocytic leukaemia) cases than NPM1(4)T (Fig. 4C,D). Using scores for HLA I and HLA II antigen-presenting cells [39] we found NPM1(1)T to have significantly lower HLA I (FDR < 0.001) and HLA II (FDR < 0.001) scores than the other clusters (Fig. 4E,F). NPM1(1)T and NPM1(5)T patients were significantly older (FDR < 0.05), while NPM1(3)T patients were younger (FDR < 0.01) (Supplementary Fig. 6F). Our findings emphasise the existence of distinct NPM1-related subsets, highlighting the limitations of relying solely on genetic classification.
Gene expression profiling identifies five transcriptional AML-MRC-related clusters
The ICC 2022 divides AML-MRC into three groups based on TP53 mutations, myelodysplasia-related gene mutations, and cytogenetic abnormalities [7]. Our study identified five gene expression-based AML-MRC related clusters (Fig. 5), with varying fractions of TP53 mutations, MRC gene mutations, and cytogenetic abnormalities.
Despite sharing these mishaps, each cluster had unique characteristics (Fig. 5A). AML-MRC(1)T was characterised by IDH1-R132 (49%, FDR < 0.001) and IDH2-R170 (27%, FDR < 0.001) mutations. A DNMT3A and IDH1/2 mutated subtype has been reported [5], but 41% of the AML-MRC(1)T cases lacked DNMT3A mutations. AML-MRC(2)T, AML-MRC(3)T, and AML-MRC(4)T were all enriched (FDR < 0.001) for TP53 mutations, cytogenetic abnormalities and high MECOM expression, and AML-MRC(3)T also contained a large fraction of mutated MRC genes (65%, FDR < 0.001). AML-MRC(5)T stood out with the highest fraction of mutated MRC genes cases (81%, FDR < 0.001) and the lowest fraction of TP53 mutations (6%) and cytogenetic abnormalities (34%).
We found marker genes for all clusters (Fig. 5A). For instance, high SRSF12 marked AML-MRC(1)T, and LINC00865 marked AML-MRC(5)T. AML-MRC(2)T presented high glycophorin genes and UROD expression, suggesting an association with acute erythroid leukaemia [48,49,50]. Distinct cell differentiation scores further highlighted differences (Fig. 5A–C, Supplementary Fig. 12). For example, AML-MRC(1)T showed high progenitor-like scores, with 65% M1 (minimal maturation) cases, and AML-MRC(3)T showed a more differentiated pattern, with 54% M2 (significant maturation) cases. AML-MRC(2)T was the only cluster with M6 (erythroid leukaemia) – in line with high expression of erythrocyte cell markers – and M7 (megakaryocytic leukaemia) cases. Additionally, high cytolytic cell infiltration has been reported for AML-MRC cases [39]. Using the same score (Fig. 5D) we found that the cytolytic infiltration was significantly (FDR < 0.05) lower for AML-MRC(1)T and higher for AML-MRC(2)T compared to the other AML-MRC clusters. Our results demonstrate that different AML-MRC transcriptomic clusters can be identified, showing genetic enrichments that do not necessarily align with the ICC 2022 classification.
AML clusters exhibit cell type-independent differences in ex-vivo drug responses
Finally, we assessed the drug sensitivity of the transcriptional AML subtypes. Using ex-vivo drug response data, we discovered 101 drug-cluster combinations with significantly lower resistance (FDR < 0.05), of which 21 combinations remained statistically significant (FDR < 0.05) when adjusting for cell differentiation status (Fig. 6, Supplementary Fig. 13, Supplementary Table 3).
The ex-vivo drug responses between NPM1-related clusters were often divergent, exemplified by venetoclax and selumetinib (Fig. 6A, B). NPM1(1)T, NPM1(2)T and NPM1(3)T mostly responded positively to tyrosine kinase inhibitors and CDK kinase inhibitors. NPM1(4)T and NPM1(5)T samples were more sensitive to PI3K and MAPK kinase inhibitors. We also found drugs where only one cluster was responsive, exemplified by axitinib for NPM1(4)T where this effect remained significant (FDR < 0.05) when controlled for cell type scores (Fig. 6C).
Several drugs demonstrated favourable ex-vivo responses in KMT2A(1)T compared to the other clusters, exemplified by idelalisib. For KMT2A(2)T we found no significant responsive drugs, but testing was limited due to small cluster size. For the AML-MRC clusters, most drugs showed strong resistance. Still, specific drugs were more effective for AML-MRC(1)T, AML-MRC(4)T, and AML-MRC(5)T (Fig. 6A, B), suggesting potential for targeted treatments in this diverse, high-risk patient group.
Next, we examined if transcriptional clusters provide insights beyond genetic classifications. Comparing the AUCs of each drug between groups, we found 71 drugs with significantly different (FDR < 0.05) median AUCs between the clusters, while only 21 drugs were significantly different between ICC 2022 classes (Fig. 6D). Additionally, 57 of the 101 cluster-drug combinations remained significant (FDR < 0.05) when cluster membership and ICC 2022 diagnosis were included in a LM, suggesting that the transcriptional clusters offer information beyond genetic classification.
Overall, our findings offer novel opportunities for targeted therapy in AML. We observed effective drug responses even after adjusting for differentiation status, possibly allowing gene expression-based subtypes to guide treatment strategies.
Discussion
This study presents an overview of transcriptomics in AML and provides a framework for transcriptional subtyping. We integrate multiple cohorts to identify 17 robust transcriptional subtypes that subclassify ~75% of our datasets’ patients. We make the harmonised data and a cluster predictor publicly available, facilitating future research.
For the CEBPAT cluster, we show that patients without a detected CEBPA bZIP indel mutation still have similar favourable survival. Patients without the canonical CEBPA mutation in the CEBPAT cluster could be explained by undetected mutations, given the complexities of CEBPA sequencing. Also, CEBPA hypermethylation has been described to lead to a similar expression profile [51]. The use of the CEBPAT gene signature for risk stratification could be a relevant alternative to detect these favourable-outcome patients.
KMT2A(1)T mainly featured KMT2A with the fusion partners MLLT3, MLLT10 and MLLT1 – all TFs in the super elongation complex whose perturbation leads to disrupted hematopoietic lineage commitment [52]. In contrast, KMT2A(2)T featured KMT2A::MLLT4, which is thought to cause leukaemia by promoting self-association [53]. Interestingly, MLLT3, MLLT10 and MLLT1 all fuse a specific region of KMT2A, but MLLT4 shows less specificity [54]. Collectively, these results suggest that two types of oncogenic mechanisms involving KMT2A fusions exist that may be marked with unique gene expression patterns.
We identified five NPM1-related clusters, further underpinning findings of transcriptional heterogeneity among NPM1-mutated patients [15, 16, 55], but also providing additional insight into co-mutations and detailed subtypes. We observed several samples from NPM1-related clusters that lacked the NPM1 mutation. Several rare NPM1 fusions, like NPM1::MLF1, have been described to lead to cytoplasmic localisation of NPM1, comparable to the canonical NPM1 frameshift [45, 56]. These non-canonical mishaps could lead to a NPM1-mutated-like presentation and similarities in survival and drug response should be explored. Additionally, NPM1(1)T was mutually exclusive enriched for IDH1/2 and TET2 co-mutations. IDH1/2 mutations lead to an aberrant alpha-ketoglutarate metabolism and are functionally complementary to TET2 loss-of-function mutations [57]. While NPM1(1)T and NPM1(5)T show significant enrichments for TET2, only NPM1(1)T shows this mutual exclusivity with IDH1/2. This suggests that only NPM1(1)T is driven by aberrant alpha-ketoglutarate metabolism, which should be further studied using metabolomics.
Similarly, cytogenetic abnormalities, AML-MRC mutations and high MECOM expression were found in all AML-MRC clusters, but lead to different gene expression. A possible explanation could be clonal architecture and the differentiation state of the cell acquiring the leukemic aberration, both known to influence the biology of the resultant leukaemia [58, 59]. To our knowledge, we are the first to show different gene expression-based subgroups in AML-MRC, with divergent drug responses. Accurate identification of these clusters requires gene expression analysis, showing the relevance of our work.
We found no additional survival differences between other clusters. However, data availability limited the survival analysis, and different treatment protocols across studies could have led to confoundment. Survival differences between transcriptional subtypes should thus be further explored in one large cohort. However, we did find marked differences in drug responses between the clusters. Ideally, new studies should test in-patient efficacy of drugs with good ex-vivo responses in transcriptional subtypes. Furthermore, transcriptional AML subtyping could aid specialists in the complex field of clinical care and lead to multidisciplinary tailored-based treatment advice [60].
In conclusion, the transcriptional subtypes reveal heterogeneity in AML not captured by genetic classification. Integration of transcriptomics into AML research and diagnostics could improve disease understanding and lead to more treatment options.
Data availability
The datasets generated and/or analysed during the current study are available from https://osf.io/wq7gx (https://doi.org/10.17605/OSF.IO/WQ7GX).
Code availability
All code used to generate results is available on reasonable request. The predictor is available from https://github.com/jeppeseverens/AMLmapR as an R package.
References
The Cancer Genome Atlas Research Network. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N Engl J Med. 2013;368:2059–74.
Farrar JE, Schuback HL, Ries RE, Wai D, Hampton OA, Trevino LR, et al. Genomic profiling of pediatric acute myeloid leukemia reveals a changing mutational landscape from disease diagnosis to relapse. Cancer Res. 2016;76:2197–205.
Tyner JW, Tognon CE, Bottomly D, Wilmot B, Kurtz SE, Savage SL, et al. Functional genomic landscape of acute myeloid leukaemia. Nature. 2018;562:526–31.
Papaemmanuil E, Gerstung M, Bullinger L, Gaidzik VI, Paschka P, Roberts ND, et al. Genomic classification and prognosis in acute myeloid leukemia. N Engl J Med. 2016;374:2209–21.
Tazi Y, Arango-Ossa JE, Zhou Y, Bernard E, Thomas I, Gilkes A, et al. Unified classification and risk-stratification in Acute Myeloid Leukemia. Nat Commun. 2022;13:4622.
Khoury JD, Solary E, Abla O, Akkari Y, Alaggio R, Apperley JF, et al. The 5th edition of the World Health Organization classification of haematolymphoid tumours: myeloid and histiocytic/dendritic neoplasms. Leukemia. 2022;36:1703–19.
Arber DA, Orazi A, Hasserjian RP, Borowitz MJ, Calvo KR, Kvasnicka H-M, et al. International Consensus Classification of Myeloid Neoplasms and Acute Leukemias: integrating morphologic, clinical, and genomic data. Blood. 2022;140:1200–28.
Döhner H, Wei AH, Appelbaum FR, Craddock C, DiNardo CD, Dombret H, et al. Diagnosis and management of AML in adults: 2022 recommendations from an international expert panel on behalf of the ELN. Blood. 2022;140:1345–77.
Burd A, Levine RL, Ruppert AS, Mims AS, Borate U, Stein EM, et al. Precision medicine treatment in acute myeloid leukemia using prospective genomic profiling: feasibility and preliminary efficacy of the Beat AML Master Trial. Nat Med. 2020;26:1852–8.
Valk PJM, Verhaak RGW, Beijen MA, Erpelinck CAJ, van Doorn-Khosrovani SB, van W, et al. Prognostically useful gene-expression profiles in acute myeloid leukemia. N Engl J Med. 2004;350:1617–28.
Lavallée V-P, Baccelli I, Krosl J, Wilhelm B, Barabé F, Gendron P, et al. The transcriptomic landscape and directed chemical interrogation of MLL-rearranged acute myeloid leukemias. Nat Genet. 2015;47:1030–7.
Mou T, Pawitan Y, Stahl M, Vesterlund M, Deng W, Jafari R, et al. The transcriptome-wide landscape of molecular subtype-specific mRNA expression profiles in acute myeloid leukemia. Am J Hematol. 2021;96:580–8.
Wouters BJ, Löwenberg B, Erpelinck-Verschueren CAJ, Van Putten WLJ, Valk PJM, Delwel R. Double CEBPA mutations, but not single CEBPA mutations, define a subgroup of acute myeloid leukemia with a distinctive gene expression profile that is uniquely associated with a favorable outcome. Blood. 2009;113:3088–91.
Taskesen E, Bullinger L, Corbacioglu A, Sanders MA, Erpelinck CAJ, Wouters BJ, et al. Prognostic impact, concurrent genetic mutations, and gene expression features of AML with CEBPA mutations in a cohort of 1182 cytogenetically normal AML patients: further evidence for CEBPA double mutant AML as a distinctive disease entity. Blood. 2011;117:2469–75.
Mer AS, Heath EM, Madani Tonekaboni SA, Dogan-Artun N, Nair SK, Murison A, et al. Biological and therapeutic implications of a unique subtype of NPM1 mutated AML. Nat Commun. 2021;12:1054.
Cheng W-Y, Li J-F, Zhu Y-M, Lin X-J, Wen L-J, Zhang F, et al. Transcriptome-based molecular subtypes and differentiation hierarchies improve the classification framework of acute myeloid leukemia. Proc Natl Acad Sci. 2022;119:e2211429119.
de Leeuw DC, Ossenkoppele GJ, Janssen JJWM. Older patients with acute myeloid leukemia deserve individualized treatment. Curr Oncol Rep. 2022;24:1387–400.
Improved relative survival in older patients with acute myeloid leukemia over a 30-year period in the Netherlands: a long haul is needed to change nothing into something. Leukemia, https://www.nature.com/articles/s41375-021-01503-y. Accessed 25 October 2022.
Zeng AGX, Bansal S, Jin L, Mitchell A, Chen WC, Abbas HA, et al. A cellular hierarchy framework for understanding heterogeneity and predicting drug response in acute myeloid leukemia. Nat Med. 2022;28:1212–23.
Bottomly D, Long N, Schultz AR, Kurtz SE, Tognon CE, Johnson K, et al. Integrative analysis of drug response and clinical outcome in acute myeloid leukemia. Cancer Cell. 2022;40:850–64.e9.
Macrae T, Sargeant T, Lemieux S, Hébert J, Deneault E, Sauvageau G. RNA-Seq reveals spliceosome and proteasome genes as most consistent transcripts in human cancer cells. PloS One. 2013;8:e72884.
Lavallée V-P, Lemieux S, Boucher G, Gendron P, Boivin I, Armstrong RN, et al. RNA-sequencing analysis of core binding factor AML identifies recurrent ZBTB7A mutations and defines RUNX1-CBFA2T3 fusion signature. Blood. 2016;127:2498–501.
Pabst C, Bergeron A, Lavallée V-P, Yeh J, Gendron P, Norddahl GL, et al. GPR56 identifies primary human acute myeloid leukemia cells with high repopulating potential in vivo. Blood. 2016;127:2018–27.
Arindrarto W, Borràs DM, de Groen RAL, van den Berg RR, Locher IJ, van Diessen SAME, et al. Comprehensive diagnostics of acute myeloid leukemia by whole transcriptome RNA sequencing. Leukemia. 2020;35:47–61.
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinforma Oxf Engl. 2013;29:15–21.
Genome Reference Consortium, https://www.ncbi.nlm.nih.gov/grc. Accessed 15 August 2023.
Frankish A, Diekhans M, Jungreis I, Lagarde J, Loveland JE, Mudge JM, et al. GENCODE 2021. Nucleic Acids Res. 2021;49:D916–23.
Zhang Y, Parmigiani G, Johnson WE. ComBat-seq: batch effect adjustment for RNA-seq count data. NAR Genomics Bioinforma. 2020;2:lqaa078.
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550.
Büttner M, Miao Z, Wolf FA, Teichmann SA, Theis FJ. A test metric for assessing single-cell RNA-seq batch correction. Nat Methods. 2019;16:43–9.
Uhrig S, Ellermann J, Walther T, Burkhardt P, Fröhlich M, Hutter B, et al. Accurate and efficient detection of gene fusions from RNA sequencing data. Genome Res. 2021;31:448–60.
Monti S, Tamayo P, Mesirov J, Golub T. Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn. 2003;52:91–118.
Jeub LGS, Sporns O, Fortunato S. Multiresolution consensus clustering in networks. Sci Rep. 2018;8:3259.
McInnes L, Healy J, Melville J. UMAP: uniform manifold approximation and projection for dimension reduction. Epub ahead of print 17 September 2020. https://doi.org/10.48550/arXiv.1802.03426.
Traag VA, Waltman L, van Eck NJ. From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep. 2019;9:5233.
Lambert SA, Jolma A, Campitelli LF, Das PK, Yin Y, Albu M, et al. The human transcription factors. Cell. 2018;172:650–65.
Bausch-Fluck D, Hofmann A, Bock T, Frei AP, Cerciello F, Jacobs A, et al. A mass spectrometric-derived cell surface protein atlas. PLOS One. 2015;10:e0121314.
van Galen P, Hovestadt V, Wadsworth MH II, Hughes TK, Griffin GK, Battaglia S, et al. Single-cell RNA-seq reveals AML hierarchies relevant to disease progression and immunity. Cell. 2019;176:1265–81.e24.
Dufva O, Pölönen P, Brück O, Keränen MAI, Klievink J, Mehtonen J, et al. Immunogenomic landscape of hematological malignancies. Cancer Cell. 2020;38:380–99.e13.
Patel JP, Gönen M, Figueroa ME, Fernandez H, Sun Z, Racevskis J, et al. Prognostic relevance of integrated genetic profiling in acute myeloid leukemia. N Engl J Med. 2012;366:1079–89.
Gracia-Maldonado G, Clark J, Mulloy JC, Kumar AR. LAMP5 - a novel target of MLL-fusion proteins is required for the propagation of leukemia. Blood. 2016;128:1512.
Milan T, Celton M, Lagacé K, Roques É, Safa-Tahar-Henni S, Bresson E, et al. Epigenetic changes in human model KMT2A leukemias highlight early events during leukemogenesis. Haematologica. 2022;107:86–99.
Wakita S, Sakaguchi M, Oh I, Kako S, Toya T, Najima Y, et al. Prognostic impact of CEBPA bZIP domain mutation in acute myeloid leukemia. Blood Adv. 2022;6:238–47.
Brunetti L, Gundry MC, Sorcini D, Guzman AG, Huang Y-H, Ramabadran R, et al. Mutant NPM1 maintains the leukemic state through HOX expression. Cancer Cell. 2018;34:499–512.e9.
Falini B, Bigerna B, Pucciarini A, Tiacci E, Mecucci C, Morris SW, et al. Aberrant subcellular expression of nucleophosmin and NPM-MLF1 fusion protein in acute myeloid leukaemia carrying t(3;5): a comparison with NPMc+ AML. Leukemia. 2006;20:368–71.
Aguilo F, Avagyan S, Labar A, Sevilla A, Lee D-F, Kumar P, et al. Prdm16 is a physiologic regulator of hematopoietic stem cells. Blood. 2011;117:5057.
Imperato MR, Cauchy P, Obier N, Bonifer C. The RUNX1–PU.1 axis in the control of hematopoiesis. Int J Hematol. 2015;101:319–29.
Hollox EJ, Louzada S. Genetic variation of glycophorins and infectious disease. Immunogenetics. 2022. https://doi.org/10.1007/s00251-022-01280-7.
Greaves MF, Sieff C, Edwards PAW. Monoclonal antiglycophorin as a probe for erythroleukemias. Blood. 1983;61:645–51.
Andersson LC, Jokinen M, Gahmberg CG. Induction of erythroid differentiation in the human leukaemia cell line K562. Nature. 1979;278:364–5.
Hollink IHIM, van den Heuvel-Eibrink MM, Arentsen-Peters STCJM, Zimmermann M, Peeters JK, Valk PJM, et al. Characterization of CEBPA mutations and promoter hypermethylation in pediatric acute myeloid leukemia. Haematologica. 2011;96:384–92.
Mohan M, Lin C, Guest E, Shilatifard A. Licensed to elongate: a molecular mechanism for MLL-based leukaemogenesis. Nat Rev Cancer. 2010;10:721–8.
Liedtke M, Ayton PM, Somervaille TCP, Smith KS, Cleary ML. Self-association mediated by the Ras association 1 domain of AF6 activates the oncogenic potential of MLL-AF6. Blood. 2010;116:63–70.
Meyer C, Larghero P, Almeida Lopes B, Burmeister T, Gröger D, Sutton R, et al. The KMT2A recombinome of acute leukemias in 2023. Leukemia. 2023;37:988–1005.
Mason EF, Kuo FC, Hasserjian RP, Seegmiller AC, Pozdnyakova O. A distinct immunophenotype identifies a subset of NPM1-mutated AML with TET2 or IDH1/2 mutations and improved outcome. Am J Hematol. 2018;93:504–10.
Martelli MP, Rossi R, Venanzi A, Meggendorfer M, Perriello VM, Martino G, et al. Novel NPM1 exon 5 mutations and gene fusions leading to aberrant cytoplasmic nucleophosmin in AML. Blood. 2021;138:2696–701.
Figueroa ME, Wahab OA, Lu C, Ward PS, Patel J, Shih A, et al. Leukemic IDH1 and IDH2 mutations result in a hypermethylation phenotype, disrupt TET2 function, and impair hematopoietic differentiation. Cancer Cell. 2010;18:553–67.
Morita K, Wang F, Jahn K, Hu T, Tanaka T, Sasaki Y, et al. Clonal evolution of acute myeloid leukemia revealed by high-throughput single-cell genomics. Nat Commun. 2020;11:5327.
Krivtsov AV, Figueroa ME, Sinha AU, Stubbs MC, Feng Z, Valk PJM, et al. Cell of origin determines clinically relevant subtypes of MLL-rearranged AML. Leukemia. 2013;27:852–60.
Willemsen AECAB, Krausz S, Ligtenberg MJL, Grünberg K, Groen HJM, Voest EE, et al. Molecular tumour boards and molecular diagnostics for patients with cancer in the Netherlands: experiences, challenges, and aspirations. Br J Cancer. 2019;121:34–6.
Acknowledgements
This project was funded by the Dutch Cancer Society (project number 15152) and by a strategic investment of the Leiden University Medical Center, embedded within the Leiden Oncology Center, and executed within the Leiden Center for Computational Oncology. EBA was funded by a personal grant from the Dutch Research Council (NWO; VENI: 09150161810095). The funding bodies had no role in the study design, the collection, analysis, and interpretation of data, the writing of the manuscript, and the decision to submit the manuscript for publication.
Author information
Authors and Affiliations
Contributions
MJR, MG, and EBA conceived and designed the project; EBA acquired funding; EBA performed project administration; MG, EBA, HV, RRB, CJMH, PB performed oversight and management of resources (data generation, collection, transfer, infrastructure, data processing); JFS performed computational and statistical analyses; JFS, EBA, MG, EOK, ES-L performed analyses and interpretation; JFS performed and structured data visualisation; MJR, MG and EBA provided supervision and scientific direction; JFS wrote the manuscript; and all authors critically reviewed the manuscript and figures.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Severens, J.F., Karakaslar, E.O., van der Reijden, B.A. et al. Mapping AML heterogeneity - multi-cohort transcriptomic analysis identifies novel clusters and divergent ex-vivo drug responses. Leukemia 38, 751–761 (2024). https://doi.org/10.1038/s41375-024-02137-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41375-024-02137-6