Introduction

Myeloproliferative neoplasms (MPN) are a heterogeneous group of clonal disorders characterized by aberrant hematopoietic proliferation. The WHO classification 2022 [1] defines MPN according to cytomorphology, bone marrow biopsy, grading of fibrosis, blood counts, and several molecular markers. Currently, there are eight subclasses, whereas four of them represent the classical MPNs: chronic myeloid leukemia (CML), but also the BCR::ABL1 negative MPNs polycythemia vera (PV), primary myelofibrosis (PMF) and essential thrombocythemia (ET), that are characterized by mutations in the driver genes JAK2, CALR, or MPL. Following the WHO classification for diagnosis of BCR::ABL1 negative MPNs the major criteria are mainly based on morphology. PV is marked by a hypercellular bone marrow and elevated hemoglobin levels. ET is characterized by megakaryocytic proliferation and increased platelet counts, while PMF show a bone marrow fibrosis beside the megakaryocytic proliferation. However, overlaps, borderline findings, or transition into another MPN subtype occur. ET and PV can transform to post-ET and post-PV MF, highlighting the difficulties to clearly separate these subgroups. MPN, not otherwise specified (MPN-NOS) is designated to cases with before mentioned MPN features, but lacking diagnostic criteria specific for MPN [2], underscoring the potential utility of genetic classification. Transformation from MPN chronic phase (MPN-CP) to MPN blast phase (MPN-BP) or acute leukemia represents a critical complication for MPN patients, as MPN-BP is highly refractory to treatment and associated with poor prognosis [3, 4]. Leukemic transformation rates vary among the three subgroups, with PMF having the highest at 8–23% over 10 years, followed by PV at 5–15% in the same period. In contrast, leukemic transformation is rare in ET at 1–5% in 10 years [5, 6]. Various clinical and genetic risk factors have been identified, including for example age, but also cytogenetic lesions as well as specific gene mutations [7]. The widespread adoption of next-generation sequencing has uncovered the molecular complexity of MPNs, and in diagnostic settings, sometimes insufficient or incomplete clinical data complicates diagnosis. Although bone marrow histology was promoted to a major diagnostic criterion in the WHO 2016, an increasing role for mutations complementing morphologic diagnosis is expected, adding also prognostic information [8].

Therefore the aim of this study was to overcome these obstacles by using genetic markers for the stratification of MPN entities, including CML, PV, PMF, and ET and, hence, to also provide prognostic information based on a patients’ genetic profile. We further aimed to thoroughly characterize MPN cases with progression to blast phase (BP) to provide guidance for a better upfront genetic risk stratification and therapeutic management.

Materials and methods

Patients’ cohorts

All patients’ samples (n = 554) were sent from different hematological centers to MLL Munich Leukemia Laboratory for diagnostic purposes between 2005 and 2017, and diagnosed by cytomorphology, cytogenetics and molecular genetics following WHO 2022 criteria [9]. All patients gave their informed consent for genetic analyses and the use of laboratory results for research purposes. The study adhered to the tenets of the Declaration of Helsinki and was approved by the laboratory´s institutional review board and the Ethics Committee of the Bavarian Medical Association for the use of archived DNA/RNA samples and associated clinical information. The clinical data were retrieved, and the samples were collected and analysed with the endorsement of the Ethics Committee of the Bavarian Medical Association. The analysis workflow and cohorts are depicted in Fig. 1, with further information available in the Supplementary material.

Fig. 1: Overview of the analysis workflow and analysed cohorts.
figure 1

A Listing of analyses performed per cohort with genome-wide characterization of MPN cases on the left, validation of the machine learning model and genetic decision tree in the middle, and characterization of MPN cases with transformation to blast phase on the right. B Overview and brief description of the analysed cohorts (left). The Venn diagram (right) indicates the patient overlap between the different cohorts. The subgroups in the different cohorts showed the following incidences: cohort 1: 106 CML, 77 ET, 75 PV, 97 PMF; cohort 2: 37 CML, 24 ET, 29 PV, 19 PMF; cohort 3: 55 MPN-NOS; cohort 4: 2 ET, 3 PV, 9 PMF, 5 MPN-NOS; cohort 5: 11 ET, 10 PV, 13 PMF. ML machine learning, Seq sequencing, WES whole exome sequencing, WGS whole genome sequencing, WTS whole transcriptome sequencing.

Cytomorphology

Cytomorphology of bone marrow and/or peripheral blood samples was performed in all cases after May-Grünwald Giemsa staining, cytochemistry with myeloperoxidase, non-specific esterase, and iron staining (Fe) [10]. The grade of fibrosis was not assessed but histopathology reports were available in 87/355 cases.

Cytogenetics

Chromosome preparations and banding analysis of bone marrow and/or peripheral blood samples were performed for 527/554 cases as previously described according to standard methods [11]. For classification of abnormalities and karyotypes, the ISCN guidelines (2016) were used [12].

Molecular genetics

Different cohorts were assessed by whole genome sequencing (WGS), whole transcriptome sequencing or whole exome sequencing (WES), described in detail in the Supplementary methods section. Targeted NGS panel sequencing covering 28 genes was described previously [13] and investigated genes are listed in Supplementary Table 1. KMT2A-PTD was analyzed at MPN-BP with a quantitative PCR assay, FLT3-ITD by gene scan to confirm ITD variants detected by WES or WGS, both described methodically previously [14, 15].

Statistical analysis

Dichotomous variables were compared between different groups using the χ2-test. Results were considered significant at p < 0.05. Adjustment for multiple testing was not done. Statistical analyses were performed using SPSS version 19.0 (IBM Corporation, Armonk, NY); the reported p-values are two-sided.

Machine learning approach to stratify MPN patients

Support vector machine with class probabilities output was used to separate MPN patients. LASSO regression was used for feature selection to identify the most discriminating maker gene set. The model generation was done including binary mutation profiles, VAF groups for JAK2, MPL, and CALR, and frequent chromosomal aberrations as features. For each training cycle 500 models were built with 10-fold cross-validation, the accuracy was estimated and the best-performing models (top 5%) were chosen, the selected features (=gene mutations) assessed and features that occurred in more than 60% of the models were kept for the next round of training. The model with the highest accuracy and the smallest number of features was selected as the final model.

Results

The transcriptome provides further insights into the molecular mechanisms of MPN patients

Gene expression profiles of the various diagnostic entities (cohort 1) were compared and revealed 670 genes differentially expressed between at least two groups (Supplementary Fig. 1A). CML patients were characterized by elevated expression of IL1RL1, PIEZO2, HRH4, and IL5RA, whereas BEND2, TREML1, EGF, and CXCR2P1 were down-regulated (Supplementary Fig. 1B). IL1RL1 is known to be up-regulated by the BCR::ABL1 fusion protein [16]. PMF patients showed an increased expression of IFI27, FAM83A, MYCN and a decreased expression of CXCL12, FAM178B, and VCAM1. Interestingly, expression of RAG1, which is only active in immune system cells, could be detected in ET and PV but not in CML and PMF patients. ET patients were further characterized by the up-regulation of the transcription factor IRF4 and a down-regulation of FAM83A, IFI27, and PIEZO2. FAT1, which is expressed in erythroid precursor, was significantly overexpressed in the PV cohort (Supplementary Fig. 1B). The expression of the erythroid lineage-specific marker TFRC (CD71) was also more elevated in PV patients but did not reach significance. The expression levels of 18 subgroup-specific genes can be used to broadly stratify MPN patients into MPN diagnostic groups (supplementary Fig. 1C).

Most MPN patients harbor a low number of mutations and present with a normal karyotype

The mutational profiles were obtained by addressing WGS data of these 355 patients (cohort 1) for 73 genes and cytogenetic aberrations, respectively. In the total cohort the most frequently mutated gene was JAK2 (n = 179), followed by ASXL1 (n = 78), TET2 (n = 46), CALR (n = 41), DNMT3A (n = 26) and SRSF2 (n = 25). MPL was mutated in 13 patients. Not surprisingly, BCR::ABL1 was the most abundant cytogenetic aberration in all CML cases (n = 106). Other common aberrations in the cohort were: gain 9p or trisomy 9 (n = 14), deletion 20q (n = 13), trisomy 8 (n = 12), deletion 13q (n = 8), deletion 5q, 7q, or 11q (n = 5 each), and gain 1q (n = 5). No copy number changes affecting CALR/MPL were detected. A complex karyotype was detected in six cases only. 177/355 (50%) cases harbored no additional gene mutation besides the driver aberrations BCR::ABL1/JAK2/CALR/MPL and 282/355 (79%) showed a normal karyotype or no additional chromosomal aberration besides BCR::ABL1 in CML. 18 cases remained as triple-negative MPNs. The median number of mutations (without BCR::ABL1) per patient was 1 for CML, PV, and ET, while PMF patients showed a median of 3 mutations (Fig. 2).

Fig. 2: Molecular genetics characterization of MPN training cohort (n = 355 cases).
figure 2

Entities are separated by colour: CML: dark gray, ET: light gray, PMF: red, PV: blue. A Stacked bar chart indicating the number of patients who carry the respective genetic abnormality per entity. Cytogenetic aberrations are indicated by a gray and mutated genes by a black typeface. B Wind rose chart illustrating number of cases showing 1–6 gene mutations separated by entity. C Wind rose chart showing the VAF distribution of the MPN driver-genes separated by entity. CN-LOH copy-neutral loss of heterozygosity.

Homozygous JAK2 mutations are a hallmark feature of more aggressive MPN phenotypes

We identified 97 cases with copy-neutral loss of heterozygosity (CN-LOH): 56 PV, 39 PMF, 1 ET, and 1 CML. As expected, 9p CN-LOH encompassing the JAK2 gene occurred most frequently (84/97; 87%) with 56 PV and 27 PMF cases, but only one ET case. In 64/84 cases the mutational load of JAK2 V617F was ≥50%. Furthermore, 1p CN-LOH explained a homozygous MPL mutation in four cases. CALR was not affected by CN-LOH in our cohort. Two PV patients showed a 14q CN-LOH beside the 9p CN-LOH. A number of other CN-LOH was only detected in PMF patients (Fig. 2; Supplementary Table 3).

Since the JAK2 46/1 haplotype has been shown to correlate with JAK2 V617F mutation status, particularly with a high variant allele frequency (VAF ≥ 50%), we examined the four SNPs of the 46/1 GGCC-JAK2 haplotype [17]. The 46/1 haplotype was most often found in PV with 78% of alleles carrying the haplotype, followed by PMF with 53%, ET with 41%, and CML with 30%. In the healthy population the 46/1 haplotype presents with 24% [17].

Unsupervised clustering of genetic features identifies 5 clusters of MPN patients with divergent prognosis

The analysis of genetic co-occurrence patterns revealed five main clusters that separated the cohort into groups of (i) BCR::ABL1 positive cases with a normal karyotype that harbored only few mutations (cluster c1), (ii) cases with a high frequency of cytogenetic abnormalities (cluster c2), (iii) cases with 9p CN-LOH and JAK2 mutation and associated high VAF that showed mainly a normal karyotype (cluster c3), (iv) cases that harbored mainly CALR and JAK2 mutations with low VAFs and no cytogenetic aberrations (cluster c4), (v) cases with a high frequency of ASXL1, TET2, and U2AF1 mutations and a normal karyotype with 28% of triple-negative cases (cluster c5, Fig. 3). With the exception of c1, which contained 100% CML cases, the clusters comprised a mixture of the entities with varying proportions (Fig. 3). The prognosis of patients was found to be closely linked to their diagnostic category, as evidenced by event-free survival (EFS) estimates derived from the personalized MPN risk calculator [18] and the overall survival (OS) data from our cohort. The OS data indicated that clusters c2 and c5, which can be classified as high-risk based on genetic profiles [19, 20], had poor prognoses, a trend also seen in cluster c3 (Fig. 3). As expected, the CML cluster (c1) exhibited a favorable prognosis. Patients assigned to cluster c4 also demonstrated comparably good survival outcomes (Fig. 3).

Fig. 3: Unsupervised hierarchical clustering reveals 5 genetically driven clusters in CML and MPN (cohort 1).
figure 3

The upper part shows the dendrogram of the 5 clusters. The degree of similarity is given by the line length. The middle part shows wind rose plots with the relative frequency with which genetic markers occur in the corresponding group. The main features of the group as well as the occurrence of the different entities within the group are listed below. In the lower part, the predicted event-free survival (EFS) according to Grinfeld et al. [18] for the MPN clusters is given. On the right side the Kaplan–Meier curve for outcomes classified by cluster is given.

Machine learning identifies 12 molecular markers to reliably stratify MPN patients

The analysis of genetic co-occurrence patterns confirmed the presence of cross-entity genetic subgroups in MPN but did not delineate distinct prognostic groups. Therefore, we employed a machine learning (ML) model to differentiate CML, PV, ET, and PMF based solely on molecular markers, aiming to uncover clinically significant genotype-phenotype correlations (Material and methods, Supplementary Material). The final model contained only 12 molecular markers: mutation status of ASXL1, BCR::ABL1, CALR, JAK2, SF3B1, SRSF2, TET2, TP53, and U2AF1 and the binary VAF values CALR > 35%, JAK2 > 35%, and JAK2 > 60% (Fig. 4A). Using this final model to stratify patients we achieved an accuracy of 98.3%, with only three misclassified cases. Given that CML is a genetically defined entity and the model is more helpful in stratifying PV, PMF, and ET patients, we determined the positive predictive value (PPV) and negative predictive value (NPV) for BCR::ABL1-negative MPNs only, which exceeded 0.97 for all entities. The model performed similarly well on an independent validation cohort of 109 patients (cohort 2) with an accuracy of 86.2% (Fig. 4B). PPV and NPV of BCR::ABL1-negative MPNs were comparable to the test cohort, except for a higher number of false positive ET cases and an accordingly lower PPV of 0.68.

Fig. 4: Final machine learning classification model.
figure 4

A Spider plot illustrating the final model and features with their weight for classification to a specific entity. CML is given in black, ET in light grey, PMF in red, and PV in blue. B Confusion matrix showing the results of the predicted classification (predicted) and true diagnosis by morphology (true), as well as the PPV and NPV. Left table: test cohort within the training cohort (cohort 1), right table: validation cohort (cohort 2). C Decision tree based on the features of the model. The bar charts represent the model classification (x-axis) within the genetic based decision tree for the validation cohort (cohort 2). PPV positive predictive value, NPV negative predictive value, VAF variant allele frequency.

A simple decision tree can be used to classify MPN patients

To enhance its clinical utility, we translated the logic of our final model into a straightforward decision tree. A diagnosis of CML is determined by the presence of a BCR::ABL1 translocation. In its absence, mutations in ASXL1, SF3B1, SRSF2, U2AF1, or TP53 direct classification towards MPN, specifically indicating PMF. Homozygous JAK2 mutations or mutations in JAK2 Exon 12 suggest PV, while heterozygous mutations in JAK2, CALR, or MPL indicate ET. Notably, both the advanced ML model and the simplified decision tree yielded comparable classification accuracy in an independent validation cohort, particularly for CML, PMF, and PV diagnoses. ET cases, however, exhibited a less definitive genetic signature, with the majority being classified into the heterozygous JAK2/CALR/MPL group by the decision tree. Thus, the decision tree effectively replicates the ML model’s classification logic with high concordance (Fig. 4C).

The identified molecular markers can be used to acquire diagnostic and prognostic information for MPN-NOS cases

Cases that present clinical, laboratory, morphological, and molecular features of MPN, but do not meet the diagnostic criteria of any specific MPN type, pose a diagnostic challenge. However, our ML model as well as the decision tree were able to genetically stratify MPN-NOS (cohort 3, 55 MPN-NOS cases) patients. The model assigned 27% of the patients to PMF, 60% to PV, and 13% to ET (Fig. 5A). By applying the decision tree to the MPN-NOS cases, a similar distribution was observed (Fig. 5B). For 268/410 cases, OS data was available that could be used to compare the survival of the newly grouped MPN-NOS patients to the reference MPN cohort (cohort 1). Relative to other MPN subgroups, MPN-NOS patients typically exhibit an intermediate to poor prognosis (Fig. 5C). However, a more nuanced stratification emerges when these cases are categorized according to genetic characteristics (Fig. 5C), revealing that their OS rates closely mirror those of the corresponding MPN subgroups. The results indicate that MPN-NOS reflect a mixture of genetic profiles of different MPN entities, allowing for a genetically based classification of MPN-NOS patients and for the identification of high-risk patients, even in the absence of other diagnostic criteria.

Fig. 5: Classification of MPN-NOS (cohort 3).
figure 5

Classification based on A machine learning model and B the proposed decision tree. The bars represent the proportion of cases of the validation cohort, as well as MPN-NOS cases, that were classified by the model as CML, PV, ET or PMF and their distribution within the genetic classification. C Kaplan–Meier curve for outcomes classified by WHO entity (left) and genetic profiles (right).

The clonal evolution from MPN-CP to MPN-BP is evident in their divergent genetic profiles

In MPNs, the acquisition of high-risk mutations is believed to be the driving force behind clonal evolution from chronic phase to BP. Our WES analysis of 19 paired MPN-CP and MPN-BP samples (cohort 4) revealed that more than 1000 genes are altered at both time points, with only a small proportion of these genes (~10%) being recurrently mutated in ≥2 cases. On average, 9 mutations per patient were lost during MPN progression and 12 were gained, whereas the mutational load of most variants changed only moderately (VAF change  < 15%), indicating that the majority of variants are patient-specific, potentially germline, and not associated with the progress of the disease. The most frequently mutated genes besides JAK2 (89%) and MPL (11%) at MPN-CP were SRSF2 (68%), followed by TET2 (47%), ASXL1 (26%), RUNX1, DNMT3A, ZRSR2 (16% each), SETBP1, and IDH2 (11% each, Fig. 6A). In our cohort, no patient showed CALR as mutated driver-gene. 90% (n = 17) showed a normal karyotype at MPN-CP. Only two patients showed aberrations in chromosome 9 (trisomy 9) and deletions in chromosome 13 (del(13q)) and 20 (del(20q)) in CP, respectively. 9p CN-LOH (JAK2 gene) was detected by WES in six cases, 1p CN-LOH (MPL gene) in two cases, and 11q CN-LOH (CBL gene) in one patient.

Fig. 6: Genetic profiling of paired MPN-CP/MPN-BP samples.
figure 6

A The bar chart at the top indicates the absolute mutation count per patient. The waterfall plot of paired samples illustrates gene mutations at MPN-CP and MPN-BP state, separated for the three groups marked by loss of the MPN driver-gene mutation during progression (left panel), reduction of MPN-driver clone (middle panel) and stable or increasing MPN-driver clone (right panel). Each patient is given in two columns with left showing MPN-CP and right MPN-BP. Karyotype information (lower panel; light grey: based on chromosome banding analyses; dark grey: CNV aberrations based on WES data) and gene mutation frequencies (right panel) are given. B Bar chart displaying the relative frequency of myeloid gene mutations in MPN-CP and MPN-BP. The colored bar below the plot displays the functional annotation of the genes. C Representative plots summarizing clonal evolution of the three groups (loss, reduction, increase of MPN-driver clone). The top of the graph reflects the clones at MPN-CP, while the lower part reflects the clones at MPN-BP. The indicated genes and aberrations show the characteristic of the group and the width of the clones illustrate the approximate clone size. A red marked clone shows an increasing clone size during progression, while a blue one a decreasing and the grey one a stable clone size. Gene names at the bottom of the graph illustrate gained gene mutations or aberrations. CN-LOH copy-neutral loss of heterozygosity.

Addressing the most frequently mutated genes at MPN-BP showed a slight shift in the mutation pattern. Due to the loss of some JAK2 mutations (n = 5) during progression in BP, the most frequently mutated genes were JAK2 (63%) and SRSF2 (63%). RUNX1 was more often detected than ASXL1 with 47 and 32%, respectively, and TP53 mutations appeared (n = 4, 21%) (Fig. 6A). All patients harbored at least one mutation in a transcription factor and/or a splicing and chromatin modifying gene at both time points, but the number of patients carrying a mutation in a RAS/signaling pathway related gene increased from 58% (11/19) to 74% (14/19) during progression (Fig. 6B). 47% (9/19) of patients gained cytogenetic aberrations in MPN-BP: 56% (5/9) gained a complex karyotype, 33% (3/9) lost chromosome 7 and one patient gained a chromosome 19. Therefore, nine patients remained with a normal karyotype and one patient stayed with trisomy 9 without additional karyotypic changes (Fig. 6A). One patient gained a 9p CN-LOH during progression, while three patients lost their 9p (n = 2) or 1p CN-LOH (n = 1), respectively, which also affected the allele frequencies of the corresponding gene mutations.

Application of our classification model and decision tree showed that 18/19 patients (PMF: 9, ET: 2, PV: 3, MPN-NOS: 5) were assigned to the PMF group, representing the high-risk group for transformation to BP [6], marked by mutations in either ASXL1, SF3B1, SRSF2, U2AF1, or TP53. Only one patient diagnosed as ET in chronic phase was classified to the heterozygous JAK2 mutated ET group.

Blast phase can be driven by different mechanisms

To further analyze the driver-gene mutations of BP, we investigated the clonal evolution of the 19 patients in more detail (Supplementary Fig. 2). First, we addressed the JAK2 and MPL mutation status at MPN-CP versus MPN-BP. In 6/19 patients (32%) the MPN-CP defining clone, characterized by JAK2 and MPL mutation, was completely or nearly lost. In another three patients the JAK2 mutated clone was diminished. In the remaining 10 cases the JAK2 or MPL clone was stable or even increasing. Interestingly, one patient lost the 9p CN-LOH and the homozygous JAK2 mutation status, but kept a heterozygous JAK2 mutation during progression. The observations indicate that not necessarily the dominant MPN-CP clone drives the transformation to BP (Fig. 6C, Supplementary Fig. 2). Secondly, we investigated the recurrently gained gene mutations in more detail. Here, we found that in addition to the aforementioned RAS gene mutations, mutations in FLT3-ITD (n = 1), CEBPA (n = 2), and KMT2A-PTD (n = 2) were also acquired. RUNX1 mutations were even more frequently gained: while four patients carried a RUNX1 mutation from the beginning of the disease, five patients acquired a RUNX1 mutation during progression to BP. Four patients gained a mutation in TP53 and an accompanying complex karyotype including a TP53 deletion, resulting in bi-allelic TP53 mutation status (Fig. 6A). Last, we examined the mutational background as a risk for disease progression and compared genetic data with an MPN control cohort (cohort 5) that did not show disease progression to BP during follow-up (median observation time: 86 months). We focused on the nine most frequently mutated genes in our cohort. Noteworthy, 14/19 patients harbored mutations in either SRSF2 (n = 5) or TET2 (n = 2) or in both genes (n = 7) at MPN-CP. SRSF2 was significantly more often mutated in MPN patients progressing to MPN-BP than in the control cohort (13/19, 68% versus 3/34, 8%; p < 0.001). Beyond that, also TET2 and RUNX1 were more often mutated in the MPN-CP/MPN-BP cohort compared to the control cases (TET2: 9/19 (six single hit), 47% versus 9/34 (five single hit), 26%; RUNX1: 4/19, 21% versus 1/24, 2%, respectively) but differences did not reach statistically significance due to small sample size.

Of note, the five paired transcriptome profiles analysed showed an AML-specific expression pattern, regardless of whether the leukemic clone originated from a JAK2/CALR/MPL mutated clone or a clone without an MPN-driving gene mutation (Supplementary Fig. 3).

Discussion

We genetically characterized 355 individuals with classic MPN, contributing to the growing genetic knowledge of MPNs by uncovering genetically distinct subgroups with varying cytogenetic abnormalities, mutations, and JAK2 allelic statuses. Notably, differences in JAK2 allele status (heterozygous/homozygous) correlated with diverse EFS and OS outcomes, potentially due to additional prognostic mutations. In contrast, groups with cytogenetic aberrations and additional mutations generally had shorter EFS and poorer OS regardless of the diagnosed entity, aligning with studies on the impact of karyotype and mutation count on survival [19, 21, 22]. However, these observations have to be validated in a cohort with more comprehensively documented survival rates.

Gene expression analysis can aid in disease classification for certain conditions. Our transcriptome analysis indicates that its direct diagnostic utility in MPNs is constrained by complexity and subgroup similarity. Nonetheless, altered genes linked to specific cell types or lineages can act as markers, indirectly connecting to blood counts and diagnoses by revealing the cellular dynamics behind MPNs’ clinical features.

Next, we developed a ML model based on 12 genetic markers accessible through routine analysis, echoing recent proposals for targeted NGS use in MPN to identify non-canonical and low-burden driver variants to avoid misdiagnosis of triple-negative MPNs [23]. For triple-negative MPNs it is recommended to perform gene panel sequencing to search for the most common co-mutations, such as ASXL1, TET2, SRSF2, and EZH2 [24, 25]. This particular mutation profile is also reflected in our ML model, which incorporates splicing genes, chromatin modifiers and also TP53, in addition to the respective driver genes and their allelic load. Of note, using the allelic load as a proxy for the presence of chromosomal abnormalities has some limitations, notably the potential oversight of CN-LOH events in small clones, leading to possible misclassification. However, CN-LOH assessment is not yet a standard practice, which would limit the applicability of a diagnostic approach that relies on CN-LOH information. Moreover, splicing gene and chromatin modifier mutations show a large overlap between different entities. The presence of spliceosomal gene mutations in ET according to WHO may in fact represent more early or pre-fibrotic PMF, as their distinction is very difficult to make morphologically [26]. This might represent a weakness of our study, as there was limited histopathology available for PMF diagnosis. It is also likely that spliceosomal genes are functionally involved in MPN progression and are preferentially present in PMF and not in the chronic entities PV and ET, which could explain the ET patients misclassified by our ML model and the decision tree. Furthermore, homozygous JAK2 status explained the incorrect assignment of ET patients to PV, as this is the entity with the highest frequency of homozygous JAK2 mutations [27, 28]. Thus, the misclassified patients had an adverse prognosis on genetic risk stratification [20], which could also influence the course of treatment. The GIPSS [29] and MIPSS70+ [30] reveal two therapy-relevant scenarios in PMF as a progressing MPN disease. Low-risk patients in this situation are treated using a watch and wait approach, whereas high-risk patients are prepared for an allogeneic stem cell transplant. Overall, our model demonstrated high accuracy in classifying MPN subtypes and effectively stratified MPN-NOS cases into prognostically significant subgroups. Hence, our model is suitable to deliver valuable diagnostic insights and assist clinicians in making informed decisions, even in challenging cases. To enhance clinical utility, we translated the ML model into a user-friendly decision tree.

Up to 20% of MPN cases progress to BP and multiple progression factors have been identified. As shown in other studies, TP53 mutations play a significant role in leukemic transformation [21, 31], and SRSF2 mutations are found in all entities as an adverse risk factor [20]. However, to the best of our knowledge, the simultaneous and comprehensive exome-wide assessment of SNVs, CNVs, and CN-LOH events before and after progression has not yet been performed in a cohort of substantial size. We analyzed 19 paired MPN-CP/MPN-BP samples, and all but one were classified in chronic phase into the group with mutations in ASXL1, SF3B1, SRSF2, U2AF1, or TP53, representing high-risk patients. Thus, the samples did not show entity-specific but rather prognosis-specific assignment. The clonal hierarchy of MPN is heterogeneous, as exemplified in this study by the fact that independent clones in addition to the MPN clone can also drive leukemic transformation [32]. This is in line with a recent study of clonal dynamics in MPN based on single-cell genotype data, that showed that the emergence of new dominating clones underlines progression to MPN-BP in most instances [33]. In our study, 138 PV/ET/PMF cases of cohort 1 showed additional mutations, whereof 86/138 (62%) presented with the MPN-driver mutation within the dominant clone. However, any correlation to the progressing clone was not observable in the MPN-CP/MPN-BP cohort, showing 8/19 (42%) cases with a dominant MPN-driver clone at MPN-CP.

In summary, our study identifies 12 genetic markers sufficient for classifying chronic phase MPN patients according to WHO groups. We found that mutations in SRSF2, TET2, and RUNX1 mark the transition to BP, with TP53 mutations also prevalent during this shift. Our model, incorporating these markers, can determine patients’ risk of transformation, highlighting that ET patients, typically considered low-risk, may actually be high-risk genetically. Consequently, expanding genetic analysis beyond JAK2, CALR, and MPL at diagnosis is crucial for accurate MPN classification, early high-risk patient identification, and timely intervention.