Introduction

Primary myelofibrosis (PMF) is an aggressive myeloid malignancy with an estimated median survival of 6 years [1]. Patients with PMF are also at risk for impaired quality of life, as a result of frequent red blood cell transfusion requirement, markedly enlarged spleen and liver, severe constitutional symptoms, cachexia and consequences of portal hypertension, such as ascites, edema, and recurrent gastrointestinal bleeding. Currently employed treatment modalities in PMF (e.g., JAK2 inhibitors, hydroxyurea, immunomodulatory drugs, androgen preparations, corticosteroids, involved-field radiation, and splenectomy), with the exception of allogeneic hematopoietic stem cell transplant (alloSCT), do not modify the natural history of the disease and their value is limited to symptom palliation [2]. Therefore, alloSCT currently remains the treatment of choice in PMF, if the goal of therapy was to prolong life. Unfortunately, alloSCT is associated with a substantial risk of treatment-related mortality and morbidity, and its implementation requires personalized assessment of risk-benefit ratio [3].

Beginning in 2009, international collaborations have produced a series of robust prognostic models in PMF, in order to assist with treatment decision-making and help identify candidates in whom the risk of alloSCT, or other treatment with serious side effects, is justified. The prototype risk models in this regard were initially based on clinically derived variables only [4, 5], while cytogenetic and mutation information was incorporated in the more recent reiterations, including the mutation-enhanced international prognostic scoring systems for transplant-age patients (MIPSS70 and MIPSS70-plus) [6]. The latter included previously acknowledged but further refined clinical risk factors (hemoglobin <10 g/dl, platelets <100 × 109/l, leukocytes >25 × 109/l, circulating blasts ≥2%, constitutional symptoms and grade ≥2 bone marrow fibrosis) and recently highlighted genetic predictors of shortened survival (unfavorable karyotype, absence of CALR type 1/like mutation and presence and number of high-molecular risk mutations, including ASXL1, SRSF2, EZH2, and IDH1/2); MIPSS70-plus features four risk categories with 5-years survival rates of 7–91% (http://www.mipss70score.it/) [6]. In the current study, we took advantage of the recently revised three-tiered cytogenetic risk stratification in PMF [7], the two-tiered risk stratification according to driver mutational status [8], and the growing list of high risk mutations, including ASXL1 [9], SRSF2 [10], and U2AF1Q157 [11], in order to recalibrate the inter-independent survival effect of genetic risk factors and provide a new risk model that is exclusively based on mutations and karyotype: genetically inspired prognostic scoring system (GIPSS).

Methods

The current study was approved by the institutional review boards of the Mayo Clinic, Rochester, MN, USA and the University of Florence, Florence, Italy. All patients provided informed written consent for the study sample collection, as well as permission for its use in research. Inclusion to the current study required availability of archived peripheral blood or bone marrow sample collected at the time of diagnosis (Florence cohort) or first referral (Mayo cohort). Diagnoses of PMF and leukemic transformation were according to the World Health Organization criteria [12]. Cytogenetic analysis and reporting were done according to the International System for Human Cytogenetic Nomenclature criteria [13]. Driver and other mutations were detected by targeted amplicon next generation or direct sequencing, as previously described [6]. Type 1/like and type 2/like CALR variant designations were as previously described [14,15,16]. High-molecular risk mutations included in the current report were selected based on previous reports of prognostic relevance and included ASXL1, SRSF2, EZH2, IDH1/2, and U2AF1 [17, 18]; furthermore, in order to secure optimal sample size and statistical validity, the current study required a minimum of 500 informative cases for a specific mutation to be included in the analysis.

Statistical analyses considered clinical and laboratory parameters obtained at time of diagnosis (University of Florence cohort) or time of diagnosis or first referral (Mayo Clinic cohort), which coincided, in all instances, with time of sample collection for mutation analysis. Differences in the distribution of continuous variables between categories were analyzed by either Mann–Whitney (for comparison of two groups) or Kruskal–Wallis (comparison of three or more groups) test. Patient groups with nominal variables were compared by chi-square test. Overall survival analysis was computed from the date of diagnosis or the first referral (i.e., the date of sample collection) to date of death (uncensored) or last contact (censored). Patients receiving alloSCT were censored at the time of their transplantation. Date of leukemic transformation replaced date of death, as the uncensored variable, for estimating leukemia-free survival. Overall and leukemia-free survival curves were prepared by the Kaplan–Meier method and compared by the log-rank test. Cox proportional hazard regression model was used for multivariable analysis. P-values of <0.05 were considered significant. Covariates for the multivariable model were selected based on previous knowledge of their prognostic significance; a step-wise method was used with backward elimination probability threshold of 0.1.

Bootstrap resampling technique, employing 100 bootstrap samplings, was used for internal validation of risk discrimination by the newly developed GIPSS risk model. Additional model validation was accomplished by applying GIPSS to the Mayo and Florence cohorts, separately, as well as to transplant-age patients only (≤70 years old). Relative quality of the GIPSS model, in comparison to the clinically based dynamic international prognostic scoring system (DIPSS) [5] and the more recently published MIPSS70-plus [6] models were estimated by the Akaike information criterion (AIC). In addition, logistic regression was employed to prepare receiver operating characteristic curves and area under the curve (AUC) estimates in order to compare the 10-year mortality prediction performance of GIPSS to both DIPSS and MIPSS70-plus; for the purposes of the particular logistic model, all patients surviving beyond 10 years were censored, while those who died within the particular time frame were uncensored. The JMP® Pro 13.0.0 software from SAS Institute, Cary, NC, USA, was used for all calculations.

Results

Baseline patient characteristics

A total of 641 patients with PMF (median age 63 years; 64% males) who were informative for both cytogenetic and mutation information were recruited from the Mayo Clinic, Rochester, MN, USA (n = 488) and the University of Florence, Florence, Italy (n = 153) (Table 1). Driver mutation distributions were 57% JAK2, 19% type 1/like CALR, 5% type 2/like CALR, 7% MPL, and 12% triple negative. DIPSS risk distributions were 13% high, 38% intermediate-2, 33% intermediate-1, and 16% low [5]. MIPSS70-plus risk distributions were very high in 12%, high in 41%, intermediate in 20%, and low in 27% [6]. Cytogenetic risk categories, according to the recently revised system [7], were very high risk (VHR) in 7%, unfavorable in 15% and favorable in 78%. Mutational frequencies were 38% for ASXL1, 14% for SRSF2, 8% for U2AF1Q157, 7% for EZH2, and 4% for IDH1/2. The frequencies of DIPSS component variables were 41% for age above 65 years, 41% for hemoglobin <10 g/dl, 47% for circulating blasts ≥1%, 14% for leukocyte count >25 × 109/l, and 32% for constitutional symptoms; in addition, 19% displayed platelet count <100 × 109/l and 30% were red cell transfusion dependent.

Table 1 Clinical and laboratory characteristics of 641 patients with primary myelofibrosis stratified by center of referral: Mayo Clinic, Rochester, MN, USA vs. University of Florence, Florence, Italy

Tables 1 and 2 provide additional information on distribution of clinical and laboratory variables stratified by the Mayo vs. Florence patient cohorts (Table 1) and the revised cytogenetic risk stratification (Table 2). Significant differences in the characteristics of patients from the Mayo Clinic vs. those from the University of Florence were mostly attributed to differences in time point of evaluation, as mentioned earlier in the Methods section, and best reflected in their MIPSS70-plus risk distribution (Table 1). Patients with VHR or unfavorable karyotype were more likely to display adverse clinical characteristics, including severe anemia, platelet count <100 × 109/l, increased circulating blast count and accordingly clustered with higher risk DIPSS categories; high risk molecular mutations were also more prevalent in patients with VHR karyotype (Table 2).

Table 2 Clinical and laboratory characteristics of 641 patients with primary myelofibrosis stratified by the revised cytogenetic risk modela

Univariate and multivariable analyses of genetic risk factors for overall survival and their interaction with DIPSS

After a median follow-up of 3.9 years (5.8 years for living patients), 380 (59%) deaths, 73 (11%) leukemic transformations, and 45 (7%) stem cell transplants were recorded. In univariate analysis of overall survival, the revised cytogenetic risk stratification, absence of type 1/like CALR mutation, presence of ASXL1, SRSF2, or U2AF1Q157 mutations were significantly associated with inferior survival (p < 0.001 in all instances; Table 3); significance was not apparent for IDH1/2 (p = 0.07) or EZH2 mutations (p = 0.2). In multivariable analysis restricted to genetic risk factors, significance was retained for VHR karyotype (HR 3.1; 95% CI 2.1–4.3), unfavorable karyotype (HR 2.1, 95% CI 1.6–2.7), absence of type 1/like CALR mutation (HR 2.1, 95% CI 1.6–2.9) or presence of ASXL1 (HR 1.8, 95% CI 1.5–2.3), SRSF2 (HR 2.4, 95% CI 1.9–3.2), or U2AF1Q157 (HR 2.4, 95% CI 1.7–3.3) mutations; EZH2 and IDH1/2 mutations remained not significant during multivariable analysis. The addition of DIPSS risk scores in the multivariable model did not undermine the independent prognostic effect of the aforementioned mutations while it confirmed persistence of residual significance from the clinically derived DIPSS (Table 3); HRs (95% CI values) in DIPSS-inclusive multivariable analysis were 2.5 (1.7–3.5) for VHR karyotype, 1.9 (1.4–2.5) for unfavorable karyotype, 2.0 (1.5–2.8) for absence of type 1/like CALR mutation, 1.6 (1.3–2.0) for ASXL1, 2.2 (1.7–2.8) for SRSF2 and 1.9 (1.4–2.7) for U2AF1Q157 mutations and 4.6 (2.8–7.4) for DIPSS high, 4.2 (2.7–6.5) for DIPSS intermediate-2, 2.6 (1.7–4.1) for DIPSS intermediate-1 risk categories (Table 3).

Table 3 Univariate and multivariable analysis of genetic risk factors for overall and leukemia-free survival among 641 patients with primary myelofibrosis

Univariate and multivariable analysis of genetic risk factors for leukemia-free survival and their interaction with other risk factors for leukemic transformation

In univariate analysis of genetic risk factors, leukemia-free survival was predicted by karyotype (p < 0.001), SRSF2 mutation (p < 0.001), ASXL1 mutation (p < 0.001), IDH1/2 mutations (p = 0.005), and triple negative mutational status (p = 0.005) (Table 3); U2AF1Q157 mutations had no significance (p = 0.8), while EZH2 mutations displayed borderline significance (p = 0.06). In multivariable analysis that also included other risk factors for leukemic transformation (Table 3), karyotype (HR 2.4, 95% CI 1.02–5.5 for VHR karyotype and HR 2.7, 95% CI 1.5–4.9 for unfavorable karyotype), SRSF2 mutations (HR 4.3, 95% CI 2.5–7.5), ASXL1 mutations (HR 2.1, 95% CI 1.3–3.4), platelet count <100 × 109/l (HR 2.3, 95% CI 1.3–4.0), and circulating blasts ≥2% (HR 2.6, 95% CI 2.6, 95% CI 1.6–4.3) remained significant (Table 3).

Development of a new risk model (GIPSS) that is exclusively based on genetic risk factors

Risk points were allocated to each one of the above-mentioned inter-independent genetic risk factors based on HRs derived from multivariable analysis of genetic risk factors (see above): two points for VHR karyotype (HR 3.1) and one point each for unfavorable karyotype (HR 2.1), absence of type 1/like CALR mutation (HR 2.1) or presence of ASXL1 (HR 1.8), SRSF2 (HR 2.4) or U2AF1Q157 (HR 2.4) mutations. The sum of risk points for each patient was calculated and used to develop a four-tiered GIPSS: low risk with zero points (n = 58), intermediate-1 risk with one point (n = 260), intermediate-2 risk with two points (n = 192), and high risk with three or more points (n = 131); the respective median (5-year) survival rates were 26.4 years (94%), 8.0 years (73%), 4.2 years (40%), and 2 years (14%) years (Fig. 1); HRs (95% CI), using the low risk group as the reference, were 15.8 (8.8–31.3) for high risk, 7.1 (4.0–14.0) for intermediate-2 risk, and 3.2 (1.8–6.4) for intermediate-1 risk; the bootstrap 95% confidence limits were 7.6–35.2 for high risk, 3.4–12.7 for intermediate-2 risk, and 1.6–6.2 for intermediate-1 risk. Additional inter-risk group comparisons included HRs (95% CI) of 4.9 (3.7–6.3) for high vs. intermediate-1 risk (bootstrap 95% confidence limit 3.2–6.5), 2.2 (1.7–2.9) for high vs. intermediate-2 risk (bootstrap 95% confidence limit 1.6–3.0) and 2.2 (1.7–2.8) for intermediate-2 vs. intermediate-1 risk (bootstrap 95% confidence limit 1.8–2.8). Additional model validation was accomplished by applying GIPSS to the Mayo (n = 488) and Florence (n = 153) patient cohorts separately (Fig. 2b, c), as well as to transplant-age (age ≤70 years) patients (n = 485; Fig. 2a); the lack of significant difference between low and intermediate-1 risk GIPSS groups in the Italian patient cohort was attributed to inadequate sample size.

Fig. 1
figure 1

Genetically inspired prognostic scoring system (GIPSS)-stratified survival data in 641 patients with primary myelofibrosis. Median survivals were 2 years for GIPSS high risk, 4.2 years for intermediate-2, 8 years for intermediate-1, and 26.4 years for low risk. The number of patients at risk for high, intermediate-2, intermediate-1, and low risk GIPSS at 5 years were 15, 61, 150, and 41; at 10 years 4, 15, 41, and 17; and at 15 years 2, 5, 16, and 10

Fig. 2
figure 2

a Genetically inspired prognostic scoring system (GIPSS)-stratified survival data in 485 patients with primary myelofibrosis and age 70 years or younger, including both Mayo and Florence cohorts. b GIPSS-stratified survival data in 488 Mayo Clinic patients with primary myelofibrosis, including Mayo cohort only. c GIPSS-stratified survival data in 153 Italian patients with primary myelofibrosis, including Florence cohort only

Figure 3 displays survival curves from the current dataset stratified by GIPSS (Fig. 3a), MIPSS70-plus (Fig. 3b), and DIPSS (Fig. 3c). AIC and AUC estimates were comparable between GIPSS (AIC 4148, AUC 0.76) and MIPSS70-plus (AIC 4123, AUC 0.79) and both appeared to be superior to those of DIPSS (AIC 4204, AUC 0.74). Furthermore, as illustrated in Fig. 4, there was significant alignment of risk distribution between GIPSS and MIPSS70-plus, especially for “low” and “high” risk patients. In other words, a patient with GIPSS “high” risk disease is most likely to also be in the MIPSS70-plus “high” or “very high” risk category whereas a patient with GIPSS “low” risk disease is almost certain to be in the MIPSS70-plus “low” risk category as well (Fig. 4). In other words, additional prognostic information from MIPSS70-plus might not be necessary in GIPSS “high” or “low” risk disease categories. On the other hand, a patient with GIPSS “intermediate-1” risk disease might be reclassified as MIPSS70-plus low, intermediate or high risk disease and one with GIPSS intermediate-2 risk disease as MIPSS70-plus very high, high or intermediate risk disease (Fig. 4). Finally, GIPSS was shown to be effective in also predicting leukemia-free survival; HRs (95% CI) were 16.6 (4.8–104.1) for VHR, 7.0 (2.1–43.8) for high risk and 3.0 (0.9–18.6) for low risk GIPSS categories.

Fig. 3
figure 3

Comparison of survival data in 641 patients with primary myelofibrosis stratified by genetically inspired prognostic scoring system (GIPSS; Fig. 3a), mutation-enhanced international prognostic scoring system (MIPSS70-plus; Fig. 3b), or dynamic international prognostic scoring system (DIPSS; Fig. 2c). *AIC Akaike information criterion, **AUC area under the curve

Fig. 4
figure 4

Risk distribution among 641 patients with primary myelofibrosis according to GIPSS (genetically inspired prognostic scoring system) and MIPSS70-plus (mutation-enhanced international prognostic system including karyotype) (numbers in cells indicate percentages)

Discussion

At present, the two main clinically derived risk models in PMF, IPSS [4], and DIPSS [5], remain useful for routine patient management. However, higher level care requires additional biologic information that not only refines prognostication but might also guide the implementation of targeted therapy [19]. Towards that end, cytogenetic information was first incorporated into the DIPSS model, resulting in DIPSS-plus [20], and more recently both cytogenetic and mutation information were utilized in the development of MIPSS70-plus [6]. The latter was designed with transplant-age patients (age ≤70 years) in mind and was based on four clinical (hemoglobin <10 g/dl, leukocyte count >25 × 109/l, circulating blasts ≥2% and constitutional symptoms) and three genetic risk components (karyotype, driver mutational status and high risk mutations). Since the publication of MIPSS70-plus in December 2017 [6], we have further refined cytogenetic risk stratification in PMF [7] and also identified U2AF1Q157 mutation as a new independent risk factor for overall survival [11], thus providing the opportunity to develop a new risk model that is exclusively based on genetic risk factors.

GIPSS represents the first step in our aspiration to fully replace clinical variables with genetic markers, for prediction of survival in PMF. Our working hypothesis, in this regard, considers clinical phenotype in PMF as a surrogate for currently known and unknown underlying genetic lesions. In the current study, the inter-independent prognostic relevance of previously recognized adverse mutations in PMF was vetted by multivariable analysis that also included driver mutational status and the revised cytogenetic risk stratification; accordingly the study confirmed the independent prognostic relevance of VHR karyotype, unfavorable karyotype and certain mutations including the prognostically favorable type 1/like CALR mutation and the prognostically unfavorable ASXL1, SRSF2, and U2AF1Q157 mutations; the respective frequencies of these prognostic biomarkers, at time of patient referral to a tertiary care center were approximately 8, 19, 15, 38, 14, and 9% [11, 17]. As underlined in the Methods section, the current study required a minimum of 500 informative cases for a specific mutation to be included in the analysis. Accordingly, the additional prognostic contribution of other prognostically relevant but less frequent mutations, such as LNK, RUNX1, and CBL was not addressed in the current report [18]. It should also be noted that the lack of multivariable significance for EZH2 or IDH1/IDH2 mutations, in the current study, should not be regarded as being definitive. In other words, GIPSS should not be considered as a finished product but rather a template for incorporating additional genetic information, as it becomes available. In this regard, it is crucial to recognize the important prognostic interaction between karyotype and mutations and the prospect of considering additional mutations in future genetic risk models requires clear demonstration of their karyotype-independent prognostic value; for example, the presence of high risk mutations imparts little to no additional prognostic effect in patients with VHR karyotype whereas their absence provides additional comfort in asserting the excellent prognosis associated with favorable karyotype [7].

GIPSS offers a low-complexity and practical risk model for PMF that is based exclusively on karyotype and a limited number of mutations, including ASXL1, SRSF2, U2AF1, and CALR. Application of GIPSS requires familiarity with the recently revised three-tiered cytogenetic risk stratification for PMF [7], as well as recognition of the prognostic distinction between different CALR and U2AF1 mutation variants [8, 11, 14]. In regards to the former, the new cytogenetic risk categories include “favorable” (normal karyotype or sole abnormalities of 20q−, 13q−, +9, chromosome 1 translocation/duplication or sex chromosome abnormality including—Y), “VHR” (single or multiple abnormalities of −7, inv(3), i(17q), 12p−, 11q−, and autosomal trisomies other than +8 or +9) and “unfavorable” (all other abnormalities) karyotype [7]. Assessment of ASXL1 and SRSF2 mutations is uncomplicated since one is simply required to document their presence or absence; we have recently reported that the type of ASXL1 mutation did not affect its prognostic relevance [9]. In contrast, determining the type of mutation is prognostically critical for both U2AF1 and CALR. U2AF1 mutations in PMF involve either the Q157 or S34 amino acid positions, but only those affecting the Q157 residue (i.e., Q157P and Q157R) are prognostically relevant [11]. Similarly, CALR mutations in PMF come in two types: type 1/like and type 2/like [14]. Type 1 CALR mutations constitutes a 52-bp deletion (p.L367fs*46) and type 2 a 5-bp TTGTC insertion (p.K385fs*47). Non-type 1 or type 2 CALR mutations are categorized as type 1/like and type 2/like variants, based on structural similarities (alpha helix propensity) to the corresponding classical mutants [14, 16]. It is now well-established that the favorable survival effect of CALR mutations in PMF is fully attributed to only its type 1/like variant [14, 15, 21].

Taken together, one can envision a step-wise prognostication approach in PMF that starts with the simpler GIPSS model that is based on karyotype and mutations only, and reliably select candidates for alloSCT (GIPSS high risk disease) or long-term observation with little or no therapeutic intervention (GIPSS low risk disease) (Fig. 5). In other words, for the purposes of major therapeutic decisions, additional prognostic information from MIPSS70-plus or other clinically derived prognostic models (e.g., IPSS and DIPSS) might not be necessary for GIPSS “high” or GIPSS “low” risk patients (Figs. 4 and 5). On the other hand, we favor more comprehensive risk scoring for prognostication in GIPSS intermediate-1 or intermediate-2 risk disease, which is currently provided by MIPSS70-plus (http://www.mipss70score.it/) [6]; for example, as outlined in Fig. 4, approximately 20% of patients with GIPSS intermediate-1 risk disease are reclassified as high risk, according to MIPSS70-plus, which is a treatment-relevant change in risk status; whether or not the outcome of this particular group of patients is more in line with their GIPSS or MIPSS70-plus risk level requires further investigation. Regardless, using conventional statistical tools (e.g., AIC and AUC), we were able to demonstrate the non-inferiority of GIPSS, compared to MIPSS70-plus and other prognostic models for PMF, in its discrimination ability and prediction accuracy (Fig. 3). The fact that clinical variables in PMF currently continue to display mutation- and karyotype-independent prognostic significance is more a reflection of our truncated knowledge regarding the genetic makeup of the underlying clonal process, rather than the quality of their performance. Accordingly, it is our full intention to continue recruiting additional mutations of prognostic relevance in PMF and further limit prognostic reliance on clinical variables.

Fig. 5
figure 5

Proposed treatment decision tree, including timing of allogeneic stem cell transplant, based on GIPSS (genetically inspired prognostic scoring system)-based risk stratification. It is underscored that the proposed algorithm is provided in order to illustrate the potential value of GIPSS in clinical practice, and not as a definitive treatment guideline, which requires additional validation