Is the Sokal or EUTOS long-term survival (ELTS) score a better predictor of responses and outcomes in persons with chronic myeloid leukemia receiving tyrosine-kinase inhibitors?

Data from 1661 consecutive subjects with chronic-phase chronic myeloid leukemia (CML) receiving initial imatinib (n = 1379) or a 2nd-generation tyrosine-kinase inhibitor (2G-TKI; n = 282) were interrogated to determine whether the Sokal or European Treatment and Outcome Study for CML (EUTOS) long-term survival (ELTS) scores were more accurate responses and outcome predictors. Both scores predicted probabilities of achieving complete cytogenetic response (CCyR), major molecular response (MMR), failure- and progression-free survivals (FFS, PFS), and survival in all subjects and those receiving imatinib therapy. However, the ELTS score was a better predictor of MR4, MR4.5, and CML-related survival than the Sokal score. In subjects receiving 2G-TKI therapy, only the ELTS score accurately predicted probabilities of CCyR, MMR, MR4, FFS, and PFS. In the propensity score matching, subjects classified as intermediate risk by the ELTS score receiving a 2G-TKI had better responses (p < 0.001~0.061), FFS (p = 0.002), and PFS (p = 0.03) but not survival. Our data suggest better overall prediction accuracy for the ELTS score compared with the Sokal score in CML patients, especially those receiving 2G-TKIs. People identified as intermediate risk by the ELTS score may benefit more from initial 2G-TKI therapy in achieving surrogate endpoints but not survival, especially when a briefer interval to stopping TKI therapy is the therapy objective.


INTRODUCTION
Several risk scores have been developed to predict responses and/ or outcomes of persons with chronic-phase chronic myeloid leukemia (CML). However, predictive scores are only accurate in the context of the therapy given (as opposed to prognostic scores). For example, the Sokal and Hasford scores were developed in persons receiving chemotherapy and/or interferon [1,2]. The accuracy of these scores in persons receiving TKItherapy is controversial [3][4][5][6]. In contrast, the European Treatment and Outcome Study for CML (EUTOS) and EUTOS long-term survival (ELTS) scores was developed in persons receiving predominately imatinib [7,8]. The Sokal and ELTS scores are the most commonly used today in persons receiving TKI therapy. Several studies reported that the ELTS score is more accurate in identifying high-risk populations and better ability to predict CMLrelated deaths and survival in persons receiving imatinib or a 2 ndgeneration TKI (2G-TKI) [9][10][11][12][13][14][15][16]. The ELTS score is also a more accurate predictor of the probability of achieving a complete cytogenetic response (CCyR) and major molecular response (MMR) [11,16]. Consequently, the ELTS score is preferred in the 2020 European LeukemiaNet (ELN) recommendations [17].
Few studies critically compared the Sokal and ELTS scores as predictors of cytogenetic and molecular responses and other outcomes such as failure-and progression-free survivals (FFS and PFS), especially in persons receiving 2G-TKIs recommended by some for persons with intermediate-or high-risk CML [18]. We compared prediction accuracies of the Sokal and ELTS scores on responses and outcomes in 1661 consecutive subjects with chronic-phase CML receiving imatinib or a 2G-TKI. We found better overall prediction accuracy for the ELTS score. People identified as intermediate risk in the ELTS score may benefit more from 2G-TKI therapy compared with imatinib in achieving surrogate endpoints but not CML-related survival or survival, especially when a briefer interval to stopping TKI therapy is the therapy objective.

SUBJECTS AND METHODS Subjects
We interrogated data from 1661 consecutive newly diagnosed subjects with chronic-phase CML ≥ 18 years receiving imatinib, dasatinib, or nilotinib therapy at Peking University People's Hospital from January 2006 to March   (3,193) 60 (3,193) 46 (3, 160) <0.001 The data are presented as the number (%) or median (range), except where otherwise noted. 2G-TKI second-generation tyrosine kinase inhibitor, Ph + ACA additional chromosomal aberrations in Philadelphia-positive cells, PLT platelet, TKI tyrosine kinase inhibitor, WBC white blood cell.
2021. Data of covariates determined at diagnosis included sex, age, comorbidities, hemoglobin concentration, WBC and platelet counts, cytogenetic analyses, and initial TKI therapy. Sokal and ELTS scores at diagnosis were calculated as described [2,8]. Therapy responses and outcomes were extracted from medical records. Physicians and patients jointly choose the initial TKI given based on which TKIs were available, anticipated safety and efficacy, and economics. The initial imatinib dose was 400 mg daily; nilotinib, 300 mg twice daily; dasatinib, 100 mg daily. Dose and/or type of TKI were adjusted during therapy based on responses, adverse events, and operative ELN recommendations [17,[19][20][21]. The study was approved by the Ethics Committee of Peking University People's Hospital compliant with the Helsinki Declaration. Subjects gave written informed consent.

Diagnosis, monitoring, responses, and outcomes
Diagnosis, monitoring, and therapy responses conformed operative ELN recommendations [17,[19][20][21]. Bone marrow cytogenetic analyses used G-banding. BCR::ABL1 transcript levels in blood were assessed by quantitative real time polymerase chain (qRT-PCR) with ABL1 as control and converted to international scales (BCR::ABL1 IS ) using our laboratoryspecific conversion factor of 0.65 (Institute of Medical and Veterinary Science International Reference Laboratory, Adelaide, Australia) [22]. Response assessment was performed on the intention-to-treat population. Haematologic response was monitored every 1-2 weeks, until a complete hematologic response (CHR) and every 3-6 months thereafter. The cytogenetic response was assessed at baseline and then every 3-6 months, until a CCyR was achieved and repeated at therapy failure. High-risk additional cytogenetic abnormalities (ACAs) were defined according to 2020 ELN criteria [17]. Molecular monitoring was done at baseline and every 3 months, until major molecular response (MMR) and every 3-6 months thereafter. Screening for ABL1 mutation was done in subjects with a suboptimal or warning response according to operative ELN criteria [17,[19][20][21].

Statistical analyses
Descriptive statistics were used to summarize covariates. Categorical variables are reported as percentages and counts and continuous variables as medians and ranges. Pearson chi-squared test (for categorical variables) and Mann-Whitney U test (for continuous variables) were used to compare the imatinib and 2G-TKI cohorts. Cumulative incidences of CCyR, MMR, MR 4 , and MR 4.5 were calculated using the Fine-Gray test that considered competing events such as death, transplant, loss to follow-up, and/or withdrawal of consent. Failure-and progression-free survivals (FFS, PFS), CML-related survival, and survival were calculated using the Kaplan-Meier estimator and log-rank tests.
Potential predictive covariates for diverse responses and outcomes were tested in univariable analyses and those with p < 0.2 were included in multivariable analyses using a backward-elimination process to fit a Cox regression model. Cox regression models were built to identify independent covariates associated with responses and outcomes reported as hazard ratios (HRs) with 95% confidence intervals (CIs).
FFS was calculated from TKI-therapy start to therapy failure or censored at the last follow-up. PFS was calculated as TKI-therapy start to progression, death at any time, or censored at the last follow-up. CML-related survival was calculated from TKI-therapy start to death from CML progression or censored at the last follow-up. Survival was calculated as TKI therapy to death from any cause or censored at the last follow-up.
Propensity-score matching was used to explore whether the Sokal or ELTS score was a better predictor of responses and outcomes to imatinib or 2G-TKI as 1 st therapy, including all covariates tested in the univariable
In multivariable analyses, both scores were significantly associated with the probabilities of CCyR (Sokal,   concentration, and higher WBC counts were significantly associated with lower probabilities of molecular responses and/or inferior outcomes (Table 2).

2G-TKI cohort
In total, 267 of 282 subjects (95%) receiving initial 2G-TKI were studied for CCyR. In total, 270 with common BCR::ABL1 transcripts were studied for MMR, MR 4 [1.7, 9.9]; p = 0.002). However, the Sokal score did not accurately predict responses or outcomes. Male sex, lower hemoglobin concentration, and higher WBC counts were significantly associated with lower probabilities of molecular responses and/or worse outcomes ( Table 2).
Is the sokal or ELTS score a better predictor of response and outcomes? Because of significant differences in baseline covariates between the imatinib and 2G-TKI cohorts, we used propensity-score matching to adjust subjects. In total 1332 matches were identified in the imatinib (n = 1064; 80%) and 2G-TKI (n = 268; 20%) cohorts (Table 3).
Median follow-up was 55 months (IQR, 30-85 months) in the imatinib cohort and 46 months (IQR, 20-64 months; p < 0.001) in the 2G-TKI cohort. There were no significant differences in FFS, PFS, CML-related survival, or survival in the low-and high-risk cohorts defined by either the Sokal or ELTS scores whether subjects received initial imatinib or a 2G-TKI, except for probabilities of cytogenetic and/or molecular responses (Supplementary Figs. 1-2). However, in the intermediate-risk cohort defined by either the Sokal or ELTS scores, subjects receiving initial 2G-TKI therapy had higher probabilities of CCyR, MMR, and MR 4.5 and a better FFS compared with those receiving initial imatinib. Better MR 4 and PFS were detected only with the ELTS score (p < 0.001 and p = 0.032). However, initial TKI therapy had no impact on CML-related survival or survival using either the Sokal or ELTS scores (Fig. 5). We did not analyze interval to stopping TKI therapy or success rate of therapy-free remission.

DISCUSSION
We compared predictive accuracies of the Sokal and ELTS scores in 1661 subjects with chronic-phase CML. We found that the ELTS score was a better overall response and outcome predictor, especially in subjects receiving initial 2G-TKI therapy. Based on HRs and CIs in multivariable analyses, the ELTS score was a better discriminator between risk cohorts than the Sokal score.
Our data are consistent with some previous findings [10][11][12][13][14][15][16]. The study by Pfirrmann and colleagues reported that the ELTS was a better survival predictor than the Sokal score [8,14]. However, our study focused on FFS rather than survival. As such, it is more likely to be of use to physicians in choosing the best initial TKI therapy. Geelen et al. reported that the ELTS score identified significant differences in probabilities of MMR, CML-related death, The data are presented as the number (%) or median (range), except where otherwise noted. 2G-TKI second-generation tyrosine kinase inhibitor, Ph + ACA additional chromosomal aberrations in Philadelphia-positive cells, PLT platelet, TKI tyrosine kinase inhibitor, WBC white blood cell. and survival in subjects receiving 2G-TKIs compared with the Sokal score [11]. We found that the ELTS score predicted probabilities of CCyR, MMR, MR 4 , FFS, and PFS in subjects receiving initial 2G-TKI therapy but not MR 4.5 . However, the ELTS score was not predictive of CML-related survival or survival. Discordances between our data and those of Geelen et al. might result from the younger age of our subjects, which is an independent predictive covariate for survival in many studies [9][10][11][12][13][14][15][16]. Also, these studies may not have been comparable for therapies given after initial 2G-TKI therapy. It is not surprising that the ELTS score is a better predictor of responses and outcomes of TKI therapy, because it was derived from a dataset of subjects receiving TKI therapy whereas the Sokal score was developed in a dataset of subjects receiving other therapies. As such, the Sokal score is best considered prognostic rather than predictive score better reflecting CML biology than therapy.
We found fewer non-CML-related deaths compared with other studies [9][10][11][12][13][14][15][16]. There are several possible explanations, including the younger age of our subjects who would be expected to be otherwise healthier, have fewer comorbidities, and therefore fewer competing causes of death [23,24]. Also, as a tertiary referral center, there are likely subject-selection biases. For example, persons with substantial other health problems were less likely to travel to our center.
One potentially problematic area is defining failure. In our literature review, we found no consistent definition. We used definitions proposed in the 2020 ELN CML recommendations [17]. Because there was no consensus definition of accelerated phase, we analyzed our data including and excluding subjects in whom progression to accelerated phase was the failure event. Our conclusions were unchanged.
Several studies report that initial 2G-TKIs are associated with faster cytogenetic and molecular responses compared with imatinib and with lower rates of progression, especially in persons with Sokal intermediate-and high-risk scores [25][26][27][28][29][30][31]. However, this advantage for 2G-TKIs does not translate into better PFS or CML-related survival or survival. 2G-TKIs are recommended for initial therapy of intermediate-and high-risk cohorts in the National Comprehensive Cancer Network (NCCN) clinical practice guidelines based on the risk of progression rather than PFS, CMLrelated survival, or survival [18]. This differs from the ELN 2020 recommendation that does not suggest a TKI preference based on risk cohort [17]. Complicating the NCCN recommendation is the question which predictive score should be used to classify someone as intermediate-or highrisk.
In our propensity-matching analyses, in subjects classified as intermediate risk using the Sokal or ELTS scores, we found that initial 2G-TKI therapy improved that proportions of CCyR, molecular responses, and FFS compared with initial imatinib therapy but not CML-related survival or survival. In subjects classified as intermediate risk using the ELTS but not the Sokal score, initial therapy with a 2G-TKI resulted in better PFS but not better CML-related survival or survival. This finding may influence TKI-therapy decisions for physicians focused on surrogate endpoints. Why 2G-TKIs had no advantage in high-risk subjects identified by both scores could reflect relatively few subjects but also no favorable impact of 2G-TKIs when disease biology is highly unfavorable.
Consistent with several studies, we found that females had better molecular responses to TKI therapy than males and lower probabilities of therapy failure and transformation to accelerated and blast phases [32][33][34]. This advantage might reflect different compliance or leukemia biology or other factors [35]. Similar to previous studies, we found a lower hemoglobin concentration and higher WBC counts were associated with worse responses and/or outcomes [36][37][38][39].
Our study has limitations. First, it is retrospective. Second, we lacked a validation cohort. Third, the number of subjects receiving initial 2G-TKI therapy was only 282. Fourth, 2G-TKIs were available only after 2011 resulting in an imbalance in follow-up. Also, therapy options for subjects failing imatinib before 2011 were restricted. Fifth, use of imatinib vs. a 2G-TKI was not random nor pre specified. As such, there are likely selection biases which we tried to account for propensity-score matching. We accept this is an imperfect simulation of a randomized controlled trial. Sixth, our subjects were younger than in most other CML studies in persons of predominantly European descent and need validation in these populations. Seventh, our data are from a specialized tertiary CML center with subjects coming from all over a large country. This obviously introduces subject-selection biases. Eighth, we did not consider other 2G-TKIs approved for initial therapy, including bosutinib and radotinib. Whether our conclusions apply to these drugs is unknown. Ninth, we did not analyze interval to stopping TKI therapy or success rate of therapy-free remission. Last, we did not monitor adherence to TKI therapy which may have differed for different TKIs.
In conclusion, we found better overall prediction accuracy for the ELTS score compared with the Sokal score in persons with chronic-phase CML receiving TKI therapy, especially those receiving 2G-TKIs. People identified as intermediate risk in the ELTS score may benefit from 2G-TKI therapy compared with imatinib in achieving surrogate endpoints but not in CML-related survival or survival. The interval from start to stopping TKI-therapy and success rates of therapy-free remission were not compared.