Epidemiology and prognostic nomogram for chronic eosinophilic leukemia: a population-based study using the SEER database

Chronic Eosinophilic Leukemia (CEL), a rare and intricate hematological disorder characterized by uncontrolled eosinophilic proliferation, presents clinical challenges owing to its infrequency. This study aimed to investigate epidemiology and develop a prognostic nomogram for CEL patients. Utilizing the Surveillance, Epidemiology and End Results database, CEL cases diagnosed between 2001 and 2020 were analyzed for incidence rates, clinical profiles, and survival outcomes. Patients were randomly divided into training and validation cohorts (7:3 ratio). LASSO regression analysis and Cox regression analysis were performed to screen the prognostic factors for overall survival. A nomogram was then constructed and validated to predict the 3- and 5-year overall survival probability of CEL patients by incorporating these factors. The incidence rate of CEL was very low, with an average of 0.033 per 100,000 person-years from 2001 to 2020. The incidence rate significantly increased with age and was higher in males than females. The mean age at diagnosis was 57 years. Prognostic analysis identified advanced age, specific marital statuses, and secondary CEL as independent and adverse predictors of overall survival. To facilitate personalized prognostication, a nomogram was developed incorporating these factors, demonstrating good calibration and discrimination. Risk stratification using the nomogram effectively differentiated patients into low- and high-risk groups. This study enhances our understanding of CEL, offering novel insights into its epidemiology, demographics, and prognostic determinants, while providing a possible prognostication tool for clinical use. However, further research is warranted to elucidate molecular mechanisms and optimize therapeutic strategies for CEL.


Study population and data acquisition
Data for this study were sourced from the Surveillance, Epidemiology, and End Results (SEER) Program (https:// seer.cancer.gov/), maintained by the National Cancer Institute (NCI).The data were collected using SEER*Stat software version 8.4.1 (https:// seer.cancer.gov/ seers tat/, accessed on August 1, 2023).The SEER 17 database [Incidence-SEER Research Data, 17 registries, Nov 2022 Sub (2000-2020)] was utilized to extract data of disease incidence by using the "rate session".Patients diagnosed with CEL between 2001 and 2020 were selected from the "Incidence-SEER Research Plus Data, 17 Registries, Nov 2022 Sub (2000-2020)" database using the "case listing session".Only cases with known age (censored at age 89 years) and malignant behavior were included.The inclusion criteria were as follows: the International Classification of Diseases for Oncology (ICD-O-3) histologic code (9964/3).The exclusion criteria were as follows: (1) the diagnosis confirmation was "unknown"; (2) the patient's survival time was 0 or unknown; (3) age below 20 years old.Ultimately, 487 patients with CEL were included in the final cohort.Ethical approval was not required as SEER data are publicly available and anonymized, precluding patient reidentification.A flowchart depicting the selection process is presented in Fig. 2.

Definition of variables
The analysis encompassed an array of variables, including age, sex, race, marital status, year of diagnosis, primary site, vital status, survival months, cause of death (COD) to site recode, cause-specific death classification, cause of death to site, sequence number, first malignant primary indicator, total number of in situ/malignant tumors for the patient, type of reporting source, diagnostic confirmation, surgery of the primary site, median household income inflation adj. to 2021, rural-urban continuum code, chemotherapy recode, and radiation recode.Age was dichotomized into < 60 years and 60+ years, referencing the age at diagnosis of CEL.Race categories included African American, White, and Other (comprising "Asian/Pacific Islander, " "American Indian/Alaska Native, " and "Unknown").Marital status was classified as married, single, or other (encompassing "divorced, " "separated, " "widowed, " "unmarried or domestic partner, " and "unknown").COD information was derived from the "COD to site recode" field.In the SEER database, CEL-related death was defined as "dead (attributed to this cancer diagnosis), " while CEL-unrelated death was defined as death "dead (attributable to causes other than this cancer diagnosis)." Diagnosis years were categorized into "2001-2005, " "2006-2010, " "2011-2015, " and "2016-2020." To assess the results based on the annual median household income at the county levels, patients were categorized into groups based on their annual household income: < $50,000, $50,000-$75,000, and $75,000+.The classification of residence type was based on the Rural-Urban Continuum Codes.These codes separated metropolitan counties according to the population size of their metropolitan areas, and nonmetropolitan counties by their level of urbanization and proximity to a metropolitan area.Overall survival (OS) time was calculated from the date of diagnosis to death or last follow-up.

Statistical analysis
All statistical analyses were conducted using the R program language (version 4.2.1;R Foundation for Statistical Computing, Vienna, Austria).Patients were randomly allocated to training and validation groups (7:3 ratio), with baseline characteristics compared using Student's t-test and chi-square test for continuous and categorical data, respectively.Prognostic risk factors for CEL were identified through least absolute shrinkage and selection operator (LASSO) regression, followed by univariate and multivariate Cox proportional regression analyses to determine independent prognostic factors.Hazard ratios (HR) and 95% confidence intervals (95% CIs) were reported.The Cox proportional hazards regression model was assessed for proportionality assumptions, with no violations detected.Variables with P < 0.05 in the multivariate model were considered significant.Nomograms predicting 3-and 5-year overall survival (OS) probabilities were constructed based on independent prognostic factors.Time-dependent receiver operating characteristic (ROC) curves and corresponding area under the curve (AUC) values assessed discrimination.Calibration curves and decision curve analysis (DCA) nomograms were generated to evaluate performance.Patients from both cohorts were classified into high-and low-risk groups using median nomogram points, with Kaplan-Meier analysis employed for OS estimation and log-rank tests for group comparisons.A two-sided P value < 0.05 indicated statistical significance.

Baseline characteristics of CEL patients
As depicted in Fig. 2, 487 patients were finally identified as CEL in the SEER 17 registry, Nov 2022 Sub (2000-2020) from January 2001 to December 2020.The primary site was bone marrow (n = 487, 100%).For all patients in the study cohort, 61.2% were male, 1.6 folds that of female (n = 189, 38.8%; Table 1).The average age at diagnosis was 57.0 ± 17.0 years (Range: 20-89), with 53.4% under the age of 60 (< 60) and 46.6% aged 60 or older (60+).Most CEL patients were white (73.9%),African American and Other (including Asian/Pacific Islander and American Indian/Alaska Native) occupied 15.6% and 10.5%, respectively.At diagnosis, 57.3% of patients were married, 24.6% had other marital statuses (such as divorced, separated, widowed, unmarried or domestic partner), and 18.1% were single who had never been married.For all CEL cases in this study, 89.7% were primary CEL and 10.3% were secondary CEL that were secondary to other primary malignancies.For the average household income per year, 11.9% were < $50, 000, 45.4% were $50,000-$75, 000, and 42.7% were above $75, 000.About 41.3% were treated with chemotherapy.At the time of last follow-up, 284 (58.3%) patients were alive; 42 (8.6%)deaths were attributable to CEL, 45 (9.2%) patients died of heart diseases, and an additional 116 (23.8%) patients died due to other causes such as diabetes mellitus, cerebrovascular diseases, septicemia, and so on.For all those variables, there were no statistical difference between the training cohort and validation cohort (P > 0.05).The comparison of epidemiologic characteristics was summarized in Table 1.

LASSO regression and independent prognostic factors selection
A total of 9 clinical parameters were included in the training set.According to the results of LASSO Cox regression analysis, age, sex, marital status at diagnosis, household income and sequence were identified for OS risk factors by using the minimum standard value as the criterion (Fig. 3).The Cox regression model was further used to screen the prognostic factors.All the five variables passed the preliminary proportional hazards assumption test: age (P = 0.366), sex (P = 0.355), marital status (P = 0.535), household income (P = 0.208) and sequence (P = 0.454).Univariate Cox regression analysis revealed that age, marital status at diagnosis, and sequence were significantly associated with OS (Table 2).In the multivariate Cox analysis of OS, age, marital status at diagnosis, Vol:.( 1234567890   1 Races including Asian/Pacific Islander, American Indian/Alaska Native and "unknown". 2Sequence number of "primary only" and "1st of 2 or more primaries", indicating CEL was the primary malignancy. 3Sequence number of "2nd of 2 or more primaries", "3rd of 3 or more primaries" and "4th of 4 or more primaries", indicating CEL was secondary to other primary malignancies. 4Marital status at diagnosis was single (never married). 5Marital statuses of divorced, widowed, separated, unmarried or domestic partner and unknown at diagnosis. 6Death attributable to CEL. 7 Dead of other causes such as diabetes mellitus, cerebrovascular diseases, septicemia, and so on.

Construction of prognostic nomogram
By incorporating the three independent prognostic factors including age, marital status at diagnosis, and sequence, a nomogram was constructed to predict the 3-and 5-year OS probability of CEL patients (Fig. 4).The total points were calculated by integrating scores related to age, marital status, sequence and projected to the bottom scale to predict the OS probability at 3 and 5 years.

Table 2. Univariable and multivariable cox regression analysis for overall survival of patients with CEL.
CEL chronic eosinophilic leukemia, Chemo chemotherapy, CI confidence interval, HR hazard ration, OS overall survival. 1Marital status of single (never married) at diagnosis. 2 Marital statuses of divorced, widowed, separated, unmarried or domestic partner and unknown at diagnosis. 3Consisted of "one primary only" and "1st of 2 or more primaries". 4Consisted of "2nd of 2 or more primaries", "3rd of 3 or more primaries" and "4th of 4 or more primaries", indicating CEL was secondary to other primary malignancies.www.nature.com/scientificreports/

Evaluation and validation of the nomogram
The calibration curve of the nomogram for the training cohort revealed a close match between the predicted and observed OS probability at the 3-and 5-year intervals (Fig. 5A).Additionally, validation cohort calibration plots at 3-and 5 years also showed good agreement between prediction and actual observation (Fig. 5B).Timedependent ROC analyses showed the accuracy of the nomogram models in predicting 3-and 5-year OS probability in the training set, with AUC values of 0.702 and 0.736, respectively (Fig. 6A), and the 3-year and 5-year AUC of the validation set was of 0.731 and 0.754, respectively (Fig. 6B).The DCA was employed to evaluate the

Survival analysis between the stratified risk groups
The score of each variable was generated from the nomogram and the cumulative scores were calculated for all the patients.The entire cohort was stratified into low-and high-risk subgroups according to the median risk score.Kaplan-Meier analysis of OS revealed significant differences between the low-and high -risk groups for both training set (P < 0.0001, Fig. 8A) and validation set (P < 0.0001, Fig. 8B), which underscores the exceptional capacity of the nomogram for effective risk stratification.

Discussion
CEL represents a rare and intricate hematological disorder characterized by uncontrolled eosinophilic proliferation 10 .Given its rarity, there is limited published literature on CEL, mostly comprising case reports or small case series, the incidence and clinical characteristics have not been comprehensively studied yet 9,16 .The present study identified 487 CEL patients from 2001 to 2020 using the SEER database, representing the largest cohort describing the incidence and clinical characteristics of CEL patients to date.We also developed and validated a nomogram to predict the 3-and 5-year overall survival probability of CEL patients based on the screened prognostic factors.The epidemiology of CEL remains incompletely characterized due to its rarity and clinical heterogeneity 1,2 .Available studies suggest an estimated incidence rate of approximately 0.036 per 100,000 individuals for all hypereosinophilic syndromes (HES), including CEL 17 .In this study, analysis of CEL from the SEER database between 2001 and 2020 revealed a low age-adjusted incidence rate (AIR) of 0.033 per 100,000 person-years, and the incidence rate significantly decreased in 2011-2020 compared to 2001-2010.The decreased incidence rate of CEL may be partially attributed to modifications in its diagnostic criteria.In the 2008 revision of the World Health Organization (WHO) classification of myeloid neoplasms and acute leukemia, a specific group of patients with eosinophilia and gene rearrangements involving PDGFRA, PDGFRB, or FGFR1 were excluded from the diagnosis of CEL and classified as a separate entity, namely "myeloid and lymphoid neoplasms with eosinophilia and abnormalities of PDGFRA, PDGFRB, or FGFR1 (MLNE)", which have distinct clinical and molecular features and respond well to tyrosine kinase inhibitors 18 .Therefore, this reclassification may have reduced the number of cases that were previously diagnosed as CEL based on the older criteria.And the peak incidence in 2008 observed in this study, followed by a decline, coincides with the implementation of the revised WHO classification, suggesting that the initial higher rates may have included cases that would later be reclassified under the new criteria, which supports the hypothesis that the redefinition of CEL has had a significant impact on its reported incidence.Furthermore, the advancement in diagnostic technologies and understanding of molecular genetics over the past two decades has allowed for more accurate diagnosis of myeloproliferative neoplasms (MPNs).This progress may have resulted in more cases being classified into other specific subtypes of MPNs rather than being broadly categorized as CEL.www.nature.com/scientificreports/Moreover, it was once reported that CEL exhibits a male predominance, and the median age at diagnosis was 62 years 9 .In this study, the average age of the patients was 57.0 ± 17.0 years, and the IRR of male-to-female was 1.66 (95% CI 1.39-1.98),which is in agreement with the previous research 9,19 .Furthermore, the current study demonstrated that the incidence rate increased with age, with the IRR of the 60+ age group to the < 60 age group being 3.65 (95% CI, 3.07-4.34),which has not been documented in the literature so far.The age distribution of CEL may reflect the accumulation of genetic and epigenetic alterations that lead to clonal expansion of eosinophils over time 20 .The gender difference of CEL may be influenced by hormonal factor or genetic factors that affect the susceptibility or exposure to eosinophilic stimuli 21,22 .
A previous study showed that the prognosis of CEL is poor in a cohort of 10 patients and the median survival was 22.2 months, with 5 patients developing acute transformation after median of 20 months from diagnosis 9 .However, it is extremely difficult to draw any conclusion from this study due to the small sample size.In the current study, to identify the prognostic factors for CEL, LASSO Cox regression analysis and multivariate Cox regression analysis were performed on a set of clinical variables.The study unveiled that age, marital status at diagnosis, and sequence were independently associated with overall survival.Older age emerged as a significant adverse prognostic factor, with older individuals facing substantially elevated mortality risks (HR 3.74, 95% CI 2.51-5.60),which is in line with previous studies on MPNs 23 .Marital status was also a significant predictor of survival, with the marital statuses of single and other (divorced, separated, widowed, unmarried or domestic partner) having a worse outcome than married patients.This may reflect the impact of social support and psychological factors on cancer survival [24][25][26] .Sequence was another important prognostic factor, with secondary CEL associated with poorer prognosis.This may be due to the presence of other malignancies or comorbidities that affect the treatment response and tolerance 27 .In addition, chemotherapy showed no effect on the OS of CEL patients in this study, which is in accordance with some study showed that CEL patients are usually unresponsive to conventional chemotherapy 9 .
Nomograms have been developed and proven to surpass the conventional staging systems in terms of prognostic accuracy for certain types of cancers 24 .Consequently, the integration of nomograms into clinical practice as reliable tools for predicting cancer prognosis has become increasingly prevalent 28,29 .In this study, a prognostic nomogram was constructed to predict the 3-and 5-year overall survival probability of CEL patients based on those three independent prognostic factors: age, marital status at diagnosis, and sequence.The nomogram demonstrated commendable calibration and discriminative performance in both the training and validation cohorts, indicating satisfactory accuracy and consistency of the nomogram.This study is the first to develop and validate a clinical prognostic model for CEL patients.To the best of our knowledge, this is also the largest study ever conducted.
Effective risk stratification is integral to tailoring treatment strategies and optimizing patient outcomes 30 .Utilizing the nomogram, CEL patients were stratified into low-and high-risk groups based on individual risk scores.Kaplan-Meier analysis of OS revealed substantial distinctions between these risk cohorts, underscoring the nomogram's efficacy in risk stratification.This empowers clinicians to identify patients who may benefit from more aggressive therapeutic interventions or intensified surveillance, ultimately contributing to improved patient care and outcomes.
However, there are some limitations with this study.Firstly, the SEER registry did not document other potential prognostic factors that may have a significant impact on CEL patient outcomes, such as genetic mutation, performance status, LDH level, immunophenotypic features, family history and alcohol/smoking consumption history.Secondly, detailed information about therapy was not recorded in the SEER database, making it impossible to analyze the effect of different treatment regimens.Thirdly, this is a retrospective study, which means that there may be unavoidable potential biases such as selection bias.Finally, while the nomograms of CEL were constructed and verified using the same database, they were not further validated using another independent dataset.Thus, although this study provided important insights on CEL due to the rarity and lack of large-scale multicenter prospective study of this disease, the results should still be interpreted with caution.

Conclusions
In conclusion, this study provides novel insights into the epidemiology and prognosis of CEL in the US population using the SEER database.CEL is a very rare disease with a variable clinical presentation and outcome.Age, marital status at diagnosis, and sequence were identified as independent prognostic factors for overall survival, culminating in the development of a prognostic nomogram to predict the 3-and 5-year overall survival probability of CEL patients.This nomogram may help clinicians provide personalized treatment and clinical decisions for CEL patients.To our knowledge, this study represents the largest population-based cohort investigating the epidemiology and survival outcome of CEL patients.However, more clinical research is needed to validate our findings.

)Figure 1 .
Figure 1.Age-adjusted incidence of CEL from 2001 to 2020 in SEER database.(A) Annual age-adjusted incidence of CEL; (B) Age-adjusted incidence of CEL based on age of diagnosis; (C) Annual age-adjusted incidence of CEL in male and female populations, respectively; (D) Age-adjusted incidence of CEL based on age of diagnosis in male and female patients, respectively.CEL, chronic eosinophilic leukemia.

Figure 2 .
Figure 2. Flow chart of study cohort selection using the SEER database.A flow diagram of selection of patients with CEL in this study.CEL, chronic eosinophilic leukemia; SEER, Surveillance, Epidemiology, and End Results.

Figure 3 .
Figure 3. LASSO regression model was used to select characteristic impact factors.(A) LASSO coefficients of 7 features; (B) Selection of tuning parameter (λ) for LASSO model.

Figure 4 .
Figure 4. Construction of the prognostic nomogram of CEL patients based on 3 risk factors.The total points were calculated by integrating scores related to age, marital status, sequence and projected to the bottom scale to predict the overall survival probability at 3 and 5 years.CEL, chronic eosinophilic leukemia.

Figure 5 .
Figure 5. Evaluation of the nomogram of by calibration plot.(A) The calibration curve of the training set for the observed overall survival (OS) probability and predicted OS at 3-year and 5-year.(B) The calibration curve of the validation set at 3-year and 5-year.

Figure 6 .
Figure 6.Evaluation of the nomogram of by receiver operating characteristic (ROC) plot.(A) Time-dependent ROC curve analyses of the nomograms for the 3 years and 5 years in the training set.(B) Time-dependent ROC curve analyses of the validation set.

Figure 7 .
Figure 7. Evaluation of the nomogram of by decision curve analyses.(A, B) The decision curve analyses of the nomogram for the 3 years (A) and 5 years (B) in the training set.(C, D) The decision curve analysis of the nomogram for the 3 years (C) and 5 years (D) in the validation set.

Figure 8 .
Figure 8. Kaplan-Meier curve of overall survival of CEL patients stratified by the risk stratification system in the training set (A) and validation set (B). CEL, chronic eosinophilic leukemia.

Table 1 .
Comparison of baseline characteristics of patients with CEL between the training set and validation set from SEER database.CEL chronic eosinophilic leukemia, chemo chemotherapy, COD cause of death.