Hemoglobin level, a prognostic factor for nasal extranodal natural killer/T-cell lymphoma patients from stage I to IV: A validated prognostic nomogram

Although nasal extranodal natural killer/T-cell lymphoma (nasal ENKL) shares some prognostic factors with other lymphomas, seldom studies had explored the prognostic value of hemoglobin. The ENKL cases in stage I–IV during 2000 to 2015 were collected from two medical centers (group A, n = 192), and were randomly divided into the group B (n = 155) and C (n = 37). Although the significant factors identified by the univariate analysis differed between the group A and B, the multivariate Cox regression indicated the same factors. C-index of the model was slightly better than Yang’s, but its integrated Brier score (IBS) was obviously lower than Yang’s both in the group A and B. Additionally, minimal depth of random survival forest (RSF) classifier confirmed that the prognostic ability of hemoglobin was better than age both in the group A and B. In the calibration of the nomogram, the predicted 3-year or 5-year OS of our nomogram well agreed with the corresponding actual OS. In conclusion, Hemoglobin is a prognostic factor for nasal ENKL patients in stage I - IV, and integrating it into a validated prognostic nomogram, whose generalization error is the smallest among the evaluated models, can be used to predict the patients’ outcome.


Methods
Patients, Treatment and Follows. At the Shanxi Cancer Hospital and Institute (SCHI) and the First Affiliated Hospital of Anhui Medical University (FAHAMU), nasal ENKL cases between 2000 and 2015 were collected according to the morphological and immunohistological criteria of the World Health Organization classification 10 . Before any treatment, patient information included history taking, physical and laboratory examinations, and results of computerized tomography. Hemoglobin levels were classified into lower or higher than 120 g/L. To minimize migrations of staging techniques through a time-span of 15 years, only computerized tomography series of PET/CT scans were used to stage and evaluate the diseases (n = 17). Our protocol was approved by the ethics committee at SCHI and FAHAMU.
Because tumor heterogeneity affects the robustness of predictors 11 , some re-sampling approaches are used to ensure our model can be applied in other cohorts of patients. Using SPSS software (version 10.01), 20% of all recruited patients (group A) were randomized into the group C, and the rest was as the group B. Both the group A and B were used to develop prognostic model for guaranteeing its repeatability. And then, the group B and C were used to validate the model for assuring its reliability. The approach was similar to the external validation procedure, i.e. the developed model (from group B) was validated in another cohort of patients (group C).
Additionally, using the bootstrap method, a 10 fold cross-validation was used to test the generalization ability of the model 12 . The patients (group A and B) were equally and randomly divided into10 subsets. And then, the model was repeatedly trained and validated 10 times. Each time, the pooled data of 9 subsets were used to train the model, which was validated in the retained subset subsequently. The average error across 10 rounds (integrated Brier score, IBS) could estimate the error in generalizing the model in an independent dataset, and a lower IBS indicated a better model 13 .
According to the paradigm of the two hospitals, CT was the first-line treatment, and only the poor responders would receive RT. After the publication of the study by Li et al. 14 in 2006, more patients received RT than before; however, RT was still not the first-line treatment. From medical records or by telephone, all patients were followed to the end of August 2016. Overall survival (OS) was measured from the day of diagnosis to death from any cause.
Prognostic Model and Validation. Separately using the data of the group A and B, the significance of prognostic factors against OS was univariately identified by the Kaplan-Meier and the Log-rank test (P < 0.05). And then, the multivariate Cox regression was used to select the qualified factors against OS from the significant ones. At last, the same factors between the group A and B were used to develop our model.
To validate IPI, Korean prognostic index (KPI), Yang's 9 , and our models, their discriminatory ability of the three groups was compared by C-index (mean ± SE), which was similar to the area under the receiver operating curve (ROC) of the models. After that, the nomogram of our model was built, and calibration plots of the model were constructed between the predicted and the observed survival probabilities. A better model would have a C-index closing to 1, and a calibration curve closing to the line passing through the original point with a slope of 1.
Additionally, to evaluate the prognostic ability of factors involved in these models, especially age and hemoglobin, an indicator of random survival forest (RSF) classifier was used. Although RSF is not so popularly used as the Cox multivariate regression, it is a more accurate method for analyzing survival data. Minimal depth is an indicator of RSF classifier to evaluate the prognostic ability of each factor. A smaller minimal depth of a factor is, the more ability it has on prognosis 15,16 .
Data were analyzed with the SPSS statistical software (version 10.01) and the R Project software package (version 3.3.1). A two sides of P < 0.05 was considered as the significant level.
Ethics approval and informed consent. Our protocol was approved by the ethics committee of the Shanxi Cancer Hospital and Institute, and the First Affiliated Hospital of Anhui Medical University. The study was conducted in accordance with the relevant guidelines and regulations. Informed consent was obtained from all participants according to the institutional guidelines.
Data availability statement. The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Results
Patient Characteristics and Treatments. The median ages of the group A (n = 192), B (n = 155) and C (n = 37) were 42.8 y (9-79 y), 42.7 y (9-79 y) and 43.0 y (17-74 y), respectively. Patient characteristics are listed in Table 1, and are well balanced between the group B and C.
RT (n = 115) was given by 6-MV linear accelerators (Varian), and the median dose was 50 Gy (range 8-70 Gy). The most patients (n = 75) received conventional radiotherapy (46/75 patients ≥ 50 Gy), and others (n = 40) received intensity modulated radiotherapy or three dimensional conformal radiation therapy (33/40 patients ≥ 50 Gy). Between the group B and C, radiation dose of 66/92 and 13/23 patients were higher than 50 Gy (x 2 = 1.981, P = 0.159), and 60/75 and 32/40 patients received conventional RT (x 2 = 0.000, P = 1.000), respectively.   analysis of the group A and B are presented in Table 2, and only the significance of the factor of distant metastasis is different between the two groups. Additionally, the factor of treatment was significant for OS in both groups.
Because our model differed from Yang's in the substitution of age with hemoglobin, survival plots of age and hemoglobin were compared (Fig. 3), and indicated that hemoglobin was better than age in both group A and B. The result was also confirmed by RSF classifier. Both in the group A and B, the minimal depth of hemoglobin were lower than that of age, and were even lower than that of LDH (Fig. 2B).
In the 10 fold cross-validation of the group A and B (Fig. 2C), our model's IBS was obviously lower than others' , and the Yang's was even the highest. Therefore, these results indicated that the generalization error of our model was the smallest among the evaluated models.
In the group A, the nomogram of our model for predicting 3-year and 5-year OS is plotted in Fig. 4A. To calibrate the nomogram, the predicted 3-year and 5 year OS was separately plotted against corresponding actual OS ( Fig. 4B and C). On the plots, the curve between predicted and actual OS closes to the line passing the original point with a slope of 1, which indicates the good agreement between them.

Discussion
Our study indicates that hemoglobin level is a prognostic factor for the patients with nasal extranodal natural killer/T-cell lymphoma in stage I -IV, and the validated prognostic nomogram (Ann Arbor stage, PTI, LDH, hemoglobin, and ECOG PS) can be used to predict the outcome of the patients. Although, among the evaluated models, ours just slightly improves C-index, its generalization error is the smallest.    Classified as a kind of lymphoma, nasal ENKL shared some prognostic factors with other lymphomas, such as IPI and LDH level 8,9 . Previously, using multivariate Cox regression, several studies tried to relate hemoglobin to the prognosis of the patients. However, because of the limited cases, Ma 17 and Kim 18 (n = 64 and 62, respectively) failed. In the study from Xu et al. 19 (n = 170), according to hemoglobin levels, the recruited patients were grouped (threshold value 100 g/L, vs. 120 g/L of our study). In the univariate analysis, the factor of hemoglobin was significant against progression free survival (P = 0.034), but not significant against OS (P = 0.057). In a cohort of 321 patients, Wang et al. 20 found that hemoglobin, ECOG PS, age, LDH and Ann Arbor stage were significant factors for both progression free survival and OS, but they did not validate the results in another group of patients 20 . Therefore, we confirmed their results that hemoglobin level was a prognostic factor for nasal ENKL patients.
Compared to multivariate Cox regression, random survival forest classifier is better in modeling non-linear effects and complex interactions among factors. Furthermore, the classifier can provide indicators, such as minimal depth, to evaluate the prognostic ability of each factor 21 . As indicated by the depth of RSF classifier, hemoglobin was better than age in predicting the outcome of the patients. Additionally, the results were also confirmed by the survival plots of age and hemoglobin (Fig. 3). Therefore, the factor of age in Yang's model, which regressed from the patients in stage I and II 9 , was substituted by hemoglobin, and the substitution could slightly improve the C-index of the model. Furthermore, as indicated by the IBS of 10 fold cross-validation, the generalization error of our model was the smallest among the evaluated models, especially was obviously better than Yang's. It should be noted that, for the comparison of models, we re-regressed Yang's model, which differed from its origin in the scoring points of factors. Therefore, the re-regression made up the difference of C-index between Yang's and our model.
For diffuse large B-cell lymphoma and non-Hodgkin lymphoma, hemoglobin < 120 g/L is a frequent sign at diagnosis, and interleukin-6 plays a vital role in its development 8 . Because it had been reported that both interleukin-9 and −10 related to the poor prognosis of the nasal ENKL patients, the underlining mechanism might be that interleukins act as growth factors of tumor cells and participate in the production of erythropoietin (EPO) 22,23 . However, further studies should be conducted to confirm this hypothesis.
Besides the prognostic factors, there were other differences between Yang's and our nomogram. In the Yang's model 9 , Ann Arbor stage had the highest score, that was followed by ECOG PS, PTI, age and LDH. In contrast, the sequence of our nomogram was ECOG PS, hemoglobin, LDH, PTI and Ann Arbor stage. Additionally, Yang's model was developed for stage I-II patients, and ours was for those from stage I to IV.
Besides the prognostic factors identified by previous studies, other powerful factors might be long non-coding RNAs (lnRNAs), microRNAs (miRNAs), and so on. The potential relation between these markers and nasal ENKL could be predicted by some computational models [24][25][26] . After verifying the markers in laboratory research, our future work would focus on their prognostic ability.
As a study with limited cases, we tried another way to validate our model. Firstly, the enrolled patients (n = 192, group A) were randomly divided into group B (n = 155) and C (n = 37). Subsequently, to validate the repeatability of our model, both the group A and B were separately used to develop prognostic models. And then, C-index of all groups was used to evaluate the discriminatory of models. The approach was similar to the external validation procedure, i.e. the developed model (from group B) was validated in another cohort of patients (group C). At least in this study, the prognostic ability of hemoglobin was coincided with the results from Wang et al. 20 , and indicated that the validation method might be useful for other studies with limited cases.

Conclusions
Hemoglobin is a prognostic factor for nasal extranodal natural killer/T-cell lymphoma patients from stage I to IV, and integrating it into a validated prognostic nomogram, whose generalization error is the smallest among the evaluated models, can be used to predict the outcome of the patients.