A SEER-based nomogram accurately predicts prognosis in Ewing’s sarcoma

Ewing's sarcoma is a high-grade malignancy bone and soft tissue tumor that most commonly occurs in children and adolescents. Although the overall prognosis of Ewing's sarcoma has improved, the 5-year survival rate has not improved significantly. The study aimed to determine the risk factors independently associated with the prognosis of Ewing's sarcoma and to construct a nomogram to predict patient survival. Patients diagnosed with Ewing's sarcoma were collected from the Surveillance, Epidemiology, and End Results program database between 2004 and 2015 and further divided into training and validation cohort. Univariate and multivariate Cox regression analyses were used to identify meaningful independent prognostic factors. The nomogram was used to predict 3- and 5-year overall survival (OS) and cancer-specific survival (CSS). Finally, the nomogram was verified internally and externally through the training and validation cohorts, and the predictive capability was evaluated using the receiver operating characteristic (ROC) curve, C-index, and calibration curve and compared with that of the 7th TNM stage. A total of 1120 patients were divided into training (n = 713) and validation (n = 407) cohorts. Based on the multivariate analysis of the training cohort, a nomogram that integrated age, tumor size, primary site, N stage, and M stage was constructed (P < 0.05). The predicted C-indexes of OS and CSS of the training cohort were 0.744 (95% CI 0.717–0.771) and 0.743 (95% CI 0.715–0.770), respectively. However, the TNM stage had a C-index of 0.695 (95% CI 0.666–0.724) and 0.698 (95% CI 0.669–0.727) for predicting OS and CSS, respectively. The nomogram showed higher C-indexes than those in the TNM stage. Furthermore, the internal and external calibration curves showed good consistency between the predicted and observed values. Age, tumor size, primary site, N stage, and M stage are independent risk factors affecting the OS and CSS in Ewing’s sarcoma patients. Compared with the 7th TNM staging, the nomogram consisting of these factors was more accurate for risk assessment and survival prediction in patients with Ewing’s sarcoma, thus providing a novel reliable tool for risk assessment and survival prediction in Ewing’s sarcoma patients.

www.nature.com/scientificreports/ and multi-drug chemotherapy have improved the 5-year overall survival (OS) rate of patients with localized ES from approximately 10% to nearly 75% 5 . However, approximately 20-25% of ES patients have metastases at the time of initial diagnosis, and these patients are often resistant to intensive treatment 6 . In addition, the 5-year OS of metastasis patients is only 20-45%, depending on the location of metastasis 4,7 . ES survival is influenced by different factors, including patient age, primary tumor site, tumor size, distant metastasis, surgery, radiotherapy, and other clinically related prognostic factors [8][9][10][11][12] . According to previous studies, adult age, pelvic involvement, and larger tumor size were associated with poor survival in patients with ES 8,9 . In addition, the occurrence of lung metastasis and extrapulmonary metastasis also significantly increases the risk of poor prognosis for patients 9,10 . Surgery alone is always the best local control method to improve the overall survival rate of patients 11 . Studies have also shown that radiation therapy can only increase the overall survival rate of adults who have not undergone surgery. In patients undergoing surgery, radiation therapy is not associated with a higher overall survival rate for children or adults 12 . However, although previous studies have identified independent prognostic factors of survival in ES, no one independent factor can accurately predict survival of ES patients. Therefore, there is an urgent need to establish a prognostic prediction model that accurately predicts ES survival. However, ES is relatively rare, and thus, it is extremely difficult to conduct large-scale studies.
The Surveillance, Epidemiology, and End Results (SEER) program is a large population-based database for cancer-related epidemiology and health-related service research. It provides data from 18 geographically variable population-based cancer registries, which cover almost 30% of the population of the United States 13 . Nomograms, as simple and reliable predictive tools, have been widely used to assess the prognosis of many cancers. A nomogram integrates various important factors and converts the statistical prediction model into a single numerical estimate of the probability of an event in the form of a chart. These events include the survival rate of a certain disease or the probability of death 14 . Therefore, nomographs have become a reliable tool to guide decision-making and predict the clinical outcomes of many cancers.
This study aimed to determine the risk factors independently associated with ES prognosis and to construct a nomogram to predict patient survival. Towards this goal, we evaluated ES patients registered in the SEER database from 2004 to 2015 and constructed a nomogram based on the clinicopathological data of these patients.

Materials and methods
Data source and patients. All patient data were extracted from the US SEER database using SEER*Stat Study variables. Data, including the year of diagnosis, age, race, gender, tumor location, tumor size, T stage, N stage, M stage, survival time, cause of death, and survival status, were collected. Given that juveniles and patients aged ≤ 10 years at the time of diagnosis have been reported to have a lower risk of death 15 , the patients were categorized by age at diagnosis into three groups: 0-17 years, 18-59 years, and ≥ 60 years. Race was classified as black, white, and others (American Indian/AK Native, and Asian/Pacific Islander). Tumor-related factors, including tumor location and size, were also investigated. However, data on the original position of the tumor were unclear, and thus we could not confirm the exact position in the bone. As such, we classified the original parts as extremity bones (long and short bones of the extremities), axial bones or skull (spine, pelvis, ribs, mandible, and skull), or others (anterior mediastinum, posterior mediastinum, abdomen, peritoneum, and other soft tissues) based on previous studies [16][17][18] .
Tumor size was considered as a continuous variable and was classified into the following three categories based on previous studies 18-20 : ≤ 5 cm, 6-10 cm, and > 10 cm. In addition, the T stage was divided into T0, T1, T2, T3, and Tx. The N stage was described as N0 (No), N1 (Yes), and Nx. The M stage was defined as M0 for no metastasis and M1 for positive metastasis.
The study endpoints were the 3-and 5-year rates of overall survival (OS) and cancer-specific survival (CSS).

Statistical analysis.
Categorical data were expressed as frequency and percentage. The chi-square test was used to evaluate the relationship between the demographic and clinical characteristics of the two groups. The chi-square test was performed using SPSS version 22.0 (IBM Corp., Armonk, NY, USA). Survival curves were generated using the Kaplan-Meier method and stratified according to the clinicopathological index. Univariate and multivariate Cox regression analyses were used to identify all risk factors independently associated with OS and CSS. Based on the results of the multivariate Cox regression analysis, a nomogram that integrates all independent factors was constructed to predict the 3-and 5-year OS and CSS. To construct the nomogram, we first used the "coxph" function in the "survival" package to perform univariate and multivariate Cox regression analysis. The 95% confidence interval (CI) and risk ratio were then simultaneously calculated. Significant variables in the multivariate analysis were selected, and we used the "plot" function and the "nom" function in the "rms" package to construct the nomogram model. To interpret the nomogram, a straight line was drawn down to each time point, and then the assigned scores in the range of 0-100 at the top were read. By adding the scores of each www.nature.com/scientificreports/ selected variable, the probability of individual patient survival can be easily calculated. The C-index was used to assess the prognostic value of the nomogram. The survival curve, receiver operating characteristic (ROC) curve, Harrell's concordance index (C-index), and calibration curve were determined using the "rms, " "foreign, " and "survival" packages in the R software. The agreement between the actual and predicted nomograms was presented as validation curves. All software packages used in our manuscript were obtained from the website. P values < 0.05 were considered statistically significant.
Ethics approval and consent to participate. The studies involving human participants were reviewed and approved by the medical ethics committee of The First Affiliated Hospital of Nanchang University. Written informed consent from the participants' legal guardian/next of kin was not required to participate in this study in accordance with the national legislation and the institutional requirements.

Results
Patient characteristics and survival. Among the 1120 patients, 40 (3.57%), 989 (88.3%), and 91 (8.13%) patients were black, white, and of other races (American Indian/Alaska Native and Asian/Pacific Islander), respectively. There were 692 (61.79%) males and 428 (38.21%) female patients. There was no significant difference in patient characteristics between the two groups (P > 0.05). Table 1 summarizes the baseline clinicodemographic patient characteristics. The Kaplan-Meier survival curve showed that patients aged 0-17 years have a better prognosis than that in patients aged 18-59 years and ≥ 60 years (P < 0.001) (Fig. 1). Race, gender, primary site, tumor size, T stage, N stage, and M stage were identified as prognostic factors (see Supplementary Figs. S2 and S3 online).

Construction and Validation of the Nomogram.
Age, primary tumor site, tumor size, N stage, and M stage were used as prognostic predictors to construct the nomogram (Fig. 2). The strongest influencing factor of prognosis was age, followed by M stage and tumor size, and the primary site had the least influence. In the internal validation, the nomogram had a C-index of 0.744 (95% CI 0.717-0.771) and 0.743 (95% CI 0.715-0.770) for predicting OS and CSS, respectively. The corresponding values in the external validation were 0.803 (95% CI 0.748-0.858) and 0.804 (95% CI 0.747-0.861), respectively (Table 4). Moreover, the nomogram showed higher C-indexes than those in the TNM stage in both external and internal validations. In the internal validation, the TNM stage had a C-index of 0.695 (95% CI 0.666-0.724) and 0.698 (95% CI 0.669-0.727) for predicting OS and CSS, respectively. The corresponding values in the external validation were 0.714 (95% CI 0.636-0.792) and 0.732 (95% CI 0.656-0.808), respectively (Table 4). It was suggested that the nomogram was more effective than TNM stage in predicting the survival of patients.
The area under the curves (AUCs) of the nomogram in predicting the 3-and 5-year OS were 0.788 and 0.771, respectively, while those of the TNM stage were 0.742 and 0.722, respectively. In the external validation, the AUCs of the nomogram for predicting the 3-and 5-year OS were 0.734 and 0.756, respectively, while those of the TNM stage were 0.663 and 0.654, respectively. These results indicate that the nomogram can better predict www.nature.com/scientificreports/ the 3-and 5-year prognoses of ES patients (Fig. 3). Furthermore, the internal and external calibration curves showed a significant agreement between the predicted and observed values (Fig. 4).

Discussion
ES is a rare and aggressive malignant tumor that is the second most common primary bone and soft tissue malignancy in children and adolescents, with the incidence second only to osteosarcoma 3,4 . ES has a poor prognosis, with a survival rate of 70%-80% for local disease and only 30% for metastatic disease 5,6 . Although treatment advances have improved ES survival, accurate prognostic prediction remains challenging. Furthermore, although it is the second most common malignancy in children and young adults, ES is extremely rare, occurring in only Table 4. Accuracy of the prediction score of the nomogram and TNM stage for estimating prognosis of patients. OS, Overall Survival; CSS, Cancer-specific Survival; CI, Confidence Interval. www.nature.com/scientificreports/ 2 per 1 million children and adolescents worldwide. Thus, large-scale research is challenging. In this study, we used a large-sample database of the SEER program and established a predictive nomogram for the prognosis of ES patients. The nomogram integrated routinely available information such as age, primary site, tumor size, N stage, and M stage to predict OS and CSS in a large cohort of 1120 patients with ES. Consistent with previous reports 8,9,11 , age was identified as an independent risk factor for poor prognosis in this study. Patients over 60 years had lower OS and CSS than those of patients aged 18-59 years and < 17 years; the younger the age, the better the prognosis. This could be because older adult patients are more likely to develop metastatic disease and receive lower doses of chemotherapy because of their low tolerance to chemotherapy 8,21 . Moreover, older adult patients may have multiple comorbidities, including diabetes, hypertension, and heart disease, worsening their prognosis. Tumor size and axial primary tumors were also identified as independent risk factors of prognosis. Multivariate analysis showed that OS and CSS were significantly lower in patients with larger tumors (> 10 cm) and axial primary tumors, consistent with previous studies. For instance, Duchman et al. 18 found that ES patients with metastatic disease, axial tumor site, and tumor diameter > 10 cm had a significantly lower 10-year CSS. Axial primary tumors and larger tumors are also often associated with metastatic disease 22 , and metastatic disease has been established to have a direct impact on OS and CSS 23 .

Variables
ES is an aggressive tumor with a high rate of local recurrence and distant metastasis, and these metastases are often resistant to intensive treatment. Moreover, axial primary tumors are usually closer to large blood vessels, increasing the possibility of distant metastatic disease. However, patients with axial primary tumors usually do not have obvious symptoms, thus delaying diagnosis and increasing the risk of distant metastasis 17,18 . The current study found that gender, race, and T stage were not risk factors related to patient prognosis. Meanwhile, patients aged > 60 years and with larger tumors (> 10 cm), axial primary tumors, advanced N stage, and metastatic disease at the time of diagnosis are more likely to have poor OS and CSS.
Nomograms have been shown to predict the survival of many tumor types and are considered to be more accurate than the 7th AJCC staging system 14 . The nomogram in this study included age, tumor size, primary site, N stage, and M stage. Its predictive accuracy for the 3-and 5-year OS and CSS was evaluated by comparing the predicted survival and actual survival rate. In the training cohort, the C-indices for OS and CSS were 0.744 and 0.743, respectively. In the verification group, the corresponding values were 0.803 and 0.804, respectively, indicating the reliability of the nomogram. The ROC curves also indicated that our nomogram better predicts the OS and CSS than the 7th TNM staging system. The internal and external calibration curves showed a significant agreement between the predicted and observed values. Collectively, these findings indicate that the nomogram can be helpful to evaluate patient prognosis and determine the need for further chemotherapy after surgery to improve patient outcomes.
Our study has some limitations. First, the TNM stage was according to the 7th AJCC staging system, which is not up-to-date and may reduce effectiveness. Second, our nomogram is only constructed based on data from the SEER database where some patient data will inevitably be lost. This may reduce the number of qualified cases and may lead to the risk of selection bias. Finally, because there is no information on blood biomarkers such as serum lactate dehydrogenase (LDH), alkaline phosphatase, and carcinoembryonic antigen (CEA) in the SEER database, and thus these were not included in the analysis. Some previous studies have shown that combining blood biomarkers, such as hemoglobin, neutrophils, and LDH, can improve the predictive capability of the nomogram 24 . We will include these data in future research to improve the nomogram. Despite the limitations, the data in this study can be used as reference to develop a globally applicable predictive model of ES prognosis.

Conclusions
Age, tumor size, primary site, N stage, and M stage are independent risk factors affecting the OS and CSS in ES patients. Compared with the 7th TNM staging, the nomogram consisting of these factors was more accurate for risk assessment and survival prediction in patients with ES, thus providing a novel reliable tool for risk assessment and survival prediction in ES patients.