Development and validation of a nomogram to predict the prognosis of patients with gastric cardia cancer

Our goal was to develop a prognostic nomogram to predict overall survival (OS) and cancer-specific survival (CSS) in patients with gastric cardia cancer (GCC). Patients diagnosed with GCC from 2004 to 2015 were screened from the surveillance, epidemiology, and end results (SEER) database. A nomogram was developed based on the variables associated with OS and CSS using multivariate Cox analysis regression models, which predicted 3- and 5-year OS and CSS. The predictive performance of the nomogram was evaluated using the consistency index (C-index), calibration curve and decision curve analysis (DCA), and the nomogram was calibrated for 3- and 5-year OS and CSS. A total of 7,332 GCC patients were identified and randomized into a training cohort (5,231, 70%) and a validation cohort (2,200, 30%). Multivariate Cox regression analysis showed that marital status, race, SEER stage, grade, T stage, N stage, M stage, tumor size, and surgery were independent risk factors for OS and CSS in GCC patients. Based on the multivariate Cox regression results, we constructed prognostic nomograms of OS and CSS. In the training cohort, the C-index for the OS nomogram was 0.714 (95% CI = 0.705–0.723), and the C-index for the CSS nomogram was 0.759 (95% CI = 0.746–0.772). In the validation cohort, the C-index for the OS nomogram was 0.734 (95% CI = 0.721–0.747), while the C-index for the CSS nomogram was 0.780 (95% CI = 0.759–0.801). Our nomogram has better prediction than the nomogram based on TNM stage. In addition, in the training and external validation cohorts, the calibration curves of the nomogram showed good consistency between the predicted and actual 3- and 5-year OS and CSS rates. The nomogram can effectively predict OS and CSS in GCC patients, which may help clinicians personalize prognostic assessments and clinical decisions.

Scientific RepoRtS | (2020) 10:14143 | https://doi.org/10.1038/s41598-020-71146-z www.nature.com/scientificreports/ of cancer patients 10,11 . However, the evaluation of cancer prognosis based on TNM stage alone has limitations and cannot fully evaluate clinicopathological factors, such as age, sex, race and other factors. The nomogram is a statistics-based tool that calculates the probability of clinical events by considering the preweight value of each factor 12,13 . In recent years, nomograms have been widely used to predict the survival rate of various cancers 11,14,15 . The purpose of this study is to develop an effective prognostic nomogram to predict the overall survival (OS) and cancer-specific survival (CSS) of patients with GCC to help clinicians provide personalized treatment recommendations.

Results
Demographic and pathologic characteristics. A total of 7,332 patients were included in the study, and they were randomly assigned to two different cohorts: the training cohort (n = 5,132) and the validation cohort (n = 2,200). A flow chart showing the process of including patients in the study is depicted in Fig. 1. The demographic and pathologic characteristics of GCC patients are shown in Table 1. In our cohort, the highest incidence of GCC was in patients over 60 (64.7%) years old, and the majority of patients were male (79.6%), white (87.1%), and married (66.8%). The most common GCC classifications were adenocarcinoma (83.0%), regional (42.0%), grade III (51.4%) and M0 stage (78.3%). In addition, 62.8% of patients received surgery, and 64.9% received chemotherapy.

Identification of prognostic factors of OS and CSS.
To identify the prognostic factors, we performed univariate and multivariate Cox regression analyses in the training cohort. According to the univariate Cox analysis, age at diagnosis, marital status, race, histological type, SEER stage, grade, T stage, N stage, M stage, tumor size, and surgery were significantly associated with OS, while sex, age, marital status, race, histological type, SEER stage, grade, T stage, N stage, M stage, tumor size, and surgery were closely related to CSS. These significant variables were further entered into the multivariate Cox analysis. Multivariate Cox analysis showed that age, marital status, race, SEER stage, grade, T stage, N stage, M stage, tumor size and surgery were independent prognostic factors for OS (Table 2). Regarding CSS, eleven variables, including sex, age, marital status, race, SEER stage, grade, T stage, N stage, M stage, tumor size and surgery, were identified as independent prognostic factors (Table 3). nomograms construction and performance assessment. We developed two nomograms for OS and CSS: one was based on the results of multivariate Cox analysis (Fig. 2), and the other was based on TNM stage ( Supplementary Fig. S1). Each of the variables was given a point according to the HR. Then, by adding the total score of each variable and locating the score on the total points scale, the probability of 3-and 5-year OS and CSS can be obtained. In the nomogram of OS, surgery contributed the greatest to the survival outcome, while M stage contributed the greatest in the nomogram of CSS.    (Table 4 and Fig. 3A, B). To compare whether the predicted survival time was consistent with the actual survival time, the C-index was used to verify the nomogram in the training cohort. For OS or CSS, the C-index of the nomograms (OS, C-index = 0.714; CSS, C-index = 0.759) was greater than that of the TNM stage (OS, C-index = 0.651; CSS, C-index = 0.7696). Similar results were found in the validation cohort (Table 5). This similarity of the results indicates that the model established by the nomogram was accurate.
In addition, DCA calculates the net benefit to evaluate the clinical utility of the nomogram. The results showed that in the broad threshold of OS (10-50%), the clinical net benefit of the nomograms was greater than that of the TNM stage (Fig. 3C,D). The CIC results show that among the broad thresholds for OS (20-70%), the nomograms were classified as positive, and the number of true positives was greater than those of the TNM stage (Fig. 4). Moreover, we calibrated the 3-and 5-year OS and CSS nomograms of the training cohort (Fig. 5) and the validation cohort ( Supplementary Fig. S2), which were very close to the ideal curve. This showed good consistency between the prediction of the nomogram and the actual observed outcomes in the training and validation cohorts.

Discussion
GCC is a kind of malignant tumor at the junction of the stomach and esophagus that mostly occurs in middleaged people over 40 years old and elderly people and that accounts for approximately 10% of all digestive system tumors 16,17 . After the onset of the disease, patients often have clinical symptoms such as upper gastrointestinal bleeding, dysphagia, and stomach discomfort. The prognosis of patients with GCC is poor 18 , so it is very important to develop an effective system to predict the prognosis of these patients.  In this study, we first developed prognostic nomograms of OS and CSS in patients with GCC. We used the SEER database to conduct Cox regression analysis on many GCC patients to identify independent risk factors for OS and CSS. We constructed two prognostic nomograms: one based on multivariate Cox analysis and the other based on TNM stage. Through the examination of the C-index, ROC curve, DCA curve and CIC, it was found that our nomogram had better prognostic ability than that of the TNM stage. In addition, we verified and  www.nature.com/scientificreports/ calibrated the nomogram and evaluated the accuracy of OS and CSS nomograms for 3 and 5 years. The results show that the predicted results of the nomogram are in good agreement with the actual observed results and are supported by the calibration curve, ROC curve analysis and C-index value. The C-index of the nomograms is more than 0.7, indicating that it has sufficient discrimination ability. The DCA results show that the nomogram we developed has good clinical practical value.
Recently, some nomograms containing various input variables have been developed to predict the prognosis of different digestive tract tumors [19][20][21][22] . Kim et al. 19 developed and validated a nomogram that predicted the risk of lymph node metastasis in patients with early gastric cancer and could be used to avoid unnecessary gastrectomy after endoscopic dissection. By analyzing 9,026 patients with metastatic esophageal cancer between 2010 and 2015, Zhu et al. 20 found that the nomogram was better at predicting distant metastasis of esophageal cancer than traditional TNM staging. Similarly, the nomogram developed by Xue et al. 21 that includes nutritional and immune parameters can effectively predict the overall and postoperative survival rate and relapse-free survival of gastric cancer patients after radical gastrectomy, and its prediction accuracy and discrimination ability are better than those of TNM staging.
In the current reports on GCC, it is remarkable that Liu et al. 22 developed a nomogram for predicting the total survival rate of GCC based on radiology and clinical predictors. However, the nomogram for predicting the OS of GCC based on radiology and clinical predictors is not available, and the nomogram for CSS is applicable for a limited population, since it is only suitable for patients undergoing preoperative radiotherapy.
TNM staging was determined according to the results of laboratory examination and postoperative pathological examination. For example, Gong et al. 23 found that TNM stage was associated with the prognosis of high gastric cancer, and T stage was an independent factor for lymph node metastasis. Zhu et al. 24 found that TNM staging was associated with patients with adenocarcinoma of the esophagogastric junction, but only N staging www.nature.com/scientificreports/ was an independent risk factor for prognosis. TNM staging is a common method to predict the prognosis of GCC patients, but TNM staging has limitations and cannot provide clinicians with personalized prognosis prediction.
In this study, we successfully constructed a practical nomogram based on thirteen factors: sex, age, marital status, race, histological type, SEER stage, grade, T stage, N stage, M stage, tumor size, surgery, and chemotherapy, and its prediction power was better than that of traditional TNM staging.
Our research still has some limitations. First, the lack of treatment information (chemotherapy regimens, surgical methods, etc.) in the SEER database may alter our results. Second, our study is a retrospective study with inevitable selection bias. Third, due to the lack of external verification, we are concerned about the generality of our model and may need further research to prove our results.

conclusions
The present study demonstrated that the nomogram is a better prognostic determinant than TNM staging systems in GCC patients. The nomogram we developed accurately and reliably predicted the 3-and 5-year OS and CSS of GCC. This model could enable clinicians to more precisely estimate the survival of GCC patients.  In total, 21,860 patients were enrolled in this study according to the specified inclusion and exclusion criteria. The exclusion criteria in our study were as follows: more than one primary tumor (n = 5,437); unknown survival time (n = 32); unknown T stage and T0 (n = 4,300); unknown N stage (n = 569); unknown M stage (n = 92); unknown tumor size (n = 3,843); unknown surgery status (n = 6); and unknown marital status (n = 249). In total, 7,332 GCC patients were included for this analysis.
Data regarding sex, age, marital status, race, histological type, SEER stage, grade, T stage, N stage, M stage, tumor size, surgery, chemotherapy, vital status, and survival time were extracted from the SEER database (2004-2015) for further analysis. OS duration was defined as the time from diagnosis until death or last followup. CSS duration was defined as the time from diagnosis until death because of GCC or last follow-up.
Statistical analyses. Cases were randomly divided into training and validation cohorts (ratio 7:3). Univariate and multivariate Cox regression models were applied to calculate the hazard ratio (HR) and 95% confidence interval (CI) to assess the independent contributions of each factor to OS and CSS. In the univariate Cox proportional hazard model, variables with P < 0.05 were further analyzed in the multivariate Cox proportional hazard model. Based on multivariate Cox analysis, a nomogram was developed to predict the 3-and 5-year OS and CSS rates. In contrast, we built another nomogram based on TNM stage. Then, we used MedCalc software (version 15.2.0) to generate the receiver operating characteristic (ROC) curve for the two nomograms and determined the area under the curve (AUC). The performance of the nomogram was assessed by the C-index and the calibration curve (1,000 bootstrap resamples). The C-index has a range from 0.5 to 1.0, with 0.5 indicating random chance and 1.0 considered perfect discrimination. Decision curve analysis (DCA) and a clinical impact curve (CIC) were employed to evaluate the net benefit of the nomogram in a clinical context.
The above statistical analyses were conducted using SPSS software version 24.0 (SPSS, Chicago, USA) and the statistical software package R version 3.5.3 (https ://www.r-proje ct.org/). All tests were two-sided. A P value ≤ 0.05 (two-sided) was considered statistically significant.