Introduction

Gastric cancer is the 6th most common cancer and the 3rd leading cause of tumor-related death worldwide1. It was reported that there were approximately 27,510 new cases of gastric cancer resulting in 11,140 deaths in the United States in 20191. Anatomically, gastric cancer can be divided into gastric cardiac carcinoma (GCC) and non-cardia gastric cancer (NGCC). In recent decades, although the overall incidence of gastric cancer has declined worldwide2, the incidence of GCC has increased3. This may be due to the increased incidence of gastroesophageal reflux disease and obesity4. According to previous reports, there are significant differences in incidence and prognostic specificity between GCC and NGCC, indicating that they are different tumor entities5,6,7. GCC has a poor prognosis and is a serious threat to human health8.

At present, the prognosis of GCC is mainly predicted by the TNM staging system9. The TNM classification proposed by the American Joint Commission on Cancer (AJCC) is the most widely used staging system, and it is mainly based on tumor invasion (T), regional lymph node (N) and distant metastasis (M) to predict the survival of cancer patients10,11. However, the evaluation of cancer prognosis based on TNM stage alone has limitations and cannot fully evaluate clinicopathological factors, such as age, sex, race and other factors.

The nomogram is a statistics-based tool that calculates the probability of clinical events by considering the preweight value of each factor12,13. In recent years, nomograms have been widely used to predict the survival rate of various cancers11,14,15. The purpose of this study is to develop an effective prognostic nomogram to predict the overall survival (OS) and cancer-specific survival (CSS) of patients with GCC to help clinicians provide personalized treatment recommendations.

Results

Demographic and pathologic characteristics

A total of 7,332 patients were included in the study, and they were randomly assigned to two different cohorts: the training cohort (n = 5,132) and the validation cohort (n = 2,200). A flow chart showing the process of including patients in the study is depicted in Fig. 1. The demographic and pathologic characteristics of GCC patients are shown in Table 1. In our cohort, the highest incidence of GCC was in patients over 60 (64.7%) years old, and the majority of patients were male (79.6%), white (87.1%), and married (66.8%). The most common GCC classifications were adenocarcinoma (83.0%), regional (42.0%), grade III (51.4%) and M0 stage (78.3%). In addition, 62.8% of patients received surgery, and 64.9% received chemotherapy.

Figure 1
figure 1

Schematic overview for patient identification.

Table 1 Baseline demographic and clinical characteristics with gastric cardia cancer (GCC) patients in our study.

Identification of prognostic factors of OS and CSS

To identify the prognostic factors, we performed univariate and multivariate Cox regression analyses in the training cohort. According to the univariate Cox analysis, age at diagnosis, marital status, race, histological type, SEER stage, grade, T stage, N stage, M stage, tumor size, and surgery were significantly associated with OS, while sex, age, marital status, race, histological type, SEER stage, grade, T stage, N stage, M stage, tumor size, and surgery were closely related to CSS. These significant variables were further entered into the multivariate Cox analysis. Multivariate Cox analysis showed that age, marital status, race, SEER stage, grade, T stage, N stage, M stage, tumor size and surgery were independent prognostic factors for OS (Table 2). Regarding CSS, eleven variables, including sex, age, marital status, race, SEER stage, grade, T stage, N stage, M stage, tumor size and surgery, were identified as independent prognostic factors (Table 3).

Table 2 Univariate and multivariate analysis of overall survival (OS) rates in the training cohort.
Table 3 Univariate and multivariate analysis of cancer-specific survival (CSS) rates in the training cohort.

Nomograms construction and performance assessment

We developed two nomograms for OS and CSS: one was based on the results of multivariate Cox analysis (Fig. 2), and the other was based on TNM stage (Supplementary Fig. S1). Each of the variables was given a point according to the HR. Then, by adding the total score of each variable and locating the score on the total points scale, the probability of 3- and 5-year OS and CSS can be obtained. In the nomogram of OS, surgery contributed the greatest to the survival outcome, while M stage contributed the greatest in the nomogram of CSS.

Figure 2
figure 2

Nomogram predicting 3-, and 5-year overall survival (OS) and cancer-specific survival (CSS) rate of GCC patients. (A) OS rate; (B) CSS rate.

Analysis of the time-dependent ROC curves for OS and CSS showed that the AUCs of the nomograms (OS: 0.770, 95% CI = 0.758–0.782; CSS: 0.700, 95% CI = 0.687–0.713) were significantly larger than those of TNM stage (OS: 0.721, 95% CI = 0.709–0.734; CSS: 0.663, 95% CI = 0.650–0.676) in the training cohort (Table 4 and Fig. 3A, B). To compare whether the predicted survival time was consistent with the actual survival time, the C-index was used to verify the nomogram in the training cohort. For OS or CSS, the C-index of the nomograms (OS, C-index = 0.714; CSS, C-index = 0.759) was greater than that of the TNM stage (OS, C-index = 0.651; CSS, C-index = 0.7696). Similar results were found in the validation cohort (Table 5). This similarity of the results indicates that the model established by the nomogram was accurate.

Table 4 Comparison of area under the curve (AUC) between the nomogram and TNM stage in gastric cardia cancer (GCC) patients.
Figure 3
figure 3

Receiver operating characteristic (ROC) and decision curve analysis (DCA) curves detect the predictive value of two nomograms in GCC prognosis. (A) ROC curve for overall survival (OS); (B) ROC for cancer-specific survival (CSS); (C) DCA for overall survival (OS); (D) DCA for cancer-specific survival (CSS).

Table 5 Comparison of C-indexes between the nomogram and TNM stage in gastric cardia cancer (GCC) patients.

In addition, DCA calculates the net benefit to evaluate the clinical utility of the nomogram. The results showed that in the broad threshold of OS (10–50%), the clinical net benefit of the nomograms was greater than that of the TNM stage (Fig. 3C,D). The CIC results show that among the broad thresholds for OS (20–70%), the nomograms were classified as positive, and the number of true positives was greater than those of the TNM stage (Fig. 4). Moreover, we calibrated the 3- and 5-year OS and CSS nomograms of the training cohort (Fig. 5) and the validation cohort (Supplementary Fig. S2), which were very close to the ideal curve. This showed good consistency between the prediction of the nomogram and the actual observed outcomes in the training and validation cohorts.

Figure 4
figure 4

Clinical impact curve (CIC) detects the predictive value of two nomograms in GCC prognosis. (A,B) All variables nomogram. (C,D) TNM stage nomogram.

Figure 5
figure 5

Calibration plot of the nomogram for predicting 3-, 5-, and 10‐year overall survival (OS) and cancer-specific survival (CSS) in the training cohort. (A) 3-year OS; (B) 5-year OS; (C) 3-year CSS; (D) 5-year CSS.

Discussion

GCC is a kind of malignant tumor at the junction of the stomach and esophagus that mostly occurs in middle-aged people over 40 years old and elderly people and that accounts for approximately 10% of all digestive system tumors16,17. After the onset of the disease, patients often have clinical symptoms such as upper gastrointestinal bleeding, dysphagia, and stomach discomfort. The prognosis of patients with GCC is poor18, so it is very important to develop an effective system to predict the prognosis of these patients.

In this study, we first developed prognostic nomograms of OS and CSS in patients with GCC. We used the SEER database to conduct Cox regression analysis on many GCC patients to identify independent risk factors for OS and CSS. We constructed two prognostic nomograms: one based on multivariate Cox analysis and the other based on TNM stage. Through the examination of the C-index, ROC curve, DCA curve and CIC, it was found that our nomogram had better prognostic ability than that of the TNM stage. In addition, we verified and calibrated the nomogram and evaluated the accuracy of OS and CSS nomograms for 3 and 5 years. The results show that the predicted results of the nomogram are in good agreement with the actual observed results and are supported by the calibration curve, ROC curve analysis and C-index value. The C-index of the nomograms is more than 0.7, indicating that it has sufficient discrimination ability. The DCA results show that the nomogram we developed has good clinical practical value.

Recently, some nomograms containing various input variables have been developed to predict the prognosis of different digestive tract tumors19,20,21,22. Kim et al.19 developed and validated a nomogram that predicted the risk of lymph node metastasis in patients with early gastric cancer and could be used to avoid unnecessary gastrectomy after endoscopic dissection. By analyzing 9,026 patients with metastatic esophageal cancer between 2010 and 2015, Zhu et al.20 found that the nomogram was better at predicting distant metastasis of esophageal cancer than traditional TNM staging. Similarly, the nomogram developed by Xue et al.21 that includes nutritional and immune parameters can effectively predict the overall and postoperative survival rate and relapse-free survival of gastric cancer patients after radical gastrectomy, and its prediction accuracy and discrimination ability are better than those of TNM staging.

In the current reports on GCC, it is remarkable that Liu et al.22 developed a nomogram for predicting the total survival rate of GCC based on radiology and clinical predictors. However, the nomogram for predicting the OS of GCC based on radiology and clinical predictors is not available, and the nomogram for CSS is applicable for a limited population, since it is only suitable for patients undergoing preoperative radiotherapy.

TNM staging was determined according to the results of laboratory examination and postoperative pathological examination. For example, Gong et al.23 found that TNM stage was associated with the prognosis of high gastric cancer, and T stage was an independent factor for lymph node metastasis. Zhu et al.24 found that TNM staging was associated with patients with adenocarcinoma of the esophagogastric junction, but only N staging was an independent risk factor for prognosis. TNM staging is a common method to predict the prognosis of GCC patients, but TNM staging has limitations and cannot provide clinicians with personalized prognosis prediction. In this study, we successfully constructed a practical nomogram based on thirteen factors: sex, age, marital status, race, histological type, SEER stage, grade, T stage, N stage, M stage, tumor size, surgery, and chemotherapy, and its prediction power was better than that of traditional TNM staging.

Our research still has some limitations. First, the lack of treatment information (chemotherapy regimens, surgical methods, etc.) in the SEER database may alter our results. Second, our study is a retrospective study with inevitable selection bias. Third, due to the lack of external verification, we are concerned about the generality of our model and may need further research to prove our results.

Conclusions

The present study demonstrated that the nomogram is a better prognostic determinant than TNM staging systems in GCC patients. The nomogram we developed accurately and reliably predicted the 3- and 5-year OS and CSS of GCC. This model could enable clinicians to more precisely estimate the survival of GCC patients.

Patients and methods

Patient selection

The SEER database is an open public database and provides cancer data (e.g., treatment, primary site, tumor size, tumor stage, treatment regimen, pathological type, time of death, and cause of death) from the population based registries of 18 sites that cover approximately 28% of the USA population25. The National Cancer Institute's SEER*Stat software version 8.3.6 (https://seer.cancer.gov/seerstat/) (SEER 18 Regs Custom Data (with additional treatment fields), Nov 2018 Sub (2004–2016 varying) database) was used in this study.

The International Classification of Diseases for Oncology (ICD-O) site code C16.0 was used to identify patients diagnosed with GCC between 2004 and 2015. In total, 21,860 patients were enrolled in this study according to the specified inclusion and exclusion criteria. The exclusion criteria in our study were as follows: more than one primary tumor (n = 5,437); unknown survival time (n = 32); unknown T stage and T0 (n = 4,300); unknown N stage (n = 569); unknown M stage (n = 92); unknown tumor size (n = 3,843); unknown surgery status (n = 6); and unknown marital status (n = 249). In total, 7,332 GCC patients were included for this analysis.

Data regarding sex, age, marital status, race, histological type, SEER stage, grade, T stage, N stage, M stage, tumor size, surgery, chemotherapy, vital status, and survival time were extracted from the SEER database (2004–2015) for further analysis. OS duration was defined as the time from diagnosis until death or last follow-up. CSS duration was defined as the time from diagnosis until death because of GCC or last follow-up.

Statistical analyses

Cases were randomly divided into training and validation cohorts (ratio 7:3). Univariate and multivariate Cox regression models were applied to calculate the hazard ratio (HR) and 95% confidence interval (CI) to assess the independent contributions of each factor to OS and CSS. In the univariate Cox proportional hazard model, variables with P < 0.05 were further analyzed in the multivariate Cox proportional hazard model. Based on multivariate Cox analysis, a nomogram was developed to predict the 3- and 5-year OS and CSS rates. In contrast, we built another nomogram based on TNM stage. Then, we used MedCalc software (version 15.2.0) to generate the receiver operating characteristic (ROC) curve for the two nomograms and determined the area under the curve (AUC). The performance of the nomogram was assessed by the C-index and the calibration curve (1,000 bootstrap resamples). The C-index has a range from 0.5 to 1.0, with 0.5 indicating random chance and 1.0 considered perfect discrimination. Decision curve analysis (DCA) and a clinical impact curve (CIC) were employed to evaluate the net benefit of the nomogram in a clinical context.

The above statistical analyses were conducted using SPSS software version 24.0 (SPSS, Chicago, USA) and the statistical software package R version 3.5.3 (https://www.r-project.org/). All tests were two-sided. A P value ≤ 0.05 (two-sided) was considered statistically significant.

Research involving human participants and/or animals

This article does not contain any studies with human participants or animals performed by any of the authors.