Prognostic performance of three lymph node staging schemes for patients with Siewert type II adenocarcinoma of esophagogastric junction

The prognostic performance of different lymph node staging schemes for adenocarcinoma of esophagogastric junction (AEG) remains controversial. The objective of the present study was to compare the prognostic efficacy of the number of lymph node metastases (LNMs), the positive lymph node ratio (LNR) and the log odds of positive lymph nodes (LODDS). Patients diagnosed with Siewert type II AEG were included from the Surveillance, Epidemiology, and End Results database. Harrell’s C-index statistic, Schemper’s proportion of explained variation (PEV), the Akaike information criterion (AIC) and restricted cubic spine analyses were adopted to assess the predictive accuracy of LNM, LNR and LODDS. A total of 1302 patients with post-surgery Siewert type II AEG were included. LNM, LNR and LODDS all showed significant prognostic value in the multivariate Cox regression analyses. LODDS performed higher predictive accuracy than LNM and LNR, with relatively higher C-index, higher Schemper’s PEV value and lower AIC value. For patients with no nodes involved, LODDS still performed significantly discriminatory utility. LODDS showed more accurate prognostic performance than LNM and LNR for post-surgery Siewert type II AEG, and it could help to detect survival heterogeneity for patients with no positive lymph nodes involved.

The incidence of adenocarcinoma of esophagogastric junction (AEG) has increased dramatically in both Western and Asian countries over the past several decades [1][2][3][4] , which might be caused by the increasing trend of gastroesophageal reflux disease, obesity and smoking [5][6][7][8] . The definition and classification of AEG remains controversial, and the Siewert's classification 9,10 was commonly used and adopted by the seventh edition of Union for International Cancer Control (UICC) and American Joint Committee on Cancer (AJCC) tumor-node-metastasis (TNM) staging system 11 . AEG was defined as adenocarcinoma whose epicenter is in the distal thoracic esophagus, esophagogastric junction (EGJ), or within the proximal 5 cm of the stomach (cardia) that extend into the EGJ or distal thoracic esophagus and Siewert type II AEG arises from the cardiac epithelium 11 . AEG was staged identically to the staging criteria of esophageal cancer in AJCC 7th edition and the absolute number of lymph node metastases (LNMs) is currently used for the N stage. However, it has been reported that the current nodal staging criteria could be influenced by total number of lymph node retrieved and might cause stage migration 12,13 .
Recently, several different lymph node staging schemes were proposed for esophageal cancer 14,15 and gastric cancer [16][17][18] , including the number of LNMs, the positive lymph node ratio (LNR) and the log odds of positive lymph nodes (LODDS). LNR was defined as the ratio of the number of metastatic lymph nodes to the total number of examined lymph nodes 12 and LODDS was defined as the natural logarithm of the ratio of the probability of a lymph node being positive to the probability of a lymph node being negative when a single lymph node is retrieved 19 . However, the comparison of three lymph node staging schemes has not been evaluated specifically for Siewert type II AEG, and it's important to find the optimal prognostic indicator to provided evidence and

Materials and Methods
Study population. Patients were selected using the SEER*Stat software (version 8.3.2) from the latest version of SEER database ( SEER 18, 1973SEER 18, -2013, which was released in April 2016 and based on the November 2015 submission 20 . Patients were eligible for inclusion if they were at least 18 years old and diagnosed with microscopically confirmed primary AEG with no distant metastasis from 1988 to 2013. The histology types were restricted according to the International Classification of Disease-Oncology-3 rd edition (ICD-O-3), with codes of 8050, 8140-8147, 8160-8162, 8180-8221, 8250-8507, 8514-8551, 8571-8574, 8576, and 8940-8941 and tumor site codes of 160-162. The type of follow-up expected was restricted to 'active follow-up' , and the survival time should not be less than two months after the operation. A schema discriminator 'CS Site-Specific Factor 25' for distinguishing EGJ and stomach was used to distinguish AEG from gastric cancer, with codes of 020, 040, 060 and 982. The 5-year overall survival (OS) was selected as the outcome of interests for this study. Survival time was defined as the time between diagnosis and the death, the last contact or the cutoff date of December 31, 2013. The detailed selection codes for SEER database were shown in the Supplementary File 1.
A second round selection was conducted, and the detailed exclusion process was shown in Supplementary  Figure 1. The tumor extension information of AEG was not available before the year of 2004, and patients diagnosed with AEG after 2008 were excluded in order to ensure the 5-year follow-up. Thus, patients diagnosed between 2004 and 2008 were finally included. In addition, those who didn't receive surgery, diagnosed with exfoliative cytology results, with incomplete survival dates and distant metastasis were excluded. Tumors with mixed histopathologic types were also excluded, because the mixed type was classified as squamous cell carcinoma according to the seventh edition UICC/AJCC TNM staging system. Although the SEER database didn't provide the detailed information of Siewert type classification for AEG, the combined selection terminology of "Primary Site" encoded 160 (Cardia) and "CS site-specific factor 25" encoded 982 (EsophagusGEJunction) allowed us to obtain the Siewert type II AEG 21 . No institutional review board approval was declared because the SEER is a publicly available database.
Node staging schemes. LNM was defined identically to lymph node staging classification of the seventh edition of UICC/AJCC TNM staging system 22 , which was based on the number of positive lymph nodes examined: 0 (LNM-N0), 1-2 (LNM-N1), 3-6 (LNM-N2), ≥7 (LNM-N3). LNR was defined as the ratio of the number of positive lymph nodes to the total number of retrieved lymph nodes, which ranged from 0 to 1. LODDS value was calculated by the formula log e [(pN + 0.5)/(nN + 0.5)] 23 , where pN is the number of positive lymph nodes and nN is the number of negative lymph nodes and nN was calculated by subtracting pN from the total examined lymph nodes. A value of 0.5 was added to pN and nN to avoid singularity caused by null observations 19 . Statistical analysis. Spearman rank tests and scatter plots were adopted to assess the relationship between LODDS and LNM or LNR and to elucidate the distribution characteristics. Univarite analysis was first conducted to evaluate the prognostic performance of these lymph node schemes, and then multivariate analysis was performed based on statically significant factors in the univarite analysis. Then, restricted cubic splines were plotted to further display the association between log hazard ratio and LNM, LNR and LODDS 24 . The optimal cut-off values for LNR and LODDS were determined by X-tile software (http://www.tissuearray.org/rimmlab) and by the minimal P value approach 25 . The discrimination efficacy of the prognostic schemes was assessed by Harrell's C-index, which ranged from 0.5 (denotes random splitting) to 1 (perfect prediction) 26 . Besides, the bootstrap technique with 1000 repetitions was used for validation and to calculate the 95% confidence interval (CI). Schemper's proportion of explained variation (PEV), generally known as R 2 , could be used to measure predictive accuracy and explain variation for a specific predictor or model 27 . And a higher PEV value represented better prognostic accuracy. In addition, Akaike information criterion (AIC) was also applied to further evaluate the predictive efficacy for different lymph node staging schemes, and smaller AIC values represent more accurate prognostic stratification 28 .
All analyses were conducted using the SPSS 19.0 (IBM SPSS Inc. United States), Stata software (version 12.0; StatCorp, College Station, TX, USA), and R software version 3.2.2 (The R Foundation for Statistical Computing) with the Rms and Hmisc statistical packages. Statistical significance was set at P < 0.05 unless otherwise specified (All P values presented were 2-sided).

Results
Characteristics of patients and three lymph node staging schemes. A total of 1302 patients with resected Siewert type II AEG were finally included from the SEER database. The median survival time was 31 months and the overall 5-year survival rate was 36.25%. The characteristics and demographics of included patients were shown in Table 1. Univariate Cox regression analysis was conducted, which suggested that age, marital status, tumor size, tumor differentiation grade, T stage, LNM, LNR and LODDS were potential prognostic factors for AEG. However, gender, race, total lymph nodes retrieved and post-surgery radiation were not significantly associated with the prognosis of AEG.
The cut-off points of LNR and LODDS were determined by the X-tile software, which could explore the minimal P value. According to the X-tile analysis results (Supplementary Figure 2a and b), LNR was classified into LNR1 (value 0), LNR2 (0 < LNR2 ≤ 0.125) LNR3 (0.125 < LNR3 ≤ 0.425) and LNR4 (0.425 < LNR4 ≤ 1.000), and LODDS was classified into LODDS1 Table 1 23.03% (LODDS3), and 10.15% (LODDS4), respectively (P < 0.001). The Kaplan-Meier survival curves and log hazard ratio cubic spline analyses results according to three different lymph node staging schemes were shown in Fig. 1. The three lymph node schemes all showed significant discrimination efficacy and good prognostic performance for patients with resected Siewert type II AEG (log rank P < 0.001) (Fig. 1a,b and c). In addition, the cubic spline of log hazard ratios revealed non-linear increasing trend of death risk as LNM, LNR or LODDS increased (Fig. 1d,e and f). Due to the relatively small sample size for patients with no positive lymph nodes involved (N = 466) and all lymph nodes involved (N = 42), the cubic spline was not conducted in these subgroups. As shown in Fig. 2, the scatter plots suggested significant positive association between LODDS and LNM (Spearman rank test P < 0.001) and between LODDS and LNR (Spearman rank test P < 0.001), and LODDS had a higher coefficient correlation with LNR than with LNM (r = 0.943 versus r = 0.735), which indicated more accurate fitness between LODDS and LNR. However, for patients with no positive lymph nodes involved (LNR0) and all lymph nodes involved (LNR1), the LODDS value scattered out. In order to further explore the prognostic value of LODDS in these patients with relatively small sample size, we recalculated the cut-off value of LODDS and classified it into high and low categories by X-tile software, as shown in the Supplementary Figure 2c. For patients with no positive lymph nodes involved, the high and low intervals were −3.37 to −1.10 and −5.20 to −3.37, respectively. Then, Kaplan-Meier survival curves were conducted and the LODDS could distinguish the survival heterogeneity for patients with no positive lymph nodes involved (Fig. 3, log rank P = 0.003). However, the prognostic value of LODDS for patients with all lymph nodes involved was not further evaluated due to the small sample size (n = 42).

Comparison of the prognostic performance for three lymph node staging schemes. Multivariate
Cox regression analyses were further conducted to assess the lymph node staging schemes, as shown in Table 2. LNM, LNR and LODDS were all independent prognostic factors for post-surgery Siewert type II AEG (P < 0.001), and other independent prognostic factors included age, marital status, and T stage. According to the comparison results of the three lymph node schemes in Table 3, LODDS had the highest C-index (95% CI) of 0.673(0.654-0.692), and LODDS and LNR both performed significantly better prognostic efficacy than the LNM scheme, with P-values of 0.045 and 0.025, respectively. Although the comparison between LODDS and LNR was not statistically significant, the Schemper's PEV and AIC of LODDS were 15.90% and 1502.119, which indicated that LODDS scheme could explain variation better and showed superior predictive accuracy than LNM and LNR schemes (Table 3). Then, we combined the three node staging schemes with T stages to compare the modified T-LODDS-M and T-LNR-M staging system to the AJCC 7th TNM staging system. As shown in Table 3, the T-LODDS system still performed the highest C-index of 0.683(0.664-0.702), and T-LODDS (P = 0.005) and T-LNR (P = 0.015) systems were still both significantly better than the traditional TNM staging system. In order to further evaluate the stability of these prognostic factors, combined models with significant prognostic factors in the multivariate analysis were constructed, including age, marital status, T stage and the three lymph node staging schemes. Harrell's C-index analysis, Schemper's PEV and AIC analyses were conducted and the results also indicated the model with LODDS performed the best predictive accuracy, with the largest C-index of 0.707(0.689-0.725), the largest Schemper's PEV of 24.90% and the smallest AIC value of 1323.464 among three models with different node staging schemes.
In order to evaluate the impact of the total number of retrieved lymph nodes on the prognostic performance of the three node staging schemes, we classified the total number of retrieved nodes as 1-10 nodes, 11-20 nodes, and equal or more than 21 nodes. According to the Harrell's C-index analysis shown in Table 4 Subgroup and additional analysis. T stage is an independent prognostic factor for Siewert type II AEG, and we conducted the subgroup analysis stratified by different T stages, as shown in Supplementary Table 1 and Supplementary Figure 3. The three staging systems all performed good prognostic capability in stage T1, T3 and T4 patients, while the results were not statistically significant in stage T2 patients. This might be caused by the relative small sample size of stage T2 patients (n = 141). For each T stage, LODDS still had the highest C-index, and the lowest AIC value among the three lymph node staging schemes.
In order to further analyze the prognostic value of LODDS in different T stage, N stages and LNR categories, new cut-off values of LODDS were re-calculated by X-tile analyses in different subgroups and its prognostic    In addition, linear trend χ2 score was conducted to evaluate the discriminatory ability and monotonicity of gradients, and the likelihood ratio (χ2) test was conducted to assess homogeneity ability, as shown in Supplementary Table 3. The results indicated LODDS had the best efficacy with the highest scores among three schemes.

Discussion
The current study compared the prognostic efficacy of three lymph node staging schemes in patients with resected Siewert type II AEG from the SEER database, including LNM, LNR and LODDS. Among the three lymph node staging schemes, LODDS showed the best predictive accuracy and discriminatory utility, and consistent results were found for the multivariate model with LODDS. Compared with the traditional UICC/AJCC TNM classification, the novel schemes (LODDS and LNR) had better prognostic efficacy with higher C-index, higher Schemper's PEV values and lower AIC values. Besides, for patients with no positive lymph nodes involved, LODDS still performed good discriminatory efficacy and revealed the survival heterogeneity among these patients, which compensated the deficiencies of LNR and conventional LNM schemes on this issue. For these people with no positive lymph nodes involved, LODDS was re-categorized as high and low levels and patients with higher LODDS values had relatively poorer prognosis. Many studies have evaluated the prognostic value of different lymph node staging schemes 17 , and the advantages of LNR and LODDS over LNM have been validated for esophageal cancer and gastric cancer 15,16 , as well as LNR vs LNM for AEG 29 . All these studies provided evidence for the potential promising efficacy of new lymph node staging schemes. The theoretical foundation for LNR and LODDS was the combined information of both positive and negative lymph nodes. It has been reported that LNR was a reliable indicator to improve node classification for esophageal cancer and gastric cancer with less influence by insufficient number of lymph nodes retrieved 12,30 . Besides, the transformation of LODDS by adding a value to both the numerator and the denominator is the least biased estimator of the true log odds to avoid singularities caused by null observations 23 , which enables the rationality and more accurate discrimination for LODDS.
Although the total number of retrieved lymph nodes were not significantly associated with the prognosis of 5-year OS for the whole cohort according to the univariate Cox regression analysis results, it could impact the prognosis of patients with no positive lymph nodes involved and impact the prognostic performance of three lymph node staging schemes. The Harrell's C-index of LNM, LNR and LODDS increased as the total number increased. LODDS was better than LNR and LNM for patients with 1-10, 11-20 and equal or more than 21 nodes retrieved, which indicated that LODDS showed more accurate prognostic performance than LNM and LNR in all subgroups and it was a relatively stable indicator. Interestingly, our study also found that marital status was an independent prognostic factor for patients with post-surgery Siewert type II AEG and marriage was proven to be a protective factor, which was consistent with the previous studies for gastric cancer 31, 32 and esophageal cancer 33 . According to the multivariate Cox regression analysis results, the divorced or separated and single (never married) patients had relatively higher mortality rate compared with those married, thus more spiritual and social support should be given to these patients.
Although there was no statistically significant difference of C-index between LODDS and LNR, the superiority of LODDS as a novel node staging schemes for Siewert type II AEG is still recommended. Due to the unique statistical features of LODDS, it could detect the survival heterogeneity of patients with no positive lymph nodes involved and higher LODDS value predicted poorer survival of these patients. The sample size of stage N0 Siewert type II AEG patients was 466, which accounted for 35.79 percent of the total patients. Distinguishing the high-risk population with no positive lymph node involved had important clinical significance for early stage patients, and more intensive intervention, examination or treatment should be recommended for these people.
AEG was staged as an esophageal adenocarcinoma according to the seventh edition of UICC/AJCC TNM classification with a proclivity for proximal spread mainly via lymphatics in the submucosa of the esophagus 34 , however, the staging system and classification criteria for AEG remains controversial. It has been reported by Hasegawa et al. 35 that the Siewert type II and III AEGs were more appropriate to be staged by the gastric cancer TNM classification. However, an eighth edition staging primer for esophageal and EJG cancer 36 suggested that Siewert types I/II EGJ cancer should be staged as esophageal cancer, and if the cancer center is more than 2 cm distal from the EGJ, it should be staged as gastric cancer. Of note, the lymph node staging criteria were similar between esophageal cancer and gastric cancer in the seventh UICC/AJCC TNM classification 22 , with stage N0 of 0 positive node, stage N1 of 1 to 2 positive nodes, stage N2 of 3 to 6 positive nods, and stage N3 of equal or more than 7 positive nodes. Stage N3 was divided into N3a (7-15 nodes) and N3b (16 or more nodes) for gastric cancer. Thus, regardless of which system was used, the current study provided evidence for the novel lymph node staging schemes specifically in patients with Siewert type II AEG, which performed prognostic superiority over the already existing lymph node staging system and provided evidence and reference for the future staging criteria. The prognosis of AEG is significantly associated with the extent of nodal involvement and tumor location, which could impact the lymphatic dissemination 37 , and there have been studies evaluated the lymphadenectomy approach 38 and surgical approach for EGJ cancer 39 , as well as the multidisciplinary management for AEG 34,40 . However, limitations should be acknowledged in the current analysis. The information of Siewert types I and III AEG was not available from the SEER database, which limited further analysis of other cancer subtypes. Besides, the detailed information of patients' comorbidities, surgical approach and post-surgery treatment (such as chemotherapy regimens) was not provided. Patients treated with preoperative radiation therapy were excluded to reduce its impact on survival, and the impact of post-surgery radiation was not significantly associated with the 5-year OS of AEG according to the univariate analysis, while the chemotherapy information was not available and it might cause nodal down-staging.

Conclusion
Despite these potential limitations, our study indicates that LODDS showed more accurate prognostic performance than LNM and LNR in patients with post-surgery Siewert type II AEG, and it could help to detect survival heterogeneity for patients with no positive lymph nodes involved.
Ethical approval. All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. No ethics approval was declared because the SEER is a publicly available database.
Informed consent. We obtained permission to access SEER research data files with the reference number 11536-Nov2015. Extraction of data from the SEER database does not require informed consent.  Table 3. Analysis for prognostic performance of different node classifications and different models for Siewert type II esophagogastric junction adenocarcinoma. * By bootstrap method (B = 1000). AIC: Akaike information criterion. a A model with combined variables including age, marital status, T stage and LNM.  Table 4. Impact of total number of retrieved lymph nodes on the prognostic performance of node staging schemes for Siewert type II esophagogastric junction adenocarcinoma. LNM: lymph node metastasis. LNR: positive lymph node ratio. LODDS: log odds of positive lymph node. Continuous LNM, LNR and LODDS were included for evaluation. AIC: Akaike information criterion.