Retrospective analysis of predictive factors for lymph node metastasis in superficial esophageal squamous cell carcinoma

This study aimed to identify the risk factors of lymph node metastasis (LNM) in superficial esophageal squamous cell carcinoma and use these factors to establish a prediction model. We retrospectively analyzed the data from training set (n = 280) and validation set (n = 240) underwent radical esophagectomy between March 2005 and April 2018. Our results of univariate and multivariate analyses showed that tumor size, tumor invasion depth, tumor differentiation and lymphovascular invasion were significantly correlated with LNM. Incorporating these 4 variables above, model A achieved AUC of 0.765 and 0.770 in predicting LNM in the training and validation sets, respectively. Adding macroscopic type to the model A did not appreciably change the AUC but led to statistically significant improvements in both the integrated discrimination improvement and net reclassification improvement. Finally, a nomogram was constructed by using these five variables and showed good concordance indexes of 0.765 and 0.770 in the training and validation sets, and the calibration curves had good fitting degree. Decision curve analysis demonstrated that the nomogram was clinically useful in both sets. It is possible to predict the status of LNM using this nomogram score system, which can aid the selection of an appropriate treatment plan.


Scientific Reports
| (2021) 11:16544 | https://doi.org/10.1038/s41598-021-96088-y www.nature.com/scientificreports/ support is the radical esosophagectomy to remove all potentially involved nodes. Consequently, it is critical to explore the predictive factors of LNM in SESCC patients before ER. In several studies, some imaging methods (EUS, CT or PET) can detect LNM of SESCC, but these methods are not precise enough to completely rule out the presence of the LNM [9][10][11] . Additionally, the clinicopathological risk factors associated with LNM in SESCC are still understood incompletely 2 . The purpose of this study was to determine the risk factors of LNM in SESCC patients. Then a nomogram was established using these risk factors, which can help predict LNM and determine whether or not a supplementary esophagus resection is necessary after ER.

Methods
Patients selection and data collection. Between March 2005 and April 2018, the data of patients with histopathologically-confirmed esophageal cancer (Tis or T1 stage) who underwent esophagus resection at Zhejiang Cancer Hospital were retrospectively analyzed. The exclusion criteria were: (1) patients who received chemotherapy or radiotherapy before surgery; (2) patients with basaloid squamous cell carcinoma, adenosquamous carcinoma, sarcomatoid carcinoma, neuroendocrine carcinoma, or spindle cell carcinoma. The final eligible patients with SESCC who were admitted between March 2005 and August 2012 were assigned to the training set and those admitted between September 2012 and February 2018 were assigned to the validation set. Endoscopic findings of the tumors macroscopic type were classified according to the Paris classification 12 . Nonprotruding or nonexcavated superficial tumors were classified as type II (flat type), protruding and excavated superficial tumors were classified as type I and type III (type I and III were considered as nonflat type). The flowchart of patient selection is summarized in Fig. 1. Lymph node dissection. In this study, lymph node dissections were performed according to esophageal cancer location 13 . For upper thoracic esophageal cancer, the rate of cervical and upper mediastinal lymph node metastasis is high. Thus, lymph node dissection included the neck (two-field lymph node dissections). For middle thoracic esophageal cancer, lymph node metastasis mainly occurs in the neck and the upper, middle, and lower mediastinum as well as the abdominal cavity. The extent of lymph node dissection included the neck and supraclavicular area (three-field lymph node dissections). For the lower thoracic esophageal cancer, lymph node metastasis mainly occurs in the mediastinum and abdomen, and cervical metastasis is relatively low. So the twofield lymph node dissections were performed for these patients.
Histopathologic evaluation. Surgical specimens were fixed with formaldehyde and were then cut serially to make slices. The intervals between the tumor tissue and adjacent normal tissues in the slices were 2-5 mm. Tumors that exceed the muscularis mucosa were considered as submucosal invasion 14 . We then classified the location of esophageal cancer according to the guidelines of the American Joint Committee on Cancer 15 . The portion of the esophagus extending from the entrance of the thoracic cavity to the bifurcation of the trachea is considered the upper esophagus, the section from the trachea bifurcation to the distal esophagus (above the esophagogastric junction) is regarded as the middle esophagus, and the intra-abdominal portion of the esophagus and the junction of the esophagus and stomach constituted the lower esophagus.  Statistical analysis. Continuous variables are expressed as median (range) and compared using Mann-Whitney test. Categorical variables were compared using the χ 2 test or Fisher exact test. All variables associated with LNM at a significant level were candidates for stepwise multivariate logistic analysis. The integrated discrimination improvement (IDI) is the difference in the discrimination slopes for a prediction model with and without one variable, which indicates whether the discrimination slope of a model will improve if one important parameter is added. The net reclassification improvement (NRI) is an index that attempts to quantify how well a new model correctly reclassifies subjects. Typically, this comparison is between an original model and a new model (the original model plus one additional component) 16,17 . The IDI and NRI were calculated using R, version 4.0.3 with the PredictABEL package.
According to the results of multivariate logistic regression analysis, we used R software (version 4.0.3) with the rms package to formulate a nomogram. The nomogram can proportionally convert each regression coefficient in the logistic regression to a scale of 0-100 points 18 . The points of each independent variable were summed and the predicted probabilities were derived from the total points. The area under the curve (AUC) and calibration curve were used to assess the predictive performance of this nomogram. The most important and final line of evidence for the use of this nomogram is based on the need to interpret individual requirements with regard to additional treatment or care. Decision curve analysis (DCA) offers insight into clinical outcomes on the basis of threshold probability, from which the net benefit could be derived. Net benefit is defined as the proportion of true positives minus the proportion of false positives, weighted by the relative harm of false-positive and falsenegative results 19 . In order to evaluate the clinical utility of the nomogram, DCA was performed using R with the rmda package. In all analyses, P < 0.05 was considered to indicate statistical significance. All analyses were performed using SPSS version 22.0 (SPSS Inc, Chicago, Ill) and R, version 4.0.3.

Independent risk factors of LNM.
Comparisons of clinicopathological characteristics between the LNMpositive and-negative groups are summarized in Table 2. In training and validation sets, variables such as tumor size, tumor invasion depth, tumor differentiation, LVI and macroscopic type, were significantly associated with the LNM according to the univariate analysis (Table 2). However, age, gender, circumferential extension, tumor location and the presence of multiple lesions did not show any statistical correlation with LNM. Furthermore, tumor size, tumor invasion depth, tumor differentiation and LVI were identified as independent risk factors of LNM in training and validation sets by multivariate analysis. Interestingly, in training set macroscopic type was not correlated with LNM (P = 0.064), while it was considered as a risk factor in validation set (Table 3).
Predictive utility of macroscopic type for LNM prediction. Then Fig. S2). However, the IDI and NRI values showed statistically significant improvement after adding macroscopic type to model A (Table 4), meaning that macroscopic type can also be considered as a risk factor of LNM. Reclassification results of patients who had LNM and those did not have were showed in Table S1 and Table S2.
Development and validation of a LNM-predicting nomogram. Subsequently, we used ROC analysis to determine the cutoff value of tumor size as 2 cm in training set and 2.5 cm in validation set (Fig. S3). The LNM rates according to the risk factors based on the results of multivariate logistic analysis are summarized in Table S3 and Table S4. Patients with tumors of > 2 cm (training set) or > 2.5 cm (validation set) in size, submucosal invasion, LVI, poor tumor differentiation and non-flat type (I or III) of tumor gross examination seemed to have high LNM rate.
Finally, a nomogram for LNM prediction was formed by incorporating five variables-tumor size, tumor invasion depth, tumor differentiation, LVI and macroscopic type (Fig. 2). The nomogram was validated by internal (bootstrap method) and external validation (validation set). The Hosmer-Lemeshow test yielded a P value of 0.995, indicating that the model was well fitted. This nomogram showed a good performance for predicting LNM risk, with an AUC (or C-statistics) of 0.777 (95% CI 0.724-0.825) (  (Table 4). Additionally, a calibration curve of the training set demonstrated good consistency between the predicted and observed results regarding the LNM status (Fig. 3A). In validation set, the nomogram achieved an AUC of 0.790 (95% CI 0.737-0.836) for the estimation of LNM risk (Table 4, Fig. S2B), and its calibration curve also fitted well (Fig. 3B).
The nomogram score system for LNM risk prediction and clinical use. Each predictive variable displayed in the nomogram was assigned a risk score. The detailed scores of five variables (tumor size, tumor invasion depth, tumor differentiation, LVI and macroscopic type) in training and validation sets are presented in Fig. 2, Table S5 and Table S6. We predicted the presence of LNM by summing the scores of these five variables, and the final total scores ranged from 0 to 317 in training set and 0 to 281 in validation set. The optimal cutoff points of the total nomogram score for LNM in the training set and validation set were determined to be 150 and 148 respectively according to the ROC curve analysis (Table S7 and Table S8). As a result, patients with total scores ≤ 150 in the training set and ≤ 148 in the validation set were classified as low risk, and patients with total scores of > 150 (the training set) and > 148 (the validation set) were classified as high risk (Table S7 and  Table S8). In addition, the DCA in the training and validation sets indicated that our nomogram had significant  (Fig. 4).

Discussion
For the histopathological type of esophageal cancer, adenocarcinoma account for the majority in western countries, while esophageal squamous carcinoma is the predominate type in China 20 . Superficial esophageal squamous cell carcinoma (SESCC) just invade the mucosa and submucosa and lack of any subjective symptoms. Hence, early diagnosis was difficult for these patients, and most esophageal cancers were at a locally advanced stage when the diagnosis was confirmed in the past. However, due to the progress in flexible endoscopic procedure and widespread use of endoscopic screening, the incidence of SESCC is increasing 21 .
In patients of SESCC, LNM contributes a lot to the unfavourable prognosis 22 , resulting in a significantly lower 5-year survival rate in LNM positive patients than in LNM negative patients 23,24 . Endoscopic resection (ER) is mainly suitable for the low-risk LNM patients whose tumors can be completely removed by endoscopic surgery in the light of the guidelines of SESCC diagnosis and treatment 25 . Because of the restriction of ER for lymph node biopsy, we aimed to identify predictors of LNM to prevent them from the presence of tumor cells after ER. Our findings indicated that positive LNM patients were statistically more likely to have larger tumors, poorer differentiation, deeper tumor invasion and LVI in the training and validation sets. Macroscopic type was also determined to be significantly associated with LNM in the multivariate analysis of the validation set, but lost significance in the multivariate analysis of the training set. www.nature.com/scientificreports/ Some previous studies reported that LNM seemed to be correlated with tumor size and also these results had statistical significance in SESCC patients [26][27][28][29] . It can be concluded from our study that tumor size was significantly correlated with LNM in entire 520 patients and also identified as an important predictor of LNM. Although SESCC comprises both mucosal and submucosal cancers, the LNM status may differ between mucosal and submucosal cancers. Taking mucosal infiltration as reference, the odds ratio of the submucosal infiltration was 3.112 (95% CI 1.025-9.436) for prediction of LNM in our training set (Table 3), demonstrating that the presence of submucosal infiltration was identified as a significant risk factor of LNM. The LNM rate among SESCC patients with mucosal cancer was 8.1% (5/62), while the incidence of LNM increased dramatically to 29.36% (64/218) in patients with submucosal invasion (Table S3). Tumor invasion depth was also reported as a risk factor of LNM in previous studies 6,30,31 , which was similar to our results, suggesting that endoscopic resection might not be appropriate for submucosal cancers 32 .
As well as the tumor invasion depth, LVI was also considered as a remarkable risk factor for LNM in SESCC patients from several studies 31,33,34 . Similarly, it was shown from our data that LVI was significantly related to LNM in SESCC patients ( Table 3). For that reason, supplementary surgical therapy with lymph node dissection should be pondered when LVI is detected in the tumor specimen resected by endoscopic surgery. Interestingly, we also found that the LNM rates were still high even in tumors without LVI. For tumors confining to muscularis mucosa with negative LVI, the LNM rates were 6.6% (4/61) and 5.8% (3/52) in training set and validation set respectively; while for the tumors invading to the submucosa without LVI, the LNM rates of training and validation sets increased to 24.2% (46/190) and 26.5% (43/162) respectively (Table S3). Eguchi et al. 31 pointed out that Table 3. Multivariate logistic analysis of risk factors for lymph node metastasis in training and validation sets.
The bold values mean statistical significance. LVI, lymphovascular invasion; I = superficial and protruding type; II = flat type; III = superficial and excavated type.   www.nature.com/scientificreports/ the LNM rate in SESCC without LVI was 10.3% for tumors involving the muscularis mucosa and was 28.6% for tumors with SM invasion, which is similar to ours. The high rate LNM in SESCC with negative LVI may attribute to the existence of early and skip metastasis along the abundant lymphatic channels in the mucosa and submucosa cancers without LVI. In general, the absence of LVI is also a requirement for curative endoscopic resection. It was previously reported that histological differentiation was a potential risk factor of LNM 26,28,35 . Consistently, we also found a significant association between tumor differentiation and LNM in the current study (Table 3). Macroscopic appearance of esophageal cancer was seemed to be related to the tumor invasion depth, which might be crucial to evaluate the LNM risk 36 . Interestingly, from our multivariate analysis, there was no correlation between the nonflat type morphology and LNM in training set; in contrast, tumor with nonflat type was identified as an independent risk factor for LNM in validation set (    www.nature.com/scientificreports/ Moreover, a nomogram was developed for LNM prediction by incorporating the five significant predictors (tumor size, tumor invasion depth, tumor differentiation, LVI and macroscopic type), with an AUC of 0.777 in the training set and 0.790 in the validation set (Table 4, Fig. S2).
The great accuracy and consistency of our nomogram for predicting LNM were confirmed by the calibration curves (Fig. 3). Then the cutoff values of total nomogram score were determined as 150 in training set (Table S7) and 148 in validation set (Table S8) according to the ROC analysis. Patients with a total score of > 150 in the training set and a total score of > 148 in the validation set were considered high-risk, which can guide us to make best treatment decision. Finally, the DCA was performed to confirm the clinical utility of our nomogram and its result showed that if the threshold probability of a patient was > 20%, more benefit was added than either the scheme of treating all patients or the scheme of treating zero patient by using our nomogram to predict LNM (Fig. 4).
In summary, tumor size, tumor invasion depth, tumor differentiation, and LVI were identified as significant predictive factors for LNM in patients with SESCC. Tumor macroscopic type was also identified as a predictor for LNM by calculating the IDI and NRI. Furthermore, a nomogram scoring system was established using these five variables, making individualized LNM prediction easier and facilitating optimal treatment strategy selection for patients with SESCC. Judging from the nomogram scoring system, careful follow-up observation can be recommended if the LNM of SESCC patients after ER is low risk, and supplementary surgery need to be taken if the LNM of SESCC patients after ER is high risk. DCA demonstrated that the nomogram was clinically useful. However, this was a retrospective study based on data from a single institution. Therefore, it is necessary to validate the results using data from multiple centers and a prospective study is required to further confirm the reliability of the nomogram. Last but not least, our nomogram may improve and facilitate treatment strategy selection, which may lead to early diagnosis and prompt treatment initiation for patients with SESCC.