Construction and validation a nomogram to predict overall survival for colorectal signet ring cell carcinoma

To construct and validate a nomogram to predict the overall survival (OS) of colorectal signet ring cell carcinoma (SRCC). The potentially eligible cases were obtained against the SEER database from 2004 to 2015. Log-rank test and Cox analysis were conducted to identify the independent prognostic factors for predicting OS. The identified prognostic factors were later integrated for the construction of an OS prediction nomogram. Altogether 2904 eligible cases were identified, and the median survival time was 18 (range: 0–155) months. As suggested by multivariate analysis, age, primary site, grade, tumor size, T stage, N stage, M stage, surgery, lymph node dissection and chemotherapy were identified as the independent factors for predicting OS. Afterwards, the above variables were incorporated into the nomogram. The C-index indicated better discriminatory ability of the nomogram than AJCC 8th TNM staging and SEER summary stage systems (both P < 0.001). Calibration plots further showed good consistency between the nomogram prediction and actual observation. The time independent area under the curves (tAUCs) for 3-year and 5-year OS in nomogram were larger than AJCC and SEER summary stage system. The constructed nomogram could potentially predict the survival of colorectal SRCC individuals.

marital status, insurance status, primary site, year of diagnosis, tumor size, grade,T, N, M stage, surgery, lymph node dissection, radiotherapy and chemotherapy. In this study, the single (never married), widowed (having a domestic partner), separated and divorced cases were classified into unmarried category 17 . With regard to the primary tumor site, it was divided into cecum-transverse colon (such as appendix, cecum, ascending colon, transverse colon, hepatic flexure), descending colon-sigmoid (like descending colon, sigmoid colon, splenic flexure), multiple, rectum and unknown 18 . As for the year of diagnosis, it was classified as 2004-2007, 2008-2011, 2012-2015, in line with previous studies 19,20 . At the same time, tumor size and age were grouped according to previous articles as well [21][22][23][24] . The cancer stage was classified according to the AJCC 6th classification system that was adapted to SEER-derived patients diagnosed from 2004 to 2015. Meanwhile, the qualified patients were further regrouped in line with the AJCC 8th TNM classification system. In this study, the endpoint was set as overall survival (OS), which referred to the duration between diagnosis and death due to all causes 25 . Statistical analysis. Nomogram construction. Categorical variables were compared by Fisher's exact test or chi-square test and expressed in the manner of proportions and frequencies. Univariate analysis was conducted to predict prognosis using the Kaplan-Meier (K-M) approach as well as log-rank test. Upon univariate analysis, variables with P-value ≤ 0.1 were screened for multivariate backward stepwise Cox proportional hazard analysis to determine each possible independent risk factor. Additionally, multicollinearity diagnostics in statistical modeling were conducted by evaluating the correlations, variance inflation factors, and eigenvalues. Then, we established a nomogram model (based on identified prognostic factors) to predict 3-year, 5-year OS in the training and validation cohort using the rms package of R. We determined the total nomogram score for every case based on every variable score in the contour diagram for modeling group.
Nomogram validation. The nomogram was validated by determining its discrimination and calibration abilities using the internal (training) as well as external (validation) set, respectively. In addition, we used the concordance index (C-index) for evaluating our model discrimination performance and assessing the difference in the predicting ability between predicted and observed values 26 . As a result, the higher C-index value indicated the better patient discrimination ability among different prognostic outcomes. Also, we used the Rcorrp.cens package of Hmisc in R software for comparing the different results from the existing 8th TNM classification system and the SEER summary stage system, and utilized C-index to determine them. Using the marginal estimate versus model, plots presenting the calibration of predicted with measured survival outcomes were made, where the 45-degree plot was the optimal model with marked consistency in terms of the outcomes. The Receiver Operating Characteristic (ROC) curves were also plotted to validate the nomogram score. In this study, the bootstrapping re-sampling approach (1000 repetitions) was applied in obtaining the comparatively unbiased estimates and checking interval validation. Statistical analysis was conducted using SPSS19.0 (SPSS Inc., Chicago, USA) and R (version 3.51, www.r-proje ct.org). A difference of P < 0.05 (two-tailed) was deemed to be statistically significant.  Table 2).

Nomogram construction. Univariate analysis revealed 11 indicators could affect OS (shown in
Among them, marital status was included as an adjusted variable in the step-wise modeling. Consequently, multivariate analysis showed that age, primary site, grade, tumor size, T stage, N stage, M stage, surgery, lymph node dissection and chemotherapy were independent predictive indicators of OS (all P < 0.05). Multicollinearity diagnostic tests including pair-wise correlations, variance inflation factors plot and eigenvalues plot suggested that severe multicollinearity issues would not exist (Supplemental Figs. 1 and 2). A nomogram for 3-and 5-year OS prediction was constructed according to these independent factors (Fig. 2). Nomogram uncovered that AJCC stage made the greatest contribution to prognosis, followed by surgery, chemotherapy, number of lymph node dissection and age. By adding the scores of each selected variable, the likelihood of survival of the individual patient can be easily calculated.
Nomogram validation. The nomogram was validated internally and externally. In the training and validation cohorts, namely internal validation and external validation cohorts, the OS prediction C-indexes in nomogram were respectively 0.743 (95% CI, 0.730-0.755), 0.730 (95% CI, 0.710-0.751). Furthermore, a comparison was made between the discrimination ability of nomogram with the ability of SEER summary stage and TNM 8th staging classification, indicating that in the training as well as validation set (P < 0.001), the nomogram is superior to SEER and TNM 8th staging classification, as shown in Table 3. At last, as shown in Fig. 3, both the internal calibration plot and external calibration plot of the nomogram exhibited good consistency between the predictions and practical results based on the nomogram. Figure 4 showed the relevant ROC of the training and validation cohort. In the training cohort, the time independent area under the curves (tAUCs) of 3-and 5-years OS were 0.830 (0.810-0.850) and 0.840 (0.818-0.862). In the validation cohort, the tAUCs of OS for 3-and 5-years were 0.823 (95% CI: 0.793-0.853) and 0.810 (95% CI: 0.775-0.844), respectively, which were all greater than AJCC and SEER summary stage system. Bootstrapping with 1000 resamples in the training set

Discussion
A prognostic nomogram for 3-and 5-year OS prediction was constructed and validated in our study. We analyzed 2904 colorectal SRCC patients from SEER dataset, followed by constructing a nomogram for 3-and 5-year OS prediction. In addition, internal and external validation of the nomogram demonstrated favorable calibration as well as discrimination. Moreover, our established nomogram showed more potent predictive capacity compared to SEER summary stage or TNM staging systems, which could be readily applied in clinical practice to assist patient counseling as well as individualized therapy. Some independent factors for predicting prognosis were incorporated into our constructed nomogram. Besides, the survival was also analyzed based on the colorectal SRCC stage, which discovered that early stage patients had better prognosis than those at the advanced stage, and such results conformed to almost every study 27,28 . Ishihara and colleagues discovered that primary location might serve as the independent factor for prognosis prediction 29 . Typically, both tumor stage and primary location were identified as the prognostic factors in the present work. Moreover, this study identified tumor size and pathological grade as the independent prognostic factors for colorectal SRCC.
It is necessary to conduct multidisciplinary treatment of colorectal SRCC, so as to select the best therapeutic strategy, and it should be noted that surgery is significant to treat the localized tumors 30 . As suggested by a population-based study that enrolls 1972 colorectal SRCC patients between 1989 and 2010 to evaluate whether adjuvant chemotherapy is significant, adjuvant chemotherapy can offer survival benefits to stage III colon SRCC patients 31 . Tao Shi and coworkers also discovered that chemotherapy was linked with the superior survival of colorectal SRCC with distant metastasis 32 . Findings in this work also verified that both chemotherapy and surgery played important roles in diagnosing colorectal SRCC. Further, the surgical retrieval of at least 4 regional lymph nodes markedly enhanced patient survival. The above-mentioned factors remarkably impacted colorectal SRCC prognosis. Using our constructed nomogram, patients suffering from diverse tumor differentiation degrees were assigned with different scores and then with diverse survival outcomes, even though they might be at the same TNM stage. Besides, these results explicitly clarified the difference between the prognosis estimated using our constructed nomogram and that estimated by the TNM classification systems, which might explain the better ability of our nomogram in predicting OS than the TNM classification systems.
Previous studies have also explored nomograms in colorectal signet ring cell carcinoma 33,34 . Wang et al. retrospectively evaluated the patient records of mucinous adenocarcinoma and SRCC patients aged ≤ 40 years 34 . A nomogram predicting OS was created for risk quantitation. However, compared with our study, the number of cases enrolled in the previous study was still too small, and only included patients aged ≤ 40 years. Our study may be more comprehensive and practical.
As a statistical method, the nomogram is capable of providing survival possibility by formula calculation 35,36 . Nomogram has been validated to harbor superior predictive capacity in comparison with TNM stage system in certain types of malignant tumors, which is considered as an alternative or even a novel standard 37,38 . In particular, it is proper to use nomogram to handle complicated situations without clinical guidelines. And it is convenient and simple to utilize nomogram for survival prediction. To begin with, in a nomogram, from each clinicopathological parameter, a vertical line is drawn to "scores" line, followed by score addition to give rise to "total scores". Therefore, certain recommendation could accordingly be offer by clinicians. For instance, operation is suggested in well-differentiated populations in consideration of satisfactory prognosis. On the contrary, palliative chemotherapy is preferred in poorly-differentiated populations in consideration of decreased life expectancy. Thus, the presently established nomogram could help to choose patients with prolonged survival, who might benefit from palliative resection.
Several advantages exist in our research. The detailed clinicopathological data of colorectal SRCC from SEER database ensured that we successfully constructed a precise prognostic nomogram. Moreover, superior discriminative capacity for OS prediction is detected in our nomogram compared to SEER summary stage and TNM stage systems. Additionally, available clinical parameters are used, which is convenient for nomogram application.
Certain limitations should be noted in this population-based study. First of all, selection bias was inevitable due to the retrospective nature. Secondly, prognostic information, including the microsatellite stability/microsatellite instability (MSS/MSI) status, the RAS/BRAF/MSI status, family history, vascular invasion and patient condition, was not available in the SEER database, and future research should focus on these aspects. Thirdly, the convincing external verification was lacking in this work. At last, our constructed nomogram, which might   www.nature.com/scientificreports/ serve as a user-friendly approach for the decision-making of doctors, did not incorporate each prognostic factors or always offer accurate prognosis prediction in clinical practice.

Conclusion
In conclusion, for patients with colorectal SRCC, we established and validated a nomogram to predict 3-and 5-year OS based on a large, population-based cohort. The nomogram showed excellent performance and could be thought of as a practical tool to predict prognosis. Nevertheless, further mining of the uncertain prognostic parameters for the optimization of nomogram is still needed, which requires in-depth external validation.