Introduction

Colorectal cancer (CRC) is the second most common cause of cancer-associated mortality in the United States, and it has posed a great threat to global health1. In recent years, colorectal signet ring cell carcinoma (SRCC), one of the CRC subtypes2, has aroused wide attention. It is extensively reported that, SRCC commonly originates from the undifferentiated colorectal mucosal stem cells; therefore, fast proliferation, low differentiation level, metastasis and diffuse infiltration can be frequently detected3,4. In addition, SRCC is also identified by the AJCC 7th TNM classification system as the independent factor to predict the adverse prognosis5. Nonetheless, there are only small case series and case reports available for colorectal SRCC, while information regarding its clinicopathological characteristics and prognostic outcomes remains largely unexplored6. In this regard, it is important to precisely estimate the prognosis of SRCC cases, which may facilitate the development of risk-based individualized treatment and the best therapeutic strategies.

TNM stage system is a prevalent method to predict the outcomes in tumor patients through assessing tumor size and location (T), regional lymph node metastasis (N) and distant metastasis (M)7. However, TNM classification is not efficient to encompass cancer biology as well as to precisely predict outcomes of colorectal SRCC8. In addition, other clinicopathological parameters can also affect prognosis in SRCC patients, including tumor grade, tumor site, race, age and therapy4,9. Hence, it is urgently demanded to establish a novel stage classification involving tumor features and patient status.

As a simple, user-friendly statistical method, nomogram has been uncovered to harbor comparative or even superior predictive capacity over conventional TNM stage systems in various types of malignancies10,11,12. To be specific, the successful establishment of nomogram should not only consider the prognostic weight of every parameter to calculate the possibility of an outcome but incorporate several independent indicators for optimal conclusion. Of note, nomograms are capable of accurately estimating survival for individuals via the assessment of vital prognostic factors than TNM stage system13,14. However, as far as we know, there is no such population-based nomogram specifically for colorectal SRCC. To this end, we aimed at constructing and verifying a nomogram for OS prediction in colorectal SRCC based on Surveillance, Epidemiology and End Results (SEER) database.

Materials and methods

Ethics statement

SEER is the greatest cancer database with the highest authoritativeness in North America15, which includes cancer data through covering nearly 30% of the US populations across different geographic regions that can stand for population diversity16. To collect related information from this database, the SEER Research Data Agreement (No. 19817-Nov2018) was signed in this study, and data were searched against this database in line with those approved guidelines. The extracted data were publicly accessible and de-identified, and the data analysis was considered as non-human subjects by Office for Human Research Protection, thus, no approval was demanded from institutional review board.

Study population

The eligible cases were screened using the SEER*State v8.3.6 approach (released on August 8th, 2019). In the present work, we included 18 SEER regions between 2004 and 2015. Patients conforming to the following criteria were enrolled: (1) those with primary colorectal SRCC; and (2) those with SRCC diagnosed according to the third version of the International Classification of Disease for Oncology (ICD-O-3; coded as 8490/3). At the same time, patients conforming to any one of the following conditions were excluded from this study: (1) patients with more than one primary tumor; (2) those with clinical diagnosis, or those diagnosed based on autopsy or the death certificate; (3) those with insufficient data like the mode of surgery and AJCC stage; (4) those whose tumor location was not mentioned; (5) those with unavailable information on prognosis. The remaining participants were included into the initial SEER cohort. For establishing and validating the nomogram, the enrolled cases were randomized into the training or validation set.

Covariates and endpoint

The following patient characteristics were analyzed, including gender, age,race, marital status, insurance status, primary site, year of diagnosis, tumor size, grade,T, N, M stage, surgery, lymph node dissection, radiotherapy and chemotherapy. In this study, the single (never married), widowed (having a domestic partner), separated and divorced cases were classified into unmarried category17. With regard to the primary tumor site, it was divided into cecumā€“transverse colon (such as appendix, cecum, ascending colon, transverse colon, hepatic flexure), descending colonā€“sigmoid (like descending colon, sigmoid colon, splenic flexure), multiple, rectum and unknown18. As for the year of diagnosis, it was classified as 2004ā€“2007, 2008ā€“2011, 2012ā€“2015, in line with previous studies19,20. At the same time, tumor size and age were grouped according to previous articles as well21,22,23,24. The cancer stage was classified according to the AJCC 6th classification system that was adapted to SEER-derived patients diagnosed from 2004 to 2015. Meanwhile, the qualified patients were further regrouped in line with the AJCC 8th TNM classification system. In this study, the endpoint was set as overall survival (OS), which referred to the duration between diagnosis and death due to all causes25.

Statistical analysis

Nomogram construction

Categorical variables were compared by Fisher's exact test or chi-square test and expressed in the manner of proportions and frequencies. Univariate analysis was conducted to predict prognosis using the Kaplanā€“Meier (Kā€“M) approach as well as log-rank test. Upon univariate analysis, variables with P-valueā€‰ā‰¤ā€‰0.1 were screened for multivariate backward stepwise Cox proportional hazard analysis to determine each possible independent risk factor. Additionally, multicollinearity diagnostics in statistical modeling were conducted by evaluating the correlations, variance inflation factors, and eigenvalues. Then, we established a nomogram model (based on identified prognostic factors) to predict 3-year, 5-year OS in the training and validation cohort using the rms package of R. We determined the total nomogram score for every case based on every variable score in the contour diagram for modeling group.

Nomogram validation

The nomogram was validated by determining its discrimination and calibration abilities using the internal (training) as well as external (validation) set, respectively. In addition, we used the concordance index (C-index) for evaluating our model discrimination performance and assessing the difference in the predicting ability between predicted and observed values26. As a result, the higher C-index value indicated the better patient discrimination ability among different prognostic outcomes. Also, we used the Rcorrp.cens package of Hmisc in R software for comparing the different results from the existing 8th TNM classification system and the SEER summary stage system, and utilized C-index to determine them. Using the marginal estimate versus model, plots presenting the calibration of predicted with measured survival outcomes were made, where the 45-degree plot was the optimal model with marked consistency in terms of the outcomes. The Receiver Operating Characteristic (ROC) curves were also plotted to validate the nomogram score. In this study, the bootstrapping re-sampling approach (1000 repetitions) was applied in obtaining the comparatively unbiased estimates and checking interval validation. Statistical analysis was conducted using SPSS19.0 (SPSS Inc., Chicago, USA) and R (version 3.51, www.r-project.org). A difference of Pā€‰<ā€‰0.05 (two-tailed) was deemed to be statistically significant.

Results

Patient characteristics

In total, 2904 qualified subjects who were diagnosed with colorectal SRCC from 2004 to 2015 were enrolled in this research. In addition, 2032 and 872 subjects were assigned into the training and validation cohorts, respectively. The flow chart of data selection was displayed in Fig.Ā 1. Among all the subjects, 50.69% were males, with a median age was 63Ā years (range: 12ā€“103Ā years). Most subjects were married (53.99%) and white (81.61%). Cecumā€“transverse colon (61.95%) was the most prevalent tumor site, followed by rectum (19.42%), descending colonā€“sigmoid (15.87%) and multiple (1.48%). Tumor size Ėƒ 5Ā cm (42.84%) was the most common. Most cases of colorectal SRCC had advanced clinical stage (T3: 44.42%; T4:45.83%; N2:46.83%) and advanced pathological grade (grade III/IV: 80.65%).Operations were performed on 2568 (88.43%) patients, of which 1728 (59.50%) were total colectomy/ proctectomy. Most patients (79.72%) had more than four lymph nodes removed. More than half of the patients received chemotherapy (58.13%) and only 13.83% patients received radiotherapy. The median survival time was 18.0Ā months (0ā€“155Ā months). The 3- and 5-year OS rates were 35.6%, and 28.1%. The demographic and clinicopathological features were listed in Table 1, indicating no significant difference between two groups.

Figure 1
figure 1

Flow chart for patients selection.

Table 1 Patient demographics and pathological characteristics.

Nomogram construction

Univariate analysis revealed 11 indicators could affect OS (shown in Table 2). Among them, marital status was included as an adjusted variable in the step-wise modeling. Consequently, multivariate analysis showed that age, primary site, grade, tumor size, T stage, N stage, M stage, surgery, lymph node dissection and chemotherapy were independent predictive indicators of OS (all Pā€‰<ā€‰0.05). Multicollinearity diagnostic tests including pair-wise correlations, variance inflation factors plot and eigenvalues plot suggested that severe multicollinearity issues would not exist (Supplemental Figs.Ā 1 and 2). A nomogram for 3- and 5-year OS prediction was constructed according to these independent factors (Fig.Ā 2). Nomogram uncovered that AJCC stage made the greatest contribution to prognosis, followed by surgery, chemotherapy, number of lymph node dissection and age. By adding the scores of each selected variable, the likelihood of survival of the individual patient can be easily calculated.

Table 2 Univariate and multivariate analyses of overall survival (OS) for patients in training set.
Figure 2
figure 2

Nomogram to predict 3-year (A) and 5-year (B) overall survival (OS) of colorectal SRCC patients.

Nomogram validation

The nomogram was validated internally and externally. In the training and validation cohorts, namely internal validation and external validation cohorts, the OS prediction C-indexes in nomogram were respectively 0.743 (95% CI, 0.730ā€“0.755), 0.730 (95% CI, 0.710ā€“0.751). Furthermore, a comparison was made between the discrimination ability of nomogram with the ability of SEER summary stage and TNM 8th staging classification, indicating that in the training as well as validation set (Pā€‰<ā€‰0.001), the nomogram is superior to SEER and TNM 8th staging classification, as shown in Table 3. At last, as shown in Fig.Ā 3, both the internal calibration plot and external calibration plot of the nomogram exhibited good consistency between the predictions and practical results based on the nomogram. FigureĀ 4 showed the relevant ROC of the training and validation cohort. In the training cohort, the time independent area under the curves (tAUCs) of 3ā€“ and 5ā€“ years OS were 0.830 (0.810ā€“0.850) and 0.840 (0.818ā€“0.862). In the validation cohort, the tAUCs of OS for 3- and 5- years were 0.823 (95% CI: 0.793ā€“0.853) and 0.810 (95% CI: 0.775ā€“0.844), respectively, which were all greater than AJCC and SEER summary stage system. Bootstrapping with 1000 resamples in the training set yielded similar discrimination.The 3ā€“year and 5ā€“ year tAUCs of the prognostic model in the training set were 0.829(0.812ā€“0.850) and 0.839 (0.820ā€“0.860), respectively.

Table 3 C-indexes for the nomogram and other stage systems in patients with colorectal signet ring cell carcinoma.
Figure 3
figure 3

Calibration plots of nomogram to predict 3- and 5-year overall survival (OS) in training (A,B) and validation cohorts (C,D). The X-axis indicated nomogram-predictive survival; the Y-axis suggested actual CSS. A plot with 45-degree line was suggestive of a perfect calibration where predictive possibilities were identical to actual ones. Vertical bars indicated 95% CIs.

Figure 4
figure 4

Discriminatory accuracy for predicting OS assessed by receiver operator characteristics (ROC) analysis calculating time independent area under the curves (tAUCs). 3-year (A) and 5-year (B) in the training cohort; 3-year (C) and 5-year (D) in the validation cohorts.

Discussion

A prognostic nomogram for 3- and 5-year OS prediction was constructed and validated in our study. We analyzed 2904 colorectal SRCC patients from SEER dataset, followed by constructing a nomogram for 3- and 5-year OS prediction. In addition, internal and external validation of the nomogram demonstrated favorable calibration as well as discrimination. Moreover, our established nomogram showed more potent predictive capacity compared to SEER summary stage or TNM staging systems, which could be readily applied in clinical practice to assist patient counseling as well as individualized therapy.

Some independent factors for predicting prognosis were incorporated into our constructed nomogram. Besides, the survival was also analyzed based on the colorectal SRCC stage, which discovered that early stage patients had better prognosis than those at the advanced stage, and such results conformed to almost every study27,28. Ishihara and colleagues discovered that primary location might serve as the independent factor for prognosis prediction29. Typically, both tumor stage and primary location were identified as the prognostic factors in the present work. Moreover, this study identified tumor size and pathological grade as the independent prognostic factors for colorectal SRCC.

It is necessary to conduct multidisciplinary treatment of colorectal SRCC, so as to select the best therapeutic strategy, and it should be noted that surgery is significant to treat the localized tumors30. As suggested by a population-based study that enrolls 1972 colorectal SRCC patients between 1989 and 2010 to evaluate whether adjuvant chemotherapy is significant, adjuvant chemotherapy can offer survival benefits to stage III colon SRCC patients31. Tao Shi and coworkers also discovered that chemotherapy was linked with the superior survival of colorectal SRCC with distant metastasis32. Findings in this work also verified that both chemotherapy and surgery played important roles in diagnosing colorectal SRCC. Further, the surgical retrieval of at least 4 regional lymph nodes markedly enhanced patient survival. The above-mentioned factors remarkably impacted colorectal SRCC prognosis. Using our constructed nomogram, patients suffering from diverse tumor differentiation degrees were assigned with different scores and then with diverse survival outcomes, even though they might be at the same TNM stage. Besides, these results explicitly clarified the difference between the prognosis estimated using our constructed nomogram and that estimated by the TNM classification systems, which might explain the better ability of our nomogram in predicting OS than the TNM classification systems.

Previous studies have also explored nomograms in colorectal signet ring cell carcinoma33,34. Wang et al. retrospectively evaluated the patient records of mucinous adenocarcinoma and SRCC patients agedā€‰ā‰¤ā€‰40 years34. A nomogram predicting OS was created for risk quantitation. However, compared with our study, the number of cases enrolled in the previous study was still too small, and only included patients agedā€‰ā‰¤ā€‰40Ā years. Our study may be more comprehensive and practical.

As a statistical method, the nomogram is capable of providing survival possibility by formula calculation35,36. Nomogram has been validated to harbor superior predictive capacity in comparison with TNM stage system in certain types of malignant tumors, which is considered as an alternative or even a novel standard37,38. In particular, it is proper to use nomogram to handle complicated situations without clinical guidelines. And it is convenient and simple to utilize nomogram for survival prediction. To begin with, in a nomogram, from each clinicopathological parameter, a vertical line is drawn to ā€œscoresā€ line, followed by score addition to give rise to ā€œtotal scoresā€. Therefore, certain recommendation could accordingly be offer by clinicians. For instance, operation is suggested in well-differentiated populations in consideration of satisfactory prognosis. On the contrary, palliative chemotherapy is preferred in poorly-differentiated populations in consideration of decreased life expectancy. Thus, the presently established nomogram could help to choose patients with prolonged survival, who might benefit from palliative resection.

Several advantages exist in our research. The detailed clinicopathological data of colorectal SRCC from SEER database ensured that we successfully constructed a precise prognostic nomogram. Moreover, superior discriminative capacity for OS prediction is detected in our nomogram compared to SEER summary stage and TNM stage systems. Additionally, available clinical parameters are used, which is convenient for nomogram application.

Certain limitations should be noted in this population-based study. First of all, selection bias was inevitable due to the retrospective nature. Secondly, prognostic information, including the microsatellite stability/microsatellite instability (MSS/MSI) status, the RAS/BRAF/MSI status, family history, vascular invasion and patient condition, was not available in the SEER database, and future research should focus on these aspects. Thirdly, the convincing external verification was lacking in this work. At last, our constructed nomogram, which might serve as a user-friendly approach for the decision-making of doctors, did not incorporate each prognostic factors or always offer accurate prognosis prediction in clinical practice.

Conclusion

In conclusion, for patients with colorectal SRCC, we established and validated a nomogram to predict 3-and 5-year OS based on a large, population-based cohort. The nomogram showed excellent performance and could be thought of as a practical tool to predict prognosis. Nevertheless, further mining of the uncertain prognostic parameters for the optimization of nomogram is still needed, which requires in-depth external validation.