A prognostic model for colorectal cancer based on CEA and a 48-multiplex serum biomarker panel

Mortality in colorectal cancer (CRC) remains high, resulting in 860,000 deaths annually. Carcinoembryonic antigen is widely used in clinics for CRC patient follow-up, despite carrying a limited prognostic value. Thus, an obvious need exists for multivariate prognostic models. We analyzed 48 biomarkers using a multiplex immunoassay panel in preoperative serum samples from 328 CRC patients who underwent surgery at Helsinki University Hospital between 1998 and 2003. We performed a multivariate prognostic forward-stepping background model based on basic clinicopathological data, and a multivariate machine-learned prognostic model based on clinicopathological data and biomarker variables, calculating the disease-free survival using the value of importance score. From the 48 analyzed biomarkers, only IL-8 emerged as a significant prognostic factor for CRC patients in univariate analysis (HR 4.88; 95% CI 2.00–11.92; p = 0.024) after correcting for multiple comparisons. We also developed a multivariate model based on all 48 biomarkers using a random survival forest analysis. Variable selection based on a minimal depth and the value of importance yielded two tentative candidate CRC prognostic markers: IL-2Ra and IL-8. A multivariate prognostic model using machine-learning technologies improves the prognostic assessment of survival among surgically treated CRC patients.


Statistics.
The endpoint for the prognostic evaluation was disease-specific survival (DSS), defined as the time from surgery until death from CRC. We used biomarkers as continuous variables in the univariate Cox regression analyses, all of which were analyzed using the false discovery rate (FDR) for the multiple-test correction 11 . We chose background characteristics consisting of patient age, tumor location, stage and gender for the multivariate survival analysis using the Cox regression model. We also calculated time-dependent receiver operating characteristic (ROC) curves and the area under the curves (AUCs) using the TimeROC package in R, and the integrated AUC over time from 6 to 60 months.
In addition, we tested the distributions for the continuous variables using the Mann-Whitney U test and the Kruskal-Wallis test. Survival time was estimated using the Kaplan-Meier method using dichotomized biomarker levels.
For variable selection to identify tentative prognostic markers for survival in CRC, we also used random survival forest modeling. We applied a terminal node size of 19 with 5000 trees, and sampling was completed with replacement and applied the gradient-based brier score-splitting rule. Random survival forest analysis was performed using the R packages randomForestSRC (https ://githu b.com/kogal ur/rando mFore stSRC ) and ggRandomForests (https ://githu b.com/ehrli nger/ggRan domFo rests ).
Survival analysis of IL-8. We dichotomized IL-8 levels using the maximum point of the Youden index for the Kaplan-Meier survival analyses (Fig. 1). For the subgroup Cox regression analyses, IL-8 levels were analyzed as a continuous variable (  Fig. 2A and Table 2). Patients with left-sided disease and high IL-8 levels exhibited a poor prognosis compared to patients with low IL-8 levels (HR 2.29; 95% CI 1.48-3.55; p < 0.001; Fig. 2B and Table 2). Overall, colon cancer patients with high IL-8 levels exhibited a poor  Fig. 2D and Table 2). Among patients with stages I or II, IL-8 did not serve as a prognostic factor (HR 1.02; 95% CI 0.34-3.11; p = 0.968; Fig. 2E and Table 2), whereas patients with stages III or IV disease and a high IL-8 level exhibited a poor prognosis compared to stage III or IV patients with low IL-8 levels (HR 1.67; 95% CI 1.11-2.53; p = 0.015; Fig. 2F and Table 2). Furthermore, we completed a multivariate survival analysis using a Cox regression model for IL-8 with the background characteristics of age at diagnosis, tumor location, stage and gender. IL-8 served as an independent prognostic marker (HR 1.01; 95% CI 1.00-1.01; p = 0.012) along with stages III and IV and age at diagnosis. Association analysis. IL-8 levels were significantly lower in patients with left-sided disease when compared to patients with right-sided disease (Mann-Whitney U test, p = 0.005; Table 2). IL-8 levels were significantly lower among patients with colon cancer compared to rectal cancer (Mann-Whitney U test, p = 0.005;  www.nature.com/scientificreports/ Table 2). Yet, IL-8 serum levels differed significantly between stages I, II, III and IV (Kruskall-Wallis test, p < 0.001; Table 2).

Multivariate survival analysis.
We developed the background model based on clinical and patient characteristics (gender, tumor location, stage classification, CEA levels and age). In doing so, we compared the study model created using the random forest survival techniques for this background model to identify potential www.nature.com/scientificreports/ candidate prognostic markers of CRC survival to study further. CEA was included in the background model since it is an established prognostic marker, often called the gold standard marker for CRC 12 . The integrated AUC (6-60 months) for the background model was 0.812. If patients were divided into high-risk and low-risk groups by optimizing the cut-off value for the linear predictor score of the background model using the maximal Youden index, the five-year survival for the low-risk group was 87.0% (95% CI 81.8-92.2%) and 47.0% (95% CI 38.3-55.8%) for the high-risk group. We used all of the biomarkers and variables applied to the background model for the random survival forest model. Given the small dataset, we used all patients for learning; therefore, only tentative results can be obtained from this analysis. Variable selection based on the value of the minimal depth above the threshold (6.74) and the value of importance above the threshold (0.025) yielded two tentative candidate CRC prognostic markers: IL-8 and IL-2Ra (Fig. 3). However, these thresholds are somewhat arbitrary, although the most prominent markers appear in the lower-left corner of Fig. 3. Another way to present the differences in survival according to the background model and the study model is by using the Kaplan-Meier curves (Supplementary Fig. 1). The integrated AUC for the random survival forest model was 0.943 (6-60 months). If patients were divided into high-and low-risk groups by optimizing the cut-off for the linear predictor score of the random survival forest model using the maximal Youden index, 5-year survival for the low-risk group reached 97.1% (95% CI 94.6-99.6%), falling to 25.3% (95% CI 17.0-33.6%) for the high-risk group.

Discussion
We investigated 48 chemokines in 328 CRC patients, and developed a multivariate learning model, thereby improving the prognostic assessment of CRC patients. Specifically, we found that IL-8 served as a significant prognostic marker for CRC survival.
In our ELISA multiplex analysis of 48 cytokines, we found 6 biomarkers with p < 0.1 in the univariate analysis. These were IL-6, IL-8, IL-2Rα, MIF, CTACK and SDF-1α, with only IL-8 reaching p < 0.05 following FDR correction. IL-6 is a proinflammatory cytokine involved in tumor growth, invasion and metastasis, known to be elevated in CRC patients with a poor prognosis 13,14 . However, IL-6 did not emerge as a significant prognostic www.nature.com/scientificreports/ factor for CRC in our study. T-cells express IL-2Rα, which plays a role in early CRC development by suppressing T-cell activation 15,16 . IL-2Rα was also elevated in our study, suggesting it may play a role in the systematic inflammatory response in CRC, resulting in a worse prognosis 17 . In a study based on preoperative serum samples from 96 CRC patients, IL-2Rα served as a significant independent prognostic factor in CRC 18 . Hypoxia tolerance represents one step in tumor development, and one hypoxia pathway gene known to overexpress in CRC tumors is MIF 19 . MIF activation also plays a role in chemotherapy resistance and participates in parallel intrinsic pathways in KRAS-driven CRC, promoting cell growth and proliferation 20 . We identified elevated MIF serum levels in our sample, also suggesting an increased protein expression. Yet, further research, such as immunohistochemistry and proximity ligand assays (PLAs), for example, are necessary in order to conclusively determine MIF's role in CRC prognosis. Furthermore, we identified elevated CTACK levels, indicating that it plays a significant role in CRC prognosis. CTACK is a cutaneous T-cell attracting the C-C motif (two adjacent cysteines) chemokine participating in inflammatory and immunoregulatory processes. In addition, CTACK recruits T-cells to cutaneous sites and elevated levels accompany Epstein-Barr virus-induced mucosal carcinoma 21 . In a case-control study by Song et al. among 437 CRC patients and a random subcohort among 774 patients, CTACK carried no predictive value 22 . However, elevated levels of serum CTACK appeared in patients with hepatocellular carcinoma treated with and responding to radiofrequency ablation or transarterial chemoembolization 23 . The reason for this discrepancy between previous findings and ours remains unclear.
The serum levels of SDF-1α were elevated in patients with a poor prognosis in our multivariate model, indicating that it represents a significant prognostic factor in CRC. A previous study found that SDF-1α overexpression functioned as a significant prognostic marker in a cohort of 163 CRC patients carried out on formalin-fixated paraffin-embedded tissue samples; yet, no studies have examined the serum levels 24 .
We also found that high IL-8 levels associate with an impaired prognosis, locally advanced disease and metastatic disease. This agrees with a meta-analysis by Xia et al. among 1509 CRC patients, indicating that IL-8 represents a potent indicator for CRC progression 25 . Furthermore, IL-8 plays a role in cancer cell survival, proliferation and chemoresistance, and was further shown to play an active role in the CRC cell endothelialto-mesenchymal transition (EMT) 26 . EMT is a critical developmental point for cancer cells, whereby epithelial cells undergo expression changes obtaining mesenchymal properties, thereby facilitating local invasion and representing a key point for adenocarcinoma metastasis 27 .
We succeeded in developing a refined and statistically advanced learning model with potent properties for clinical use. The integrated AUC for the random survival forest model of 0.943 (6-60 months) is, of course, quite good, since it uses all of the data. Because we could not use a test group, further refinements and validation are needed before we can make any definitive claims regarding its clinical role. Nevertheless, this model appears promising as a conventional multiplex or ELISA marker kit in CRC prognostics. Consensus regarding how to further develop and apply multivariate prognostic models to clinical practice remains unresolved 7 . The survival time difference increased in our learning model compared to the more general background model. We also found significant biomarkers in the multivariate model, yet these did not emerge as significant factors in the univariate analysis after FDR correction.
Our study's limitation lies in the lack of detailed data on adjuvant and neoadjuvant radiation-and chemotherapies. We also chose not to create predictive models with C-reactive protein (CRP), since we focused instead on the chemokines. It remains unclear whether including CRP would alter these results. One weakness is the fact that the concentrations of different molecules in the serum samples may decrease during long-term storage, even at -80 °C. This has not been tested in our samples, but the measurements were performed from sera thawed for the first time when assayed. In this study, we used serum for the multiplex measurements. We did not compare serum and plasma samples. Thus, it is possible that part of the molecules measured derive from granules from platelets and granulocytes in the serum sample.

Conclusions
IL-8 represents a significant prognostic biomarker in colorectal cancer (CRC). Multivariate prognostic models remain promising and useful tools in the prognostics for CRC patients. Survival time analysis improved in our learning model. Further trials using our AI-based model are warranted in order to improve the prognostic stratification of CRC patients. www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.