Prognostic and predictive value of a lncRNA signature in patients with stage II colon cancer

The current staging method is inadequate to identify high-risk recurrence patients with stage II colon cancer (CC). Using a systematic and comprehensive-biomarker discovery and validation method, we aimed to construct a lncRNA-based signature to improve the prognostic prediction of stage II CC. We identified 1,377 differently expressed lncRNAs by analyzing 16 paired stage II CC tumor tissue and adjacent normal mucosal tissue from the TCGA dataset. Subsequently, using a univariable and step multivariable Cox regression model, we trained an 11-lncRNA signature in the training cohort (n = 141), which could divide patients into high-risk and low-risk groups (AUC at 3 years = 0.801, 95% CI: 0.724–0.877; AUC at 5 years = 0.801, 95% CI: 0.718–0.885). Significantly, patients in the high-risk group had poorer recurrence-free survival (RFS) compared with the low-risk group (log-rank test, P < 0.001 in the training cohort). This lncRNA-based signature was further confirmed in the validation cohort (P < 0.001). Multivariate Cox regression and stratified survival analyses showed that the prognostic value of this signature was independent of other clinicopathological risk factors (CEA, T stage, and chemotherapy). Time-dependent receiver operating characteristic (ROC) analysis demonstrated that this signature had better prognostic ability than any other clinical risk factors or single lncRNAs (all P < 0.05). A nomogram was constructed for clinical use, which integrated both the lncRNA-based signature and clinical risk factors (CEA and T stage) and performed well in the calibration plots. Altogether, our lncRNA-based signature was an independent prognostic factor and possessed a stronger predictive power compared with the currently used clinicopathological risk factors when predicting the recurrence of patients with stage II CC. Collectively, this lncRNA-based signature might facilitate individualized treatment decisions and postoperative counseling, ultimately contributing to improved survival.

www.nature.com/scientificreports/ proliferation, and metastasis, affecting the prognosis for CC patients [15][16][17] . These data indicate that the lncRNAs can be potential diagnostic and prognostic biomarkers in CC. Recent findings on lncRNAs in CC also support the development of biomarkers for the precise evaluation of cancer progression [18][19][20][21] . However, no comprehensive study on prognostic biomarkers has been carried out based on the expression profiles of lncRNAs in stage II CC patients. The combination of multiple variables rather than just a single biomarker can provide more robust and accurate information for prognosis, contributing to individualized treatment in this clinical setting 22,23 . In the current study, we conducted a systematic analysis and developed a novel lncRNA-based signature to predict individualized recurrence in stage II CC patients. We initially identified the differentially expressed lncRNAs (DElncRNAs) in paired stage II CC from The Cancer Genome Atlas colon adenocarcinoma (TCGA-COAD). Then, the DElncRNAs were subjected to univariable and step multivariable Cox regression analysis to train a lncRNA-based signature to predict recurrence-free survival (RFS) in stage II CC patients. Finally, the lncRNA signature was validated and incorporated into a prognostic nomogram. Additionally, we compared its predictive performance with other clinicopathological risk factors.

Materials and methods
Ethical statement. All procedures about human participants were in accordance with the ethical standards of the Clinical Research Ethics Committee of Qilu Hospital, Shandong University and performed in accordance with the Declaration of Helsinki. Understanding and written informed consent were obtained from each subject.
Patients and clinical database. The enrolled patients of this study were from the publicly available TCGA dataset and a clinical validation set from Qilu Hospital, Shandong University. In the TCGA cohort, transcriptome profiling information and corresponding clinical pathological data of stage II colon patients were downloaded from https:// portal. gdc. cancer. gov. The gene transfer format (GTF) files (Homo sapiens.GRCh38.91.chr. gtf) from Ensemble (http:// asia. ensem bl. org) were used to annotate the data and distinguish mRNAs and lncR-NAs.
Patients with lack of survival information and less than one month follow -up time were excluded, and as a result, 141 stage II colon cancer patients were included. Among them, 16 patients with paired tumor and adjacent normal tissues were used to screen differentially expressed lncRNAs. Then 141 stage II colon cancer patients were used as the training set. In the clinical validation set, we collected 63 formalin-fixed paraffin-embedded (FFPE) samples of stage II CC in Qilu Hospital, Shandong University (Jinan, China) between October 2009 and September 2013 based on the following criteria: (a) pathological confirmed colon cancer with stage II disease (T3-4, N0, M0); (b) with related clinical pathological information and survival data; (c) none of the patients have received preoperative chemotherapy, radiotherapy or chemoradiotherapy; (d) without other tumor diseases meanwhile. All of the specimens were assessed by two pathologists based on the AJCC/UICC TNM grading system 8th edition.
RT-qPCR analysis of lncRNA expression. We firstly extracted the total RNA from 10-μm-thick FFPE specimens by RNAprep pure FFPE kit (cat. no. DP439; TIANGEN Biotech, Beijing, China). All the process involving RNA were conducted in RNase-free conditions. The cDNA was synthesized from an equal amount of total RNA of each sample using SureScript™ First-Strand cDNA Synthesis kit (cat. No. QP056; GeneCopoeia, Guangzhou, China) according to the manufacturer's instructions. lncRNA expression was assessed by Bio-Rad CFX96 Detection System (Bio-Rad, Hercules, CA) with Blaze Taq™ SYBR Green qPCR Mix 2.0 (cat. No. QP033; GeneCopoeia, Guangzhou, China). The lncRNA expression levels were calculated using the 2 −dCT method with GAPDH as the reference gene. The obtained expression data were then log2 transformed. The primers for all lncRNAs and GAPDH used were purchased from Ribobio (Guangzhou, China), and the primers information was list in Table S1. Study procedures. This study was performed in three stages: discovery stage, training stage and validation stage. A flowchart of the procedures is shown in Fig. 1. In the discovery stage, 16 paired tumor and adjacent normal tissue of stage II colon cancer patients from TCGA dataset were used to screen differentially expressed lncRNAs. In the training stage, the obtained candidate lncRNAs were entered univariate Cox proportional hazard regression model to evaluate the correlation between lncRNA level and RFS in the training set. Subsequently, the lncRNAs with top statistical significance (P value ≤ 0.01) were subjected to a step multivariate Cox regression model to train lncRNA signature. A survival-related model for stage II colon patient was established to predict prognosis which using selected lncRNA expression, weighted by their multivariate Cox regression coefficients as follows: Riskscore = i coefficient(lncRNA i ) × expression(lncRNA i ) . X-tile plots (X-tile, version 3.6.1; Yale University School of Medicine, New Haven, CT, USA) was used to obtain the optimum cut-off value), and patients in the training set were divided into high-and low-risk groups. Kaplan-Meier curve and time dependent ROC curve were used to examine the prognostic ability of lncRNA-based signature. In the validation stage, we calculated the risk score of patients in the validation set using the same risk score formula obtained from the training set. Then we divided the patients into high-risk group and low-risk group using the cutoff value from the training set. Kaplan-Meier curve and ROC curve were used to examine the prognostic performance of the lncRNA signature in the validation set.
Statistical analysis. Statistical  www.nature.com/scientificreports/ method to plot survival curves and used log-rank tests to compare the difference. The univariate analysis and multivariate analysis of prognostic factors were performed using Cox proportional hazard regression model. Time-dependent ROC analysis was applied to examine the prognostic ability ('survivalROC' package), and the bootstrapping method with 10,000 iterations was performed to compare the differences between the AUCs. A nomogram was built by using the regression coefficients in multivariable Cox regression model to weigh each variable. Calibration plot and ROC curve were used to assess the performance of nomogram ("rms" package).

Results
Clinical characteristics of the enrolled participants. Table 1 shows the detailed clinical and pathological characteristics of the enrolled patients, which were similar between the training and validation cohorts (all P > 0.05).
Identification of DElncRNAs by analyzing the TCGA dataset. First, we retrieved the transcriptome profiling data from TCGA-COAD database and obtained 16 normal samples and 152 tumor samples with stage II CC. Among them, 16 paired tumor tissue and adjacent normal tissue were used to screen DElncRNAs. As a result, 1,377 lncRNAs were identified as DElncRNAs with an absolute fold change > 2 and an FDR < 0.05 (Table S2), among which 863 were upregulated, and 514 were downregulated in CC compared with adjacent normal tissue ( Figure S1).

Identification of the prognostic lncRNAs from the training cohort.
To single out the prognostic lncRNAs, the 1,377 DElncRNAs were submitted to the univariate Cox regression analysis to examine their assassination with RFS in the training cohort. Of these DElncRNAs, 23 candidate lncRNAs with top statistical significance (P value ≤ 0.01) were entered into a multivariate Cox proportional hazards model by stepwise method (Table S3). As a result, we trained an RFS-related signature consisting of 11 lncRNAs ( . Based on this formula, the risk score of each patient in the training cohort was calculated, and the patients were stratified into two groups: a high-risk group (n = 32) and a low-risk group (n = 109) according to the cutoff threshold obtained from X-tile plots ( Figure S2). Figure 3A,B show the distribution of risk scores and recurrence status, respectively, indicating that high-risk patients generally had poorer survival than low-risk ones. The heatmap showed the expression pattern of lncRNAs between the high-risk and low-risk groups (Fig. 3C). Kaplan-Meier survival curves demonstrated that patients in the high-risk group had a shorter RFS (Fig. 3D) and OS ( Figure S3A) compared with the low-risk group (log-rank test, P < 0.001). The time-dependent ROC at varying time points showed that the lncRNA signature harbored a promising prognostic ability to predict the recurrence of patients in the training cohort (AUC at 3 years = 0.801,  Validation of the lncRNA signature for RFS prediction in the validation cohort. To evaluate the robustness of the lncRNA signature in identifying high-risk patients, we further examined the prognostic performance of the signature using the validation cohort. We calculated the risk score of patients in the validation   (Table 2). Besides, the age, T stage, and preoperative CEA level of patients were significant prognostic factors in stage II CC patients in univariable analyses (all P < 0.05). To better assess the prognostic potential of our lncRNA signature, a stratification analysis was introduced to confirm the independence of our lncRNA signature in various subgroups (according to age, T stage, and preoperative CEA level). Figure 5 shows that the survival curves of the high-risk group were situated below those of the low-risk group in all subgroups. In addition, log-rank tests showed that high-risk patients had poorer RFS compared with low-risk ones in all subgroups (Fig. 5A,B,C,D,E,F). Some stage II CC patients were treated with postoperative adjuvant chemotherapy, which could affect the outcome and recurrence of patients. To eliminate the potentially confounding effect, we also performed stratification analysis by postoperative chemotherapy, and the results showed that high-risk patients identified by the lncRNA-based signature had poorer www.nature.com/scientificreports/ RFS than the low-risk ones in both chemotherapy and no-chemotherapy subgroups (Fig. 5G,H), confirming its reliable predictive ability regardless of the chemotherapy status. The multivariable Cox analyses showed that preoperative CEA level and T stage were independent prognostic factors for RFS in patients with stage II CC. We then performed ROC analysis to compare the predictive ability of the lncRNA signature with preoperative CEA level and T stage. Figure 6 shows that the lncRNA-based signature Time-dependent ROC curves analysis. We used AUCs at 3 and 5 years to assess the prognostic accuracy, and calculated P value using the log-rank test. www.nature.com/scientificreports/ risk score model possessed a more substantial predictive power than any other risk factors (preoperative CEA level and T stage), or single lncRNA alone (all P < 0.05), confirming the reliable predictive ability of our lncRNA signature. www.nature.com/scientificreports/ Construction of nomogram based on the lncRNA signature. To provide a quantitative method for the clinician to predict the probability of cancer recurrence, we constructed a nomogram that integrated both the lncRNA signature and clinicopathological independent risk factors for patients' RFS (including T stage and preoperative CEA level) (Fig. 7A). Calibration plots showed that the bias-corrected lines of 3 and 5 years were very close to the ideal 45-degree curve, indicating high agreement between prediction and observation (Fig. 7B).

Discussion
In the present study, we developed and validated a novel prognostic lncRNA-based signature to predict postoperative tumor recurrence for stage II CC patients. Our results demonstrated that this lncRNA-based signature could successfully divide patients into the high-risk group and low-risk group with significant differences in both RFS and OS. Furthermore, the prognostic and predictive value of this lncRNA-based signature was superior to other clinical risk factors. When stratified by these clinical risk factors, the lncRNA-based signature maintained its strong prognostic value. The survival of CC patients primarily depends on the stage at diagnosis 6 . Although diagnosed in locoregional disease, stage II CC contributes to 16% of CC-related death 24 . Moreover, it is more heterogeneous than other stages of the tumor, which can be divided into low-, intermediate-and high-risk groups according to the widely recognized clinicopathologic high-risk factors of the National Comprehensive Cancer Network (NCCN) guidelines 5 . Postoperative adjuvant chemotherapy is necessary for stage III patients to preclude recurrence and improve survival 5 . As for most patients with stage II disease, complete surgical resection alone is enough, and adjuvant chemotherapy brings specific adverse effects with a survival improvement of less than 5% at 5 years 7,25,26 . Therefore, it is urgently necessary to identify the minority of stage II patients with high recurrence risk who really benefit from adjuvant chemotherapy. In the present study, we constructed and validated a prognostic lncRNAbased signature to predict recurrence. The signature could effectively stratify patients into high-risk and low-risk www.nature.com/scientificreports/ groups. The identified high-risk patients were recommended to receive adjuvant chemotherapy after surgery. As a result, reduced recurrence and extended life expectancy were observed. The identified low-risk patients were cured by radical resection alone, thereby avoiding unnecessary adjuvant chemotherapy, as well as its adverse events, cost, and inconvenience. Previous studies have reported multiple differentially expressed lncRNAs between CC and normal tissues, which play roles in the carcinogenesis and progression of CC 27,28 . In particular, ZEB1-AS1, FAM83H-AS1, LINC01296, and LINC01234 have been reported to be correlated with clinicopathological parameters and patients' survival [18][19][20]29 . ZEB1-AS1 is highly expressed in CC, and a high level of ZEB1-AS1 is associated with poor survival in CC patients 18 . As a common aberrant lncRNA in several cancers, FAM83H-AS1 functions by regulating TGF-β signaling and leads to poor CC prognosis 19 . However, these studies focus on single lncRNAs and concern all disease stages of CC rather than specific stage II disease. The multivariate COX proportional hazard regression model helps to combine multiple lncRNAs into one panel, which can significantly improve the prognostic efficiency over single ones. Our team developed a lncRNA-based signature consisting of 11 RFS-related lncRNAs by using the univariate and stepwise multivariate COX method in the TCGA dataset. The signature was validated in another cohort and demonstrated to be an independent prognostic factor, holding better predictive ability than clinicopathological risk factors.
Among the identified 11 lncRNAs, AC090502.1, AL356652.1, AC011352.3, AC100791.2, AC123768.1, AP000911.1, FOXD3-AS1, AC022784.3, and LINC02119 were risk factors, whereas AC093895.1 and AP002358.1 were protective factors. The biological function of some lncRNAs enrolled in our signature has been investigated previously. As a crucial regulatory effector, FOXD3-AS1 is closely associated with multiple types of cancers, www.nature.com/scientificreports/ including CC [30][31][32][33] . Wu and colleagues have found that FOXD3-AS1 up-regulation implies poor survival in CRC patients, which is consistent with our results. They have also explored the underlying mechanism and demonstrated that FOXD3-AS1 can promote the progression of CC by regulating the miR-135a-5p/SIRT1 axis 30 . Guo has reported that FOXD3-AS1 is overexpressed in non-small cell lung cancer, and FOXD3-AS1 upregulation promotes the tumor progression by regulating the miR-135a-5p/CDK6 axis in non-small cell lung cancer 31 . AP002358.1 has been reported to be an essential gene of the enhancer RNA panel, which is closely related to the prognosis of thyroid cancer patients and involved in tumor development. Consistent with our results, they have also suggested that AP002358.1 is a "low-risk factor" for its high level is associated with a good prognosis in thyroid cancer patients 34 . The remaining lncRNAs have not been researched yet. Therefore, further studies are required to explore the contribution and function of these lncRNAs in CC.
In the present study, the combined model consisting of the 11 lncRNAs exhibited a significant association with the survival of CC patients. Multivariate Cox analysis showed that the 11-lncRNA-based signature could predict the recurrence of CC independently of the traditional clinical parameters. Stratification analysis showed that our lncRNA signature could effectively stratify patients into high-and low-risk groups within all subgroups. Time-independent ROC analysis demonstrated that the lncRNA signature possessed a stronger predictive power than other clinical risk factors. Since some stage II CC patients were treated with postoperative adjuvant chemotherapy, this could affect the outcome and recurrence of patients. To eliminate the potentially confounding effect, we examined the association between the 11-lncRNA-based signature and recurrence in both chemotherapy and no-chemotherapy subgroups. The results indicated that high-risk patients identified by the lncRNA-based signature had poorer RFS than the low-risk ones in all subgroups, confirming its reliable predictive ability regardless of the chemotherapy status.
A prognostic nomogram is a visual tool based on Cox proportional hazards regression model. Variables closely related to prognosis are assigned specific values according to their contribution to outcome events (named regression coefficient), and the total scores of all variables are calculated to obtain the individual event probability and realize the individualized prediction of prognosis 35,36 . The prognosis and recurrence of tumors are jointly affected by genes as well as clinicopathological parameters. To maximize the use of patients' clinical information, we constructed a nomogram model based on the aforementioned lncRNA-based signature and independent clinicopathological variables (including T stage and preoperative CEA level) to realize the visualization of a complex mathematical formula. The calibration curves and time-dependent ROC curve analysis showed that our nomogram model had a good fitting and favorable prediction accuracy, respectively. Therefore, our nomogram model could serve as an essential tool for risk stratification and prognosis prediction in patients with stage II CC, facilitating individualized treatment decisions and postoperative counseling and ultimately contributing to improved survival.
Collectively, we constructed and validated an RFS-related lncRNA-based signature, which could effectively classify stage II CC patients into low-and high-risk groups for tumor recurrence. Furthermore, the signature was proved to possess reliable prognostic and predictive value for recurrence of patients, which was superior to other traditional clinical risk factors. However, this signature should be further validated in large-scale multicenter clinical trials.

Data availability
All data generated or analyzed during this study are included in this published article and its Supplementary Information files.