THBS2 is a Potential Prognostic Biomarker in Colorectal Cancer

Colorectal cancer is one of the most common leading causes of death worldwide. Prognostic at an early stage is a useful way that decrease and avoid mortality. Although remarkable progress has been made to investigate the underlying mechanism, the understanding of the complicated carcinogenesis process was enormously hindered by large-scale tumor heterogeneity. Here we proposed that the prognosis-related gene THBS2, responsible for cooperativity disorientation, probably contain untapped prognostic resource of colorectal cancer. We originally established Spearman correlation transition, Kaplan–Meier survival analysis and meta-analysis that combine public dataset and clinical samples to quantify the prognostic value of THBS2. THBS2 could be considered as a novel prognostic marker in colorectal cancer.

Colorectal cancer (CRC) is the third most common occurring noncutaneous carcinoma and the third leading cause of cancer-related death worldwide 1 . The patients suffering from CRC can be cured if detected at an early stage. Unfortunately, majority patients could not be accurately diagnosed until advanced stages accompanied with malignant proliferation, extensive invasion, lymph node and distant metastasis 2 . Despite advances have been made by surgical techniques and chemotherapeutic options since the last decades, the survival rate of colorectal cancer seems not to be substantially improved, and approximately 20% of patients die of recurrence and metastasis events accounting for the poor prognostic outcomes 3 . There are not marked variability in outcome exists that can predict the patients who have potential higher risk in recurrence, metastasis, resistance to chemotherapy and decreased survival by using conventional histopathologic staging. Thus, identification of novel prognostic biomarkers in colorectal cancer is vital to develop effective targeted treatment for clinical benefits.
The Gene Expression Omnibus database (GEO) 4 and The Cancer Genome Atlas (TCGA) are international public repositories that contained a large number of genomic datasets with clinical information. Several genes that have diagnostic value in colorectal cancer have been found by using these datasets [5][6][7] . While tumor stage, tumor grade, patient age at diagnosis, performance status, race or areas, ethnicity as well as the diet propensity are important factors that may lead a contradistinction phenotype. Therefore, a systematic integration of gene expression data from multiple sources meta-analyses, increase statistical power for detecting differentially expressed datasets while allowing for an assessment of heterogeneity, and may lead to more robust, reproducible and accurate predictions 8 .
Thrombospondins (THBS2) is a multifunction alglycoprotein released from various types of cell, including stromal fibroblasts, endothelial cells and immune cells 9 . THBS2 exerts its diverse biological effects such as angiogenesis, cell motility, apoptosis, cytoskeletal organization by binding with extracellular matrix (ECM) proteins and cell surface receptors [10][11][12] . THBS2 is also known to regulate the activation of transforming growth factor-β 1 (TGF-β 1) 13 . Notably, THBS2 is special in this family for their type I repeats and mainly shown as complex function in cancer research [14][15][16] . Our previous study used biopsy screening and laser capture techniques significant slightly separated rectal cancer anterior single germ of tumor budding, tumor cells and normal mucous cells in fresh colorectal cancer surgical specimens 17 . The results showed that the expression of THBS2 in tumor budding was significantly increased that may closely related with tumor metastasis and prognosis. To verify our aforementioned hypothesis, we examined the THBS2 expression levels in human colorectal cancer tissues and corresponding normal tissues in public datasets and clinical samples. And a comprehensive evaluation by meta-analysis that explored the possible correlation between THBS2 with clinical prognosis in colorectal cancer will be more accuracy.

Results
CRC datasets preparation. GSE17536 and corresponding clinical data were downloaded from the publicly available GEO datasets. This dataset include 177 patients, 96 male and 81 female, in total 55 patients were dead during the follow-up time and 122 patients survived. With the filtering condition display recurrence in 145 patients, 36 patients relapse and 109 were non recurrence. 59 recurrence-related differentially expressed genes have been extracted between the recurrence and non-recurrence groups, including 59 up-regulated genes and 3 down-regulated genes (Supplementary Table S1).  Hierarchical cluster and Neighborhood Scoring. To identify and validate the existence of different genes in recurrence and non-recurrence groups, hierarchical clustering was performed using samples of cancer based on the genes that were identified as differentially expressed in these two subtypes. In order to analyze these 59 genes in combination with biological significance and statistical significance, we mapped these 59 genes to human protein interaction network to establish the interaction among them (Fig. 1B). In the construction of the network, the degree of variation that included distribution and change fold were used to reflect the importance of genes ( Fig. 1C-E, Supplementary Table S2). The above result showed that several genes MEP1A, MFAP5, THBS2, VCAN, FBN1 and COMP owned the highest score with important degree node distribution which had interaction with many genes. Pathway analysis. DAVID was applied to analysis recurrence associated genes and enrichment on gene functions and pathways. In this study, recurrence associated genes from DEGs were mined by Disease Module to analyze the enrichment genes in KEGG pathway (Fig. 1F, Supplementary Table S3). The genes were mainly enriched in extra-cellular matrix (ECM)-receptor interaction (n = 7, p = 2.70E-07), focal adhesion (n = 7, p = 4.40E-05) and TGF-beta signaling pathway (n = 3, p = 3.70E-02). Extra-cellular matrix (ECM)-receptor interaction pathway include seven genes: COMP, COL5A1, COL5A2, COL11A1, FN1, IBSP and THBS2. Focal adhesion pathway include seven genes: COMP, COL5A1, COL5A2, COL11A1, FN1, IBSP and THBS2. TGF-beta signaling pathway include three genes: COMP, INHBA and THBS2. There exist two genes THBS2 and COMP showed important effect on three pathways which indicated these genes may have important function.
Hypergeometric Distribution. The hypergeometric distribution test was used to analysis correlation between genes with disease. PubMed comprises more than 25 million citations for biomedical literature from medicine, life science journals and online books that correlated with colorectal cancer or THBS2 is 182043 or 81, respectively. The number of review that correlated with both colorectal cancer and THBS2 is 4. Comp owned 1037 reviews, but only 2 reviews had been reported with colorectal cancer (Supplementary Table S4). The hypergeometric distribution analysis showed that THBS2 have a higher correlation with colorectal cancer when compared with COMP (p = 0.0029). These results indicated that THBS2 maybe owned significant correlation with colorectal cancer.

Pearson correlation analysis.
To better understand the biology of the tumors with high levels of THBS2, we identified gene transcripts that most closely correlate with expression of THBS2 in the GSE17536 dataset by Pearson test 18 . The majority of the THBS2-correlated genes were matricellular-extracellular matrix and focal adhesion genes (Supplementary Table S5). The average of top 20 genes excluded THBS2 showed a high risk in overall survival (p < 0.001, 2.746(1.593-4.734)) which also could be a useful biomarker in CRC. Interestingly, combination the average of top 20 genes and THBS2 showed a higher risk in overall survival (p < 0.001, 2.890(1.676-4.981)) than the average of top 20 gene without THBS2. The results indicated that THBS2 increased HR value, which might be considered as a prognostic biomarker for CRC. Moreover, THBS2 was higher expression in tumor budding, which was considered as the cells with EMT phenotype. There was a significant correlation between THBS2 expression with CDH1, CDH2 and other EMT markers (Fig. 1A,G).
Kaplan-Meier survival analysis confirmed the prognostic value of THBS2 in GEO and TCGA datasets. To demonstrate the portability and repeatability prognostic value of THBS2, we obtained a validation cohort from the GEO and TCGA datasets. Kaplan-Meier survival analysis was performed to evaluate the prognostic value of the Gene signature in twelve Affymetrix datasets retrieved from the GEO database. The log-rank test results confirmed that the THBS2 was closely related to OS in nine datasets and DFS/RFS in thirteen datasets (  (H) GSE41328 showed that THBS2 was higher expression in tumor tissue when compared with paired adjacent normal tissue (p < 0.001); (I) TCGA showed that THBS2 was higher expression in tumor tissue when compared with paired adjacent normal tissue (p < 0.001); (J) GSE32323 showed that THBS2 was higher expression in tumor tissue when compared with paired adjacent normal tissue (p = 0.05); (K) GSE31737 showed that THBS2 was higher expression in tumor tissue when compared with paired adjacent normal tissue (p < 0.001).
Scientific RepoRts | 6:33366 | DOI: 10.1038/srep33366 normal tissue (Fig. 4A). Significantly correlations with several clinicopathology parameters were observed in tumor THBS2 expression (Table 1). Mann-Whitney test analysis showed that age-related reduction in THBS2 expression can be found in the patient with age more than 60 (p = 0.048). Gender seems to be an important factor that affect the expression of THBS2, female showed higher expression of THBS2 than male (Median = 0.0498, Median = 0.0283, respectively). Moreover, increased level of THBS2 was significantly associated with advanced infiltrating depth, lymph node metastasis and TNM stage (P = 0.044, P = 0.050, P = 0.033, respectively). Meanwhile, the expression of THBS2 was partly associated with differentiation, although it didn't reach statistic significance when incorporate all of the differentiation levels into consideration (P = 0.327). There was no statistical difference between THBS2 level with location, tumor size, grade and metastases (all P > 0.05). However, there still existed some expression difference in location and distant metastasis. The expression of THBS2 in rectrum or metastasis patients showed a higher level.
At the end of this study, 43 patients died of CRC among 138 patients. The overall ten year survival rate (OS) of patients with high THBS2 was 59.7%, lower than the patients with low expression. As shown Fig. 4B, Kaplan-Meier analysis revealed that the CRC patients with higher level of THBS2 was significantly associated with a poor overall survival (median, 6.54 years) than those with lower level patients (median, 7.77 years, P = 0.018, HR (95% CI) = 2.037(1.118-3.711)). Moreover, there was a significant correlation between THBS2 mRNA expression  Fig. 4C). Although there was no statistical difference between THBS2 mRNA expression with overall survival in TNM stage III/IV, the higher level of THBS2 showed a higher risk(p = 0.168, Fig. 4D). We used the multivariate cox proportional hazards regression model to analyze the independent prognostic value of THBS2 by adjusting location, differentiation, infiltrating depth, lymph node metastasis, distant metastasis and TNM stage. After adjustment, THBS2 still showed a significant prognostic value that higher expression of THBS2 showed a higher risk in overall survival (p = 0.022, HR (95% CI) = 2.093(1.110-3.947)). We further detect THBS2 protein expression level of 100 CRC tissue samples including 10 paired cases with both normal and tumor samples by immunohistochemistry (IHC). The results were graded from 0 to 3 (G0, G1, G2, G3) depending on average number of THBS2-positive aggregates per field (Fig. 4E). As shown in Fig. 4F, THBS2 was over expressed in tumor when compared with paired normal tissues. The expression level of THBS2 showed a gradually increasing trend from distant tissues, adjacent stroma to tumor epithelium (Fig. 4G). And the THBS2 expression showed a significant positive correlation with age (P = 0.024), lymph node metastasis (P = 0.008) and distant metastasis (P = 0.046). In addition, there was a significant positive correlation between THBS2 expression and the tumor TNM grade (P = 0.001), (Table 2). However, no correlations between THBS2 expression and gender, site, tumor size, histological type, general type and differentiation were found. Consistently with the results of mRNA, Overall survival was much worse in patients with high grade THBS2 in  tumors than in patients with low or no THBS2 staining (Fig. 4H,I). Combined with TNM stage, the higher THBS2 expression was associated with the worse overall patient survival (Fig. 4J,L) in TNM stage I/II. But there was no statistical significance for overall survival between high and low grade THBS2 staining in TNM stage III/IV, even that just G3 grade THBS2 staining showed worst overall survival (Fig. 4K,M). Taken together, these results indicated that THBS2 was a prognostic factor independent of TNM stage, especially in TNM stage I/II. Multivariate Cox analysis revealed THBS2 as an independent prognostic factor after adjustment with age, sex, differentiation, lymph node metastasis, distant metastasis and TNM stage (P < 0.001, Table 2).

THBS2 promoted proliferation and metastasis in CRC.
To clarify the biological roles of THBS2 in CRC, we knocked down the expression of THBS2 in HCT116 cell line expressing high level THBS2 (Fig. 6A,B). CCK8 assay showed that THBS2 knockdown inhibited cell proliferation (Fig. 6C). Transwell assay showed that migration and invasion ability were decreased by siRNA for THBS2 (Fig. 6D). Otherwise, THBS2 silencing could also induce G1/S cell cycle arrest (Fig. 6E). Together with these results, THBS2 might promote colorectal cancer growth and metastasis.

Discussion
Cancer is one of the most common leading causes of death worldwide. Prognostic at an early stage is a useful way that decrease and avoid mortality. Although remarkable progress has been made to investigate the underlying mechanism, the understanding of the complicated carcinogenesis process was enormously hindered by large-scale tumor heterogeneity. Here we proposed that the diagnose-related genes, responsible for cooperativity disorientation, probably contain untapped prognostic resource of colorectal cancer. The current study investigated a comprehensive evaluation method application in profound prognostic value gene in colorectal cancer and attempted to evaluate a new prognostic candidate gene THBS2 underlying significant value. Multiple analysis indentified THBS2 as an independent prognostic factor for survival. High level of THBS2 was associated with poor disease-free and overall survival in a cohort of serous colorectal cancer patients that integrated data combination GEO/TCGA datasets with meta-analysis, suggesting that THBS2 may be a prognostic biomarker of colorectal cancer. The expression of THBS2 was significant correlated with lymphatic invasion and TNM stage in CRC patients. Base on the GEO dataset GSE17536, we found some specificity or universality genes between recurrence and non-recurrence patients. The significant differently expressed genes included 56 up-regulated and 3 down-regulated genes, indicated the up-regulated genes maybe the dominant factor that affected the progress of cancer recurrence. In order to analyze the biological and statistical significance of DEGS, we constructed a PPI network that included degree distribution and neighborhood score for the 59 genes. Most node distributions were between 0-6, only 4 genes' degrees were more than 6. The neighborhood score showed that MEP1A, MFAP5, THBS2, VCAN, FBN1 and COMP owned the highest score, which indicated these genes might play important functions.
To further evaluate the biological significance for DEGs, we performed KEGG pathway enrichment analysis for molecular functions significantly enriched Focal adhesion, ECM-receptor interaction and TGF-beta pathways 19,20 . Interestingly, several of these genes such as THBS2 and COMP that filtered after strict condition attracted our attention. They were specificity genes that high expression in recurrence patients when compared with non-recurrence patients. They own the same degree distribution and extremely high neighborhood score. Furthermore, they showed significant additive effect on the whole pathways. Above results suggested that THBS2 and COMP were important recurrence-related genes in CRC. In previous study, Kalmár A. et al. reported that THBS2 showed hypomethylation in both colorectal adenoma and CRC-normal adjacent tissue 21 . Lin D. et al. found that COMP and THBS2 in plasma of colon cancer patients were higher expression than non-cancer control 22 . Hoon Kim et al. indentified that THBS2 could be used for developing high specificity biomarkers sensing cancer invasion and predicting response to neoadjuvant therapy, as well as potential multi-cancer metastasis inhibiting therapeutics targeting the corresponding biological mechanism 23 . THBS2 showed partly correlation with colorectal cancer, COMP reported rarely in colorectal cancer, it was mostly associated with arthritis 24,25 . The function of these genes involved in CRC development has not been discovered.
By analyzing the associations between the expression profiles of THBS2 and COMP, clinical outcome of CRC patients and healthy volunteers in GEO datesets, although the expression of COMP correlated with prognosis outcome, it did not present significant difference expression in tumor tissue when compared with paired adjacent normal tissue from two datasets. As a universality gene in tumor and normal tissue, COMP might not be considered as a biomarker associated with colorectal cancer.
We found that THBS2 was significantly up regulated in tumor tissue when compared with paired adjacent normal tissue. The correlation between THBS2 expression with survival was significant, regardless of the data evaluated as a continuous variable or category variable. A clear separation was observed in the survival curves between patients with low-or high-risk signatures. Patients with a low-risk THBS2 signature in their tumor specimens tended to have prolonged overall survival, whereas patients with a high-risk signature tended to have shortened survival. The usefulness of THBS2 could be internally validated in the non-overlapping mostly GEO datasets, indicating good reproducibility of this gene signature in the overall survival of CRC. However, there still existed some datasets show a nonsense result, even one dataset showed the opposite result. The disease-free survival also showed the similar phenotype. Therefore, a systematic integration of gene expression data from multiple sources meta-analyses as a comprehensive evaluation has been used for THBS2. ROC is a normalized method in data category variable, especially in improving data significance and expanding difference. Another analysis used P50 as a filter.
Fortunately, after the comprehensive evaluation method application in discovery gene signature with profound prognostic value in colorectal cancer, THBS2 have been reserved and demonstrated by clinical samples. Our study presents the first line of evidences that THBS2 expression is up-regulated mRNA and increased THBS2 expression is associated with poor overall survival of colorectal cancer. In order to further identify the clinical role of THBS2 in CRC patients, we measured the protein expression of THBS2 in CRC tissues specimens through immunohistochemistry. We found that THBS2 expression was positively correlated with lymph node metastasis, distant metastasis and clinical stage. Our study suggested the high level of THBS2 expression serves as a significant role in CRC progression through promoting tumor growth and accelerating tumor cell metastasis. Differentiation, infiltrating depth, lymph node metastasis, distant metastasis and TNM stage were currently the most important known prognostic factor. Among the host factors, age was a powerful prognostic factor. Elderly patients had poorer survival in cancer, partly because of comorbidities that caused lower response to therapy 26 . The current study also showed an attributable to age. In addition, gender was also an important factor that associated with cancer specificity. Chih-Feng Chian et al. reported several genes had been indentify to be gender-dependent markers 27 . Among the disease factors, increasing data suggest that the tumor size have independent significance and could supplement prognostication 28,29 . Although there was no statistical difference between THBS2 level with gender, location, tumor size, grade and differentiation, these factors might have potentially diagnose value. We still found some slightly expression difference in part clinicopathology parameters. The expression of THBS2 in rectrum or mod/poor differentiation patients showed a higher level. These findings suggest that THBS2 could play a potentially critical role of in the pathogenesis and progression of colorectal cancer. With the gradual depth studies, THBS2 might become as a potential biomarker for predicting clinical outcome for colorectal cancer patients.

Methods
Discovery and Extraction of differentially expressed genes. We used Affymetrix data set publicly available in GEO database 30 with available clinical information as originally research. GSE17536 data set includes individual gene expression level with overall survival, disease-specific survival and disease-free survival information for 177 patients with CRC disease collected at the Moffitt Cancer (United States), and it was used to define the molecular classification. The recurrence-associated genes were determined by limma package in Bioconductor that based on: (1) fold change > 0.5; (2) false discovery rate based on the moderated t test followed by Benjamini and Hochberg's multiple-test adjustment < 0.05. Directional concordance between recurrence patients and non-recurrence patients from VMC with poor outcome was determined to refine 59-gene recurrence classifier.
Hierarchical cluster and Neighborhood Scoring. Cluster analysis was performed to examine the extracted genes with significantly different levels of expression between patients with diverse outcomes in recurrence group and non-recurrence group. The expression data of the probes corresponding to protein in the Retrieval of Interacting Genes (STRING) 31 terms from functional annotation analysis were extracted to establish the network. Neighborhood Scoring was based on the distribution and fold change of different expressed genes in network 32 .
Pathway analysis and Hypergeometric Distribution. DAVID (http://david.abcc.ncifcrf.gov/) Pathway Analysis was used for functional annotation of the transcripts and identification of upstream regulators. The hypergeometric distribution test was used to examine the correlation between gene with disease that measured by the P-value. Public datasets analysis. CRC expression profiling studies including relevant clinical information were identified by searching public datasets. The following key words and their combinations were used: "colon cancer or colorectal cancer and expression profiling by array" to search in GEO datasets (Supplement Figure S1). In addition, dataset from TCGA was also searched to ensure the relevant studies were not missed. Datasets with gene expression profile comparing CRC or colorectal adenoma to paired adjacent normal tissue were obtained from Dataset GSE32323 which contained 17 paired samples. And the comparison between 40 paired colorectal adenoma and adjacent normal tissue samples were performed by dataset GSE31737. The GSE33113 data set includes disease-free survival information for 90 patients with AJCC stage II disease collected at the Academic Medical Center in Amsterdam (the Netherlands). GSE39582 includes expression and clinical data for 566 patients with CRC collected for the Cartesd'Identité des Tumeurs (CIT) program, from the French LigueNationaleContrele Cancer. Other datasets that contained complete clinical information were also be used to generate survival curves of THBS2. Other datasets that contained complete clinical information were also been used to analysis the survival curves of THBS2 (Supplementary Table S7).
Clinical specimens. This study was approved by Zhejiang University's School of Medicine and was carried out in accordance with the approved guidelines and regulations. All experimental protocols were approved by Zhejiang University's School of Medicine. All participants provided written, informed consent for this study and the ethics committee of Zhejiang University's School of Medicine approved the study. 138 CRC patients were recruited from Sir Run Run Shaw Hospital of Zhejiang University. 110 CRC tissues including 10 paired normal tissues were recruited from affiliated hospital, Zhejiang University School of Medicines and Sir Run Run Shaw Hospital of Zhejiang University, respectively. The histologically normal tissues in the distant margin to the tumor were collected at the time of surgery from patient who undergoing resection of colorectal tumors. Pathologic diagnoses were evaluated by pathologists via biopsy reports and patients with familial adenomatous polyposis, hereditary non-polyposis CRC and inflammatory bowel disease were excluded. All tissue samples were obtained from colorectal adenocarcinoma patients without any adjuvant treatment including radiotherapy or chemotherapy prior to surgery and diagnosis.

RNA extraction and quantitative RT-PCR.
Total RNA from the tissues were extracted using TRIZOL Reagent (Invitrogen). The RNA concentration was determined using UV spectrophotometry. RT-PCR was performed using Thunderbird SYBR Master Mix (Takara, Japan). The PCR was performed on a Real-time PCR Detection System (StepOnePlus, ABI) with the following cycles: 95 °C for 1 min, followed by 40 cycles of 95 °C for 15 s, 60 °C for 15 s, and 72 °C for 45 s to detect the THBS2 and GAPDH gene levels. GAPDH expression was used as an internal control. GAPDH-Sense: ACCACAGTCCATGCCATCAC; GAPDH-Antisense: TCCACCACCCTGTTGCTGTA. THBS2-Sense: TCCTGCTGGCTCTGTGGGTGT; THBS2-Antisense: ATGGTCTTGCGGTTGATGTTGCT. The 2 −ΔCT was calculated for every sample and normalized to GAPDH. All RT-PCR results are representative of three independent experiments. Immunohistochemistry analysis. Immunohistochemistry (IHC) study was carried out as described previously 33  Unpaired Student's t tests were used for normally distributed data and non-parametric Mann-Whitney U-tests were used for non-normally distributed data to compare central tendencies. For results in CRC tissues, Relapse-free, metastasis-free, or overall survival was compared between high and low THBS2 expression groups using median gene expression value as a bifurcating point. Correlations were analyzed by the Spearman coefficient test. Significance was set at P < 0.05. Stata software was used to be as a comprehensive evaluation that associated public datasets with clinical samples.
Cell lines and cell culture. The human CRC cell lines HCT116 were purchased from the American Type Culture Collection (Manassas, VA, USA), which were maintained in RPMI 1640 medium; The cell was routinely maintained and supplemented with 10% fetal bovine serum (HyClone, Tauranga, New Zealand) and grown at 37 °C in an atmosphere of 95% air and 5% CO 2 .

RNA interference.
SiRNA targeting humanTHBS2 and one negative control siRNA were designed and synthesized by Genepharma (Shanghai, China). Transfection of each siRNA (50 nM) was carried out using Lipofectamine2000 (Invitrogen) according to the manufacturer's recommendations.