Comprehensive Analysis of the Expression Profiles of Long Non-Coding RNAs with Associated ceRNA Network Involved in the Colon Cancer Staging and Progression

Long non-coding RNAs (lncRNAs) act as competing endogenous RNAs (ceRNAs) to compete with microRNAs (miRNAs) in cancer occurrence and development. However, the differential expression of RNAs and their ceRNA network during the development of colon cancer (CC) remains unclear. This study was aimed at comprehensive analysis of the lncRNAs and their ceRNA networks associated with CC. Whole transcriptome sequencing was performed on colorectal and adjacent normal tissues at different pathological stages. Forty-nine lncRNAs were differently expressed between the CC tissues and their adjacent normal tissues at all stages. Aberrant expression of lncRNA CDKN2B-AS1 and lncRNA MIR4435-2HG was confirmed by TCGA database. Moreover, 14 lncRNAs were differentially expressed between early and advance stages of the tumor tissues, and 117 miRNAs were specifically expressed in stage III & IV. Weighted gene co-expression network analysis of 17105 differently expressed mRNAs revealed that the mRNAs shown in module pink, midnight blue, black, and light cyan were related to TNM and pathological stage, and that these mRNAs were enriched in cancer related functions and pathways. As DElncRNA showed a trend of change similar to that of the DEmRNA and opposite to that of DEmiRNA, ceRNA network was constructed with 3 DEmiRNAs, 5 DElncRNAs, and 130 DEmRNAs. Real time PCR revealed that expression of MEG3 was decreased in the tumor tissues belonging to stage III and IV as compared to that in stage I. Moreover, hsa-miR-324-5p was upregulated, while FGFR3, PLCB4, and IKBKB were downregulated in the tumor tissues as compared to that in the adjacent normal tissues. Thus, this study revealed differentially expressed lncRNA between different stages of CC as well as suggested that lncRNA CDKN2B-AS1, MIR4435-2HG, and MEG3 may act as diagnostic biomarkers for the development of CC.

Thousands of non-coding transcripts, including non-coding linear RNAs microRNAs(miRNAs), long non-coding RNAs (lncRNA), and circular RNAs (circRNA), involved in various diseases as well as cancers have been identified with the advances in the RNA-sequencing techniques 6 . The lncRNAs are functionally defined as transcripts, which are more than 200 nucleotides in length with no protein coding potential 7 ; most of these are characteristically expressed in specific tissues or cancer types and possess conserved secondary structure and function 8 . A number of lncRNAs could act as diagnosis and prognosis markers, and have potential of being therapeutic targets for tumors. A recent study found that lncRNA TTTY15 was frequently up-regulated in a large group of prostate cancer (PCa) tissues. The TTTY15 knockout showed a tumor suppressive effect, which might have therapeutic implications for PCa patients 9 . Most studies focus on the mechanism of the action of lncRNA as competing RNA for miRNA as well as its competitive binding to the miRNA targets, which is called as ceRNA mechanism 10 . For instance, Long noncoding RNA LINC01234 was identified to function as a competing endogenous RNA and sponging miR-204-5p; the CBFB expression was regulated by gain or loss of expression of LINC01234, thus, affecting the gastric cancer tumorigenesis 11 .
A variety of lncRNAs have also been reported to be involved in CC development. For instance, a recent study revealed that nuclear lncRNA HOXD-AS1 inhibited colorectal cancer proliferation and metastasis by regulating HOXD3-induced integrin β3 transcriptional activity, thus inhibiting MAPK/AKT signaling 12 . When considering the ceRNA mechanism, a study also identified that MALAT1 acted as a sponge of miR-106b-5p to competitively bind to SLAIN2, which enhanced the microtubules mobility to promote the invasion and metastasis of colorectal carcinoma 13 . However, there has been no comprehensive analysis of CC-related lncRNAs and miRNAs based on ceRNA networks combined with large comprehensive analysis of the pathological characteristics of patients with CC.

Identification of differentially expressed (DE) lncRNAs.
A total of 49 DElncRNAs were shared in all stages between the CC tissues and their adjacent normal tissues (n = 3); among these, 48 DElncRNAs shared the same trend of up-or down-regulation in all stages, including 10 down-regulated and 38 up-regulated lncRNAs in the CC tissue. The significant difference was defined as the log2 (fold change) >1 or log2 (fold change) <−1 and p-value < 0.05 (Fig. 1A,B). Among them, the expression of down-regulated lncRNA CDKN2B-AS1 and the up-regulated lncRNA MIR4435-2HG was confirmed by comparing with the TCGA database (Fig. 1C,D).
We determined the DElncRNAs between the stage III/IV tumor tissue and the stage I/II tumor tissue. A total of 3749 DElncRNAs were related to the advanced stages of cancer development (stages III and IV) ( Fig. 2A). Fourteen DElncRNAs were aberrantly expressed in these four groups, of which, 5 DElncRNAs were downregulated while 9 DElncRNAs were up-regulated (Fig. 2B). The expression levels of maternally expressed gene 3 (MEG3), a known lncRNA, were markedly downregulated in colon adenocarcinoma (COAD) as observed in the TCGA database (P < 0.05, Fig. 2C). More importantly, the MEG3 was differentially expressed in the stages I to IV (P < 0.05, Fig. 2D).

Identification of DEmRNAs and DEmiRNAs.
A total of 13,278 DEmRNAs were enriched in the advanced stages of cancer (stages III and IV). Sixty three DEmRNAs were aberrantly expressed in these four groups (Supplement 1A), among which, 23 DEmRNAs were downregulated while 40 DEmRNAs were up-regulated (Supplement 1B). GO functional enrichment analyses were performed to predict the functional role of the DEmRNAs among stages III and IV (Supplement 1C). We found that most DEmRNAs were enriched in the functions related to the wound healing, cell adhesion, angiogenesis, cell proliferation, and cell differentiation that play an important role in the progression of the cancer. Additionally, a total of 117 DEmiRNAs were identified between the stages III and IV and stages I and II. However, there was no crossover of DEmiRNAs between these four groups; the up-and down-regulated DEmiRNAs in each group are shown on the volcano map (Supplement 1D).

WGCNA analysis of DEmRNAs and functional annotation of the modules.
To illustrate the functional enrichment of DEmRNAs in CC development, gene co-expression network of 17105 mRNAs among 12 tumor samples was constructed through WGCNA analysis. All the DEmRNAs were divided into twenty color modules (Fig. 3A); among these the grey module was designated as the gene set that cannot be attributed to any module and has no reference significance. Then, Pearson correlation algorithm was used to calculate the correlation coefficient and P-value of module feature genes and clinical pathology as well as demographics traits such as age, sex, and TNM stage. Modules related to each trait were screened with the threshold that the absolute value of correlation coefficient must be ≥0.5 and p-value should be <0.05. The results showed that the genes in the module were highly correlated with both the traits and characteristics (Fig. 3B). We surprisingly found that the mRNA expression was correlated with TNM stage as well as pathological staging in module pink, midnight blue, black, and light cyan. Module core gene analysis of pink, midnight blue, black, and light cyan is showed in Supplement 2; this figure shows the analysis of the top 50 genes with the highest degree of connectivity in each module and the relationship between these genes. Subsequently the KEGG pathway enrichment analysis was carried out to illustrate the functional and pathway enrichment of mRNAs in 4 modules. Among these four modules, most mRNAs were found to have been enriched in VEGF signaling pathway, pathway in cancer, TGF-beta signaling pathway, MicroRNAs in cancer, MAPK signaling pathway, mTOR signaling pathway, and colorectal cancer pathway; all these pathways are closely related to the occurrence and development of tumor ( Fig. 4A-D).
Verification of key lncRNAs, miRNAs, and mRNAs in CC. MEG3 was selected as a candidate key lncRNA due to its differential expression between the stages as was evident from the TCGA database and our RNA-seq. Real time PCR showed that the expression of MEG3 was downregulated in the tumor tissues as compared to the adjacent normal tissues (Fig. 6A). Moreover, MEG3 levels were decreased in tumor tissues belonging to stages III and IV as compared to those of stage I (Fig. 6B). The expression of hsa-miR-324-5p, which predictively targeted MEG3, was upregulated in the tumor tissues as compared with the adjacent normal tissues (Fig. 6C). The expression of FGFR3, PLCB4, and IKBKB were downregulated in the tumor tissues as compared to the adjacent normal tissues (Fig. 6D). These results were consistent with the results obtained from the RNA-seq. The expression of hsa-miR-324-5p and mRNAs showed no significant difference between different stages.   www.nature.com/scientificreports www.nature.com/scientificreports/

Discussion
Long non-coding RNAs (lncRNAs) have been reported to act as competing endogenous RNAs (ceRNAs) and play significant roles in the development of various diseases and tumors. Several recent studies have mapped the ceRNA network of lncRNA for colon adenocarcinoma from different points of view 14,15 . Our study is first to reveal that the DElncRNAs are related to cancer staging and its ceRNA network. We also have provided several candidate DElncRNAs and related DEmiRNAs and DEmRNAs involved in different pathological stages and development of CC. These factors might act as diagnostic markers of CC staging and therapeutic targets for CC treatment.
Tumor staging is a process of assessing the number and locations of malignant tumors in the body. Tumor staging describes the severity and extent of malignancy based on the primary tumor in the individual and the degree of dissemination. Correct tumor staging helps to rationalize treatment plans, accurately determine prognosis, and evaluate efficacy; Stage I and stage II are curable while stage III and stage IV require long-term, comprehensive treatment. Our study combined with TCGA database has revealed that MEG3 is associated with tumor staging. MEG3 is downregulated and acts as a tumor suppressor in various cancer types including breast cancer, liver cancer, glioma cancer, colorectal cancer, cervical cancer, gastric cancer, lung cancer, ovarian cancer, and osteosarcoma 16,17 . For instance, MEG3 has been shown to inhibit cells proliferation and migration in CC by decreasing Clusterin expression via directly binding to Clusterin protein 18 . In gastric cancer, MEG3 has been shown to inhibit cell proliferation and metastasis via p53 signaling pathway 19 . In cholangiocarcinoma, MEG3 level was found to have been significantly decreased in cholangiocarcinoma tissues and cells as compared to that in nontumor controls and it suppressed cell proliferation and invasion via regulating Bmi1/RNF 20 . In cervical cancer, MEG3 was downregulated in the cancer tissues as compared to that in corresponding normal tissues and downregulated MEG3 was associated with recurrence and short overall survival 21 . In breast cancer, MEG3 is known to be downregulated in cancer tissues and cells and MEG3 overexpression attenuates the tumorigenesis via PI3K/Akt pathway by sponging miR-21 22 . In addition to cancer cells, MEG3 is also expressed in other cells such as immune cells and fibroblasts, and affects their gene expression patterns. MEG3 expression is upregulated in CD4+ T cells of patients with immune thrombocytopenic purpura and this enhances the immune imbalance of Treg/Th17 23 . MEG3 is downregulated in Tenon's capsule fibroblasts as compared with control fibroblasts and is known to inhibit cell proliferation ability by interacting with Nrf2 protein 24 . Our study, for the first time reveals that the MEG3 expression is downregulated in CC and higher tumor grade is related to higher MEG3 expression, indicating that MEG3 might act as a diagnostic marker for the pathological staging.
In addition to lncRNAs, hsa-miR-324-5p and several mRNAs including WNT11, FGFR3, PLCB4, and IKBKB were revealed to be related with the pathological stage and development of CC. Hsa-miR-324-5p is associated with enhanced oncogenicity and up-expression of hsa-miR-324-5p in the colorectal cancer cells suppresses CRC www.nature.com/scientificreports www.nature.com/scientificreports/ tumorigenicity in vitro as well as in vivo 25 . It might act as a therapeutic target for CC treatment. Additionally, the results showed that MEG3 was positively correlated with Wnt11, PLCB4, IKBKB and negatively correlated with FGFR3 (Supplement 3). These observations provide clues for our further research based on the hypothesis that MEG3 might be involved in CC staging by regulating the expression of these genes.
Wnt protein family is known to play fundamental roles in the cancer development 26 . It was reported that Wnt-11 promoted cancer cell migration and invasion independently of β-catenin 27 . Wnt-11 is related to the degree of malignancy and metastasis of cervical cancer 28 , which is consistent with our study revealing that the Wnt-11 expression is also related to the degree of malignancy in CC. However, our study has several limitations. First, the down-regulation of MEG3 needs to be verified in the CC cell lines and the function of MEG3 should be confirmed by cell function assays. Secondly, the relationship between candidate DEmRNAs genes and MEG3 in the colon carcinogenesis will need to be confirmed by interaction validation assay.

Materials and Methods
Patients and samples. This study was approved by the Human Ethics Committee of First affiliated hospital of kunming medical university. Written informed consent was taken from all the patients. All methods were performed in accordance with the relevant guidelines and regulations.
A total of 40 patients with CC (with 10 patients each in stages I, II, III and IV) without anti-cancer treatment at pre-operation period were taken into the account for our study. All the cases were diagnosed pathologically and treated in the Department of Surgical Oncology, First Affiliated Hospital of Kunming Medical University in 2018. The CC samples were divided into four groups viz. stage I, II, III or IV according to the size of the tumor, infiltration range, and metastasis. Stage I and stage II are early and mild stages of cancer that are curable whereas stage III and stage IV are advanced stages of cancer that require long-term, comprehensive treatment. Grouping of samples and clinical and pathological characteristics are given in Table 1. This study was approved by the Human Ethics Committee of First affiliated hospital of kunming medical university Ethics committee. Written informed consent was taken from all the patients.
Whole transcriptome sequencing. First, the total RNA of the 12 pairs of samples with 3 pairs in each stage was extracted, the ribosomal RNA was digested using the TruSeq Stranded Total RNA with ribo-zero Gold reagent, and disrupted into short fragments. Then cDNA strands were synthesized using fragmented RNA. An A tail was added to the purified cDNA and then it was connected to the sequencing connector, followed by fragment size selection, and finally PCR amplification. After the construction of RNA library, Agilent 2100 Bioanalyzer, Illumina (Illumina, CA, USA) sequencers were used for the sequencing. Total transcriptome sequencing was used not only to analyze the protein-coding genes (mRNA) but also long non-coding RNA (lncRNA).
Identification and differential expression level analysis of lncRNAs. Stringtie (version 1.3.3b. Linux_x86_64) software 29 was used to reconstruct the transcripts of each sample based on a probabilistic model. According to the characteristics of lncRNAs, a rigorous four-step screening method was adopted to obtain candidate lncRNAs. First, the splicing transcript was compared with the reference transcript, and the transcripts with www.nature.com/scientificreports www.nature.com/scientificreports/ known coding transcripts or loci were eliminated. Next, the transcripts obtained from the first step were screened according to the length greater than 200 bp and the number of exons greater than or equal to 2. Subsequently, gene coding ability was predicted for the screened transcripts in the second step and the transcripts with coding potential were removed using coding potential calculator (CPC, version 0.9-r2) 30 , coding-non-coding index (CNCI, version 1.0) 31 , the protein families database (Pfam, version 30.0) 32 , and improved k-mer scheme (PLEK, version 1.2) 33 . The lncRNA sequences obtained in step 3 were then compared with the known lncRNA by blastn software, and the repeated sequences were removed. The significant difference was defined as the log2 (fold change) >1 or log2 (fold change) <-1, and the p-value of <0.05. To identify the tumor related lncRNAs, the DElncRNAs between the cancer tissues and their adjacent normal tissues were identified for each stage, and then the common DElncRNAs in all the stages were obtained by taking the intersection elements. To identify the tumor development related lncRNAs, DElncRNAs between the advance stages (III/IV) and early stages (I/II) were analyzed in the tumor tissue including stage III versus stage I, stage III versus stage II, stage IV versus stage I, and stage IV versus stage II. The common DElncRNAs between stage III/IV and stage I/II were obtained by considering the intersection elements in the above comparison.
Weighted gene co-expression network (WGCNA) analysis. To find clusters (modules) of highly correlated 17105 mRNAs from 12 samples, WGCNA analysis was carried out using WGCNA R package 34 . First, those genes with less variable expression levels were filtered out. Secondly, with the selected power value, a weighted co-expression network model was established. A total of 17105 genes were divided into 20 modules, among which, the grey module was the gene set that could not be assigned to any module and had no reference significance. The calculation process was measured by the block wise Consensus Modules of WGCNA package. Module analysis includes cluster graph analysis, GO and KEGG analysis etc. Then, Pearson correlation algorithm was used to calculate the correlation coefficient and P-value of module feature genes and characters. Modules related to each trait were screened according to the threshold of absolute value of correlation coefficient of >0.5 and p-value of <0.05. Finally, in the network constructed by WGCNA, the connectivity of most nodes (genes) is very low because the WGCNA is a scale-free network; the core genes in a module are the group of genes with the highest connectivity in the module.
Small RNA sequencing and miRNA targets prediction. Total RNA was isolated from CC tissue and its adjacent normal tissue with Trizol reagent (Invitrogen, CA, USA). The RNA quantity and purity was determined by Nanodrop (Thermo, MA, USA) and 1% gel electrophoresis. Each purified RNA sample was used for the construction of RNA libraries using the miRNA Library Prep Kit (New England Biolabs, MA, USA) according to the Illumina's manufacturer's information. Subsequently, RNA sequencing was performed on Hiseq2500 (Illumina, CA, USA) with 50 million reads depth-datasets and the raw data were filtered with FastQC to get rid of the low-quality reads. Then high-quality reads were matched with miRBase (http://www.mirbase.org/) database to recognize the known miRNAs. The miRNAs targets were determined by Miranda and RNAhybird. The criteria for screening were score ≥150 and energy <-20(Miranda) or energy <-25(RNAhybird), and miRNA targets were obtained by taking the intersection of the Miranda and RNAhybird results.
Gene ontology functional enrichment and Kyoto encyclopedia of genes and genomes (KEGG) pathway enrichment. Hierarchical clustering was performed using TBtools V0.655 (Toolbox for biologists). DEmRNAs were supplied to GO analysis and KEGG pathway analysis by DAVID (http://david.ncifcrf. gov). GO analysis was used to evaluated the functions of the screened DEmRNAs including biological process (BP), Molecular Function (MF), and cellular component 1 analysis. KEGG analysis was used to map DEmRNAs to KEGG pathways. The P value of <0.05 represents significant difference.
Construction of the DElncRNA-DEmiRNA-DEmRNA network. The lncRNAs, miRNAs, and mRNAs that were differentially expressed between the colorectal and their adjacent normal tissues were selected to establish the ceRNA network. The regulated pairs DElncRNA-DEmiRNAs and DEmiRNAs-DEmRNA were predicted by RNAhybrid and Miranda, respectively. Threshold energy of <-20 and score of >150 was used for Miranda