Long intergenic non-coding RNA 271 is predictive of a poorer prognosis of papillary thyroid cancer

A number of long non-coding RNAs (lncRNAs) have been found to play critical roles in oncogenesis and tumor progression. We aimed to investigate whether lncRNAs could act as prognostic biomarkers for papillary thyroid cancer (PTC) that may assist us in evaluating disease status and prognosis for patients. We found 220 lncRNAs with expression alteration from the annotated 2773 lncRNAs approved by the HUGO gene nomenclature committee in The Cancer Genome Atlas (TCGA) dataset, of which FAM41C, CTBP1-AS2, LINC00271, HAR1A, LINC00310 and HAS2-AS1 were associated with recurrence. After adjusting classical clinicopathogical factors and BRAFV600E mutation, LINC00271 was found to be an independent risk factor for extrathyroidal extension, lymph node metastasis, advanced tumor stage III/IV and recurrence in multivariate analyses. Additionally, LINC00271 expression was significantly downregulated in PTCs versus adjacent normal tissues (P < 0.001). The Gene Set Enrichment Analysis (GSEA) revealed that genes associated with cell adhesion molecules, cell cycle, P53 signaling pathway and JAK/STAT signaling pathway were remarkably enriched in lower-LINC00271 versus higher-LINC00271 tumors. In conclusion, LINC00271 was identified as a possible suppressor gene in PTC in our study, and it may serve as a potential predictor of poor prognoses in PTC.


LINC00271 was downregulated in PTC compared with adjacent normal tissues.
To determine the role of LINC00271 in PTC, we initially detected LINC00271 expression in the carcinoma specimens and the paired normal tissues in 50 PTC patients, and the results suggested LINC00271 was significantly downregulated in PTC compared with the level in the adjacent normal tissues (P < 0.001, Fig. 2B,C). While confirmed in the validation cohorts including the TCGA and GSE35570 cohorts, LINC00271 expression was also found to be suppressed in tumor tissues in comparison with normal tissues (Fig. 2D-G). Additionally, significantly repressed expression of LINC00271 was observed in breast invasive carcinoma (BRCA, Supplementary Figure S2A LINC00271 as an independent risk factor for poor clinical outcomes of PTC. Multivariate analyses were performed to confirm whether associations of LINC00271 with high-risk pathological outcomes (ETE, LNM and III/IV stage) and recurrence were independent of classical clinicopathogical factors and BRAF V600E mutation. Table 3 showed that LINC00271 < cutoff remained significantly correlated with ETE, LNM and III/ IV stage in the TCGA cohort in multivariate logistic regression analyses though LINC00271 < cutoff was only found to statistically increase the risk of LNM in the FUSCC cohort. In Table 4, the TCGA cohort indicated advanced T stage (T3-T4, HR = 2.294, 95%CI: 1.010-5.211, P = 0.047) and LINC00271 < cutoff (HR = 3.688, 95%CI: 1.384-9.829, P = 0.009) were risk factors for poor RFS in univariate cox proportional hazards analysis, and female gender was a protective factor for improved RFS. Multivariate analysis adjusted by age, gender, histological subtypes, T stage, TNM stage, showed female gender (HR = 0.386, 95%CI: 0.172-0.868, P = 0.021) and LINC00271 (HR = 3.182, 95%CI: 1.160-8.726, P = 0.025) were still statistically significant. Then, we performed a ROC analysis to evaluate the predictive values of LINC00271, gender and TNM stage for PTC recurrence. After combining LINC00271, the AUC values of gender and TNM stage for predicting recurrence were significantly elevated from 0.611 to 0.697 (P = 0.020) and 0.597 to 0.677 (P = 0.016), respectively (Supplementary Figure S3). Furthermore, to investigate independent factors that might affect LINC00271 expression in PTC, we conducted a multivariate logistic regression analysis in the TCGA and FUSCC cohorts (Supplementary Table S3). LNM was shown to be significantly correlated with low expression of LINC00271 in both TCGA and FUSCC cohorts. Identification of LINC00271-associated biological pathways by GSEA. To identify LINC00271-associated biological signaling pathways on an unbiased basis, we performed Gene Set Enrichment Analysis (GSEA) using high throughput RNA-sequencing data of the TCGA cohort. The expression level of LINC00271 was used as the phenotype label. Among all the predefined KEGG pathways gene sets, cell adhesion molecules (CAMs), cell cycle, P53 signaling pathway and JAK/STAT signaling pathway were found to be significantly associated with LINC00271 expression in the TCGA cohort (Fig. 3), suggesting that LINC00271 may be involved in PTC development and progression through the above cancer-associated signaling pathways.

Discussion
Large numbers of lncRNAs have been identified through genome-wide transcriptome analyses 21,22 . A series of studies have revealed that lncRNAs can act as regulators of diverse biological functions including X-chromosome silencing 23 , transcription regulation 24 , P53 function 25 , and cell growth control 26 . Recently, roles and functions of more annotated specific lncRNAs in cancer have been characterized. HOTAIR is shown to promote breast cancer  metastasis by reprograming chromatin state, and HOTAIR expression level has prognostic value for metastasis and survival 27 . Leukemia-induced non-coding activator RNA-1 (LUNAR1) is revealed as a novel regulator of IGF1 signaling and T-ALL cell growth and may be a potential biomarker and therapeutic target in acute leukemia 15 . Ewing sarcoma-associated transcript 1 (EWSAT1) is found to facilitate Ewing sarcoma oncogenesis by mediating EWS-FLI1 suppression pathways 16 . BRAF-regulated lncRNA (BANCR) induced by BRAF V600E gene can regulate carcinoma cell migration in melanoma 14 . Furthermore, the potential of lncRNAs as biomarkers for cancer have been confirmed in many cancers. Wang GH et al. 28 revealed that LINC01207 overexpression was associated with advanced TNM stage and shorter survival in lung adenocarcinoma patients. Zhang EB et al. 19 found that ANRIL, as a growth regulator, may serve as a candidate prognostic biomarker and target for new therapies in human gastric cancer. Jin Meng et al. 29 reported that a four-long non-coding RNA signature can be predictive of breast cancer survival, and Liu HR 18 also identified several lncRNAs with prognostic values for breast cancer. Though recent efforts on protein coding exons and transcripts provided deep insights into genomic characteristics of PTC 5 , the possible roles and prognostic values of lncRNA signature and specific lncRNAs have been poorly characterized in PTC.
As reported in the previous study 18 , the cancer dataset of TCGA at cBioPortal were available for investigation of expression and clinical significance of lncRNAs. In the present study, the TCGA dataset was used to screen out the potential annotated lncRNAs with prognostic values for PTC. Initially, we performed an associative analysis of the annotated lncRNAs with patients' clinicopathological parameters by using the PTC data of TCGA, and then validated the preliminary findings through our own data analysis at FUSCC. As shown in the study, the 220 lncRNAs were found to be altered at expression levels in PTC patients at cBioportal. Of the 220 lncRNAs, FAM41 C, CTBP1-AS2, LINC00271, HAR1A, LINC00310 and HAS2-AS1 were associated with RFS for PTC patients, and decreased LINC00271 expression also significantly increased risks of ETE, LNM, advanced T stage and TNM stage while the other five lncRNAs failed to correlate with these poor outcomes of PTC accordantly, which indicated LINC00271 as the optimal biomarker for aggressive behaviors of PTC. The associative analysis of LINC00271 expression with clinicopathological factors in the FUSCC cohort confirmed the role of enhancing aggressiveness of LINC00271 in PTC though there was no recurrent case in our cohort due to the limited follow-up time.
PTC is usually curable with thyroidectomy followed by radioiodine therapy, but many patients suffer disease recurrence, and some cases even advance to be incurable and fatal 4,30 . Therefore, specific biomarkers for risk stratification are helpful to identify patients at high recurrence so active treatment and careful monitoring can be provided. As we know, the conventional risk stratification is based on patient age, gender, tumor size, histological subtypes, the presence or absence of ETE and LNM and tumor stage finally defined by pathology. However, these  criteria are always dependent on postoperative pathology and histological outcomes are not defined before surgery, and there exist heterogeneity and uncertainty in the risk evaluation based on these criteria.
As a prognostic marker, BRAF V600E mutation has received great attention in the past decades for its potential utility in the risk stratification and management of PTC 31 . However, as mentioned in the 2015 American Thyroid Association Management Guidelines for Adult Patients with Thyroid Nodules and Differentiated Thyroid Cancer 2 , BRAF V600E has limited values in predicting recurrence of PTC and should not impact on the decision for prophylactic central neck dissection in primary tumor 2,31 . Likewise, in our study, BRAF V600E had no predictive values for recurrence and LNM in PTC. Our multivariate analyses showed that associations of LINC00271 with ETE, LNM, advanced tumor stage and recurrence were independent of classical clinicopathogical factors and BRAF V600E mutation in the TCGA cohort, which suggested that it could act as a potential prognostic predictor for PTC. We had to admit that the limited number of cases and follow-up time and the difference in sample distribution in the FUSCC cohort were responsible for the negative statistical effects of LINC00271 on ETE and advanced tumor stage although the ORs of LINC00271 for the two parameters revealed positive trends.
Furthermore, the comparison of LINC00271 expression between tumor and adjacent normal tissues in the FUSCC, TCGA and GSE35570 cohorts suggested that LINC00271 could act as a suppressor gene in PTC. Likewise, the suppressive role of LINC00271 is observed in many common cancer types including BRCA, LUAD, KIRP and HNSCC. Diverse signaling pathways including cell cycle, P53 signaling and cell adhesion signaling cooperate to initiate and sustain oncogenesis and progression through maintaining proliferative signaling, evading growth suppressors and activating invasion and metastasis 32 . The pathways that LINC00271 may mediate in PTC remain unclear, so we performed GSEA to identify LINC00271-associated biological signaling pathways. GSEA showed low LINC00271 expression was positively associated with CAMs, cell cycle, P53 signaling pathway and JAK/STAT signaling pathway, which may suggest that LINC00271 is involved in PTC development and progression through these cancer-associated pathways.  In spite of observing the role of decreased LINC00271 expression firstly in PTC in our study, the possible mechanism mediated by LINC00271 has not been illuminated. Basic experiments in vitro and in vivo and large samples of patients with long-term follow-up outcomes are needed to confirm the effects of LINC00271 in PTC in the future.
In conclusion, LINC00271 was identified as a possible suppressor gene in PTC in our study, and it was an independent risk factor for LNM and recurrence of PTC. LINC00271 may serve as a potential predictor for poor clinical outcomes in PTC.

Materials and Methods
Annotated lncRNA database search. We performed a primary search for annotated lncRNAs at the website of HUGO gene nomenclature committee (HGNC) (http://www.Genenames.org/) according to a previous study 18 , and a total of 2773 lncRNAs were obtained from the HGNC database (http://www.genenames.org/ cgi-bin/statistics). The 2773 lncRNAs were searched for expression data in PTC at The cBioPortal for Cancer Genomics 34,35 (http://www.cbioportal.org/), of which 220 known lncRNAs with upregulation or downregulation were found finally by using Onco Query Language "EXP < = − 2 EXP > = 2" from the TCGA data. Expression data of the 220 lncRNAs and the corresponding clinicopathological data from TCGA dataset were downloaded from the website of The cBioPortal for Cancer Genomics (http://www.cbioportal.org/) and Cancer Genomics Browser of University of California Santa Cruz (https://genome-cancer. ucsc. edu/). Paired LINC00271 expression data in tumor and normal tissues were retrieved from TCGA RNA-sequencing data of PTC, BRCA, LUAD,

RNA extraction and real-time qRT-PCR analysis.
We extracted total RNA from the specimens using the Trizol reagent (Invitrogen, Carlsbad, CA, USA) according to the manufacturer's protocol. The extracted RNA was reversely transcribed for cDNA, followed by real-time quantitative reverse transcription-polymerase chain reaction (qRT-PCR) as previously described 20 . The primers for LINC00271 were as follows: GCTATTGGTGGGAGGCTTCAG (Sense), TGGGCTGGACTTAATGACTTGC (Antisense). GAPDH was used as a housekeeping gene. qRT-PCR assays were performed in triplicate for each sample, and the mean value was used for calculation of mRNA expression levels. The relative mRNA expression levels of LINC00271 were determined by the comparative Ct (2 −∆Ct ) method. The amount of target gene expression levels was given as ratios to GAPDH mRNA level.
Gene Set Enrichment Analysis. GSEA was performed using GSEA software, Version 2.0.1, which was obtained from the Broad Institute (http://www.broad.mit.edu/gsea), as previously described 40,41 . Enrichment map was used for visualization of the GSEA results. False discovery rate (FDR) value and normalized enrichment score (NES) were used to sort the pathways enriched in each phenotype after gene set permutations were performed 1000 times for each analysis.
Statistical Analysis. Categorical data were summarized with frequencies and percentages. The continuous results were expressed as the mean ± standard deviation. Paired-t and independent-t test was used to compare continuous variables in two groups. Associations between continuous variables and categorical variables were evaluated using Mann-Whitney U tests for two groups and Kruskal-Wallis tests for more than two groups. χ 2 and Fisher's exact test were used for categorical variables. The Kaplan-Meier method was used to construct RFS curves, and the univariate survival difference was determined by the log-rank test. To analyze the association between lncRNAs and clinicopathological parameters, patients were divided into two subgroups (Low expression and High expression) according to the median value of lncRNA expression levels. Nonparametric receiver operating characteristic (ROC) analyses were performed to calculate the best cutoff value for LINC00271 expression level that would be predictive of LNM and recurrence. Moreover, univariate and multivariate analysis were performed to determine risk factors for recurrence in PTC using cox proportional hazards models calculated by HR and 95% CI. OR with 95% CI was calculated by logistic regression analysis. A p-value < 0.05 was considered significant. Statistical analyses were performed using the GraphPad Prism 6.0 and SPSS for Windows (SPSS Inc., Chicago, IL).