Identification of Long Noncoding RNAs Deregulated in Papillary Thyroid Cancer and Correlated with BRAFV600E Mutation by Bioinformatics Integrative Analysis

Papillary Thyroid Cancer (PTC) is an endocrine malignancy in which BRAFV600E oncogenic mutation induces the most aggressive phenotype. In this way, considering that lncRNAs are arising as key players in oncogenesis, it is of high interest the identification of BRAFV600E-associated long noncoding RNAs, which can provide possible candidates for secondary mechanisms of BRAF-induced malignancy in PTC. In this study, we identified differentially expressed lncRNAs correlated with BRAFV600E in PTC and, also, extended the cohort of paired normal and PTC samples to more accurately identify differentially expressed lncRNAs between these conditions. Indirectly validated targets of the differentially expressed lncRNAs in PTC compared to matched normal samples demonstrated an involvement in surface receptors responsible for signal transduction and cell adhesion, as well as, regulation of cell death, proliferation and apoptosis. Targets of BRAFV600E-correlated lncRNAs are mainly involved in calcium signaling pathway, ECM-receptor interaction and MAPK pathway. In summary, our study provides candidate lncRNAs that can be either used for future studies related to diagnosis/prognosis or as targets for PTC management.

Experimental validation using qRT-PCR was performed to demonstrate the reliability of the bioinformatics analyses applied. From the top 25 positively DE lncRNAs and from the top 20 negatively DE lncRNAs, it were selected for validation those lncRNAs with low adjusted p-values to minimize expression variability, especially in the comparison BRAF WT × BRAF V600E tumor, among others characteristics (for detailed information see Methods). From a total of 5 DE lncRNAs selected for validation from the TCGA analysis ( Fig. 2 and Supplemental  Fig. S1, upper part), 4 DE lncRNAs were validated using thyroid cell lines ( Fig. 2 and Supplemental Fig. S1, lower part), which strengths the reliability of this bioinformatics analysis, although the experimentally tested set of lncRNAs constitutes a relatively small sampling. Considering all comparisons, we obtained a very expressive validation efficiency [from a total of 8 different comparisons (Normal × Tumor and BRAF WT × BRAF V600E ), 6 were validated]. Downregulation of ENSG00000235070.3 and ENSG00000255020.1 in PTC was confirmed in the tumor cell lines TPC1 (BRAF WT ) and BCPAP (BRAF V600E ) compared to the normal immortalized cell line NTHY (Fig. 2). Also, downregulation of their expression was in accordance to the bioinformatics analysis, since lower expression for both of them was observed in BCPAP (BRAF V600E ) compared to TPC1 (BRAF wild type) (Fig. 2). Noteworthy, is that due to the very low abundance of ENSG00000255020.1 in the BCPAP cell line, qRT-PCR resulted in two unspecific melting peaks, which did not influenced the results. Upregulation of ENSG00000273132.1 in PTC was confirmed, however its overexpression in BRAF V600E tumors was not observed in the cell line BCPAP compared to TPC1 (Fig. 2 Fig. S1).
Results demonstrated that this set of lncRNAs is capable of clustering, majorly, normal and cancer patients in two distinct groups (Supplemental Fig. S2A). Clustering lncRNAs by Spearman correlation among all  DE lncRNAs also identified two groups of lncRNAs highly positively correlated or negatively correlated (Supplemental Fig. S3A). Hierarchical clustering was also performed with a more stringent set of DE lncRNAs between WT and BRAF V600E , which allowed the clustering of two groups enriched with WT and BRAF V600E patients, respectively (Supplemental Fig. S2B). Clustering lncRNAs by Spearman correlation among all DE lncRNAs also identified two groups highly positively correlated or negatively correlated lncRNAs (Supplemental Fig. S3B).
Indirectly validated lncRNAs' targets are involved in several oncogenic processes. As almost the totality of the identified DE lncRNAs in both conditions (Normal × Tumor and WT × BRAF V600E ) is uncharacterized, we used prediction methods to identify a possible interaction between lncRNAs and mRNAs/microRNAs. Predicted mRNAs and microRNAs (targets of DE lncRNAs) were compared to differentially expressed mRNAs and microRNAs (log2 fold change >1 or <−1; adj. p-value < 0.05) calculated from the same TCGA patients. Predicted mRNAs/microRNAs that were also identified as DE were considered as indirectly validated targets.
A total of 1109 DE mRNAs (Table 2 and Supplemental Table S8) and 26 DE microRNAs (Supplemental Table S9) were found to be predicted targets of the DE lncRNAs between Normal and Tumor samples and were considered as indirectly validated targets. Gene ontology and KEGG pathways enrichment of these validated mRNAs demonstrated that most of the genes are involved in surface receptors responsible for signal transduction and cell adhesion, as well as, regulation of cell death, proliferation and apoptosis (Fig. 3A). Enriched pathways (Fig. 3B) were composed of cytokine-cytokine receptor interaction, pathways in cancer (Fig. 3C), focal adhesion, MAPK pathway and calcium signaling pathway. Validated microRNAs were also used to determine enriched pathways based on their predicted targets calculated elsewhere (Fig. 3D). Genes involved in cancer and MAPK pathways were the most enriched pathways. Interestingly, some genes predicted to be targets of the validated microRNAs were also DE expressed in our analysis (Supplemental Table S2), such as upregulation of the MAPK constituents, CACNG4, CACNA1E, DUSP4, TGFBR1, FGF1, FGF2 and MAP3K1. Enriched pathways were extended to genes involved in cancer, focal adhesion and calcium signaling (Fig. 3D). The nonparametric Mann-Whitney test was applied due to the non-Gaussian expression distribution and p-value was assigned. Lower part of panel displays the experimental validation of these lncRNAs measured by qRT-PCR and calculated with 2 −ΔΔCt method using RPL19 (Ribosomal Protein L19) as endogenous control. Experiments with three biological replicates were performed using two technical replicates for each sample. These results are representative of at least two independent experiments. Values are plotted as expression mean ± Standard Error of Mean (SEM). Unpaired two-tailed t-Test assigned the p-value.   Between WT and BRAF V600E , 471 DE mRNAs (Table 3 and Supplemental Table S10) and 11 DE microRNAs (Supplemental Table S11) were indirectly validated. Gene ontology of these mRNAs demonstrated that most of the genes are also related to surface receptors involved in signal transduction and cell adhesion, but, additionally, with response to hormone stimulus and transmembrane transport (Fig. 4A). Enriched pathways (Fig. 4B) were constituted of calcium signaling pathway (Fig. 4C), cardiomyopathies and ECM-receptor interaction. KEGG enrichment pathway analysis of the validated DE microRNAs demonstrated participation of the MAPK and WNT pathways, as well as regulation of actin cytoskeleton and focal adhesion (Fig. 4D). Several pro-oncogenic genes were found to be upregulated in our analysis and were described as predicted targets of the validated DE microRNAs, as the example of MET and TGFBR1 genes (Supplemental Table S5).

Discussion
Long noncoding RNAs are arising as key participants in cancer establishment and progression by several oncogenic mechanisms 30,32 . On the other hand, it is of urge interest the determination of how these lncRNAs are activated and how they can be associated with specific events or genotypes, such as point mutations. BRAF V600E is the driver oncogenic mechanism with the greatest incidence in PTC 7 and, therefore, any event correlated with this mutation will be necessary to understand BRAF V600E -induced aggressiveness. This is the first study to identify DE lncRNAs correlated with BRAF V600E in PTC and, besides that, we extended the cohort of paired normal and PTC samples to more accurately determine DE lncRNAs between these conditions. We have identified 455 DE lncRNAs between paired normal and PTC samples. A total of 76 (log2 fold change >1 or < −1; adj. p-value < 1 × 10 −7 ) lncRNA were previously reported as DE in thyroid cancer compared to adjacent normal thyroid 36 (Fig. 5A and Supplemental Table S12). This validation set, together with the lncRNAs confirmed by experimental approaches (Fig. 2 and Supplemental Fig. S1), confers consistency to our analysis. Additionally, a diversity of DE lncRNAs identified in our analysis were reported in individual studies as altered in PTC samples, such as ENSG00000259104.2 38 ENSG00000259104.2 (PTCSC3), which is downregulated in the tumor samples (log2 fold change −1.40; adj. p-value 1.11E-12) was previously reported as having thyroid-specific expression and decreased expression in PTC 38,39 . Interestingly, the risk allele [T] associated with SNP rs944289, located at PTCSC3's promoter, affects the binding site of C/EBPα and C/EBPβ (PTCSC3 activators), reducing its expression. Restoration of PTCSC3 expression in PTC cells inhibited cell growth and affected the expression of genes involved in DNA replication/repair, cellular movement and cell death 38 . Also PTCSC3 ectopic expression reduces cell proliferation and increases cell cycle arrest and apoptosis 39 , while reducing cell motility and invasiveness through S100A4 downregulation 42 (Fig. 5B).
ENSG00000236130 (PTCSC2), was also reported as having decreased expression in PTC 40 , which was confirmed in our analysis (N × T log2 fold change −1.03; adj. p-value 3.12E-09). The risk allele [A] of rs965513 was significantly associated with low expression of unspliced PTCSC2 in unaffected thyroid tissue, however this correlation was not extended to PTC samples 40 .    ENST00000426615 (ENSG00000226363) is another lncRNA that we identified as upregulated in PTC (NxT log2 fold change 3.87; adj. p-value 2.36E-05), which was experimentally demonstrated to be overexpressed in this cancer, inducing cell proliferation and motility 37 (Fig. 5B).
Our analysis also confirmed the differential expression (N × T: log2 fold change −2.42; adj. p-value 3.96E-11) of the previously reported lncRNA ENSG00000271086 (NAMA), which is downregulated in PTC compared to normal tissues 39,41 and in BRAF V600E tumors compared to wild type tumors 39 (Fig. 5B). NAMA is induced by inhibition of the MAPK pathway, growth arrest and DNA damage 41 and our analysis also demonstrated that NAMA is downregulated in BRAF V600E patients (WT × BRAF V600E log2 fold change −1.66; adj. p-value 2.02E-15). All these independently validated lncRNAs demonstrate the reliability of our study (Fig. 5).
Similarity matrix based on Spearman correlation identified clusters of DE lncRNAs between Normal × Tumor (Supplemental Fig. S3A) and WT × BRAF V600E (Supplemental Fig. S3B) with similar expression patterns, which can provide evidence for further studies to determine common upstream regulators.
Indirectly validated targets of the DE lncRNAs between Normal × Tumor are involved in a diversity of biological processes (Fig. 3A). For instance, it was noticed an overrepresentation of adhesion molecules, such as downregulation of CDH16, which was already reported as a potential marker for PTC 43 . Along with CDH16, many other cadherins were identified as validated targets of DE lncRNAs, such as CDH2, CDH3, CDH4, CDH6, CDH11 and CDH24 (Supplemental Table S8). Another highly enriched biological process was the regulation of programmed cell death (Fig. 3A), represented by the upregulation of the antiapoptotic SOX4 44 and TP63 45 in PTC samples. Enriched pathways (Fig. 3B) as cytokine-cytokine receptor interaction, focal adhesion and MAPK pathways were already reported in the first study of DE lncRNA with paired Normal × PTC samples 36 , providing further support for future research. It was observed an enrichment of MAPK-related genes, represented in our results by upregulation, in the tumor samples, of TGB1, TGFB2 and TGFBR1 that were shown to activate the MAPK pathway 46 . Interestingly, pathway enrichment analysis of indirectly validated microRNAs (Fig. 3D) demonstrated a convergent tendency to genes involved in cancer, MAPK pathway and focal adhesion, which were also observed with the validated mRNAs.
Indirectly validated targets of the DE lncRNAs between WT × BRAF V600E tumors were demonstrated to be involved with cell surface receptors responsible for signal transduction and with cell adhesion (Fig. 4A). Pathway enrichment analysis, identified genes involved in calcium signaling and ECM-receptor interaction, which were already reported as an early transcriptome change in BRAF V600E -associated mouse model 24 . Interestingly, we also observed several genes correlated with cardiomyopathies that are mostly related to calcium regulation in cardiac muscle cells (Fig. 4B). Calcium (Fig. 4C) and MAPK cascade (represented by BRAF V600E group) are tightly involved, where calcium modulates the protein interaction properties of ERKs, affecting the subcellular localization and influencing the distribution of their targets 47 . Calcium can also stimulates MEK through Ras activation 48 . Therefore, these results can support a future investigation to answer if BRAF V600E -stimulated MAPK activation can be reinforced by calcium modulation induced by the DE lncRNAs. Additionally, MAPK stimulation may be supported by the indirectly validated microRNAs, since pathway analysis demonstrated an enrichment of microRNAs' targets in cancer and MAPK pathways (Fig. 4D). Interestingly, predicted targets of the DE microR-NAs were also differentially expressed in our analysis, such as the upregulation of TGFBR1 and downregulation of PRKACB (Supplemental Table S5).
Concluding, our extended cohort of paired Normal and PTC patients identified new DE lncRNAs and confirmed many other lncRNAs already reported. Additionally, to our knowledge, this is the first study to identify BRAF V600E -correlated lncRNAs in PTC, which will provide support for future studies aiming to identify BRAF V600E -linked events in attempt to optimize therapeutic treatment and diagnosis/prognosis of this aggressive PTC genotype.

Methods
Data analysis. Thyroid Carcinoma (THCA) clinical information, mRNA and microRNA data expression data were downloaded from The Cancer Genome Atlas (TCGA) online platform (https://tcga-data.nci.nih.gov/ tcga/), as January 2016. Mutations data were retrieved through cBioPortal 49 . LncRNA RPKM expression levels corresponding to TCGA patients were downloaded through TANRIC 50 , which obtained the genomic coordinates of 13,870 human lncRNAs from the GENCODE Resource (version 19) 51 and further filtered out those lncRNA exons that overlapped with any known coding genes based on the gene annotations of GENCODE and RefGene, resulting in 12,727 lncRNAs 50 .
BRAF V600E patients were selected to form the BRAF V600E group (n = 226), which excluded any other type of BRAF mutations. Wild Type group (n = 242) was formed by patients without any somatic mutation in BRAF gene, but patients with mutations in HRAS, NRAS, KRAS, EIF1AX, PPM1D, RET and NTRK1 were considered. It were selected only the patients with papillary thyroid cancer diagnosis. Differential Expression Analysis. For differential expression analysis of mRNA and microRNA was used edgeR package 52 through TCGAbiolinks 53 . To identify differentially expressed lncRNAs between groups, it was used the paired/unpaired Student t test to assess the statistical difference of mean expression values between the two groups 50 . LncRNA with median value equal zero were excluded and fold change was calculated using median expression values. In all differential analysis, p-values were adjusted for False Discovery Rate (FDR) < 0.05 as multiple hypothesis test correction method.
To select the above DE lncRNAs for validation, it was taken in consideration one of the major characteristics of lncRNA, that is, high heterogeneous expression across the same tumor and even the same cell line. Long noncoding RNA's expression is tightly regulated by a wild range of cellular responses and, due to the markedly lower transcriptional levels of lncRNAs, the expression variability inside the same group of patients is expected. Taking this aspects in consideration, from the top 25 positively DE lncRNAs and from the top 20 negatively DE lncRNAs, it were selected for validation those lncRNAs with low adjusted p-values to avoid this variability, especially in the comparison BRAF WT × BRAF V600E tumor, which is the focus of this research. As another desirable characteristic, most lncRNAs selected, presented at least in one group (normal thyroid, PTC, BRAF WT or BRAF V600E ) a median expression greater than 1 RPKM (reads per kilo base per million mapped reads). Added to that, it was given preference for those lncRNAs without isoforms (seen that many lncRNAs have annotation errors) and that present at least 2 exons, which are more stable and would allow the PCR primers to be located in different exons.
Prediction of lncRNAs targets. To identify the possible target genes of the selected (Fig. 1D and E) differentially expressed lncRNAs via cis-or trans-regulatory effects, two previously described approaches were used 36,54 . The genes transcribed within a 10 kb window upstream or downstream of lncRNAs were considered as cis-target genes 36,54 . The second method was used to identify trans-targets and is based on mRNA and microRNA sequence complementarity with the query lncRNA. For mRNA interactions we used a pre-computed database that catalogs the predicted lncRNA-RNA interactions 55 , where the accessible regions of the query lncRNA and possible targets (mRNA/lncRNA) are extracted, the binding energies of pairs of sequences (target and query) around the seed matches are evaluated and the minimum interaction energy of the joint secondary structures is calculated 55 . The 500 predicted targets (mostly constituting repeated targets with different interaction sites) with the lowest minimum free energies under −20 kcal/mol were taken in consideration for downstream analysis. For lncRNA and microRNA interaction prediction it was used rna22, a method for identifying microRNA-binding sites and their corresponding heteroduplexes 56 . It were selected those microRNAs with a folding energy lower than −20 kcal/mol. Indirect Validation. As interaction prediction methods are susceptible to error and to minimize this, we compared the predicted targets of the differentially expressed lncRNAs with the differentially expressed mRNAs and microRNAs 36 calculated with the TCGA patients, because we consider that the targets of DE lncRNAs would possibly be DE in TCGA analysis. With this approach, we intended to enrich our analysis for targets with a greater propensity to be occurring biologically.
Gene ontology, pathway enrichment and protein-protein interaction network. Indirect validated targets of the DE lncRNAs were loaded into the Database for Annotation, Visualization and Integrated Discovery (DAVID) 57 , which returned the gene ontology of the query genes and identified enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways 58 . miRSystem 59 was used to calculate enriched pathways based on the predicted targets of the query microRNAs, which in this case, were the DE microRNAs in both conditions (Normal × Tumor and WT × BRAF V600E ). For protein-protein interaction network it was used Genemania 60 .