Long noncoding RNA landscapes specific to benign and malignant thyroid neoplasms of distinct histological subtypes

The main types of thyroid neoplasms, follicular adenoma (FA), follicular thyroid carcinoma (FTC), classical and follicular variants of papillary carcinoma (clPTC and fvPTC), and anaplastic thyroid carcinoma (ATC), differ in prognosis, progression rate and metastatic behaviour. Specific patterns of lncRNAs involved in the development of clinical and morphological features can be presumed. LncRNA landscapes within distinct benign and malignant histological variants of thyroid neoplasms were not investigated. The aim of the study was to discover long noncoding RNA landscapes common and specific to major benign and malignant histological subtypes of thyroid neoplasms. LncRNA expression in FA, FTC, fvPTC, clPTC and ATC was analysed with comprehensive microarray and RNA-Seq datasets. Putative biological functions were evaluated via enrichment analysis of coexpressed coding genes. In the results, lncRNAs common and specific to FTC, clPTC, fvPTC, and ATC were identified. The discovered lncRNAs are putatively involved in L1CAM interactions, namely, pre-mRNA processing (lncRNAs specific to FTC); PCP/CE and WNT pathways (lncRNAs specific to fvPTC); extracellular matrix organization (lncRNAs specific to clPTC); and the cell cycle (lncRNAs specific to ATC). Known oncogenic and suppressor lncRNAs (RMST, CRNDE, SLC26A4-AS1, NR2F1-AS1, and LINC00511) were aberrantly expressed in thyroid carcinomas. These findings enhance the understanding of lncRNAs in the development of subtype-specific features in thyroid cancer.

www.nature.com/scientificreports/ (originates from the promoter region of a protein-coding gene with transcription proceeding in the opposite direction on the other strand); and 3-prime overlapping (overlap the 3′UTR of a protein-coding locus on the same strand). Today, the number of annotated lncRNA genes has reached 14 720 according to Ensembl version 93 9 .
In thyroid cancer, several lncRNAs have been shown to have pathogenic and predictive roles, including BANCR, FALEC, CNALPTC1, PVT1, NAMA, PTCSC1, PTCSC2, PTCSC3, and TNRC6C-AS1 [10][11][12][13][14][15][16][17][18][19][20][21] . However, all of the studies to date have considered only PTC, and mostly none of the previous works takes into account the difference between clPTC and fvPTC. There are no published studies describing the landscapes of lncRNAs in ATC, FTC and FA. Nevertheless, lncRNAs differentially expressed in ATC may reflect anaplastic features and be strong prognostic factors. As the morphology and behaviour of FTC differ from those of PTC, it is proposed that the landscape of lncRNAs in FTC may be different from that of PTC. Investigation of lncRNAs common and specific to FA and FTC is important in understanding their relations and revealing differential diagnostic markers.
This study aimed to identify lncRNAs specific and common to the main types of thyroid neoplasms (FA, FTC, fvPTC, clPTC and ATC). The expression data from microarray technology (8 datasets) and RNA-Seq technology (the PRJEB11591 dataset and TCGA transcriptome data) were analysed.

Results
LncRNAs differentially expressed in thyroid neoplasms. LncRNA expression was evaluated in the main histological subtypes of thyroid neoplasms, FA, FTC, fvPTC, clPTC, and ATC, compared to those in NT. The expression of 3910 lncRNA genes in the microarray dataset, 2587 in the RNA-Seq PRJEB11591 dataset and 3009 in the RNA-Seq TCGA dataset was analysed. The number of genes analysed corresponded to the total number of lncRNAs covered by uniquely mapped probes in the Affymetrix Human Genome U133 Plus 2.0 Array and the number of lncRNAs yielded after filtration by low number of counts for RNA-Seq datasets.
The numbers of lncRNAs found to be differentially expressed in each subtype compared to NT are presented in Table 1. The complete lists of the identified differentially expressed lncRNAs are shown in Supplementary files 1-3. Volcano plots representing the distribution of fold change and adjusted p values in the studied histological subtypes are shown in Supplementary file 4.
Hierarchical clustering of differentially expressed lncRNAs in the microarray and PRJEB11591 datasets is presented in Fig. 1. There was strong clustering of ATC, clustering of clPTC and weak clustering of fvPTC lncR-NAs. No clustering of lncRNAs within the FTC or FA groups was observed (Fig. 1).
Via cross-dataset confirmation, 116 genes in clPTC were validated (45 genes found in all analysed datasets, 71 genes without probes in the microarray were found in both RNA-Seq datasets; Fig. 2A), and 62 genes in fvPTC were validated (Fig. 2B). These genes can be considered to have robustly differentially expressed lncRNAs. There are no datasets available for performing cross-dataset validation of FA, FTC or ATC genes.
LncRNAs common and specific to each histological subtype were detected via intersection of the genes expressed differentially in each subtype compared to NT, and subsequent selection of lncRNAs validated in clPTC and fvPTC, and significantly differentially expressed in comparison between subtypes of neoplasms (Figs. 3, 4).

LncRNAs common to FA and WDTC.
Of the 35 lncRNAs found to be differentially expressed in FA and WDTC (FTC, clPTC, and fvPTC) compared to NT, 13 genes were cross validated in clPTC and fvPTC (Figs. 3, 4, Table 2). The expression of LINC02555 and LINC02471 increased during the progression from adenoma to carcinomas and was significantly higher in fvPTC and clPTC than in FA. The expression of ENSG00000256542 and ENSG00000258117 decreased during the transition from FA to carcinomas and was significantly lower in fvPTC and clPTC than in FA or FTC.
LncRNAs common to WDTC. There were 32 lncRNAs differentially expressed in all the studied histological subtypes of WDTC (FTC, clPTC, and fvPTC) but not in FA (Fig. 3). Of these lncRNAs, 6 lncRNAs were validated to be significantly differentially expressed in clPTC and fvPTC compared to FA (Fig. 4, Table 3). None of the 32 lncRNAs were differentially expressed in FTC compared to FA.
LncRNAs common to papillary carcinomas. There were 22 genes differentially expressed in both clPTC and fvPTC, but not in FA or FTC (Fig. 3), validated and significantly differentially expressed compared to FA and FTC (Fig. 4, Table 4)-lnRNAs are putatively associated with papillary features in thyroid carcinomas.  Table 5). However, none of these lncRNAs was differentially expressed compared to those in FA. Of the 29 genes differentially expressed in fvPTC but not in other differentiated carcinomas or FA (Figs. 3, 4), only the ENSG00000257647 gene was specific to fvPTC-validated and significantly differentially expressed in fvPTC compared to FA, FTC and clPTC.
The 32 genes were found to be differentially expressed in clPTC but not in other differentiated carcinomas or FA, validated, and significantly differentially expressed compared to those in fvPTC, FTC and FA-lncRNAs specific to clPTC (Figs. 3, 4, Table 6).
LncRNA specific to ATC . ATC samples were available only in the microarray dataset, which also included two variants of PTC. Of the 376 lncRNAs differentially expressed in ATC compared to NT, 252 were not differentially expressed in the other investigated histological subtypes, and 185 were significantly differentially expressed compared to those in clPTC and fvPTC-lncRNAs specific to ATC. The 30 most differentially expressed genes are presented in Table 7, and the full list is shown in Supplementary file 5.
Potential biological functions of aberrantly expressed lncRNAs. The coexpressed genes for each lncRNA from the top 5 most differentially expressed list for discussed groups were identified. The number of coexpressed genes was 138.5 (46.25 − 256.5) −median (Q1 − Q3).
Analysis of the enrichment of GO biological processes and GO molecular functions, KEGG, and Reactome terms with coexpressed coding genes allowed us to identify putative pathways involving common and specific www.nature.com/scientificreports/ lncRNAs (Fig. 5). The main functions of the lncRNAs common to FA and WDTC-cancers (Colorectal, Nonsmall cell lung, Thyroid) and p53 signalling; functions of the lncRNAs common to WDTC-Pathways in cancer and L1CAM interactions; functions of the lncRNAs common to papillary carcinomas-aldehyde dehydrogenase (NAD) activity; functions of the lncRNAs specific to FTC-processing of capped intron-containing pre-mRNA; functions of the lncRNAs specific to fvPTC-PCP/CE pathway and Beta-catenin independent WNT signalling; functions of the lncRNAs specific to clPTC-extracellular matrix organization and endoderm formation; and functions of the lncRNAs specific to ATC-cell cycle and mitotic processes.

Discussion
Histological subtypes of follicular cell-derived thyroid carcinomas (FTC, PTC, and ATC) significantly differ in their mutational landscapes and clinical characteristics. Although FTC and clPTC are both WDTCs, FTC is characterized by a follicular growth pattern and tends more often to spread as metastases to distant organs, while clPTC typically has papillary architecture and spreads more often to lymph nodes in the neck. In FTC, K/H/ NRAS and PAX8/PPARG mutations are prevalent, whereas BRAF mutations and tyrosine kinase fusions prevail in clPTC 1 . The clinical characteristics of fvPTC are intermediate; fvPTC is composed of neoplastic follicles not papillae, but with follicular cells showing nuclear features typical of PTC 22 . The mutational profile of fvPTC is most similar to that of FTC: both exhibit a prevalence of K/H/NRAS and PAX8/PPARG mutations. In a previous TCGA study, fvPTC was characterized as a Ras-like tumour, and its classification as a papillary carcinoma was questioned 23 . Recently, reclassification of encapsulated fvPTC as a "noninvasive follicular thyroid neoplasm with papillary-like nuclear features" (NIFTP) was proposed 2 . ATC is an advanced stage thyroid neoplasm and is the Figure 2. In silico validation of differentially expressed lncRNAs in clPTC (A) and fvPTC (B). In clPTC, 116 genes were considered to be validated (differentially expressed in all datasets or differentially expressed in both RNA-Seq datasets but absent in microarray probes). In fvPTC, 62 genes were considered to be validated. www.nature.com/scientificreports/ most aggressive thyroid cancer. It is expected that there are specific molecular features, including lncRNA patterns, associated with the clinical and histological features of WDTC and the aggressive behaviour of ATC. FA is thought to be a benign counterpart of FTC, and understanding the common and different molecular features of these neoplasms is important for the development of diagnostic and therapeutic strategies. In this study, the expression of lncRNAs was evaluated in the main histological subtypes of thyroid neoplasms: FA, FTC, fvPTC, clPTC and ATC. Datasets analysed in the study (a microarray dataset of 8 independent experiments; RNA-Seq PRJEB11591; and RNA-Seq TCGA) allowed us to perform robust cross-dataset validation of the results for clPTC and fvPTC and to include representative sets of FA, FTC and ATC samples. LncRNA landscapes in FA, FTC and ATC were analysed for the first time. The highest number of genes aberrantly expressed compared to normal thyroid tissue were found in ATC (330 lncRNAs), followed by clPTC, FTC and fvPTC, which reflects   www.nature.com/scientificreports/ the more advanced stage of ATC. Since the data for ATC, FA and FTC wer limited with one dataset only, the results for these subtypes are preliminary. Intersection of the differentially expressed lncRNAs and subsequent comparison of the expression between subtypes of neoplasms led to the discovery of lncRNAs common to FA and WDTC (13 genes), common to WDTC (6 genes), common to classical and follicular variants of PTC (22 genes), and specific to FTC (19 genes), fvPTC (1 gene), clPTC (32 genes), and ATC (185 genes). The discovered lncRNAs were proposed to be involved in the development of clinical and morphological features of the studied subtypes. Putative biological processes involving common and specific lncRNAs were identified.
LncRNAs common to all studied thyroid neoplasms (including FA) and common to WDTC appear to be involved in carcinogenesis in different locations. L1CAM interactions found in this study to involve lncRNAs common to WDTC have been previously associated with well-described roles in tumour progression, metastases and the epithelial-to-mesenchymal transition 24,25 .
LncRNAs common to follicular and classical variants of papillary carcinoma (associated with papillary histology) are involved in aldehyde dehydrogenase (NAD) activity. Aldehyde dehydrogenase is known to maintain cancer stem cell properties in various cancers, including the thyroid 26 .
Biological processes involving lncRNAs specific to FTC include processes that are associated with splicing (Processing of Capped Intron-Containing Pre-mRNA, mRNA Splicing, and RNA processing). Accumulating evidence suggests that aberrant RNA splicing is a common and driving event in cancer development and progression. For instance, oncogenic Ras signalling via the ERK and PI3-K/Akt pathways regulates the phosphorylation of splicing factors such as SRSF1, SRSF7, and SPF45 and drives the switching of active and inactive states of tumour promoters and suppressors (MST1R, FAS, CD44, LBR, Casp-9, KLF6, and others) via alternative splicing 27,28 . None of the lncRNAs specific to FTC were differentially expressed compared to those in FA. The absence of lncRNAs differentially expressed in FTC and FA corresponds to the commonality of these subtypes and frequent difficulty in cytology-based differential diagnostics.
Only lncRNA ENSG00000257647 is specific to fvPTC, which might be explained by its intermediate morphology with features of both papillary and follicular carcinomas, leading to its debatable classification. LncRNA ENSG00000257647 appeared to be involved in WNT signalling, predominantly through the   www.nature.com/scientificreports/ Beta-catenin-independent WNT pathway (especially, planar cell polarity that modulates cytoskeleton rearrangements through the activation of the small GTPases RhoA and Rac and their downstream effectors Rock and JNK). WNT signalling is known to play a crucial role in thyroid carcinogenesis, and several mechanisms of its deregulation have been described, including inhibition of the β-catenin degradation complex via its phosphorylation by RET/PTC, inhibition of E-cadherin expression through the MAPK/ERK pathway activated by BRAF mutations, and activation of both canonical and non-canonical Wnt pathways by RAS mutations 29,30 . LncRNAs specific to clPTC are involved in extracellular matrix organization and endoderm and collagen formation. Extracellular matrix (ECM) disorganization is known to play a pivotal role in cancer initiation and progression. The major driving mutation in clPTC is BRAF p.V600E, and there is emerging evidence of ECM remodelling induced by BRAF p.V600E in PTCs 31 . Notably, it has been previously shown that the extracellular matrix of PTCs driven by BRAF p. V600E (but not mutant HRAS) is enriched with stromal-derived fibrillar collagen and facilitates cancer progression 32 .
lncRNAs specific to ATC are probably associated with its anaplastic features and aggressive behaviour. For these lncRNAs, there is a strong enrichment of cell cycle and mitotic pathways which possibly reflects the involvement of these lncRNAs in the loss of differentiation and high proliferation rate characteristic of ATC.
Of lncRNAs previously described in thyroid cancer, we found that PTCSC3 was downregulated in all investigated neoplasms, including FA; TNRC6C-AS1 was upregulated in papillary carcinomas; and PVT1 was specifically upregulated in ATC 11,17,18 . Other lncRNAs previously described in thyroid malignancy (BANCR, NAMA, CNALPTC1, FALEC, and PTCSC2) were not identified in our study 10,12,13,16,20,21 . A possible explanation is the strong association of these lncRNAs with specific mutations and the heterogeneity of driving mutations within the same subtype; for example, the aberrant expression of BANCR is driven by BRAF mutation.
The aberrant expression of lncRNAs with Ensembl annotation found by Liyanarachchi et al. (2016) in PTC was confirmed in our study 14 . Most of these lncRNAs were common to thyroid neoplasms (including FA) or common to classical and follicular variants of PTC. No lncRNAs found in our study to be subtype specific were discovered by Liyanarachchi. www.nature.com/scientificreports/ In the present study, we identified some lncRNAs with known roles in tumorigenesis but not previously described in thyroid cancer [33][34][35][36][37] . The identified upregulated promoters of cancer progression included NR2F1-AS1 and LINC00511 in clPTC and CRNDE in ATC; downregulated tumour suppressors SLC26A4-AS1 in clPTC and RMST in ATC.

Conclusion
LncRNAs common to FA and WDTC, common to WDTC, common to carcinomas with papillary features, and specific to clPTC, fvPTC, FTC and ATC were discovered in the analysis performed with the most comprehensive datasets (combination of a microarray dataset and two RNA-Seq datasets). The similarity of the lncRNA landscapes in FTC and FA was revealed. The results showed that LncRNAs common to FA and WDTC and common to WDTC are involved in pathways in cancer at various sites, p53 signalling and L1CAM interactions; lncRNAs common to papillary carcinomas are involved in aldehyde dehydrogenase (NAD) activity; lncRNAs specific to FTC are involved in mRNA processing; a lncRNA specific to fvPTC is involved in planar cell polarity and WNT signalling; lncRNAs specific to clPTC are involved in extracellular matrix organization and endoderm formation; and lncRNAs specific to ATC are involved in the cell cycle and mitotic processes; and LncRNAs found to be specific to ATC, including CRNDE and RMST, are likely associated with cancer aggressiveness and cancer progression.

Materials and methods
Microarray datasets. The microarray datasets obtained from Affymetrix Human Genome U133 Plus 2.0 Array (Platform GPL570) were originally selected from the GEO database. The following datasets were included: GSE3467, GSE60542, GSE35570, GSE76039, GSE53157, GSE33630, GSE65144, and GSE29265. A total of 107 samples of normal tissue (NT) and 32 fvPTC, 48 clPTC, and 49 ATC samples were analysed. CEL files were downloaded, and normalization was performed using the gcrma R package. Microarray probes were annotated with Ensembl version 93 using the biomaRt package 38 . www.nature.com/scientificreports/ sample group) were eliminated, and TMM normalization (edgeR package) and the voom method using the limma R package were applied. In the TCGA transcriptome data, 58 NT, 356 clPTC and 101 fvPTC were selected. Samples of metastases and other minor histological subtypes were excluded. Raw counts (HTSeq-Counts Workflow Type, briefly, STAR 2-pass alignment followed by gene expression count assessment with HTSeq) were downloaded from Genomic Data Commons Data Portal (GDC, https:// portal. gdc. cancer. gov/). Genes with low counts (less than 1 count in number of samples exceeding the size of smallest sample group) were eliminated, followed by TMM normalization (edgeR package) and voom analysis with limma 42 .
Selection of lncRNA genes. Protein-coding genes and genes attributed to Havana biotypes not related to lncRNAs were eliminated. Genes of the following Havana biotypes were included in the analysis: lincRNA, antisense, 3-prime overlapping ncRNA, bidirectional promoter lncRNA, misc RNA, processed transcript, sense intronic, and sense overlapping.

Statistical analysis.
To identify differentially expressed lncRNAs, linear modelling using the limma package was performed 43 . Genes with FDR adjusted p value ≤ 0.01 and fold change (FC) ≥ 2.0 were considered to be differentially expressed. A heat map analysis of differentially expressed genes was performed using coolmap limma.
Validation. For clPTC and fvPTC, the sets of genes found to be significantly differentially expressed in a previous step in the microarray, RNA-Seq PRJEB11591, and RNA-Seq TCGA datasets were processed with intersection. Genes found in all three datasets and genes found in both RNA-Seq datasets but not in microarray probes were considered validated.
Selection of lncRNAs common and specific to histological subtypes. LncRNAs common and specific to FA and WDTC were selected via the intersection analysis of genes found to be significantly differentially expressed in each subtype compared to NT in the RNA-Seq dataset PRJEB11591 and subsequent application following criteria: -Common to FA and WDTC-confirmed through validation for clPTC and fvPTC; -Common to WDTC-confirmed through validation for clPTC and fvPTC, and significantly differentially expressed compared to FA; -Common to papillary carcinomas-confirmed through validation for clPTC and fvPTC, and significantly differentially expressed compared to FA and FTC; -Specific to clPTC, fvPTC, FTC-confirmed through validation for clPTC and fvPTC (not applied to FTC), and significantly differentially expressed compared to each studied subtype; LncRNAs specific to ATC were selected from intersection with genes found in FA and WDTC with subsequent filtration of genes significantly differentially expressed compared to those in clPTC and fvPTC.
Evaluation of potential biological functions. To identify genes positively and negatively coexpressed with 5 most differentially expressed lncRNAs, pairwise Pearson correlation between the lncRNAs and all the genes was calculated using the RNA-Seq PRJEB11591 dataset (for FA, FTC, fvPTC and clPTC) and the microarray dataset (for ATC). Genes with an absolute r ≥ 0.7 and a significant correlation (p value < 0.05) were considered to be coexpressed. For coexpressed genes, enrichment of Gene Ontology (GO) Biological Process (2018), GO Molecular Function (2018), Kyoto Encyclopedia of Genes and Genomes (KEGG, 2019) and Reactome (2016) terms was estimated using Enrichr 44,45 . Terms with adjusted p values in Fisher's exact test ≤ 0.05 were considered significantly enriched [42][43][44][45][46] .