Introduction

The major types of thyroid cancer are papillary carcinoma (PTC) and follicular carcinoma (FTC), accounting for approximately 70–80% and 10–15% of all thyroid cancers, respectively. Both PTC and FTC are derived from follicular cells and are well-differentiated thyroid cancers (WDTCs), with distinct mutational landscapes, clinical behaviours, typical sites of metastasis and prognostic clinical markers1. Within PTC, several variants can be distinguished, and classical (clPTC) and follicular variants (fvPTC) are the most frequently identified. fvPTC has intermediate behaviour; it is composed of neoplastic follicles rather than papillae, but with follicular cells showing nuclear features characteristic of PTC. The benign counterpart of FTC is follicular adenoma (FA), and it is often challenging to differentiate them through cytology. FA, FTC and fvPTC compose follicular-pattern thyroid tumours, sharing common mutational prevalence and clinical features1,2. Anaplastic thyroid carcinoma (ATC) is the most advanced and aggressive thyroid cancer and the least likely to respond to treatment3,4. Based on the differences in mutational landscapes, morphology and clinical behaviour of histological subtypes, specific molecular patterns, including patterns of long noncoding RNAs (lncRNAs), are expected to be associated with these features.

Evidence of the important roles of lncRNAs in tumour suppression, cancer progression, invasion and metastatic potential and their prognostic and therapeutic value is increasing5. LncRNAs are RNA molecules of more than 200 nucleotides that typically do not have a functional open reading frame (however, bifunctional RNAs have been discovered that function as both protein-coding and noncoding RNAs). Many lncRNA genes have two or more exons and display 5′-capping, polyadenylation and alternative splicing. The functions of lncRNAs are realized in different ways: recruiting transcription factors, chromatin organizers, or chromatin modifiers, forming DNA–RNA triplex anchoring effector proteins to the gene promoter, acting as decoys for miRNAs and proteins, or interfering with protein posttranslational modification5,6,7,8. Relative to the coding genes, lncRNAs can be classified into intergenic (lincRNA); antisense (on the opposite strand of a protein-coding locus); sense intronic or overlapping (on the same strand, with transcript in introns of a coding gene, or containing a coding gene in its intron); retained intron (an alternatively spliced transcript containing an intronic sequence); bidirectional (originates from the promoter region of a protein-coding gene with transcription proceeding in the opposite direction on the other strand); and 3-prime overlapping (overlap the 3′UTR of a protein-coding locus on the same strand). Today, the number of annotated lncRNA genes has reached 14 720 according to Ensembl version 939.

In thyroid cancer, several lncRNAs have been shown to have pathogenic and predictive roles, including BANCR, FALEC, CNALPTC1, PVT1, NAMA, PTCSC1, PTCSC2, PTCSC3, and TNRC6C-AS110,11,12,13,14,15,16,17,18,19,20,21. However, all of the studies to date have considered only PTC, and mostly none of the previous works takes into account the difference between clPTC and fvPTC. There are no published studies describing the landscapes of lncRNAs in ATC, FTC and FA. Nevertheless, lncRNAs differentially expressed in ATC may reflect anaplastic features and be strong prognostic factors. As the morphology and behaviour of FTC differ from those of PTC, it is proposed that the landscape of lncRNAs in FTC may be different from that of PTC. Investigation of lncRNAs common and specific to FA and FTC is important in understanding their relations and revealing differential diagnostic markers.

This study aimed to identify lncRNAs specific and common to the main types of thyroid neoplasms (FA, FTC, fvPTC, clPTC and ATC). The expression data from microarray technology (8 datasets) and RNA-Seq technology (the PRJEB11591 dataset and TCGA transcriptome data) were analysed.

Results

LncRNAs differentially expressed in thyroid neoplasms

LncRNA expression was evaluated in the main histological subtypes of thyroid neoplasms, FA, FTC, fvPTC, clPTC, and ATC, compared to those in NT. The expression of 3910 lncRNA genes in the microarray dataset, 2587 in the RNA-Seq PRJEB11591 dataset and 3009 in the RNA-Seq TCGA dataset was analysed. The number of genes analysed corresponded to the total number of lncRNAs covered by uniquely mapped probes in the Affymetrix Human Genome U133 Plus 2.0 Array and the number of lncRNAs yielded after filtration by low number of counts for RNA-Seq datasets.

The numbers of lncRNAs found to be differentially expressed in each subtype compared to NT are presented in Table 1. The complete lists of the identified differentially expressed lncRNAs are shown in Supplementary files 13. Volcano plots representing the distribution of fold change and adjusted p values in the studied histological subtypes are shown in Supplementary file 4.

Table 1 Numbers of lncRNAs differentially expressed in thyroid nodules compared to normal thyroid tissue.

Hierarchical clustering of differentially expressed lncRNAs in the microarray and PRJEB11591 datasets is presented in Fig. 1. There was strong clustering of ATC, clustering of clPTC and weak clustering of fvPTC lncRNAs. No clustering of lncRNAs within the FTC or FA groups was observed (Fig. 1).

Figure 1
figure 1

Clustering of FA, FTC, fvPTC, clPTC and ATC by the expression of lncRNA. (A) Microarray dataset; (B) RNA-Seq PRJEB11591 dataset. Genes differentially expressed in each histological subtype are included.

Via cross-dataset confirmation, 116 genes in clPTC were validated (45 genes found in all analysed datasets, 71 genes without probes in the microarray were found in both RNA-Seq datasets; Fig. 2A), and 62 genes in fvPTC were validated (Fig. 2B). These genes can be considered to have robustly differentially expressed lncRNAs. There are no datasets available for performing cross-dataset validation of FA, FTC or ATC genes.

Figure 2
figure 2

In silico validation of differentially expressed lncRNAs in clPTC (A) and fvPTC (B). In clPTC, 116 genes were considered to be validated (differentially expressed in all datasets or differentially expressed in both RNA-Seq datasets but absent in microarray probes). In fvPTC, 62 genes were considered to be validated.

LncRNAs common and specific to each histological subtype were detected via intersection of the genes expressed differentially in each subtype compared to NT, and subsequent selection of lncRNAs validated in clPTC and fvPTC, and significantly differentially expressed in comparison between subtypes of neoplasms (Figs. 3, 4).

Figure 3
figure 3

Intersection of lncRNA landscapes in thyroid neoplasms. (A) lncRNA landscapes in FA, FTC, fvPTC, and clPTC (based on the RNA-Seq PRJEB11591 dataset). (B) LncRNA landscapes in differentiated neoplasms (total list of lncRNAs differentially expressed in FA, FTC, fvPTC, and clPTC) and ATC.

Figure 4
figure 4

Numbers of common and specific lncRNAs found after intersection analysis and filtration.

LncRNAs common to FA and WDTC

Of the 35 lncRNAs found to be differentially expressed in FA and WDTC (FTC, clPTC, and fvPTC) compared to NT, 13 genes were cross validated in clPTC and fvPTC (Figs. 3, 4, Table 2). The expression of LINC02555 and LINC02471 increased during the progression from adenoma to carcinomas and was significantly higher in fvPTC and clPTC than in FA. The expression of ENSG00000256542 and ENSG00000258117 decreased during the transition from FA to carcinomas and was significantly lower in fvPTC and clPTC than in FA or FTC.

Table 2 LncRNAs common for FA and WDTC.

LncRNAs common to WDTC

There were 32 lncRNAs differentially expressed in all the studied histological subtypes of WDTC (FTC, clPTC, and fvPTC) but not in FA (Fig. 3). Of these lncRNAs, 6 lncRNAs were validated to be significantly differentially expressed in clPTC and fvPTC compared to FA (Fig. 4, Table 3). None of the 32 lncRNAs were differentially expressed in FTC compared to FA.

Table 3 LncRNAs common for WDTC.

LncRNAs common to papillary carcinomas

There were 22 genes differentially expressed in both clPTC and fvPTC, but not in FA or FTC (Fig. 3), validated and significantly differentially expressed compared to FA and FTC (Fig. 4, Table 4)—lnRNAs are putatively associated with papillary features in thyroid carcinomas.

Table 4 LncRNAs common for papillary carcinomas.

LncRNA specific to histological subtypes of differentiated carcinomas

Nineteen lncRNAs were aberrantly expressed in FTC but not in the other studied neoplasms and were significantly differentially expressed compared to those in clPTC and fvPTC (Figs. 3, 4, Table 5). However, none of these lncRNAs was differentially expressed compared to those in FA.

Table 5 LncRNAs specific for FTC.

Of the 29 genes differentially expressed in fvPTC but not in other differentiated carcinomas or FA (Figs. 3, 4), only the ENSG00000257647 gene was specific to fvPTC-validated and significantly differentially expressed in fvPTC compared to FA, FTC and clPTC.

The 32 genes were found to be differentially expressed in clPTC but not in other differentiated carcinomas or FA, validated, and significantly differentially expressed compared to those in fvPTC, FTC and FA-lncRNAs specific to clPTC (Figs. 3, 4, Table 6).

Table 6 LncRNAs specific for clPTC.

LncRNA specific to ATC

ATC samples were available only in the microarray dataset, which also included two variants of PTC. Of the 376 lncRNAs differentially expressed in ATC compared to NT, 252 were not differentially expressed in the other investigated histological subtypes, and 185 were significantly differentially expressed compared to those in clPTC and fvPTC-lncRNAs specific to ATC. The 30 most differentially expressed genes are presented in Table 7, and the full list is shown in Supplementary file 5.

Table 7 Top 30 lncRNAs specific for ATC.

Potential biological functions of aberrantly expressed lncRNAs

The coexpressed genes for each lncRNA from the top 5 most differentially expressed list for discussed groups were identified. The number of coexpressed genes was 138.5 (46.25 − 256.5) −median (Q1 − Q3).

Analysis of the enrichment of GO biological processes and GO molecular functions, KEGG, and Reactome terms with coexpressed coding genes allowed us to identify putative pathways involving common and specific lncRNAs (Fig. 5). The main functions of the lncRNAs common to FA and WDTC-cancers (Colorectal, Non-small cell lung, Thyroid) and p53 signalling; functions of the lncRNAs common to WDTC-Pathways in cancer and L1CAM interactions; functions of the lncRNAs common to papillary carcinomas-aldehyde dehydrogenase (NAD) activity; functions of the lncRNAs specific to FTC-processing of capped intron-containing pre-mRNA; functions of the lncRNAs specific to fvPTC-PCP/CE pathway and Beta-catenin independent WNT signalling; functions of the lncRNAs specific to clPTC-extracellular matrix organization and endoderm formation; and functions of the lncRNAs specific to ATC-cell cycle and mitotic processes.

Figure 5
figure 5

Putative biological process involving aberrantly expressed lncRNAs in thyroid neoplasms. The enrichment analysis of GO Biological Process, GO Molecular Function, KEGG, and Reactome terms was performed for the 5 most differentially expressed lncRNAs, and terms with adjusted p values ≤ 0.05 were considered significantly enriched. For fvPTC and ATC, the 10 most significantly enriched terms are presented.

Discussion

Histological subtypes of follicular cell-derived thyroid carcinomas (FTC, PTC, and ATC) significantly differ in their mutational landscapes and clinical characteristics. Although FTC and clPTC are both WDTCs, FTC is characterized by a follicular growth pattern and tends more often to spread as metastases to distant organs, while clPTC typically has papillary architecture and spreads more often to lymph nodes in the neck. In FTC, K/H/NRAS and PAX8/PPARG mutations are prevalent, whereas BRAF mutations and tyrosine kinase fusions prevail in clPTC1. The clinical characteristics of fvPTC are intermediate; fvPTC is composed of neoplastic follicles not papillae, but with follicular cells showing nuclear features typical of PTC22. The mutational profile of fvPTC is most similar to that of FTC: both exhibit a prevalence of K/H/NRAS and PAX8/PPARG mutations. In a previous TCGA study, fvPTC was characterized as a Ras-like tumour, and its classification as a papillary carcinoma was questioned23. Recently, reclassification of encapsulated fvPTC as a “noninvasive follicular thyroid neoplasm with papillary-like nuclear features” (NIFTP) was proposed2. ATC is an advanced stage thyroid neoplasm and is the most aggressive thyroid cancer. It is expected that there are specific molecular features, including lncRNA patterns, associated with the clinical and histological features of WDTC and the aggressive behaviour of ATC. FA is thought to be a benign counterpart of FTC, and understanding the common and different molecular features of these neoplasms is important for the development of diagnostic and therapeutic strategies.

In this study, the expression of lncRNAs was evaluated in the main histological subtypes of thyroid neoplasms: FA, FTC, fvPTC, clPTC and ATC. Datasets analysed in the study (a microarray dataset of 8 independent experiments; RNA-Seq PRJEB11591; and RNA-Seq TCGA) allowed us to perform robust cross-dataset validation of the results for clPTC and fvPTC and to include representative sets of FA, FTC and ATC samples. LncRNA landscapes in FA, FTC and ATC were analysed for the first time. The highest number of genes aberrantly expressed compared to normal thyroid tissue were found in ATC (330 lncRNAs), followed by clPTC, FTC and fvPTC, which reflects the more advanced stage of ATC. Since the data for ATC, FA and FTC wer limited with one dataset only, the results for these subtypes are preliminary.

Intersection of the differentially expressed lncRNAs and subsequent comparison of the expression between subtypes of neoplasms led to the discovery of lncRNAs common to FA and WDTC (13 genes), common to WDTC (6 genes), common to classical and follicular variants of PTC (22 genes), and specific to FTC (19 genes), fvPTC (1 gene), clPTC (32 genes), and ATC (185 genes). The discovered lncRNAs were proposed to be involved in the development of clinical and morphological features of the studied subtypes. Putative biological processes involving common and specific lncRNAs were identified.

LncRNAs common to all studied thyroid neoplasms (including FA) and common to WDTC appear to be involved in carcinogenesis in different locations. L1CAM interactions found in this study to involve lncRNAs common to WDTC have been previously associated with well-described roles in tumour progression, metastases and the epithelial-to-mesenchymal transition24,25.

LncRNAs common to follicular and classical variants of papillary carcinoma (associated with papillary histology) are involved in aldehyde dehydrogenase (NAD) activity. Aldehyde dehydrogenase is known to maintain cancer stem cell properties in various cancers, including the thyroid26.

Biological processes involving lncRNAs specific to FTC include processes that are associated with splicing (Processing of Capped Intron-Containing Pre-mRNA, mRNA Splicing, and RNA processing). Accumulating evidence suggests that aberrant RNA splicing is a common and driving event in cancer development and progression. For instance, oncogenic Ras signalling via the ERK and PI3-K/Akt pathways regulates the phosphorylation of splicing factors such as SRSF1, SRSF7, and SPF45 and drives the switching of active and inactive states of tumour promoters and suppressors (MST1R, FAS, CD44, LBR, Casp-9, KLF6, and others) via alternative splicing27,28. None of the lncRNAs specific to FTC were differentially expressed compared to those in FA. The absence of lncRNAs differentially expressed in FTC and FA corresponds to the commonality of these subtypes and frequent difficulty in cytology-based differential diagnostics.

Only lncRNA ENSG00000257647 is specific to fvPTC, which might be explained by its intermediate morphology with features of both papillary and follicular carcinomas, leading to its debatable classification. LncRNA ENSG00000257647 appeared to be involved in WNT signalling, predominantly through the Beta-catenin-independent WNT pathway (especially, planar cell polarity that modulates cytoskeleton rearrangements through the activation of the small GTPases RhoA and Rac and their downstream effectors Rock and JNK). WNT signalling is known to play a crucial role in thyroid carcinogenesis, and several mechanisms of its deregulation have been described, including inhibition of the β-catenin degradation complex via its phosphorylation by RET/PTC, inhibition of E-cadherin expression through the MAPK/ERK pathway activated by BRAF mutations, and activation of both canonical and non-canonical Wnt pathways by RAS mutations29,30.

LncRNAs specific to clPTC are involved in extracellular matrix organization and endoderm and collagen formation. Extracellular matrix (ECM) disorganization is known to play a pivotal role in cancer initiation and progression. The major driving mutation in clPTC is BRAF p.V600E, and there is emerging evidence of ECM remodelling induced by BRAF p.V600E in PTCs31. Notably, it has been previously shown that the extracellular matrix of PTCs driven by BRAF p. V600E (but not mutant HRAS) is enriched with stromal-derived fibrillar collagen and facilitates cancer progression32.

lncRNAs specific to ATC are probably associated with its anaplastic features and aggressive behaviour. For these lncRNAs, there is a strong enrichment of cell cycle and mitotic pathways which possibly reflects the involvement of these lncRNAs in the loss of differentiation and high proliferation rate characteristic of ATC.

Of lncRNAs previously described in thyroid cancer, we found that PTCSC3 was downregulated in all investigated neoplasms, including FA; TNRC6C-AS1 was upregulated in papillary carcinomas; and PVT1 was specifically upregulated in ATC11,17,18. Other lncRNAs previously described in thyroid malignancy (BANCR, NAMA, CNALPTC1, FALEC, and PTCSC2) were not identified in our study10,12,13,16,20,21. A possible explanation is the strong association of these lncRNAs with specific mutations and the heterogeneity of driving mutations within the same subtype; for example, the aberrant expression of BANCR is driven by BRAF mutation.

The aberrant expression of lncRNAs with Ensembl annotation found by Liyanarachchi et al. (2016) in PTC was confirmed in our study14. Most of these lncRNAs were common to thyroid neoplasms (including FA) or common to classical and follicular variants of PTC. No lncRNAs found in our study to be subtype specific were discovered by Liyanarachchi.

In the present study, we identified some lncRNAs with known roles in tumorigenesis but not previously described in thyroid cancer33,34,35,36,37. The identified upregulated promoters of cancer progression included NR2F1-AS1 and LINC00511 in clPTC and CRNDE in ATC; downregulated tumour suppressors SLC26A4-AS1 in clPTC and RMST in ATC.

Conclusion

LncRNAs common to FA and WDTC, common to WDTC, common to carcinomas with papillary features, and specific to clPTC, fvPTC, FTC and ATC were discovered in the analysis performed with the most comprehensive datasets (combination of a microarray dataset and two RNA-Seq datasets). The similarity of the lncRNA landscapes in FTC and FA was revealed. The results showed that LncRNAs common to FA and WDTC and common to WDTC are involved in pathways in cancer at various sites, p53 signalling and L1CAM interactions; lncRNAs common to papillary carcinomas are involved in aldehyde dehydrogenase (NAD) activity; lncRNAs specific to FTC are involved in mRNA processing; a lncRNA specific to fvPTC is involved in planar cell polarity and WNT signalling; lncRNAs specific to clPTC are involved in extracellular matrix organization and endoderm formation; and lncRNAs specific to ATC are involved in the cell cycle and mitotic processes; and LncRNAs found to be specific to ATC, including CRNDE and RMST, are likely associated with cancer aggressiveness and cancer progression.

Materials and methods

Microarray datasets

The microarray datasets obtained from Affymetrix Human Genome U133 Plus 2.0 Array (Platform GPL570) were originally selected from the GEO database. The following datasets were included: GSE3467, GSE60542, GSE35570, GSE76039, GSE53157, GSE33630, GSE65144, and GSE29265. A total of 107 samples of normal tissue (NT) and 32 fvPTC, 48 clPTC, and 49 ATC samples were analysed. CEL files were downloaded, and normalization was performed using the gcrma R package. Microarray probes were annotated with Ensembl version 93 using the biomaRt package38.

RNA-Seq datasets

The RNA-Seq dataset PRJEB11591 of Yoo et al.39 was selected from the EBI European Nucleotide Archive database (https://www.ebi.ac.uk/ena/data/view/PRJEB11591). PRJEB11591 is the most comprehensive available RNA-Seq dataset containing benign and malignant thyroid neoplasms (FA, FTC, fvPTC, and clPTC). The PRJEB11591 samples included 81 NT, 26 FA, 30 FTC, 48 fvPTC and 77 clPTC samples. FASTQ files were downloaded, and alignment was performed by HISAT240. Counts were calculated using featureCounts (Rsubread package) with annotation by Ensembl version 93 and Ensemble gene ID for grouping attributes41. Genes with low counts (less than 2 counts in number of samples exceeding the size of the smallest sample group) were eliminated, and TMM normalization (edgeR package) and the voom method using the limma R package were applied.

In the TCGA transcriptome data, 58 NT, 356 clPTC and 101 fvPTC were selected. Samples of metastases and other minor histological subtypes were excluded. Raw counts (HTSeq-Counts Workflow Type, briefly, STAR 2-pass alignment followed by gene expression count assessment with HTSeq) were downloaded from Genomic Data Commons Data Portal (GDC, https://portal.gdc.cancer.gov/). Genes with low counts (less than 1 count in number of samples exceeding the size of smallest sample group) were eliminated, followed by TMM normalization (edgeR package) and voom analysis with limma42.

Selection of lncRNA genes

Protein-coding genes and genes attributed to Havana biotypes not related to lncRNAs were eliminated. Genes of the following Havana biotypes were included in the analysis: lincRNA, antisense, 3-prime overlapping ncRNA, bidirectional promoter lncRNA, misc RNA, processed transcript, sense intronic, and sense overlapping.

Statistical analysis

To identify differentially expressed lncRNAs, linear modelling using the limma package was performed43. Genes with FDR adjusted p value ≤ 0.01 and fold change (FC) ≥ 2.0 were considered to be differentially expressed. A heat map analysis of differentially expressed genes was performed using coolmap limma.

Validation

For clPTC and fvPTC, the sets of genes found to be significantly differentially expressed in a previous step in the microarray, RNA-Seq PRJEB11591, and RNA-Seq TCGA datasets were processed with intersection. Genes found in all three datasets and genes found in both RNA-Seq datasets but not in microarray probes were considered validated.

Selection of lncRNAs common and specific to histological subtypes

LncRNAs common and specific to FA and WDTC were selected via the intersection analysis of genes found to be significantly differentially expressed in each subtype compared to NT in the RNA-Seq dataset PRJEB11591 and subsequent application following criteria:

  • Common to FA and WDTC-confirmed through validation for clPTC and fvPTC;

  • Common to WDTC-confirmed through validation for clPTC and fvPTC, and significantly differentially expressed compared to FA;

  • Common to papillary carcinomas-confirmed through validation for clPTC and fvPTC, and significantly differentially expressed compared to FA and FTC;

  • Specific to clPTC, fvPTC, FTC-confirmed through validation for clPTC and fvPTC (not applied to FTC), and significantly differentially expressed compared to each studied subtype;

LncRNAs specific to ATC were selected from intersection with genes found in FA and WDTC with subsequent filtration of genes significantly differentially expressed compared to those in clPTC and fvPTC.

Evaluation of potential biological functions

To identify genes positively and negatively coexpressed with 5 most differentially expressed lncRNAs, pairwise Pearson correlation between the lncRNAs and all the genes was calculated using the RNA-Seq PRJEB11591 dataset (for FA, FTC, fvPTC and clPTC) and the microarray dataset (for ATC). Genes with an absolute r ≥ 0.7 and a significant correlation (p value < 0.05) were considered to be coexpressed. For coexpressed genes, enrichment of Gene Ontology (GO) Biological Process (2018), GO Molecular Function (2018), Kyoto Encyclopedia of Genes and Genomes (KEGG, 2019) and Reactome (2016) terms was estimated using Enrichr44,45. Terms with adjusted p values in Fisher's exact test ≤ 0.05 were considered significantly enriched42,43,44,45,46.