Non-coding RNAs in Various Stages of Liver Disease Leading to Hepatocellular Carcinoma: Differential Expression of miRNAs, piRNAs, lncRNAs, circRNAs, and sno/mt-RNAs

Hepatocellular carcinoma (HCC) was the fifth leading cause of cancer death in men and eighth leading cause of death in women in the United States in 2017. In our study, we sought to identify sncRNAs in various stages of development of HCC. We obtained publicly available small RNA-seq data derived from patients with cirrhosis (n = 14), low-grade dysplastic nodules (LGDN, n = 9), high grade dysplastic nodules (HGDN, n = 6), early hepatocellular carcinoma (eHCC, n = 6), and advanced hepatocellular carcinoma (HCC, n = 20), along with healthy liver tissue samples (n = 9). All samples were analyzed for various types of non-coding RNAs using PartekFlow software. We remapped small RNA-seq to miRBase to obtain differential expressions of miRNAs and found 87 in cirrhosis, 106 in LGDN, 59 in HGDN, 80 in eHCC, and 133 in HCC. Pathway analysis of miRNAs obtained from diseased samples compared to normal samples showed signaling pathways in the microRNA dependent EMT, CD44, and others. Additionally, we analyzed the data sets for piRNAs, lncRNAs, circRNAs, and sno/mt-RNAs. We validated the in silico data using human HCC samples with NanoString miRNA global expression. Our results suggest that publically available data is a valuable resource for sncRNA identification in HCC progression (FDR set to <0.05 for all samples) and that a data mining approach is useful for biomarker development.

. Differential expression of miRNAs in liver tissue samples: Differentially expressed miRNAs were quantified (FDR < 0.05) and a heatmap was prepared for each disease stage (Cirrhosis, Low-grade dysplastic nodule, Highgrade dysplastic nodule, Early stage Hepatocellular carcinoma, and Advanced stage Hepatocellular carcinoma) with healthy control samples (A-E). Enriched miRNAs for all stages of liver disease were summarized by a Venn diagram, which identified 37 miRNAs commonly expressed in all stages and 29 additional differentially expressed miRNAs were enriched in advanced hepatocellular carcinoma alone (F). (G) A Circos plot was prepared incorporating all differential expressions of miRNAs in cirrhosis, low-grade dysplastic nodule, high-grade dysplastic nodule, early hepatocellular carcinoma, and hepatocellular carcinoma tissue samples compared with healthy samples (FDR < 0.05). Chromosomes and bands were listed in the chromosomal positions of miRNAs affected expression in liver disease vs healthy samples. The innermost ring is for cirrhosis, followed by low-grade dysplastic nodule, high-grade dysplastic nodule, early hepatocellular carcinoma, and hepatocellular carcinoma. Darker and lighter background colors represent upregulated and downregulated genes respectively. We validated the miRNA sequence data using NanoString global miRNA expression assay. Data was exported using nSolver software.  HCC often originates in a focus of dysplasia within a cirrhotic regenerative nodule. With progression to more severe dysplasia, the lesion assumes increasingly malignant characteristics until it becomes frankly cancerous. Tumors may be well differentiated or poorly differentiated. As expected, poorly differentiated HCC has a more aggressive course and poorer outcomes than well differentiated HCC.
Scientific progress in next generation sequencing (NGS) has enhanced our understanding of biological systems by profiling whole transcriptomic expression at a molecular level 2 . Small RNAs are small non-coding RNAs (sncRNA) consisting of 17-250 nucleotides in length that perhaps play a crucial role in disease development 3 . A nearly comprehensive repertoire of various types of sncRNAs has been collected and analyzed including: microRNA (miRNAs, 17-22 nucleotides) 4 , piwi-interacting RNAs (piRNAs, 26-33 nucleotides) 5 , small nuclear/ nucleolar RNAs (sn/snoRNAs, 70-120 nucleotides), long non-coding RNAs (lncRNAs, more than 200 nucleotides) 6 , and circular RNAs (circRNA) 7 . This has led to great interest in revealing their role in transcriptional regulation. piRNAs are the largest class of the small non-coding RNA family and are implicated in epigenetic and post-transcriptional regulation but still lack functional characterization 8 . lncRNAs are a diverse class of RNAs, believed to have an important role in cellular mechanisms; however, little biological relevance has been established thus far 9 . circRNAs are a recently rediscovered class of non-coding RNAs, which were initially described as scrambled exons 10,11 . They are resistant to endonuclease treatment and are highly stable 12 . snoRNAs are perhaps the most ancient and highly conserved class of sncRNAs carrying out a fundamental role in modification and processing of ribosomal RNAs (rRNA), transfer RNAs (tRNA), and small nuclear RNAs (snRNA) 13 . Two well-known classes of snoRNAs are C/D snoRNAs and box H/ACA snoRNAs which primarily differ in sequence and structure 13 .
The present study focused on in-depth analysis of small RNA sequencing data obtained from cirrhosis, low-grade dysplastic nodules (LGDN), high-grade dysplastic nodules (HGDN), early stage hepatocellular carcinoma (eHCC), and advanced stage hepatocellular carcinoma (HCC). Tissue samples were compared to healthy liver tissue. We aimed to identify the differential signature and dysregulated expression of small non-coding RNAs. We further identified the most predominantly expressed non-coding RNAs and molecular pathways that could serve as biomarkers during malignant transformation.

Materials and Methods
Read Alignment and Annotations. Healthy (BioProject: PRJNA266511) and diseased liver (BioProject: PRJEB11462) small RNA sequencing raw sample datasets 14 2,16 . miRBase version 20 (http://www.mirbase.org/), which contains more than 1900 high confidence miRNAs 17 was used for annotation. piRNA data was generated and annotated from piRBase (http:// regulatoryrna.org/database/piRNA), which is manually curated with a focus on functional analysis 18 . lncRNAs were quantified using reference annotation LNCipedia (http://www.lncipedia.org) version 3.1, downloaded from all coordinates relative to the hg19 reference genome 19 . circRNA database (http://www.circbase.org) contains thousands of circRNAs and annotation was download and quantified 20 . Total small RNA (including miRNA, piRNA, snRNA, snoRNA, mt-RNA, tRF3, tRF5, tRNA, and rRNA) was annotated using Gencode version 26 (www.gencodegenes.org) 21 , which provides comprehensive information on human sncRNAs. Transcript abundances were determined and expression levels were represented using normalized reads per million (RPM) values. All small RNAs with expression RPM values > 1 in 100% of the samples were considered robustly expressed and used for further analysis. Expression matrices were compared among clinicopathological features including cirrhosis, low-grade dysplastic nodules, high-grade dysplastic nodules, early stage HCC, and advanced stage HCC samples 2 . Statistical analyses were carried out using the non-parametric Mann-Whitney U test followed by false discovery rate (FDR) correction through the Benjamini-Hochberg method. A default FDR < 0.05 was considered statistically significant 22 with a log 2 -fold change more than 1. Circos plots 23 were generated for differential expression of all small RNAs in various stages of liver disease.    Ethics. Data presented in the study was downloaded from the NIH data sets (BioProjects: PRJNA266511 and PRJEB11462). We haven't recruited any human subjects in this study (Not applicable).

Validation of Differentially Expressed miRNAs in HCC Clinical Samples.
In order to validate in silico miRNA sequencing data from a publically available source, we used human HCC specimens compared to healthy liver tissues by using an absolute quantification miRNA assay from NanoString technologies, which screens for more than 800 human miRNAs, validated to an independent cohort of tissue samples. In our analysis, we found more than 274 miRNAs were differentially expressed by more than a two fold change compared to healthy liver samples. We then exported the data using fold change (up and down regulated miRNAs) and compared miRNA-seq differentially expressed miRNAs in the HCC group using FunRich software. We created  Table 7. Differentially expressed piRNAs in cirrhosis vs healthy patient's tissue samples (FDR < 0.05).

Figure 2.
Enrichment analysis of microRNA for Pathway Maps, Gene Ontology, Disease by Biomarker and Network processes in liver diseases. Pathway analysis was carried out via MetaCore software, differentially expressed miRNA data for Cirrhosis, Low-grade dysplastic nodule, High-grade dysplastic nodule, Early stage Hepatocellular carcinoma, and Advanced stage Hepatocellular carcinoma were uploaded to MetaCore server and the most significantly affected pathways were created using comparative enrichment analysis. The gene content were aligned between all listed experiments above. The intersection set of experiments is defined as "common" and marked as a blue/white striped bar. The unique genes for the experiments are marked as colored bars. The genes from the "similar" set are present in all but one (any) file. The parameters for comparison are set as above. Enrichment analysis consists of matching gene IDs of possible targets for the "common", "similar" and "unique" sets with gene IDs in functional ontologies in MetaCore. The probability of a random intersection between set IDs the size of the target list with ontology entities estimated in p-value of hypergeometric intersection. The lower p-value means higher relevance of the entity to the dataset, which shows in a higher rating for the entity (A) there is a unique signature in advanced hepatocellular carcinoma. (B) Pathway Maps: Comparative and enrichment analysis of disease by biomarkers shown in various disease stages, significantly affected miRNAs in liver disease included various carcinoma pathways. Disease folders were organized into a hierarchical tree. Gene content may vary greatly between such complex diseases as cancers and some Mendelian diseases. In addition, coverage of different diseases in the literature were skewed. These two factors may affect p-value prioritization for diseases. (C) Biological network analysis: Differentially expressed miRNA data were analyzed for the biological networks involved in liver disease, we presented a list of top score networks in common, similar, and unique groups.
Comparative and enrichment pathway analysis showed most of the miRNAs enriched in various disease stages were involved the oncogenic pathways (The results were obtained using MetaCore pathways analysis tool; GeneGo/Thomson Reuters). Top five common pathways were listed (B1-B5). Experimental data was visualized on the maps as blue (for downregulation) and red (upregulation) histograms. The height of the histogram corresponds to the relative expression value for a particular gene/protein (Pathway maps were obtained from MetaCore pathways analysis tool; GeneGo/Thomson Reuters).
a grouped heatmap for the top 20 upregulated and downregulated miRNAs (Fig. 1I) and compared differentially expressed genes between the miRNA-seq and NanoString miRNA assays (Fig. 1H). We identified 21 miRNAs using a Venn diagram with 11 of them showing similar trends as the sequenced data (Table 6). Five miRNAs were    upregulated (miR-130b, miR-182, miR10b, miR320a, and miR769) ranging from a 2.9 to 30-fold induction in the NanoString miRNA assay, whereas six miRNAs were downregulated (miR122, miR451a, miR200a, miR139, miR148a, and miR375) ranging from −2.2 to −6 fold (Table 6). Following miRNA expression analysis, we used MetaCore pathway software to analyze the possible signaling pathways affected and enriched microRNAs role in cirrhosis and HCC pathogenesis. We uploaded all the differentially   Processed data for comparative enrichment analysis showed a three way representation (common-high significance in all groups, similar-similarly enriched in all groups, and unique-various pathways specifically expressed in the individual groups). We found 112 pathways to be common, 177 similar, and a number unique to each group. 49 pathways were unique to advanced HCC data sets ( Fig. 2A). Pathway Maps were utilized for comparative and enrichment analysis of differentially expressed miRNAs for biological pathways analysis (The results were obtained using MetaCore pathways analysis tool; GeneGo/Thomson Reuters). Diseased samples clearly showed involvement of signaling molecules in microRNA dependent EMT, CD44 signaling, and others (Fig. 2B). Gene Ontology (GO) Processes analysis revealed that most affected miRNAs were involved in the cellular response to amino acid and chemical stimulus, with HCC being the most affected group (Fig. 3A). Many cancer-related categories were over-represented via Disease Stages by Biomarkers analysis (Fig. 3B). Disease Stages by Biomarkers analysis found the most affected miRNAs were involved in cancer signaling pathways including renal, non-small cell lung, bronchogenic, and hepatocellular cancers (Fig. 3A). Biological network analysis of miRNAs involved in the highest affected network processes are listed in Fig. 4A with the top three commonly regulated networks in all five groups (Fig. 4A), top three similarly affected networks (Fig. 4B), and top uniquely regulated biological network pathways in liver disease shown in Fig. 4C Table 11). The top five differentially upregulated piRNAs in cirrhosis ( Large data from all five groups was placed in a Venn diagram (Fig. 5F), which showed, 30 piRs were commonly expressed in all groups; whereas six in cirrhosis, seven in HGDN, and 52 in HCC were uniquely expressed (Fig. 5F). Interestingly, we could not determine piRs linked specifically to LGDN and eHCC. There were several  piRs commonly expressed between groups; specifically, 14 in LGDN and HCC, 10 in cirrhosis and HCC, 11 in cirrhosis and eHCC, and eight in cirrhosis, HGDN, eHCC, and HCC (Fig. 5F). We also prepared a Circos Plot for comprehensive visualization of differentially expressed piRs, with chromosome number and location (Fig. 5G). After, plugging the data into Circos Plot, we found chromosomes 1, 2, 5, 6, and 22 were the most enriched chromosomes in all five groups (Fig. 5G).
Differentially expressed lncRNAs in all five groups were listed in a Venn diagram (Fig. 6F), which showed, 109 lncRNAs were commonly expressed in all groups; whereas, 16 in cirrhosis, 3 in LGDN, 1 in HGDN, 39 in eHCC, and 3 in HCC were uniquely expressed (Fig. 6F). Several lncRNAs were commonly expressed between groups.  Specifically, 19 in LGDN and eHCC; 10 in cirrhosis, LGDN, and eHCC; and 6 in LGDN, HGDN, eHCC, and HCC (Fig. 6F). We also organized a Circos Plot for comprehensive visualization of differentially expressed lncR-NAs in the five groups with chromosome number and location (Fig. 6G). After plugging the data into Circos Plot, we found chromosomes 1, 2, 9, 13, 17, and X were the most enriched chromosomes in all five groups (Fig. 6G).

Differential Expression of sno/mt-RNA in Cirrhosis, LGDN, HGDN, eHCC, and HCC Tissue
Samples. Small nuclear RNAs form a class of RNA molecules that localize within the nucleus of eukaryotic cells 28 . Their primary function is pre-mRNA processing, for which they are always associated with a set of specific proteins. These complexes are referred to as small nuclear ribonucleoproteins (snRNP). The small nucleolar RNAs (snoRNAs) are another subclass of snRNA that localize in the nucleolus and are associated with the maturation of RNA molecules through chemical modifications targeting mainly rRNAs, tRNAs, and snRNAs 28 . We remapped aligned reads to GenCode v26 database, which contains most of the curated small RNAs to identify differential expression in cirrhosis, LGDN, HGDN, eHCC, and HCC ( Fig. 8A-E, Tables 20-24). We found 10 sno/mt-RNAs in cirrhosis (Fig. 8A, Table 20), 15 sno/mt-RNAs in LGDN ( Fig. 8B and Table 21), nine sno/mt-RNAs in HGDN (Fig 8C and Table 22), six sno/mt-RNAs in eHCC ( Fig. 8D and Table 23), and 16 sno/mt-RNAs in HCC ( Fig. 8E and Table 24). Most of the mitochondrial RNAs were upregulated in HCC and all four snoRNAs were downregulated. All five groups had downregulated snoRD121B, whereas snoRD121A was downregulated only in HCC samples.
Differentially expressed data for all five groups is shown in a Venn diagram (Fig. 8F), that demonstrated five sno/mt-RNAs commonly expressed in all groups whereas, four in cirrhosis; none in LGDN, HGDN, or eHCC; and one in HCC uniquely expressed (Fig. 8F). There were several sno/mt-RNAs commonly expressed between groups: six in LGDN and HCC; and one in LGDN, HGDN, and HCC (Fig. 8F). We also prepared a Circos Plot for comprehensive visualization of differentially expressed sno/mt-RNAs in the five groups (Fig. 8G). Chromosomes 9, 17, and 19 were the most enriched (Fig. 8F).

Discussion
We have identified various non-coding RNAs in liver disease samples, which can be used for validation and development of novel therapeutics for HCC. miR-101 regulates proliferation, migration, and invasion in various cancers [29][30][31][32] ; suggesting importance in the ordered transformation from normal to malignant phenotype. This was also suggested in our study as miR-101 was continually overexpressed in all disease stages when compared to normal liver tissue. miR-22 is considered to have tumor suppressor activity; however, in our study, it showed remarkable overexpression in HCC 33 . miR-23a has been reported to downregulate expression of interferon regulatory factor-1 in HCC 34 . Accordingly, it was overexpressed in the eHCC samples we analyzed. miR-7704 was found to be highly overexpressed in HCC when compared to cirrhosis (~400-fold change).
Current therapies and targeted strategies are well documented in the literature; however, more research is needed to identify crucial players in early disease development and treatment targets. Deregulation of various pathways such as p53, RAS/MAPK, PI3K/AKT/mTOR, WNT, MET, MYC, and TGF-beta are involved in oncogenesis 35 . Interestingly, miR-122 affects all of these pathways while also targeting CUTL1 transcriptional repression 35,36 , leading to apoptosis and cell cycle arrest. Accordingly, miR-122 is downregulated in more than 70% of cancers, suggesting a crucial role in oncologic transformation. miR-122 appears to act as a tumor suppressor and its downregulation in our study may be pertinent to an ordered progression from normal liver to HCC phenotype. In our analysis, miR-122 was downregulated across all disease stages in both sequenced data and clinical samples (Fig. 1H).
The Lethal-7 (Let-7) family of miRNAs is present in multiple copies in the genome and highly conserved 37 . These miRNAs are known to act as tumor suppressors and prevent angiogenesis. The Let-7 family has 10 members in the human genome (Let-7a, let-7b, let-7c, let-7d, let-7e, let-7f, let-7g, let-7i, miR-98, and miR-202 38 ), which are involved in gene regulation and cell adhesion. We found, Let-7f microRNAs were strikingly downregulated in    . Differential expression of circular RNAs (circRNAs) in liver tissue samples: Differentially expressed circRNAs were quantified and a heatmap view was prepared (FDR < 0.05) for each disease stage (Cirrhosis, Low-grade dysplastic nodule, High-grade dysplastic nodule, Early stage Hepatocellular carcinoma and Advanced stage Hepatocellular carcinoma) with healthy control samples (A-E). All stages of liver disease enriched circRNAs were summarized by a Venn diagram, which identified 41 circRNAs commonly expressed in all stages and 11 circRNAs were specifically enriched in advanced hepatocellular carcinoma (F). (G) A Circos plot was prepared incorporating all differential expressions of circRNAs in cirrhosis, low-grade dysplastic nodule, high-grade dysplastic nodule, early hepatocellular carcinoma,and hepatocellular carcinoma tissue samples compared with healthy samples (FDR < 0.05). Chromosome and bands were listed in chromosomal positions of circRNAs affected expression in liver disease vs healthy samples. The Innermost ring is cirrhosis, followed by low-grade dysplastic nodule, high-grade dysplastic nodule, early hepatocellular carcinoma, and hepatocellular carcinoma with darker and lighter background colors representing upregulated and downregulated genes respectively. in HCC. miR200b was commonly downregulated in HGDN, eHCC, and HCC (<3.5-fold). Six piRNAs were affected only in cirrhosis, seven in HGDN, and 52 in HCC. 16 lncRNAs were affected in cirrhosis, three in LGDN (lnc-C17orf51-5:1, lnc-KDM4C-18:1, and SNHG1:57), one in HGDN (lnc-REG3G-6:1), 39 in eHCC, and three in HCC (LINC01021:16, lnc-MBNL2-3:1, and SNHG1:60). Six circRNAs in cirrhosis and 11 in HCC were specifically dysregulated. Four sno/mt-RNAs were dysregulated in cirrhosis only and one in HCC (snoRD121A). Our results summarize that multiple differentially expressed sncRNAs were identifided in concordance with disease progression, which may provide a basis for future study and clues to understanding HCC pathogenesis. Although we have a substantial sequencing dataset, we were limited in miRNA validation by a small tissue sample size. Additionally, limited data on disease etiology prevented further analysis. Nevertheless, we believe that our study uncovers the potential of data mining for sncRNA identification with regards to HCC. It is also appreciated that further research is needed to address the functional validation of sncRNA signatures and their relevance.