Introduction

Hepatocellular carcinoma (HCC) was the fifth leading cause of cancer death in men and eighth leading cause of death in women in the United States in 2017. The incidence of HCC has been increasing every year, with an estimated 40,710 new cases in 20171. In addition, death rates are increasing by 3% every year, with an estimated 28,920 deaths in 20171. These statistics suggest more research needs to be done towards understanding the biology, diagnosis, and prevention. Primary liver cancer, or HCC can be triggered by ongoing inflammation from pathologies such as cirrhosis, dysplastic nodule formation, viral infections (i.e. viral hepatitis), and NASH. Cirrhosis is a condition in which the liver becomes scarred secondary to repeated insults.

HCC often originates in a focus of dysplasia within a cirrhotic regenerative nodule. With progression to more severe dysplasia, the lesion assumes increasingly malignant characteristics until it becomes frankly cancerous. Tumors may be well differentiated or poorly differentiated. As expected, poorly differentiated HCC has a more aggressive course and poorer outcomes than well differentiated HCC.

Scientific progress in next generation sequencing (NGS) has enhanced our understanding of biological systems by profiling whole transcriptomic expression at a molecular level2. Small RNAs are small non-coding RNAs (sncRNA) consisting of 17–250 nucleotides in length that perhaps play a crucial role in disease development3. A nearly comprehensive repertoire of various types of sncRNAs has been collected and analyzed including: microRNA (miRNAs, 17–22 nucleotides)4, piwi-interacting RNAs (piRNAs, 26–33 nucleotides)5, small nuclear/nucleolar RNAs (sn/snoRNAs, 70–120 nucleotides), long non-coding RNAs (lncRNAs, more than 200 nucleotides)6, and circular RNAs (circRNA)7. This has led to great interest in revealing their role in transcriptional regulation. piRNAs are the largest class of the small non-coding RNA family and are implicated in epigenetic and post-transcriptional regulation but still lack functional characterization8. lncRNAs are a diverse class of RNAs, believed to have an important role in cellular mechanisms; however, little biological relevance has been established thus far9. circRNAs are a recently rediscovered class of non-coding RNAs, which were initially described as scrambled exons10,11. They are resistant to endonuclease treatment and are highly stable12. snoRNAs are perhaps the most ancient and highly conserved class of sncRNAs carrying out a fundamental role in modification and processing of ribosomal RNAs (rRNA), transfer RNAs (tRNA), and small nuclear RNAs (snRNA)13. Two well-known classes of snoRNAs are C/D snoRNAs and box H/ACA snoRNAs which primarily differ in sequence and structure13.

The present study focused on in-depth analysis of small RNA sequencing data obtained from cirrhosis, low-grade dysplastic nodules (LGDN), high-grade dysplastic nodules (HGDN), early stage hepatocellular carcinoma (eHCC), and advanced stage hepatocellular carcinoma (HCC). Tissue samples were compared to healthy liver tissue. We aimed to identify the differential signature and dysregulated expression of small non-coding RNAs. We further identified the most predominantly expressed non-coding RNAs and molecular pathways that could serve as biomarkers during malignant transformation.

Materials and Methods

Read Alignment and Annotations

Healthy (BioProject: PRJNA266511) and diseased liver (BioProject: PRJEB11462) small RNA sequencing raw sample datasets14 were downloaded from NIH short read archive (SRA). The datasets contained 14 cirrhosis, 9 low-grade dysplastic nodules, 6 high-grade dysplastic nodules, 6 early HCC, and 20 advanced HCC samples along with 9 healthy liver tissue samples (normal clinical assessment and normal liver enzymes). Downloaded SRA files were converted to FASTQ files using the SRA toolkit version 2.5.715. The FASTQ files were uploaded to PartekFlow® software, version 6.0 (Partek, Inc., St. Louis, MO) on a Linux based High Performance Computing system at Pennsylvania State University College of Medicine, adapter-trimmed, and remapped to human genome hg19 using BWA-0.7.12 aligner (BWA-MEM) with a few modifications (mismatch penalty 2, gap open penalty 6, clipping penalty 4, and alignment score cutoff 15) for short read mapping2,16. miRBase version 20 (http://www.mirbase.org/), which contains more than 1900 high confidence miRNAs17 was used for annotation. piRNA data was generated and annotated from piRBase (http://regulatoryrna.org/database/piRNA), which is manually curated with a focus on functional analysis18. lncRNAs were quantified using reference annotation LNCipedia (http://www.lncipedia.org) version 3.1, downloaded from all coordinates relative to the hg19 reference genome19. circRNA database (http://www.circbase.org) contains thousands of circRNAs and annotation was download and quantified20. Total small RNA (including miRNA, piRNA, snRNA, snoRNA, mt-RNA, tRF3, tRF5, tRNA, and rRNA) was annotated using Gencode version 26 (www.gencodegenes.org)21, which provides comprehensive information on human sncRNAs. Transcript abundances were determined and expression levels were represented using normalized reads per million (RPM) values. All small RNAs with expression RPM values > 1 in 100% of the samples were considered robustly expressed and used for further analysis. Expression matrices were compared among clinicopathological features including cirrhosis, low-grade dysplastic nodules, high-grade dysplastic nodules, early stage HCC, and advanced stage HCC samples2. Statistical analyses were carried out using the non-parametric Mann-Whitney U test followed by false discovery rate (FDR) correction through the Benjamini-Hochberg method. A default FDR < 0.05 was considered statistically significant22 with a log2-fold change more than 1. Circos plots23 were generated for differential expression of all small RNAs in various stages of liver disease.

HCC and Normal Liver Tissue Samples

Hepatocellular carcinoma (n = 3; moderately to poorly differentiated) and normal liver tissue (n = 3; adenoma) frozen samples were obtained from the Institute for Personalized Medicine (IPM) at Penn State College of Medicine, Hershey, PA, after written approval. Total RNA was extracted using a Direct-zol™ RNA Kit (Zymo Research, cat#: R2051) according to the manufacturer’s instructions. The extracted RNAs were quantified and quality checked using a NanoDrop 1000 Spectrophotometer (Thermo Fisher Scientific, Waltham, MA) and a BioAnalyzer RNA 6000 Nano Kit (Agilent Technologies, Santa Clara, CA).

MicroRNA Profiling Using NanoString nCounter miRNA Assays

Total RNA samples were analyzed according to manufacturer’s instructions for the nCounter miRNA Expression Assay kit (NanoString Technologies®, Seattle, WA). Briefly, 100 ng of each sample total RNA was used for nCounter Human miRNA sample preparation. Hybridization was conducted for 16 h at 65°C. Subsequently, probes were purified and counted on the nCounter Prep Station. Each sample was scanned for 600 FOV (fields of view) on the nCounter Digital Analyzer. Data was extracted using the nCounter RCC Collector.

NanoString nCounter miRNA Data Analysis

For platform validation using synthetic oligonucleotides, NanoString nCounter miRNA raw data was normalized for lane-to-lane variation with a dilution series of six spike-in positive controls using nSolver v4.0 software (www.nanostring.com/products/nSolver). The sum of the six positive controls for a given lane were divided by the average sum across lanes to yield a normalization factor, which was then multiplied by the raw counts in each lane to obtain normalized values. For each sample, the mean plus two times the standard deviation of the eight negative controls were subtracted from each miRNA count in that sample. Only miRNAs with non-negative counts across all samples were retained for downstream analysis. The relative miRNA levels were indicated as median fold changes and a cutoff of two fold-change (up or down) was used24. A Venn diagram was prepared using FunRich 3.1.3 open source software25,26.

Biological Processes and Gene Network Visualization by MetaCore

Biological pathway interactions of microRNA expressions were analyzed using MetaCore pathway analysis of differentially expressed genes (Thomson Reuters, New York, NY)2,16,27 with p < 0.05 and greater than two-fold change. We performed multiple comparative analysis and enrichment analysis on all five stages of liver disease. Functional gene networks were built based on differentially regulated miRNA gene lists as input to generate disease biomarkers and Gene Ontology terms (Data analyzed by Gene Arrays, Entity of Vedic Research, Inc., New York)2,16,27.

Statistical Analysis

Paired student’s t test was used to compare diease stage vs healthy samples, with an FDR < 0.05 considered statistically significant and log2-fold change greater than one. Furthermore, the Benjamini and Hochberg multiple testing adjustment method was applied for all small RNA sequencing studies and an p-value < 0.05 with fold change greater than two for pathway analysis.

Ethics

Data presented in the study was downloaded from the NIH data sets (BioProjects: PRJNA266511 and PRJEB11462). We haven’t recruited any human subjects in this study (Not applicable).

Results

Differential Expression of miRNA in Cirrhosis, LGDN, HGDN, eHCC, and HCC Tissue Samples

We first annotated the datasets to microRNA analysis. Using stringent statistics (FDR < 0.05) with filters set at a log2-fold change greater than one and total minimum reads > 500, expression data was visualized (Fig. 1A–E, Tables 15). Our analysis found 87 miRNAs differentially expressed in cirrhosis compared to normal liver tissue (15 upregulated and 72 downregulated; Fig. 1A, Table 1), 106 miRNAs in LGDN (46 upregulated and 60 downregulated; Fig. 1B, Table 2), 59 miRNAs in HGDN (18 upregulated and 41 downregulated; Fig. 1C, Table 3), 80 miRNAs in eHCC (12 upregulated and 68 downregulated; Fig. 1D, Table 4) and 133 miRNAs in HCC (64 upregulated and 69 downregulated; Fig. 1E, Table 5). The top five differentially upregulated miRNAs in cirrhosis (Table 1) were: miR-7704 (403-fold), miR-22 (143-fold), miR-101 (113-fold), miR-486 (75-fold), and miR-192 (32-fold). The top five downregulated were: miR-122 (312-fold), Let-7g (204-fold), miR-103a (83-fold), miR-532 (79-fold), and miR-451a (62-fold). The top five differentially upregulated miRNAs in LGDN (Table 2) were: miR-141 (625-fold), miR-101 (208-fold), miR-22 (111-fold), miR-16 (61-fold), and miR-486 (35-fold); whereas, the top five downregulated were: miR-451a (513-fold), miR-378c (104-fold), miR-361 (95-fold), miR-122 (81-fold), and miR-30c (78-fold). The top five differentially upregulated miRNAs in HGDN (Table 3) were: miR-101 (266-fold), miR-22 (170-fold), miR-16 (54-fold), miR-192 (45-fold), and miR-19b (34-fold). The top five downregulated were: miR-26b (1 million-fold), miR-20a (1 million-fold), Let-7f (1 million-fold), miR-22–3p (1 million-fold), and Let-7c (364-fold). The top five differentially upregulated miRNAs in eHCC (Table 4) were: miR-101 (215-fold), miR-22 (94-fold), miR-10b (34-fold), miR-19b (34-fold), and miR-192 (29-fold). The top five downregulated were: miR-20a (1 million-fold), miR-22–3p (1 million-fold), miR-26b (1 million-fold), Let-7f (1 million-fold), and miR-30c (3545-fold). The top five differentially upregulated miRNAs in HCC (Table 5) were: miR-142 (1 million-fold), miR-7704 (257-fold), miR-101 (147-fold), miR-23a (124-fold), and miR-22 (85-fold); whereas, the top five downregulated were: miR-122 (513-fold), Let-7g (358-fold), miR-378c (187-fold), miR-185 (68-fold), and miR-451a (58-fold).

Figure 1
figure 1

Differential expression of miRNAs in liver tissue samples: Differentially expressed miRNAs were quantified (FDR < 0.05) and a heatmap was prepared for each disease stage (Cirrhosis, Low-grade dysplastic nodule, High-grade dysplastic nodule, Early stage Hepatocellular carcinoma, and Advanced stage Hepatocellular carcinoma) with healthy control samples (AE). Enriched miRNAs for all stages of liver disease were summarized by a Venn diagram, which identified 37 miRNAs commonly expressed in all stages and 29 additional differentially expressed miRNAs were enriched in advanced hepatocellular carcinoma alone (F). (G) A Circos plot was prepared incorporating all differential expressions of miRNAs in cirrhosis, low-grade dysplastic nodule, high-grade dysplastic nodule, early hepatocellular carcinoma, and hepatocellular carcinoma tissue samples compared with healthy samples (FDR < 0.05). Chromosomes and bands were listed in the chromosomal positions of miRNAs affected expression in liver disease vs healthy samples. The innermost ring is for cirrhosis, followed by low-grade dysplastic nodule, high-grade dysplastic nodule, early hepatocellular carcinoma, and hepatocellular carcinoma. Darker and lighter background colors represent upregulated and downregulated genes respectively. We validated the miRNA sequence data using NanoString global miRNA expression assay. Data was exported using nSolver software. (H) A Venn diagram showing miRNA overlap between HCC miRNA-seq data and HCC-NanoString validation data. (I) A heat map illustrating the top 20 up- and down- regulated miRNAs in the HCC-NanoString validation data.

Table 1 Differentially expressed miRNAs in cirrhosis vs healthy patient’s tissue samples (FDR < 0.05).
Table 2 Differentially expressed miRNAs in low-grade dysplastic nodule vs healthy patient’s tissue samples (FDR < 0.05).
Table 3 Differentially expressed miRNAs in high-grade dysplastic nodule vs healthy patient’s tissue samples (FDR < 0.05).
Table 4 Differentially expressed miRNAs in early hepatocellular carcinoma vs healthy patient’s tissue samples (FDR < 0.05).
Table 5 Differentially expressed miRNAs in hepatocellular carcinoma vs healthy patient’s tissue samples (FDR < 0.05).

Broadly visualized data for all five groups was represented in a Venn diagram (Fig. 1F), which showed 37 miRs are commonly expressed in all groups whereas, 16 miRs in cirrhosis, four miRs in LGDN, none in HGDN, one miR in eHCC, and 29 miRs in HCC are uniquely expressed (Fig. 1F). Several miRs expression patterns were common between groups, specifically 24 miRs in LGDN and HCC and 18 miRs in cirrhosis, LGDN, eHCC, and HCC (Fig. 1F). Circos Plots were prepared for comprehensive visualization of differentially expressed miRs in all five groups, including chromosome number and location (Fig. 1G). Black lines inside respective rings show the affected gene location and chromosome number. After plugging the data into Circos Plot, we found chromosomes 1, 2, 9, 13, and 17 were the most enriched chromosomes in all five groups (Fig. 1G).

Validation of Differentially Expressed miRNAs in HCC Clinical Samples

In order to validate in silico miRNA sequencing data from a publically available source, we used human HCC specimens compared to healthy liver tissues by using an absolute quantification miRNA assay from NanoString technologies, which screens for more than 800 human miRNAs, validated to an independent cohort of tissue samples. In our analysis, we found more than 274 miRNAs were differentially expressed by more than a two fold change compared to healthy liver samples. We then exported the data using fold change (up and down regulated miRNAs) and compared miRNA-seq differentially expressed miRNAs in the HCC group using FunRich software. We created a grouped heatmap for the top 20 upregulated and downregulated miRNAs (Fig. 1I) and compared differentially expressed genes between the miRNA-seq and NanoString miRNA assays (Fig. 1H). We identified 21 miRNAs using a Venn diagram with 11 of them showing similar trends as the sequenced data (Table 6). Five miRNAs were upregulated (miR-130b, miR-182, miR10b, miR320a, and miR769) ranging from a 2.9 to 30-fold induction in the NanoString miRNA assay, whereas six miRNAs were downregulated (miR122, miR451a, miR200a, miR139, miR148a, and miR375) ranging from −2.2 to −6 fold (Table 6).

Table 6 Validation of Differentially Expressed miRNAs in HCC miRNA sequence data and HCC live tissue Samples.

Following miRNA expression analysis, we used MetaCore pathway software to analyze the possible signaling pathways affected and enriched microRNAs role in cirrhosis and HCC pathogenesis. We uploaded all the differentially expressed miRNAs to the MetaCore server for comparative and gene enrichment analysis (Figs 2, 3 and 4). Processed data for comparative enrichment analysis showed a three way representation (common-high significance in all groups, similar-similarly enriched in all groups, and unique-various pathways specifically expressed in the individual groups). We found 112 pathways to be common, 177 similar, and a number unique to each group. 49 pathways were unique to advanced HCC data sets (Fig. 2A). Pathway Maps were utilized for comparative and enrichment analysis of differentially expressed miRNAs for biological pathways analysis (The results were obtained using MetaCore pathways analysis tool; GeneGo/Thomson Reuters). Diseased samples clearly showed involvement of signaling molecules in microRNA dependent EMT, CD44 signaling, and others (Fig. 2B). Gene Ontology (GO) Processes analysis revealed that most affected miRNAs were involved in the cellular response to amino acid and chemical stimulus, with HCC being the most affected group (Fig. 3A). Many cancer-related categories were over-represented via Disease Stages by Biomarkers analysis (Fig. 3B). Disease Stages by Biomarkers analysis found the most affected miRNAs were involved in cancer signaling pathways including renal, non-small cell lung, bronchogenic, and hepatocellular cancers (Fig. 3A). Biological network analysis of miRNAs involved in the highest affected network processes are listed in Fig. 4A with the top three commonly regulated networks in all five groups (Fig. 4A), top three similarly affected networks (Fig. 4B), and top uniquely regulated biological network pathways in liver disease shown in Fig. 4C–F.

Figure 2
figure 2

Enrichment analysis of microRNA for Pathway Maps, Gene Ontology, Disease by Biomarker and Network processes in liver diseases. Pathway analysis was carried out via MetaCore software, differentially expressed miRNA data for Cirrhosis, Low-grade dysplastic nodule, High-grade dysplastic nodule, Early stage Hepatocellular carcinoma, and Advanced stage Hepatocellular carcinoma were uploaded to MetaCore server and the most significantly affected pathways were created using comparative enrichment analysis. The gene content were aligned between all listed experiments above. The intersection set of experiments is defined as “common” and marked as a blue/white striped bar. The unique genes for the experiments are marked as colored bars. The genes from the “similar” set are present in all but one (any) file. The parameters for comparison are set as above. Enrichment analysis consists of matching gene IDs of possible targets for the “common”, “similar” and “unique” sets with gene IDs in functional ontologies in MetaCore. The probability of a random intersection between set IDs the size of the target list with ontology entities estimated in p-value of hypergeometric intersection. The lower p-value means higher relevance of the entity to the dataset, which shows in a higher rating for the entity (A) there is a unique signature in advanced hepatocellular carcinoma. (B) Pathway Maps: Comparative and enrichment pathway analysis showed most of the miRNAs enriched in various disease stages were involved the oncogenic pathways (The results were obtained using MetaCore pathways analysis tool; GeneGo/Thomson Reuters). Top five common pathways were listed (B1–B5). Experimental data was visualized on the maps as blue (for downregulation) and red (upregulation) histograms. The height of the histogram corresponds to the relative expression value for a particular gene/protein (Pathway maps were obtained from MetaCore pathways analysis tool; GeneGo/Thomson Reuters).

Figure 3
figure 3

Enrichment analysis of microRNA for Gene Ontology in liver diseases. (A) GO Biological Processes: Comparative and enrichment analysis of GO processes shown in various disease stages. (B) Disease status: Comparative and enrichment analysis of disease by biomarkers shown in various disease stages, significantly affected miRNAs in liver disease included various carcinoma pathways. Disease folders were organized into a hierarchical tree. Gene content may vary greatly between such complex diseases as cancers and some Mendelian diseases. In addition, coverage of different diseases in the literature were skewed. These two factors may affect p-value prioritization for diseases. (C) Biological network analysis: Differentially expressed miRNA data were analyzed for the biological networks involved in liver disease, we presented a list of top score networks in common, similar, and unique groups.

Figure 4
figure 4

Enrichment analysis of microRNA for Network processes in liver diseases. (AF) Top biological networks: The top three networks that are involved in Cirrhosis, Low-grade dysplastic nodule, High-grade dysplastic nodule, Early stage Hepatocellular carcinoma and Advanced stage Hepatocellular carcinoma with common, similar, and unique groups. This is a variant of the shortest paths algorithm with main parameters of enrichment. Enriched miRNAs were prioritized based on the number of fragments of canonical pathways on the networks. Up-regulated genes were marked with red circles and down-regulated genes with blue circles. The ‘checkerboard’ color indicates mixed expression for the gene between files or multiple tags for the same gene.

Differential Expression of piRNA in Cirrhosis, LGDN, HGDN, eHCC, and HCC Tissue Samples

piRNAs are the largest class of endogenous non-coding RNAs. They have recently been shown to play important biological roles as RNA silencers; forming RNA-protein complexes that are required for both epigenetic and post-transcriptional gene silencing in germ line cells2,28. We used piRNA annotation to identify piRNA differential expression in cirrhosis, LGDN, HGDN, eHCC, and HCC groups (Fig. 5A–E, Tables 610). We found 75 piRNAs associated with cirrhosis (16 upregulated and 59 downregulated; Fig. 5A, Table 7), 60 piRNAs in LGDN (28 upregulated and 32 downregulated; Fig. 5B, Table 8), 49 piRNAs in HGDN (12 upregulated and 37 downregulated; Fig. 5C, Table 9), 56 piRNAs in eHCC (9 upregulated and 46 downregulated; Fig. 5D, Table 10), and 128 piRNAs in HCC (83 upregulated and 45 downregulated; Fig. 5E, Table 11). The top five differentially upregulated piRNAs in cirrhosis (Table 7) were: piR-32299 (7658-fold), piR-28488 (2105-fold), piR-7239 (1497-fold), piR-5939 (536-fold), and piR-5067 (106-fold); whereas, the top five downregulated were: piR-952 (423-fold), piR-28525 (140-fold), piR-5938 (87-fold), piR-5937 (65-fold), and piR-25780 (49-fold). The top five differentially upregulated piRNAs in LGDN (Table 8) were: piR-32299 (5557-fold), piR-28488 (1804-fold), piR-7239 (1139-fold), piR-5939 (627-fold), and piR-23655 (90-fold). The top five downregulated were: piR-952 (450-fold), piR-12759 (270-fold), piR-820 (101-fold), piR-28525 (82-fold), and piR-5937 (37-fold). The top five differentially upregulated piRNAs in HGDN (Table 9) were: piR-28488 (1013-fold), piR-7239 (798-fold), piR-5939 (281-fold), piR-23655 (96-fold), and piR-6147 (26-fold); with downregulation of: piR-12759 (534-fold), piR-952 (469-fold), piR-25782 (309-fold, top 10, out of 6 were alternative number transcripts of this gene with fold change ranging 94–309), piR-820 (130-fold), and piR-28525 (116-fold). The top five differentially upregulated piRNAs in eHCC (Table 10) were: piR-28488 (1966-fold), piR-7239 (1629-fold), piR-5939 (718-fold), piR-1338 (56-fold), and piR-23786 (18-fold); whereas, the top five downregulated were: piR-952 (383-fold), piR-5937 (121-fold), piR-5938 (108-fold), piR-820 (99-fold), and piR-28525 (81-fold). The top five differentially upregulated piRNAs in HCC (Table 11) were: piR-32299 (4045-fold), piR-23670 (2335-fold), piR-24684 (2220-fold), piR-28488 (1099-fold), and piR-7239 (949-fold). On the other hand, piR-952 (518-fold), piR-820 (78-fold), piR-28525 (58-fold), piR-5938 (57-fold), and piR-5937 (52-fold) were downregulated. Further investigation is needed to evaluate their functions in diseased states, as there is very limited literature available on their functionality.

Figure 5
figure 5

Differential expression of piRNAs in liver tissue samples: Differentially expressed piRNAs were quantified and a heatmap was prepared (FDR < 0.05) for each disease stage (Cirrhosis, Low-grade dysplastic nodule, High-grade dysplastic nodule, Early stage Hepatocellular carcinoma and Advanced stage Hepatocellular carcinoma) with healthy control samples (AE). All enriched piRNAs were summarized by a Venn diagram, which identified 30 piRNAs commonly expressed in all stages and 52 piRs were specifically dysregulated in advanced hepatocellular carcinoma (F). (G) A Circos plot was prepared incorporating all differential expressions of piRNAs in cirrhosis, low-grade dysplastic nodule, high-grade dysplastic nodule, early hepatocellular carcinoma, and hepatocellular carcinoma tissue samples compared with healthy samples (FDR < 0.05). Chromosome and bands were listed in chromosomal positions of piRNAs affected expression in liver disease vs healthy samples. The Innermost ring are cirrhosis, then low-grade dysplastic nodule, high-grade dysplastic nodule, early hepatocellular carcinoma, and hepatocellular carcinoma with darker and lighter background colors representing upregulated and downregulated genes respectively.

Table 7 Differentially expressed piRNAs in cirrhosis vs healthy patient’s tissue samples (FDR < 0.05).
Table 8 Differentially expressed piRNAs in low-grade dysplastic nodule vs healthy patient’s tissue samples (FDR < 0.05).
Table 9 Differentially expressed piRNAs in high-grade dysplastic nodule vs healthy patient’s tissue samples (FDR < 0.05).
Table 10 Differentially expressed piRNAs in early hepatocellular carcinoma vs healthy patient’s tissue samples (FDR < 0.05).
Table 11 Differentially expressed piRNAs in hepatocellular carcinoma vs healthy patient’s tissue samples (FDR < 0.05).

Large data from all five groups was placed in a Venn diagram (Fig. 5F), which showed, 30 piRs were commonly expressed in all groups; whereas six in cirrhosis, seven in HGDN, and 52 in HCC were uniquely expressed (Fig. 5F). Interestingly, we could not determine piRs linked specifically to LGDN and eHCC. There were several piRs commonly expressed between groups; specifically, 14 in LGDN and HCC, 10 in cirrhosis and HCC, 11 in cirrhosis and eHCC, and eight in cirrhosis, HGDN, eHCC, and HCC (Fig. 5F). We also prepared a Circos Plot for comprehensive visualization of differentially expressed piRs, with chromosome number and location (Fig. 5G). After, plugging the data into Circos Plot, we found chromosomes 1, 2, 5, 6, and 22 were the most enriched chromosomes in all five groups (Fig. 5G).

Differential Expression of lncRNA in Cirrhosis, LGDN, HGDN, eHCC, and HCC Tissue Samples

We further analyzed data for long non-coding RNAs using lncRNA annotation on small RNA sequencing data and calculated differential expression in all five groups (Fig. 6A–E, Tables 1214, Supplimetery Tables 1 and 2). Our annotation identified 192 lncRNAs in cirrhosis (42 upregulated and 150 downregulated; Fig. 6A, Table 12), 180 lncRNAs in LGDN (53 upregulated and 127 downregulated; Fig. 6B, Supplimentery Table 1), 150 lncRNAs in HGDN (24 upregulated and 126 downregulated; Fig. 6C, Table 13), 160 lncRNAs in eHCC (24 upregulated and 136 downregulated; Fig. 6D, Table 14), and 225 lncRNAs in HCC (96 upregulated and 129 downregulated; Fig. 6E, Supplimentery Table 2). The top five upregulated lncRNAs in cirrhosis (Table 12) were: lnc-C21orf67-10 (23661-fold), lnc-CRK-3 (19138-fold), lnc-FBXO11-7 (5354-fold), lnc-GCNT1-4 (383-fold, 5 various transcripts with fold change range 279–383), and HAGLR:1 (260-fold); whereas, lnc-AC022098.1-1:10 (343-fold), lnc-HSD17B10-3 (231-fold), lnc-NEDD4L-1 (42-fold), GAS5:43 (41-fold), and lnc-CTD-2144E22.5.1-20 (36-fold) were downregulated. The top five upregulated lncRNAs in LGDN (Supplimentery Table 1) were: lnc-CRK-3 (1830-fold), GCNT1-4 (355-fold, five various transcripts with fold change range 339–383), lnc-ADCY10-1 (271-fold), lnc-UBC-3 (237-fold), and lnc-TRIM27-18 (172-fold). The top five downregulated were: lnc-AC022098.1-1:10 (756-fold), SNHG6:15 (144-fold), GAS5 (124-fold, three transcripts), lnc-HSD17B10-3 (104-fold), and lnc-CCNB1IP1-1 (103-fold). The top five upregulated lncRNAs in HGDN (Table 13) were: GCNT1-4 (307-fold, top 5 various transcripts with fold change range 295–307), lnc-UBC-3 (293-fold), lnc-TRIM27-18 (238-fold), LNC00273:8 (34-fold), and lnc-FNBP1L-2 (18-fold) and the top five downregulated were: lnc-AC022098.1-1:10 (1603-fold), GAS5 (685-fold, four transcripts with fold change ranging from 51–685), SNHG6:15 (409-fold), lnc-HSD17B10-3 (105-fold), and lnc-CTD-2144E22.5.1-20 (89-fold). The top five upregulated lncRNAs in eHCC (Table 14) were: lnc-CRK-3 (6385-fold), GCNT1-4 (290-fold, continue 5 various transcripts with fold change range 237–290), HAGLR (251-fold), lnc-UBC-3 (202-fold), and lnc-TRIM27-18 (171-fold); whereas the top five downregulated were: lnc-MBNL2-3 (1457-fold), LINC01021:16 (1317-fold), GAS5 (815-fold, multiple transcript types), SNHG1:60 (694-fold), and lnc-AC022098.1-1:10 (577-fold). The top five upregulated lncRNAs in HCC (Supplimentery Table 2) were: lnc-CCDC167-2 (1 million-fold), lnc-TPTE-3 (18378-fold), lnc-C21orf67-10 (9271-fold), lnc-TMEM8A-1 (6308-fold), and lnc-CRK-3 (3027-fold), with downregulation seen in: lnc-SNHG6:15 (99-fold), HSD17B10-3 (65-fold), lnc-CCNB1IP1-1 (50-fold), GAS5 (45-fold, multiple transcripts), and lnc-ARHGEF6-4 (28-fold). Further investigation is needed to determine their importance in HCC development.

Figure 6
figure 6

Differential expression of long non-coding RNAs (lncRNAs) in liver tissue samples: Differential expression of lncRNAs were quantified and a heatmap view was prepared (FDR < 0.05) for each disease stage (Cirrhosis, Low-grade dysplastic nodule, High-grade dysplastic nodule, Early stage Hepatocellular carcinoma, and Advanced stage Hepatocellular carcinoma) with healthy control samples (AE). All stages of liver disease enriched lncRNAs were summarized by a Venn diagram, which identified 109 lncRNAs commonly expressed in all stages with 39 lncRNAs specifically enriched in early hepatocellular carcinoma and only three lncRNAs enriched in advanced hepatocellular carcinoma (F). (G) A Circos plot was prepared incorporating all differential expressions of lncRNAs in cirrhosis, low-grade dysplastic nodule, high-grade dysplastic nodule, early hepatocellular carcinoma, and hepatocellular carcinoma tissue samples compared with healthy samples (FDR < 0.05). Chromosome and bands were listed in chromosomal positions of lncRNAs affected expression in liver disease vs healthy samples. The Innermost ring represents cirrhosis, then low-grade dysplastic nodule, high-grade dysplastic nodule, early hepatocellular carcinoma and hepatocellular carcinoma with darker and lighter background colors representing upregulated and downregulated genes respectively.

Table 12 Differentially expressed lncRNAs in cirrhosis vs healthy patient’s tissue samples (FDR < 0.05).
Table 13 Differentially expressed lncRNAs in high-grade dysplastic nodule vs healthy patient’s tissue samples (FDR < 0.05).
Table 14 Differentially expressed lncRNAs in early hepatocellular carcinoma vs healthy patient’s tissue samples (FDR < 0.05).

Differentially expressed lncRNAs in all five groups were listed in a Venn diagram (Fig. 6F), which showed, 109 lncRNAs were commonly expressed in all groups; whereas, 16 in cirrhosis, 3 in LGDN, 1 in HGDN, 39 in eHCC, and 3 in HCC were uniquely expressed (Fig. 6F). Several lncRNAs were commonly expressed between groups. Specifically, 19 in LGDN and eHCC; 10 in cirrhosis, LGDN, and eHCC; and 6 in LGDN, HGDN, eHCC, and HCC (Fig. 6F). We also organized a Circos Plot for comprehensive visualization of differentially expressed lncRNAs in the five groups with chromosome number and location (Fig. 6G). After plugging the data into Circos Plot, we found chromosomes 1, 2, 9, 13, 17, and X were the most enriched chromosomes in all five groups (Fig. 6G).

Differential Expression of circRNA in Cirrhosis, LGDN, HGDN, eHCC, and HCC Tissue Samples

We used circRNA annotation to identify differential expression in cirrhosis, LGDN, HGDN, eHCC, and HCC (Fig. 7A–E, Tables 1519). We found 70 circRNAs in cirrhosis (24 upregulated and 46 downregulated; Fig. 7A, Table 15), 56 circRNAs in LGDN (16 upregulated and 40 downregulated; Fig. 7B, Table 16), 51 circRNAs in HGDN (14 upregulated and 37 downregulated; Fig. 7C, Table 17), 60 circRNAs in eHCC (14 upregulated and 46 downregulated; Fig. 7D, Table 18), and 70 circRNAs in HCC (26 upregulated and 44 downregulated; Fig. 7E, Table 19). The top five differentially upregulated circRNAs in cirrhosis (Table 15) were: circR-0040679 (2328-fold), circR-0015774 (1622-fold), circR-0016456 (1039-fold), circR-0015637 (810-fold), and circR-0014848 (591-fold); whereas, the top five downregulated were: circR-0035407 (258-fold), circR-0035409 (245-fold), circR-0092360 (39-fold), circR-0043980 (26-fold), and circR-0007956 (21-fold). The top five differentially upregulated circRNAs in LGDN (Table 16) were: circR-0087119 (455-fold), circR-0014848 (454-fold), circR-0015637 (369-fold), circR-0015774 (126-fold), and circR-0008347 (116-fold). circR-0006257 (48723-fold), circR-0035409 (2745-fold), circR-0034507 (791-fold), circR-0089763 (86-fold), and circR-0076872 (71-fold) were downregulated. The top five differentially upregulated circRNAs in HGDN (Table 17) were: circR-0054435 (2907-fold), circR-0015637 (454-fold), circR-0015774 (417-fold), circR-000007119 (265-fold), and circR-0040679 (211-fold); whereas the top five downregulated were: circR-0006257 (27596-fold), circR-0035409 (1234-fold), circR-0035407 (995-fold), circR-0076872 (44-fold), and circR-0092360 (28-fold). The top five differentially upregulated circRNAs in eHCC (Table 18) were: circR-0015637 (556-fold), circR-0015774 (516-fold), circR-0087119 (315-fold), circR-0040679 (258-fold), and circR-0087948 (254-fold); and circR-0006257 (19579-fold), circR-0035409 (810-fold), circR-0035407 (623-fold), circR-0076872 (47-fold), and circR-0092360 (29-fold) were downregulated. The top five differentially upregulated circRNAs in HCC (Table 19) were: circR-0021905 (800542-fold), circR-0016456 (1204-fold), circR-0015637 (608-fold), circR-0087119 (602-fold), and circR-0014848 (438-fold). The top five downregulated were: circR-0035409 (1219-fold), circR-0035407 (784-fold), circR-0076872 (98-fold), circR-0092360 (15-fold), and circR-0069970 (13-fold).

Figure 7
figure 7

Differential expression of circular RNAs (circRNAs) in liver tissue samples: Differentially expressed circRNAs were quantified and a heatmap view was prepared (FDR < 0.05) for each disease stage (Cirrhosis, Low-grade dysplastic nodule, High-grade dysplastic nodule, Early stage Hepatocellular carcinoma and Advanced stage Hepatocellular carcinoma) with healthy control samples (AE). All stages of liver disease enriched circRNAs were summarized by a Venn diagram, which identified 41 circRNAs commonly expressed in all stages and 11 circRNAs were specifically enriched in advanced hepatocellular carcinoma (F). (G) A Circos plot was prepared incorporating all differential expressions of circRNAs in cirrhosis, low-grade dysplastic nodule, high-grade dysplastic nodule, early hepatocellular carcinoma,and hepatocellular carcinoma tissue samples compared with healthy samples (FDR < 0.05). Chromosome and bands were listed in chromosomal positions of circRNAs affected expression in liver disease vs healthy samples. The Innermost ring is cirrhosis, followed by low-grade dysplastic nodule, high-grade dysplastic nodule, early hepatocellular carcinoma, and hepatocellular carcinoma with darker and lighter background colors representing upregulated and downregulated genes respectively.

Table 15 Differentially expressed circRNAs in cirrhosis vs healthy patient’s tissue samples (FDR < 0.05).
Table 16 Differentially expressed circRNAs in low-grade dysplastic nodule vs healthy patient’s tissue samples (FDR < 0.05).
Table 17 Differentially expressed circRNAs in high-grade dysplastic nodule vs healthy patient’s tissue samples (FDR < 0.05).
Table 18 Differentially expressed circRNAs in early hepatocellular carcinoma vs healthy patient’s tissue samples (FDR < 0.05).
Table 19 Differentially expressed circRNAs in hepatocellular carcinoma vs healthy patient’s tissue samples (FDR < 0.05).

A Venn diagram (Fig. 7F) showed, 41 circRNAs commonly expressed in all groups whereas, six in cirrhosis; none in LGDN, HGDN, oreHCC; and 11 in HCC were uniquely expressed (Fig. 7F). There were several circRs commonly expressed between groups: five in cirrhosis and HCC; five in cirrhosis, LGDN, eHCC, and HCC; and three in LGDN, HGDN, eHCC, and HCC (Fig. 7F). We also prepared a Circos Plot for comprehensive visualization of differentially expressed circRNAs (Fig. 7G) which found chromosomes 1, 2, 5, 6, and 22 to be the most enriched across groups (Fig. 7G).

Differential Expression of sno/mt-RNA in Cirrhosis, LGDN, HGDN, eHCC, and HCC Tissue Samples

Small nuclear RNAs form a class of RNA molecules that localize within the nucleus of eukaryotic cells28. Their primary function is pre-mRNA processing, for which they are always associated with a set of specific proteins. These complexes are referred to as small nuclear ribonucleoproteins (snRNP). The small nucleolar RNAs (snoRNAs) are another subclass of snRNA that localize in the nucleolus and are associated with the maturation of RNA molecules through chemical modifications targeting mainly rRNAs, tRNAs, and snRNAs28. We remapped aligned reads to GenCode v26 database, which contains most of the curated small RNAs to identify differential expression in cirrhosis, LGDN, HGDN, eHCC, and HCC (Fig. 8A–E, Tables 2024). We found 10 sno/mt-RNAs in cirrhosis (Fig. 8A, Table 20), 15 sno/mt-RNAs in LGDN (Fig. 8B and Table 21), nine sno/mt-RNAs in HGDN (Fig 8C and Table 22), six sno/mt-RNAs in eHCC (Fig. 8D and Table 23), and 16 sno/mt-RNAs in HCC (Fig. 8E and Table 24). Most of the mitochondrial RNAs were upregulated in HCC and all four snoRNAs were downregulated. All five groups had downregulated snoRD121B, whereas snoRD121A was downregulated only in HCC samples.

Figure 8
figure 8

Differential expression of snoRNAs and mitochondrial RNAs (mt-RNAs) liver tissue samples: Differential expression of snoRNAs and mt-RNAs were quantified and a heatmap view was prepared (FDR < 0.05) for each disease stage (Cirrhosis, Low-grade dysplastic nodule, High-grade dysplastic nodule, Early stage Hepatocellular carcinoma and Advanced stage of Hepatocellular carcinoma) with healthy control samples (AE). All stages of liver disease enriched sno/mt-RNAs were summarized by a Venn diagram, which identified 5 small RNAs commonly expressed in all stages and 1 snoRNA was specifically enriched in advanced hepatocellular carcinoma (F). (G) A Circos plot was prepared incorporating all differential expressions of sno/mt-RNAs in cirrhosis, LGDN, HGDN, eHCC and HCC tissue samples compared with healthy samples (FDR < 0.05). Chromosome and bands were listed in chromosomal positions of sno/mt-RNAs expression in liver disease vs healthy samples. The Innermost ring is cirrhosis, then low-grade dysplastic nodule, high-grade dysplastic nodule, early hepatocellular carcinoma and hepatocellular carcinoma with darker and lighter background colors representing upregulated and downregulated genes respectively.

Table 20 Differentially expressed sno/mt RNAs in cirrhosis vs healthy patient’s tissue samples (FDR < 0.05).
Table 21 Differentially expressed sno/mt RNAs in low-grade dysplastic nodule vs healthy patient’s tissue samples (FDR < 0.05).
Table 22 Differentially expressed sno/mt RNAs in high-grade dysplastic nodule vs healthy patient’s tissue samples (FDR < 0.05).
Table 23 Differentially expressed sno/mt RNAs in early hepatocellular carcinoma vs healthy patient’s tissue samples (FDR < 0.05).
Table 24 Differentially expressed sno/mt RNAs in hepatocellular carcinoma vs healthy patient’s tissue samples (FDR < 0.05).

Differentially expressed data for all five groups is shown in a Venn diagram (Fig. 8F), that demonstrated five sno/mt-RNAs commonly expressed in all groups whereas, four in cirrhosis; none in LGDN, HGDN, or eHCC; and one in HCC uniquely expressed (Fig. 8F). There were several sno/mt-RNAs commonly expressed between groups: six in LGDN and HCC; and one in LGDN, HGDN, and HCC (Fig. 8F). We also prepared a Circos Plot for comprehensive visualization of differentially expressed sno/mt-RNAs in the five groups (Fig. 8G). Chromosomes 9, 17, and 19 were the most enriched (Fig. 8F).

Discussion

We have identified various non-coding RNAs in liver disease samples, which can be used for validation and development of novel therapeutics for HCC. miR-101 regulates proliferation, migration, and invasion in various cancers29,30,31,32; suggesting importance in the ordered transformation from normal to malignant phenotype. This was also suggested in our study as miR-101 was continually overexpressed in all disease stages when compared to normal liver tissue. miR-22 is considered to have tumor suppressor activity; however, in our study, it showed remarkable overexpression in HCC33. miR-23a has been reported to downregulate expression of interferon regulatory factor-1 in HCC34. Accordingly, it was overexpressed in the eHCC samples we analyzed. miR-7704 was found to be highly overexpressed in HCC when compared to cirrhosis (~400-fold change).

Current therapies and targeted strategies are well documented in the literature; however, more research is needed to identify crucial players in early disease development and treatment targets. Deregulation of various pathways such as p53, RAS/MAPK, PI3K/AKT/mTOR, WNT, MET, MYC, and TGF-beta are involved in oncogenesis35. Interestingly, miR-122 affects all of these pathways while also targeting CUTL1 transcriptional repression35,36, leading to apoptosis and cell cycle arrest. Accordingly, miR-122 is downregulated in more than 70% of cancers, suggesting a crucial role in oncologic transformation. miR-122 appears to act as a tumor suppressor and its downregulation in our study may be pertinent to an ordered progression from normal liver to HCC phenotype. In our analysis, miR-122 was downregulated across all disease stages in both sequenced data and clinical samples (Fig. 1H).

The Lethal-7 (Let-7) family of miRNAs is present in multiple copies in the genome and highly conserved37. These miRNAs are known to act as tumor suppressors and prevent angiogenesis. The Let-7 family has 10 members in the human genome (Let-7a, let-7b, let-7c, let-7d, let-7e, let-7f, let-7g, let-7i, miR-98, and miR-20238), which are involved in gene regulation and cell adhesion. We found, Let-7f microRNAs were strikingly downregulated in HGDN and eHCC (1 million-fold change) and Let-7g in HCC (358-fold change) with similar reports published in gastric39,40, prostate41, colon42,43, small cell lung44, thyroid45, breast46,47, and hepatocellular cancer48,49. miR-221, was found to be upregulated in HCC via in silico analysis and has multiple known pathway targets, such as p27Kip1, p53, BMF, PI3K, PTEN, and mTOR50. Vascular endothelial growth factor (VEGF) plays a major role in tumor development, and is partly regulated by miR-16. Our data correlates with recent studies that have shown miR-16 upregulation in HCC. GEMOX (Gemcitabine and oxaliplatin) is one of the chemotherapeutic options in HCC treatment, which specifically targets VEGF and miR-1651. Other available chemotherapeutic agents such as sunitibib, linifanib, brivanib, tivantibib, and everolimus target other kinases. Specifically, everolimus targets mTOR signaling and mTOR regulated miRNAs such as miR-99a-3p (upregulated), miR-99a-5p (downregulated), miR-221 (upregulated), and miR-100 (downregulated).

Most circular RNA functions are not clear. Until now, the only clear evidence is that they can serve as miRNA “sponges”52. In our study, we found altered expression of many circular RNAs, however additional validation is needed prior to assigning functionality.

Overall, 37 dysregulated sncRNAs were shared between all five phenotypic groups. miR-101, miR-22, and circR-0015774 were the top upregulated sncRNAs, whereas miR-122, piR-952, and circR-0035409 were the most frequently downregulated. 30 piRNAs, 109 lncRNAs, 41 circRNAs, and five sno/mt-RNAs were downregulated in all groups. circR-0015774, circR-0035409, MT-TS1, and MT-TP were in the top five upregulated RNAs in all groups; whereas, sno115-31 and snoRD37 were in the top five downregulated. Specific RNAs were also dysregulated in a single group. In cirrhosis, 16 miRNAs were dysregulated whereas four were dysregulated in LGDN (let-7d, miR-141, miR-181b, and miR-3120), none in HGDN, miR-150 in eHCC, and 29 in HCC. miR200b was commonly downregulated in HGDN, eHCC, and HCC (<3.5-fold). Six piRNAs were affected only in cirrhosis, seven in HGDN, and 52 in HCC. 16 lncRNAs were affected in cirrhosis, three in LGDN (lnc-C17orf51-5:1, lnc-KDM4C-18:1, and SNHG1:57), one in HGDN (lnc-REG3G-6:1), 39 in eHCC, and three in HCC (LINC01021:16, lnc-MBNL2-3:1, and SNHG1:60). Six circRNAs in cirrhosis and 11 in HCC were specifically dysregulated. Four sno/mt-RNAs were dysregulated in cirrhosis only and one in HCC (snoRD121A).

Our results summarize that multiple differentially expressed sncRNAs were identifided in concordance with disease progression, which may provide a basis for future study and clues to understanding HCC pathogenesis. Although we have a substantial sequencing dataset, we were limited in miRNA validation by a small tissue sample size. Additionally, limited data on disease etiology prevented further analysis. Nevertheless, we believe that our study uncovers the potential of data mining for sncRNA identification with regards to HCC. It is also appreciated that further research is needed to address the functional validation of sncRNA signatures and their relevance.

In summary, miR-101, miR-22, miR-122, circR-0015774, circR-0035409, MT-TS1, MT-TP, sno115-31, and snoRD37 may serve as biomarkers for liver pathogenesis, since they were differentially expressed. Each liver phenotype demonstrated a unique molecular signature. In cirrhosis, there were 16 dysregulated miRNAs, six piRNAs, 16 lncRNAs, and six circRNAs. Four miRNAs and three lncRNAs were differentially expressed in LGDN; whereas, seven piRNAs and one lncRNA were dysregulated in HGDN. The molecular signature of eHCC was altered by one miRNA and 39 lncRNAs, while HCC demonstrated changes in 29 miRNAs, 52 piRNAs, three lncRNAs, 11 circRNAs, and one sno/mt-RNA. The sncRNAs most strongly associated with cirrhosis were mir-192 (32-fold), miR 320b (14-fold), and circ-0079763 (27-fold). LGDN was most defined by changes in miR-141(626-fold) and piR-25782.5 (309-fold), while piR-25782.1 (181-fold) was expressed in HGDN. miR-150 (4-fold) was most strongly associated with eHCC, whereas, miR-142 (1-million fold), miR23a (124-fold), miR 130b (65-fold), piR-23670 (2335-fold), piR-24684 (2072-fold), circR-0021905, and snoRD121A were specifically altered in HCC. Further studies are needed to validate sncRNA functionality in liver disease prior to biomarker development.