Integrated analysis of hepatic mRNA and miRNA profiles identified molecular networks and potential biomarkers of NAFLD

To enhance our understanding of molecular mechanisms and mine novel biomarkers of non-alcoholic fatty liver disease (NAFLD), RNA sequencing was performed to gain hepatic expression profiles of mRNAs and miRNAs in NAFLD and normal rats. Using DESeq with thresholds of a two-fold change and a false discovery rate (FDR) less than 0.05, 336 mRNAs and 21 miRNAs were identified as differentially expressed. Among those, 17 miRNAs (e.g., miR-144-3p, miR-99a-3p, miR-200b-3p, miR-200b-5p, miR-200c-3p, etc.) might serve as novel biomarkers of NAFLD. MiRNA target genes (13565) were predicted by the miRWalk database. Using DAVID 6.8, the intersection (195 genes) of differentially expressed mRNAs and miRNA-predicted target genes were enriched in 47 gene ontology (GO) terms and 28 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Using Cytoscape, pathway interaction and protein-protein interaction (PPI) networks were constructed, and hub genes (e.g., Abcg8, Cyp1a1, Cyp51, Hmgcr, etc.) associated with NAFLD were obtained. Moreover, 673 miRNA-mRNA negative regulatory pairs were obtained, and networks were constructed. Finally, several representative miRNAs and mRNAs were validated by real-time qPCR. In conclusion, potential molecular mechanisms of NAFLD could be inferred from integrated analysis of mRNA and miRNA profiles, which may indicate novel biomarkers of NAFLD.


Results
Quality control of RNA sequence experiment. RNA samples with optimal RIN (RNA Integrity Number) values (≥8) were considered to construct libraries for sequencing. Library quality was also assessed (supplementary Table S1). The libraries with sufficient concentration and a peak length within the ranges 350-500 bp and 138-150 bp were sequenced on an Illumina HiSeq platform to generate mRNA and miRNA sequences, respectively. In the present study, the mRNA sequencing generated an average of 48348613 raw reads for each sample. After quality filtering (removing adapter and low-quality reads), an average of 48309324 (99.9%) high-quality, clean reads were obtained for each sample. A large portion of clean reads (92.5%) were mapped to the rat genome. Among those reads, we observed 21174764 spliced reads and 20772406 non-spliced reads. Moreover, the miRNA sequencing yielded an average of 10088010 raw reads for each sample, and 8629406 (85.5%) clean reads were filtered out. A large number of clean reads (97.9%) were mapped to the genome. The high percentage of mapped reads is consistent with what would be expected for high-quality datasets 18 . The quality characteristics of sequence reads for each sample are listed in supplementary Table S2. Additionally, the RNA expression levels were determined by reads per kilobase per million mapped reads and are provided in supplementary Table S3. Hierarchical clustering of representative mRNA and miRNA expressions revealed samples in the same group clustered closely, which indicated high reproducibility in biological replicates (Fig. 1).
To further establish the main biological function and key pathways of the 195 overlapped genes, functional enrichment was performed by DAVID. GO analysis revealed that the overlapped genes were enriched in 47 terms such as cholesterol biosynthetic process, immune response and oxidation-reduction process (Fig. 3B). KEGG pathway enrichment showed that 20 pathways were enriched such as steroid biosynthesis, NF-κB signaling pathway and adipocytokine signaling pathway (Fig. 3C).
Furthermore, pathway interaction networks were constructed and are presented in Fig. 4. Pathways might interact with each other through a specific gene or hub gene. For instance, steroid hormone biosynthesis and bile secretion might interact with each other via gene Cyp7a1, while bile secretion might interact with fat digestion and absorption through gene Abcg8. RT1-Da, RT1-Ba, RT1-Bb and RT1-Db1 were hub genes of several different pathways that might interact with each other. Detailed information pertaining to genes enriched in different GO terms and KEGG pathways is provided in supplementary Table S6.

PPI (Protein-protein interaction) network analysis.
To explore interactions among the 195 overlapped genes, PPI was applied, and the most important modules were then screened. A PPI network with 51 nodes was obtained, and the average number of neighbors was 6.9. The hub nodes with the greatest number of neighbors (≥8), such as Abcg8, Cyp1a1, and Cyp7a1, were identified (Fig. 5A). By Molecular Complex Detection  (MCODE), one significant module with the highest score (11.5) was obtained, including 12 nodes such as Cyp7b1, Cyp51, and Hmgcr (Fig. 5B). A sub-network with the first neighbors was constructed (Fig. 5C).

Validation by real-time PCR.
To validate our findings based on RNA sequence data, we selected 10 mRNAs and 4 miRNAs (half were up-regulated, and half were down-regulated) to perform real-time PCR experiments. Log 2 (fold change) values were transformed based on the ratios between NAFLD and normal group average expression values. As shown in Fig. 8, the RNA expression patterns in real-time PCR data were consistent with RNA sequence data, which might partially validate the reliability of our sequence data and our findings in the present study.

Discussion
In the present study, we combined the analysis of mRNA and miRNA profiles and constructed PPI, pathway interaction and miRNA-mRNA regulatory networks to enhance our understanding of molecular mechanisms of NAFLD and mine potential biomarkers.
By integrated analysis, we observed 14 decreased miRNAs (such as miR-33, miR-33-5p, miR-33-3p, miR-144-3p, and miR-99a-3p) and constructed 459 miRNA and mRNA regulatory pairs in NAFLD rats. MiR-33 is a key regulator of lipid metabolism and transport that targets ATP-binding cassette A1 (ABCA1) and thus suppresses HDL synthesis by attenuation of cholesterol efflux to apolipoprotein A1 and nascent HDL 19 . Studies have reported that genetic ablation and antagonism of miR-33 attenuates the progression of atherosclerosis in mice 20,21 . However, there are conflicting reports regarding NAFLD. A study reported that persistent inhibition of miR-33 in high-fat-diet-fed mice might cause deleterious effects such as moderate hepatic steatosis and hypertriglyceridemia 22 . Researchers observed that high-fat-diet feeding reduced miR-33 expression along with an increase in body weight, fasting blood glucose, triglyceride, cholesterol, AST and ALT in NAFLD mice 23 . Altered expression of miR-33-5p was also reported in Western-diet-induced NAFLD rats and high-fructose-diet-induced chronic metabolic disorders mice 24,25 . However, little is known about miR-33-3p in NAFLD. Intriguingly, we observed that the expression levels of miR-33, miR-33-5p and miR-33-3p were significantly decreased in high-fat-diet-induced NAFLD. All data suggested that miR-33, its guide strand miR-33-5p and passenger strand miR-33-3p might play pivotal roles in NAFLD. Moreover, the miRNA-mRNA regulatory networks revealed that 23 genes interacted with both miR-33-5p and miR-33-3p. Among those genes, Apln and Vcan have been reported to be associated with NAFLD.One study reported that the serum concentration of apelin-36 (Apln gene encoded protein) was significantly higher in NAFLD patients than in controls 26 . Another study revealed that Vcan encoded versican might serve as a novel modulator of hepatic fibrosis. Hepatic expression of versican increased in cirrhotic livers and a mouse model of fibrosis, while transient knockdown of versican inhibited expression of fibrogenic genes 27 . In the present study, we observed that Apln and Vcan were up-regulated in NAFLD, consistent with the results reported by other researchers. Collectively, there is strong evidence indicating the potential roles of miR-33-3p and its target genes in NAFLD.
Researchers reported that the serum level of miR-99a was significantly decreased in NAFLD patients compared with that in healthy controls 28 . Decreased miR-144 could enhance TNF-α and IFN-γ production, which might contribute to the progression of NAFLD in rats 29 . In addition, miR-144 might contribute to the pathogenesis of NASH in morbidly obese subjects 30 . In the present study, although we did not observe altered expression of miR-99a or miR-144, we found that miR-99a-3p and miR-144-3p were significantly down-regulated in NAFLD  rats and constructed a series of miRNAs-mRNAs regulatory pairs. We also observed that NAFLD-related genes Apln, Vcan, Cd36 and Cyp7a1 were targets of miR-99a-3p and miR-144-3p 26,27,31,32 . Further investigation of the identified DEMs and their target genes may help explain the molecular mechanisms and identify novel biomarkers of NAFLD.
In addition to the down-regulated miRNAs, we obtained 7 up-regulated miRNAs (miR-200b, miR-200b-3p, miR-200b-5p, miR-200c-3p, miR-378b, miR-184, and miR-349) and 114 correlated miRNA and mRNA regulatory pairs. MiR-200b belongs to the miR-200 family, which is organized into two groups based on a single nucleotide difference in their seed sequence (group A: miR-141 and -200a; group B: miR-200b, -200c and −429) 33 . Researchers observed that expression of miR-200b and miR-200c increased in high-fat-diet-induced NAFLD rat livers and free-fatty-acid-treated HepG2 cells and human hepatocytes 34 . Another study reported that up-regulation of miR-200b in the livers of mice fed a choline-and folate-deficient diet was strongly correlated with the severity of NAFLD 35 . However, there are no reports about the correlation of miR-200b-3p, miR-200b-5p and miR-200c-3p with NAFLD. In the present study, we observed up-regulation of miR-200b in NAFLD rats, consistent with previous studies. Interestingly, we also observed that the expression levels of miR-200b-3p, miR-200b-5p and miR-200c-3p were remarkably increased in NAFLD rat livers. GO and KEGG pathway analysis revealed that their target genes were enriched in cholesterol biosynthetic process, metabolic pathways, etc., implying the crucial roles of the three miRNAs in NAFLD. Additionally, miRNA-mRNA regulatory network analysis indicated that five down-regulated genes (Abcg8, Cyp1a1, Tmem255a, Gstm6 and Lat) in NAFLD rats were common targets of miR-200b-3p, miR-200b-5p and miR-200c-3p. Notably, the absence of ABCG5/ABCG8 (ATP binding cassetteG5/G8) reduces biliary cholesterol secretion and results in hepatic cholesterol accumulation, acceleration of obesity, and exacerbation of NAFLD in mice 36 . It was also reported that CYP1A1 (cytochrome P450 1 A 1) protein expression was decreased in methionine-choline-deficient-diet-induced NAFLD mice and that 3,3′-diindolylmethane significantly up-regulated the protein expression of CYP1A1, which could be used as a potential therapeutic candidate for NAFLD 37 . The data presented in other studies are consistent with our findings, suggesting that miR-200b-3p, miR-200b-5p, miR-200c-3p and their target genes might be crucial regulators in the pathogenesis of NAFLD. Further investigation of the three miRNAs and their target genes may yield promising therapeutic targets of NAFLD. Additionally, miR-349 was reported to be up-regulated in hepatitis C virus-induced liver fibrosis 38 . Interestingly, we observed that miR-349 was increased in NAFLD rats, which interacted with 29 target genes, e.g., Abcg8, Cyp1a1, Fads1, Lats2, etc. In addition to Abcg8 and Cyp1a, Fads1 has also been reported to be correlated with NAFLD 39 . Therefore, miR-349 may serve as a new biomarker of NAFLD, which is worthy of in-depth investigation.
Furthermore, the functional enrichment of the miRNA target genes revealed that numerous genes were enriched in NAFLD-related processes and pathways such as cholesterol biosynthetic process, immune response, steroid biosynthesis, NF-κB signaling pathway, fat digestion and absorption and bile secretion [40][41][42][43] . The constructed pathway interaction networks indicated that different pathways could interact with each other through specific genes or hub genes, such as Abcg8, Acat2, Cyp7a1, Cyp1a1, Cd36, Cd44, and RT1-Ba. PPI network and MCODE module analysis also identified a batch of hub genes, including Abcg8, Cyp1a1, Cyp7b1, Cyp51, and  Hmgcr. In addition to the previously mentioned Abcg8, Cd36 and Cyp1a1, genes such as Cyp7b1, Cyp51 and Hmgcr have also been reported to be correlated with NAFLD. It was reported that mRNA and protein expression levels of CYP7B1 were increased in human NAFLD 44 . Down-regulation of Cyp51 was observed by transcriptome analysis 45 . The hepatic expression of Hmgcr was reduced in morbidly obese individuals with liver steatosis and fibrosis 46 . In-depth investigation of these specific genes or hub genes may provide insight into the molecular mechanisms of NAFLD and identified novel biomarkers.
Several weak points in our present study should be noted. First, our results are based on rat models; few findings have been replicated across human studies 47 . Discrepancies in the findings are likely due to differences in study design, phenotypic endpoint, genetic alterations, etc. In-depth investigation should be performed to generalize our findings to humans. Second, we performed a bioinformatics analysis based on high-throughput RNA sequencing data and conducted several RNA expression validations by real-time qPCR; thorough validation and further functional experiments should be performed in the future. Third, we focused on exploring the negative miRNA-mRNA regulatory pair, considering the down-regulation role for miRNA revealed by available studies. However, recent evidence suggests that some miRNAs could activate gene expression in specific cell types and conditions with distinct transcripts and proteins. Given this complexity, further exploration should be performed to uncover the puzzling and interesting aspects of gene regulation by miRNAs.
In conclusion, our present integrated analysis may open a new avenue toward understanding the molecular mechanisms associated with the pathogenesis of NAFLD and provide potential novel biomarkers or candidate targets of NAFLD. Thorough and comprehensive investigation of the identified DEMs and their target genes should be performed in the future, which may ascertain novel biomarkers of NAFLD.

Methods
Animals sample collection. As previously described 17 , five-week-old male Wistar rats (130 g ± 10 g) were obtained from Shanghai SLAC Laboratory Animal CO. LTD, China, and maintained in a temperatureand humidity-controlled room. Fourteen rats were randomly divided into two groups: a normal group (n = 7), fed with a chow diet; and an NAFLD group (n = 7), fed with a high-fat diet (88% chow diet, 10% lard and 2% cholesterol) Animals were weighed and sacrificed after 4 weeks of feeding. Liver tissues were quickly removed and stored at −80 °C after being snap frozen in liquid nitrogen. All animal procedures were approved by the Animal Experiment Ethics Committee of Shanghai University of Traditional Chinese Medicine, and all methods were performed in accordance with the relevant guidelines and regulations.
RNA extraction and sequencing. To obtain mRNA and miRNA expression profiles, total RNA of the liver (100 mg) was extracted using TRIzol reagent (Invitrogen, USA). RNA quality was verified using an Agilent 2100 Bio-analyzer (Agilent Technologies, Santa Clara, CA). Libraries for mRNA and miRNA sequencing were respectively generated using the NEBNext Ultra RNA Library Prep Kit for Illumina (NEB,USA) and the NEBNext Multiplex Small RNA Library Prep Kit for Illumina (NEB,USA) according to the manufacturer's instructions. Then, the final purified libraries were evaluated using a BioAnalyzer 2100 and quantified. Using the Illumina HiSeq platform, libraries for mRNA were sequenced, and 150 bp paired-end reads were generated; moreover, libraries for miRNA were sequenced, and 50 bp single-end reads were generated according to the manufacturer's instructions. Seven biological replicate samples of each group were used, and a total of 28 libraries were sequenced.
RNA sequencing data processing. We performed quality control on raw sequence data using the FastQC tool. To obtain high-quality, clean data for analysis, adaptor sequence trimming and removal of low-quality reads were performed 48 . Clean reads were aligned with the rat genome using TopHat v2.1.1. Based on the ultra-high throughput short-read aligner Bowtie, TopHat aligned RNA-Seq reads to reference genomes and analyzed the mapping reads to determine possible splice junctions between exons. RNA expression levels were determined by reads per kilobase per million mapped reads. Hierarchical clustering of representative mRNAs and miRNAs expressions was performed to reveal reproducibility in biological replicates.
The DESeq package was used to detect DEGs and DEMs between the normal and NAFLD groups, with thresholds of a two-fold change (the ratio between seven NAFLD rats' and seven normal rats' averaged signal values) and an FDR less than 0.05. The DEGs and DEMs were visualized on a volcano plot, which was constructed by plotting − log 10 (FDR) on the y-axis and log 2 (fold change) on the x-axis.
Target genes prediction, functional enrichment. The target genes of DEMs were predicted by the online tool of miRWalk (http://zmf.umm.uni-heidelberg.de/apps/zmf/mirwalk2/index.html). The intersection of DEMs predicted target genes, DEGs were identified by a Venn diagram and the overlapped genes were used for further analysis.
Using DAVID 6.8 (https://david.ncifcrf.gov/), based on the default rat whole genome background, GO analysis was performed to help elucidate the concrete biological functions of specific genes, and KEGG pathways enrichment was employed to identify the critical signal pathways of the differential expressed genes. Any GO terms and KEGG pathways with p values less than 0.05 were considered significantly enriched. PPI, pathway interaction, miRNA-mRNA regulatory network construction. The overlapped genes were mapped to the STRING database (https://string-db.org/) to screen PPI, and networks were visualized by Cytoscape. Important hub proteins were screened by counting the degree of connectivity of each node in the network. In the network, nodes represented proteins and lines represented the interactions. The most significant modules were identified with the plug-in MCODE of Cytoscape with a cutoff MCODE score of >5. The enriched pathway interaction networks were constructed using the Cytoscape plug-ins ClueGO and CluePedia. Negatively correlated, differentially expressed miRNA-mRNA pairs were selected for screening, and the regulatory networks were constructed and visualized by Cytoscape software. Sub-networks were also constructed with mRNAs connected to a large number of different miRNAs (≥5).
Real-time quantitative PCR assay. Total RNA was extracted from liver tissues using TRIzol (Invitrogen, China) according to the manufacturer's protocol. For miRNA detection, total RNA was polyadenylated and then submitted to reverse transcription with primers and reverse transcriptase (Promega, USA) according to the manufacturer's instructions. For mRNA detection, RNA was reverse transcribed into cDNA and then subjected to a real-time quantitative PCR (qPCR) assay. The qPCR assay was performed in triplicate with the SYBE Green PCR Kit on an ABI StepOnePlus Real-time PCR system (Applied Biosystems, USA). Cycle threshold (CT) values were calculated using the automated settings of the system. U6 and GAPDH were used as the internal controls for miRNAs and mRNAs, respectively. The primer sequences are listed in Tables 3 and 4.
Statistical analysis. Data were expressed as means ± SD and were analyzed by one-way analysis of variance (ANOVA) using SPSS v18.0 software. P values less than 0.05 were considered significantly different.
Data availability statement. All data generated during this study are included in this published article (and its Supplementary Information files).   Table 4. Primer information for PCR verification in mRNAs.