Background & Summary

MicroRNAs (miRNAs) are small noncoding RNAs (21–24 nucleotides) that mediate post-transcriptional regulation of gene expression by binding to the 3′ untranslated regions of messenger RNAs (mRNA)1,2,3,4,5. miRNAs play pivotal roles in biological processes such as differentiation, development, tissue growth, and tumor initiation and progression2,4,6. miRNAs can serve as potential clinical biomarkers for disease diagnosis, prognosis, and treatment selection7,8,9, as well as therapeutic targets for human diseases10,11,12. However, the development of miRNA-based biomarkers and therapies is hampered due to incomplete understanding of the characteristics of miRNA profiles across different organs, various developmental stages, or sexes of rat, an extensively used animal model for evaluating drug safety and understanding drug mechanisms of action. Furthermore, the rat miRNA transcriptome is much less well annotated compared to that of mouse and human, with 764, 1915, and 2588 miRNAs annotated in miRBase v.22.1 (http://www.mirbase.org/) for rat, mouse, and human, respectively13,14. Therefore, characterizing rat miRNA expression profiles more comprehensively is warranted, especially with the advancement of high-throughput technologies such as microarrays and next-generation sequencing.

Several efforts for constructing a catalog of rat miRNA expression have been reported. For example, Minami et al.15 reported a dataset of microarray-based expression profiles of 424 miRNAs cross 55 different organs and tissues of 10-week-old male Sprague-Dawley rats. Similarly, Bushel et al.16 developed the RATEmiRs database consisting of miRNA-seq expression data across 14 organs among 12/13-week-old Sprague Dawley rats of both sexes17, through six sequencing batches. These studies mainly focused on miRNA expression differences among various organs. However, the diversity and number of expressed miRNAs in animals is not only organ-specific18,19, but also shows age dependence1,20,21 and sex biases22,23. Specifically, the role of miRNAs in the aging process has often been overlooked and is therefore largely unclear.

Ideally, studies on rat transcriptome should simultaneously cover multiple organs across different developmental stages for both sexes. Indeed, in a widely cited previous study24,25 as part of the Sequencing Quality Control (SEQC) consortium26, we constructed a comprehensive rat transcriptomic BodyMap with RNA-seq of 320 samples from 11 organs of both sexes of juvenile, adolescence, adult, and aged Fischer 344 rats and identified a large number of transcripts showing organ-specific, age-dependent, or sex-specific expression patterns. The unique rat mRNA dataset has already been widely used by scientific communities to advance our understanding of the dynamics of rat transcriptome, including studies on circRNAs27,28,29, the development of annotation tools30,31, and enrichment of database resources32,33.

Here, as a follow-up and complementary study, we constructed a comprehensive rat miRNA BodyMap by generating miRNA-seq data from the same 320 samples and described the quality and the dynamic characteristics of miRNA expression profiles across 11 organs, between two sexes, and over the life span of the rats (Fig. 1). We quantified 604 out of the 764 miRNAs annotated in miRbase v22.1 (http://www.mirbase.org/). In addition, to explore the potential of novel miRNA discovery with this dataset, we used the miRDeep2 algorithm34,35 and identified 12 organ-enriched novel miRNA candidates. The miRNA-seq expression data were then analyzed in conjunction with the corresponding mRNA expression profiles. The quality of the miRNA-seq data was found to be high as seen from various aspects such as raw sequences, mapped sequences, and biological replicates.

Fig. 1
figure 1

Overview of study design. 320 total RNA samples were collected from 16 female rats and 16 male rats of the Fisher 344 strain, including four rats for each sex under each of the four developmental stages, and ten organs (adrenal gland, brain, heart, kidney, liver, lung, muscle, spleen, thymus, testis, or uterus) per rat. Aliquots of the same RNA samples have also been used to construct mRNA dataset previously24,25 (a), which makes it possible to integrate miRNA data with mRNA data. (b) Schematic overview of the miRNA-seq workflow: miRNA libraries were prepared and sequenced for each of the 320 samples. For miRNA quantification, adapter sequences were removed, followed by a read-quality filter. Reads of high quality were then mapped to miRBase, piRBase, GtRNAdb, and the NCBI rat transcriptome and genome. In addition, reads from all 320 samples were pooled for novel miRNA discovery.

The dataset we present here can help researchers study the characteristics of rat transcriptome and identify novel miRNAs. In addition, along with the mRNA dataset from the same set of samples, this miRNA-seq dataset provides a unique resource for data integration.

Methods

Organ collection and RNA isolation

Female and male Fischer 344 rats (pair-housed under standard conditions) from the National Center for Toxicological Research of the US Food and Drug Administration animal-breeding colony were euthanized by carbon dioxide asphyxiation at 2, 6, 21 and 104 week-of-age as previously described24,25. In summary, 320 tissue samples were derived from 16 female and 16 male rats of the Fischer 344 strain, including four rats for each sex under each of the four developmental stages and ten organs (adrenal gland, brain, heart, kidney, liver, lung, muscle, spleen, thymus, and testis or uterus) per rat. That is, four biological replicates per sample group were used in this study. Ground organ tissue was stored at −80 °C. Total RNA was extracted from ground tissue by using the miRNeasy Mini Kit (Qiagen) according to the manufacturer’s protocol, including treatment with DNase. RNAs longer than 18 nucleotides were recovered with this method. The quality of each RNA sample was examined with an Agilent 2100 Bioanalyzer.

Construction and sequencing of miRNA-seq libraries

Small RNA libraries were constructed using Illumina TruSeq Small RNA Library Prep kit (the sequences of adapters and primers are provided in Table 1) by following the manufacturer’s Reference Guide with some modifications. In brief, 1 µg of total RNA (including miRNA) as input was sequentially ligated with 3′ and 5′ adapters, and the ligated RNA fragments were then reverse transcribed to single-stranded cDNAs using Superscript II reverse transcriptase (Invitrogen) and RT primer. The cDNAs were amplified by PCR with 13 cycles, and the resulting products were further purified with 2× Agencourt AMPure XP beads (Beckman), followed by size selection using the Pippin Prep Size Selection System (Sage Science). The final libraries were validated using Agilent High Sensitivity DNA assay, and subjected to 50 bp single-end sequencing (SE50) on an Illumina HiSeq. 2500. No spike-in controls or negative controls were used in the construction of this dataset.

Table 1 Sequence information of the Truseq-small RNA kit.

Pre-processing and processing of the reads

The adapter sequences were removed with fastp 0.23.2 software (http://opengene.org/fastp/fastp.0.23.2), followed by a quality filter and a length filter. Reads with more than 2 “N” bases were discarded. The clipped reads with length between 16 and 35 nt were retained for alignment to various transcript types, including miRNA (miRbase v22.1, http://www.mirbase.org/), piRNA (piRBase, http://www.regulatoryrna.org/database/piRNA/download.html), tRNA (GtRNAdb, http://gtrnadb.ucsc.edu/GtRNAdb2/genomes/eukaryota/Rnorv7/Rnorv7-seq.html), other RNA (NCBI transcriptome, https://www.ncbi.nlm.nih.gov/genome/73), and the rat genome (Rn7, https://hgdownload.soe.ucsc.edu/downloads.html#mammals). The alignment was performed with Bowtie 1.3.1 (http://bowtie-bio.sourceforge.net/manual.shtml) with less than 2 mismatched bases allowed.

Filtering of miRNAs and samples

The following strategies were used to filter miRNAs and samples of questionable quality: (1) the low-expressed miRNAs less than one count per library on average, were discarded. (2) a sample was removed if it failed to cluster with other samples from the same organ type in a hierarchical clustering analysis. Spl_F_104_4 and Brn_M_006_3 samples were removed, as they clustered to uterus samples instead of spleen and brain, respectively. Ultimately, a miRNA expression matrix with 604 miRNAs and 318 samples was obtained for further analysis.

Identification of novel miRNA candidates

First, reads from all 320 samples were pooled for identification of novel miRNA candidates. Using the mapper module provided within miRDeep2.0.1.3, raw reads of the merged samples were subjected to a series of stringent filters (discarding low-quality reads and reads with fewer than 18 nt after clipping the 3′ adapter), and the remaining sequences were then mapped to the rat genome reference (Rn7). Next, the mapped reads were submitted to the miRDeep2 module to detect novel miRNAs with default parameters. Novel miRNAs that have passed the stringent filters (miRDeep2 score ≥10, significant Ranfold P-value and no rfam alert of the possibility of being a rRNA) were selected as the potential novel miRNA candidates.

To validate the organ-specificity of the novel miRNA candidates, we quantified the novel miRNA expressions across 320 samples and performed differential expression analysis between any two organs at each developmental stage. A miRNA is considered ‘organ-specific’ if it is over-expressed by at least 1.5 fold (fold-change > 1.5 and adjusted P-value < 0.05) in one organ over all other organs and across all four developmental stages. Finally, the 12 organ-specific sequences were reported as novel miRNA candidates in Online-only Table 1.

Data Records

The miRNA-seq dataset generated in study is available in the NCBI Gene Expression Omnibus (GEO) with series accession number GSE17226936. This accession contains both the raw sequence data files (fq.gz format) and the processed data files (raw counts of mapped sequencing reads) used in this report. All data can be used without restrictions.

Technical Validation

Quality of RNA samples

The quality of RNA samples was assessed with an Agilent Bioanalyzer. Most samples showed good RNA Integrity Number (RIN 9.08 ± 1.08, mean ± sd), except for the eight week-2 spleen samples with RIN bellow 5.2. The RNA integrity numbers of all the 320 samples were listed in Online-only Table 2.

Quality of libraries

The quality of miRNA sequencing data for each sample was checked (Fig. 2). Briefly, a total of 1.6 billion raw reads was obtained, with 5.1 million reads per sample on average, which is sufficient for miRNA quantification. After quality filtering and length filtering, 4.5 ± 2.8 million reads of high quality were retained per sample on average (Fig. 2a). The proportions of various transcript types of these reads were demonstrated in Fig. 2b. The majority of libraries were contributed by miRNA and piRNA sequences, followed by tRNA, rRNA, mRNA and other sequences from the rat transcriptome and genome. Most libraries were dominated by miRNA reads (75.6 ± 8.5%), except for the testis samples from the adolescent and adult rats, which were dominated by piRNA (73.8 ± 0.8%). However, the proportion of miRNA for the Brn_F_104_4 sample was 14.4%, much lower than the average proportion of brain samples.

Fig. 2
figure 2

Quality of sequencing reads. (a) The overall height of each bar represents the number of total reads sequenced from each RNA sample, and the green part and orange part correspond to the reads which either passed or failed the quality filter, respectively. High-quality reads entered the following mapping workflow. (b) Percentages of the reads that mapped to miRNA, piRNA, tRNA, rRNA, miscRNA, mRNA, other RNA, as well as the reads that mapped to none of the aforementioned transcript types but mapped to the rat genome, are represented by different colours. Abbreviation for organs, Adr, adrenal; Brn, brain; Hrt, heart; Kdn, kidney; Lng, lung; Lvr, liver; Msc, skeletal muscle; Spl, spleen; Thm, thymus; Tst, testis; and Utr, uterus.

Reproducibility of biological replicates

To assess the reproducibility of biological replicates, we calculated the Pearson’s correlation coefficients between any two of the 320 samples (Fig. 3a–c). The Pearson’s correlation coefficients between miRNA expressions of biological replicates from the same group (0.98 ± 0.01, mean ± sd) were much higher than those from different sample groups (0.8 2 ± 0.09) (Fig. 3a). This result indicated good consistency among biological replicates in which biological variations between animals and technical variations between sample processing were included. Moreover, this conclusion was further confirmed in principal component analysis, from which the majority of biological replicates could be grouped together (Fig. 4a–d).

Fig. 3
figure 3

Reproducibility of biological replicates. (a) Distributions of the pairwise Pearson correlation coefficients between the biological replicates (navy blue) and between samples from different groups (light blue). (b) Correlations between four biological replicates from each of two particular groups (adrenal gland and brain samples of female rats at week 6) were demonstrated as an example. (c) Pairwise Pearson correlation coefficients between miRNA expressions of the 318 samples. Abbreviation for organs, Adr, adrenal; Brn, brain; Hrt, heart; Kdn, kidney; Lng, lung; Lvr, liver; Msc, skeletal muscle; Spl, spleen; Thm, thymus; Tst, testis; and Utr, uterus.

Fig. 4
figure 4

Expression profiles of miRNAs are discriminating among different organs. miRNA expression profiles are depicted by the top three principal components (ad). Each point corresponds to one of the 318 samples. Samples from different organs are represented by different colours, and those of different ages are represented by different shapes. (e) Hierarchical clustering analysis (HCA) of expression profiles from 318 rat samples with 604 miRNAs. Heatmap (left) for normalized log2 (CPM) values. Right, normalized loadings of each miRNA for the top three PCs. Abbreviation for organs, Adr, adrenal; Brn, brain; Hrt, heart; Kdn, kidney; Lng, lung; Lvr, liver; Msc, skeletal muscle; Spl, spleen; Thm, thymus; Tst, testis; and Utr, uterus.

Validation of organ specific signature

To obtain an overview of miRNA expression profiles, we performed principal component analysis and hierarchical clustering (Fig. 4a–e). The miRNA profiles were discriminating among different organs (Fig. 4e). To identify the miRNAs that contribute the most to organ-specific expression, we set out to identify ‘organ-enriched’ miRNAs. A miRNA is considered ‘organ-enriched’ if it is over-expressed by at least four folds in one organ over any other organs and across all four developmental stages, a quite stringent criterion. Consequently, 67 organ-enriched miRNAs were identified, including a novel miRNA candidate. The z-scaled expression levels of these miRNAs were shown in Fig. 5a.

Fig. 5
figure 5

Validation of organ-specific miRNA signatures in a microarray dataset. (a) Expression profiles of 67 organ-enriched miRNAs across 318 samples are arranged by organ type. Expression data were Z-score scaled per gene across all 318 samples. Each row represents a miRNA that is enriched in an organ. (b) Expression profiles of 31 organ-enriched miRNAs across 23 tissues in the microarray miRNA expression dataset of Minami et al.15 are arranged by biological system. Each row represents a miRNA that is enriched in an organ. Each column represents a sample profiled in the microarray dataset. Abbreviation for organs, Adr, adrenal; Brn, brain; Hrt, heart; Kdn, kidney; Lng, lung; Lvr, liver; Msc, skeletal muscle; Spl, spleen; Thm, thymus; Tst, testis; and Utr, uterus.

To independently validate the organ-enriched miRNAs discovered in this study, the literature-reported miRNA profiles of 55 different organs and tissues from normal male rats based on Agilent miRNA microarrays were used15. Thirty-one (31) organ-enriched miRNAs identified in our study, including brain-enriched, testis-enriched, liver-enriched, and spleen-enriched and heart-enriched miRNAs, were also profiled in the miRNA microarray dataset and displayed a pattern of high-level expression in biological systems with functions similar to those of the organs examined in this study (Fig. 5b). This result confirmed the reliability of the organ specific signature presented in our dataset.

Reliability of miRNA-mRNA integration

To examine the reliability of miRNA-mRNA integration using the miRNA dataset presented here and the previously published mRNA dataset, we compared the source of variability in these two datasets. Principal variance component analysis was performed on miRNA and mRNA expression profiles respectively (Fig. 6a). As expected, the ‘organ’ factor accounted for the most total variance in both datasets, with 61.6% for miRNA and 64.2% for mRNA profiles. The variance attributable to each factor was similar in the mRNA and miRNA datasets, except for age. The overall expression difference due to age was significantly higher for miRNA (11.7%) than that for mRNA (1.7%) expression. However, there was a stronger interaction between age and organ type for miRNA (15.7%) compared to that for mRNA (2.5%) expression. It appeared that there is a more dynamic miRNA expression pattern across the four developmental stages independent of organ types, whereas the temporal expression of mRNA appeared to be more organ dependent than that of miRNA expression. These results highlight the concordance of the two datasets.

Fig. 6
figure 6

Reliability of miRNA-mRNA integration (a) Principal variance component analysis (PVCA) shows the relative contribution of main effects (organ, age, sex, and replicate) and interactions (:) to total model variance in miRNA (blue) and gene (red) expression. Effects in the Y-axis are ordered by proportion of overall variations explained in the miRNA dataset. (b) Pairwise correlations between all 10,418,396 miRNA-gene pairs (604 miRNAs versus 17,249 genes), 20 intragenic miRNAs on opposite strand of host genes, 147 intragenic miRNAs on the same strand of host genes.

Meanwhile, the genomic locus of miRNA genes was used to test the reliability of the association between these miRNA and mRNA datasets. We calculated the correlations between expression levels of miRNAs and those of genes (miRNA-gene pairs). The distributions of correlations between several types of miRNA-gene pairs were demonstrated in Fig. 6b. The correlations between all miRNA-gene pairs were weak (0.01 ± 0.28, median ± sd) as expected. Importantly, the co-transcription events of miRNA and host genes were observed in these datasets, as the expressions of intragenic miRNAs were highly positively correlated with those of their host genes on the same strand (0.60 ± 0.29, N = 179). In addition, the slightly negative correlations (−0.17 ± 0.35) between these miRNAs-host genes (opp) pairs confirmed that co-expression events are rare when intragenic miRNAs and the host genes are located on the opposite strands. In summary, the miRNA dataset along with the mRNA datasets provided a reliable and unique resource for integration.