Analysis of miRNAs and their target genes in five Melilotus albus NILs with different coumarin content

MicroRNAs (miRNAs) exhibit diverse and important roles in regulation of various biological processes at the post-transcriptional level in plants. In this study, Melilotus albus miRNA and their target genes were elucidated from five M. albus near-isogenic lines which differ in coumarin content to construct small RNA libraries through high-throughput sequencing. A total of 417 known miRNAs and 76 novel miRNAs were identified in M. albus. In addition, 4155 different target genes for 114 known miRNA families and 14 target genes for 2 novel miRNAs were identified in M. albus. Moreover, mtr-miR5248 and mtr-miR7701-5p target c35498_g3 and gma-miR396a-3p target c37211_g1 involved in coumarin biosynthesis were identified by using the differential expression of the miRNAs and their target genes correlation analysis. The abundance of miRNAs and potential target genes were validated by qRT-PCR analysis. We also found that there were both positive and negative expression changing patterns between miRNAs and their related target genes. Our first and preliminary study of miRNAs will contribute to our understanding of the functions and molecular regulatory mechanisms of miRNAs and their target genes, and provide information on regulating the complex coumarin pathway in M. albus for future research.


Results
Analysis of small RNAs in M. albus. A total of 23.8, 24.7, 23.5, 24.4 and 24.2 million reads were generated from N46, N47, N48, N49 and RPh, the five genotypes of M. albus, respectively. The Q20 values were greater than 98.81% and the GC percentages were 48.92%, 48.66%, 48.94%, 48.86% and 48.62% for the five genotypes (Additional file 1: Table S1). The sequencing data are available at the NCBI number Bioproject with accession number PRJNA356361 with biosample accession number SRS1842864, SRS1842865, SRS1842878, SRS1842884 and SRS1842885 for N46, N47, N48, N49 and RPh, respectively. After filtering out the reads with N%> 10%, low quality, 5′ adapter contaminants, 3′ adapter null or insert null and poly A/T/G/C, a total of 23.3, 24.2, 23.1, 23.9 and 23.7 million high quality clean reads were obtained, respectively (Additional file 2: Table S2). The length of the small RNAs was between 18 nt and 30 nt, and the sequences of small RNAs with 21, 22, 23 and 24 nt had the highest abundance (Additional file 3: Fig. S1). The length distributions of small RNAs were similar for five genotypes of M. albus. The most abundant small RNAs were found in 24 nt, representing 44.26%, 44.36%, 40.78%, 42.95% and 43.19% of the small RNAs in the five genotypes.
Identification of known miRNAs. A total of 417 known miRNAs were identified from five M. albus genotypes based on all unique plant known miRNAs libraries ( Table 2). These known miRNAs belonged to 114 miRNA families and they were found that the distributions were similar between the five genotypes. The analysis of transcripts per million (TPM) value for the known miRNA indicated that the expression frequency varied significantly from 0 to 298447. For example, TPM values of various members in the miR156 family were tremendously different from each other, ranging from 0 to 19676. Several families such as miR156, miR159, miR166, miR171, miR172 and miR396, were relatively abundant, whereas some families were not. The largest family was miR156 with 33 members (Additional file 4: Table S3). The base bias on the first nucleotide of miRNA and the miRNA nucleotide bias at each position are shown in Additional file 5: Fig. S2. The result revealed that these miRNAs started with a 5′-U, which is consistent with typical miRNA sequence patterns based on miRBase 19.0 (http://www.mirbase.org/).

Identification of novel miRNAs.
In total, 76 novel miRNA were identified among the five genotypes in M. albus. Among these novel miRNAs, 72, 75, 75, 72 and 75 were expressed in N46, N47, N48, N49 and RPh, respectively (Table 2). Similarly to the known miRNAs, the expression of the novel miRNAs also varied largely between the five libraries. Most of the novel predicted miRNAs had relatively highly abundances, and they were mostly observed in five genotypes. However, novel_146 was only observed in RPh (Additional file 6: Table S4). The novel miRNA sequences ranged from 18 to 24 nt in length, and 21 nt was the most frequent length. Differential expression analysis of miRNAs. The differential miRNAs were significantly expressed with more than one log 2 fold change and qvalue lower than 0.01 which were in different levels and comparison groups. In different coumarin expressions with a low β-glucosidase activity level between N48 and N46, 47 miRNAs were differentially expressed in these two genotypes. Among these differentially expressed miRNAs, 17 miRNAs were up-regulated and 30 miRNAs were down-regulated. A total of 52 differentially expressed miR-NAs were identified between N49 and N47 in a high level of β-glucosidase activity. Of these miRNAs, 19 were up-regulated and 33 were down-regulated. Fourteen miRNAs (5 up-regulated and 9 down-regulated) and 25 miRNAs (11 up-regulated and 14 down-regulated) were significantly differentially expressed between N47 and N46 and between N49 and N48, respectively (Fig. 1A). We also found that 24 miRNAs overlapped between N48 vs N46 and N49 vs N47 (Fig. 1B). More detailed information on the different miRNAs is shown in Additional file 10: Table S7. We found that M. albus represented the same coumarin level with a similar miRNAs expression pattern, demonstrating that genotypes with the same coumarin level clustered together according to their miRNA expression (Additional file 11: Fig. S4).
To identify the results of the miRNA sequencing and bioinformatics analysis, totally 28 known miRNAs with different expression profiles were selected randomly, and their expression profiles were used to verify the sequencing results by quantitative real-time polymerase chain reaction (qRT-PCR) analysis. The qRT-PCR expression pattern and TPM value of these miRNAs are shown in Fig. 2 and Table S8. The results showed that the majority of relative expression obtained by qRT-PCR was consistent with the sequencing results because the expression trends are similar. This indicated that the miRNA sequencing data here were reliable. For example, ppe-miR398b, mtr-miR5290, tae-miR395b, osa-miR398b, ath-miR396b-3p, mtr-miR5559-5p, ath-miR395a and mtr-miR395a showed a similar expression profile between sequencing and qRT-PCR. However, some miRNAs showed that the  qRT-PCR results were inconsistent with the sequencing results, i.e., stu-miR156f-5p, pta-miR319, ath-miR159c, ppt-miR319a, ahy-miR156a, and lus-miR159b.
Prediction and annotation of miRNA target genes. In order to gain insight into the functions of the identified M. albus miRNAs, the target genes of these miRNAs were predicted using the psRobot_tar program. The 114 known miRNA families had 4155 affiliated target genes and 2 of 76 novel miRNAs had 14 target genes. Moreover, some miRNAs targeted a single gene, whereas the other miRNAs targeted multiple genes. To better investigate the functions of miRNAs, the target genes were analysed with functional analysis, GO annotations and the KEGG pathway.
The results of GO analysis demonstrated that the target genes of the miRNAs could be enriched into 3128 GO terms (Additional file 12: Table S9). Among them, 20 biological process categories, 2 cell components categories and 1 molecular function category were significantly enriched (Fig. 3). Under the biological process, biological regulation, regulation of biological process, regulation of cellular process and others were significantly enriched. For the cellular component category, the nucleus was the most highly represented group. In the molecular function, ATP binding and DNA binding were significantly enriched. From the KEGG analysis, one hundred and seven pathways were found (Additional file 13: Table S10), and the top 20 enrichment pathways are shown in Fig. 4. Circadian rhythm, plant hormone signal transduction, starch and sucrose metabolism, isoflavonoid biosynthesis and nicotinate and nicotinamide metabolism were the most frequently represented pathways.
Furthermore, qRT-PCR was performed on 32 differentially expressed target genes randomly selected from the expression profile data to validate the assembly and annotation of the RNA-Seq data. The qRT-PCR expression patterns and FRKM values of these target genes are shown in Additional file 14: Fig. S5. The results showed that the majority of the expression levels of these selected genes obtained by qRT-PCR were consistent with the sequencing results.

Correlation analysis between miRNAs and target genes.
To better understand the possible roles of the miRNAs and their target genes, we analyzed the relationship between differentially expressed miRNAs and the expression of their target genes differentially combined with the results of the transcriptome sequencing. It was observed that 53 miRNA-target pairs showed correlations which contained 35 miRNAs and 30 target  (Table 3). And these miRNA-target pairs had a number of functions, such as multidrug resistance protein, regulation of transcription, photosystem II stability, assembly factor, DNA binding, high-affinity nitrate transporter and other proteins were involved in various biological processes from the functional annotations. In a different comparison group, various numbers of correlated miRNA-target pairs were found. In the N48 vs N46 group, 8 miRNA-target pairs showed a negative regulation pattern of 24 miRNA-target pairs. Meanwhile, 22 miRNA-target pairs had the correlation expression while 5 miRNA-target pairs were negative expressed between N49 and N47 ( Fig. 5 and Table S11). We further validated the expression of the identified miRNAs and investigated the expression of their target genes using qRT-PCR. As shown in Fig. 6, 36 miRNAs and their targets were randomly selected to analyse the expression patterns in M. albus. Comparing N48 and N46, for example, mtr-miR395a and tae-miR395b with their target gene c36499_g2 presented negative correlation, whereas Identification of target genes involved in the coumarin pathway. From the functional annotations of 4169 miRNA target genes, two shikimate O-hydroxycinnamoyltransferase (HCT) genes were found to be associated with coumarin synthesis pathways, including gene c35498_g3 for mtr-miR5248 and mtr-miR7701-5p and gene c37211_g1 for gma-miR396a-3p. Largely varied expression was observed for these miRNAs and their target genes in the five genotypes (Fig. 7). As an example, mtr-miR7701-5p showed higher expression in N48 and N49    than in N46 and N47. Additionally, miR7701-5p with its target gene c35498_g3 presented a positive expression in N48, whereas in N49 it showed the negative expression.

Discussion and Conclusion
miRNAs play important roles in the regulation of plant growth and development, biotic and abiotic stress responses and other biological processes. Various miRNAs in different plants have been discovered and characterized, and revealed the molecular mechanisms in regulating miRNAs 20 . Although miRNAs have been reported in many plants, such as Arabidopsis thaliana 21 , maize 22 , wheat 23 , and other plants 24,25 , no study has been performed on M. albus, and the miRNAs and their target genes of M. albus remain unknown. In this study, we studied the regulation of M. albus miRNAs and function of their target genes using a high-throughput sequencing analysis. The accuracy of the sequencing and the expression of the miRNAs and the miRNA target genes were analysed through qRT-PCR. The small RNA length distribution of M. albus indicated that the 24 nt small RNAs were the highest abundance, representing 44.26%, 44.36%, 40.78%, 42.95% and 43.19% in the five genotypes (Additional file 3: Fig. S1). Similar results have been found in the previous studies with other plant species, such as Medicago sativa L. 26 , Vigna mungo 27 and foxtail millet 28 . Therefore, as in other plants, the 24 nt small RNAs may also be involved in critical functions in M. albus. Small RNAs can be annotated into different categories, including rRNA, tRNA, snRNA, snoRNA, repeat associated sRNA, TAS, and sRNAs that could not be annotated. In previous studies, the majority of them were mapped to other unannotated and not annotated RNAs in the sRNA libraries 29,30 , and the results of our present study were similar with very few known and novel miRNAs. This suggests that many miRNAs currently have not been recognized. The expression level of various miRNAs was significantly different which had been reported previously 31,32 . Similarly to the previous study, the expression level of identified 417 known miRNAs and 76 novel miRNAs also showed a large variation in our current study. In this study, the number of miRNA family members varied greatly, ranging from 1 to 35 (miR156), similar to the previous studies 14, 33 . In addition, most of the predicted target genes annotated in consistent with the biological functions of the genes in foxtail millet 33 , Arabidopsis thaliana 34 , and wheat 35 . In short, our results suggests that the high-throughput sequencing analysis for M. albus miRNAs and their target genes was reliable mainly due to the study for the length distribution of small RNA, the function, expression and regulation of miRNAs and their target genes in M. albus were consistent with previous researches.
Furthermore, to gain insight into the regulatory function of miRNAs, a large number (4169) of target genes were identified and annotated by mapping to the GO and KEGG databases. In this study, the prediction of the target gene showed that miRNAs can target genes related to plant growth and development, including circadian rhythm, starch and sucrose metabolism, and plant hormone signal transduction. Previous studies had shown that miRNA families in different plants perform analogous regulatory functions 36 . More study of the miRNAs and their target genes will help further understanding the regulatory mechanism and function of M. albus miRNAs. The incomplete mRNA database may limit the comprehensive identification of target genes as well. Consequently, further construction of small RNA and libraries from different tissues and developmental stages should provide more insight into the network between miRNAs and their target genes in M. albus.
Investigations of coumarin biosynthesis were conducted during the 1960s and 1970s with the help of tracer-feeding experiments 37 . Several enzymes related to this biosynthesis have been identified, such as trans-cinnamate, trans-2-coumarate, trans-2-coumarate-β-D-glucoside and cis-2-coumarate 38 . However, to date, there is a general lack of gene information for the enzymes involved in the coumarin biosynthesis pathway. Recently, several branch pathways and enzymes catalysing coumarin-formation reactions in other plant species have been identified with the help of modern synthesis and molecular techniques [39][40][41] . Phenylalanine ammonia-lyase (PAL), which converts phenylalanine to cinnamic acid, is the first enzyme in the coumarin biosynthesis pathway. Trans-cinnamate 4-monooxygenase (C4H) then adds a hydroxyl group to produce 4-coumarate acid, and CoA is linked by 4-coumarate-CoA ligase (4CL) 42 . HCT belongs to the large family of BAHD-like acyltransferases 43 , a key enzyme in the phenylpropanoid and lignin biosynthesis pathway. A study on Arabidopsis demonstrated that HCT gene silencing led to significant changes in lignin content 44 . The role of HCT in coumarin biosynthesis also has been reported 3 . In our study, two HCT genes were identified, and they showed differential expression in five different genotypes. The regulation of coumarin biosynthesis is complex. Major details remain unresolved, and many of the P450-dependent enzymatic steps are largely unknown. Considering the importance of coumarin, the functions of the miRNA and their target genes which will be involved in the coumarin biosynthesis pathway need further investigation. It is important to identify Melilotus species and genotypes with low coumarin levels.
More positive and mixed correlations were found in our analysis of the expression pattern of the selected miRNA-target pairs than we had expected, which was similar to previous studies on Chinese cabbage 45 and Salix matsudana 46 . Such a comparison of the different coumarin contents in M. albus to identify the differentially expressed miRNAs may lead to further understanding of the post-transcriptional regulation of coumarin biosynthesis and its regulatory network. Currently, the differentially expressed miRNAs regulated from different comparisons among five M. albus genotypes have not been implicated in the regulation of coumarin biosynthesis. For example, we compared the expression levels of miRNAs between N48 and N46 (different coumarin expression at a low β-glucosidase activity level) and classified a total of 15 differentially expressed miRNAs with down-and up-regulation in response to target genes. MiR156, miR159, miR319 and miR5261were found to be significantly abundant in N46. Thus, these miRNAs may play important roles in coumarin biosynthesis.
In conclusion, this is the first time high-throughput sequencing has been employed to identify miRNAs and their target genes from five M. albus NILs. We identified 417 known and 76 novel miRNAs in five genotypes of M. albus. The predicted 4196 target genes were performed the functions annotation using GO and KEGG. Additionally, two target HCT genes were computationally predicted to involve in the coumarin biosynthesis pathway with three miRNAs. This study will provide useful information for future research on the functions and molecular regulatory mechanisms of miRNA and their target genes in M. albus. However, the predicted miRNA and their targets need to be further evaluated with its reference genome in case of any false-positive predictions. Furthermore, these selected miRNAs should be functional identified to understand the regulatory roles in M. albus coumarin biosynthesis pathway.

Methods
Plant materials and coumarin content determination. The five genotypes of M. albus, near-isogenic lines, N46, N47, N48, N49 and the recurrent male parent of RPh, were obtained from an initial cross of cucubb biennial plants × CuCuBB plants of RPh. Here, Cu/cu and B/b are two pairs of alleles affecting coumarin content and β-glucosidase activity, respectively. Then, cucubb segregates, which differ in coumarin content and β-glucosidase activity, were underwent six successive backcrosses to the RPh 47 . The seeds of five genotypes were planted in 20 cm plots containing agricultural soil with a photoperiod of 16 h light at 26 °C and 8 h dark at 18 °C in a greenhouse at Lanzhou University, Gansu Province, China. The leaves (three individuals from each genotype plant) were collected at the flowering stage. These young leaves were immediately frozen in liquid nitrogen and then stored at −80 °C for further use.
Fresh leaves from three individuals of each genotype were collected for the determination of coumarin content. For the assay of coumarin content, each replicate of five genotypes was combined and ground in a mill to pass a 1 mm screen for coumarin determination. 0.1 g of ground material was extracted twice with 1 mL of 60% ethanol at 30 °C for 30 min. Coumarin was quantified by high-performance liquid chromatography (HPLC) using the mobile phase of methanol-water (65:35) through an Agilent-XDB C18 column 48 . For the measurement of β-glucosidase activity, an enzyme-linked immunoassay assay (ELISA) was performed with a β-glucosidase activity assay kit (MeilianBio Co., Ltd, Shanghai, China) following the manufacturer's instructions. As shown in Table 4, the coumarin content in N48 and N49 was significantly higher (p < 0.05) than that in N46 and N47. For the same level of coumarin, N47 and N49 had higher β-glucosidase activity than N46 and N48, respectively. RPh had a similar coumarin content and β-glucosidase activity to N49.
RNA isolation, small RNA and mRNA library construction and sequencing. Total RNA of six fresh leaves as a sample in each M. albus genotype was isolated using the RNAprep pure Plant RNA Purification Kit (Tiangen Biotech, Beijing, China). Each sample was constructed three libraries. RNA degradation and contamination was monitored on 1% agarose gels. And the total RNA quantity and purity were determined with the OD260/280 ratio and checked using the NanoPhotometer ® spectrophotometer (IMPLEN, CA, USA). RNA integrity was assessed using the RNA Nano 6000 Assay Kit of the Agilent Bioanalyzer 2100 system (Agilent Technologies, CA, USA). Small RNA and mRNA libraries were prepared using NEBNext ® Multiplex Small RNA Library Prep Set for Illumina ® (NEB, USA.) and NEBNext ® Ultra ™ RNA Library Prep Kit for Illumina ® (NEB, USA) following the manufacturer's recommendations, respectively, and index codes were added to attribute sequences to each sample. Then we performed single-end reads and paired-end reads on the Illumina Hiseq. 2500 platform from small RNA and mRNA libraries, respectively.
Identification of known and novel miRNA. After removing reads containing ploy-N, with 5′ adapter contaminants, without 3′ adapter or the insert tag, containing ploy A or T or G or C and low quality reads from raw data, the small RNA tags ranged from 18-30 nt were mapped to the reference sequence by Bowtie 49 with no mismatches using miRBase 20.0 as a reference. The known miRNA and the secondary structures were obtained through the software mirdeep2 50 and srna-tools-cli. The novel miRNA prediction was used the software miREvo 51 and mirdeep2 50 with the characteristics of hairpin structure of miRNA precursor. The parameter used to screen for "novel" miRNAs predicted using miRDeep2 were as follows: (a) Delete miRDeep2 score: <100; (b) The ratio of mature miRNA vs. miRNA*; and (c) we screened the predicted miRNAs strictly according to the hairpin structure, with only a 2-nt overhang, which is the hallmark of a bona fide miRNA. And the secondary structure, the Dicer cleavage site and the minimum free energy of the small RNAs were explored the prediction of novel miRNA. The sRNAs can be annotated into different categories, including rRNA, tRNA, snRNA, snoRNA, repeat associated sRNA, TAS, and sRNAs that could not be annotated.
Differential expression of miRNAs. The differential expression miRNAs analysis among the five libraries was performed using the DEGseq 52,53 R package. The P-value was adjusted to get qvalue 54 . miRNA with qvalue <0.01 and |log 2 (foldchange)|> 1 were considered as the significantly differentially expressed miRNAs. Prediction of potential miRNA target genes. The psRobot_tar in psRobot 55 was used for predicting the target gene of miRNA, and the same rules as previously reported 56,57 . The parameters to adjust include the following: (1) penalty score for the alignment between smRNAs and targets, which is defined by the formulas below; (2) the boundaries of essential sequence region, within which mismatches or gaps will receive double penalty scores than other regions; (3) the threshold for the total number of gaps within the smRNA and target alignment region; and (4) the region within which gaps are permitted. Degradome sequences mapped within the target sites will be analyzed and presented. To annotate the functions and pathways of the predicted target genes, the target genes of miRNAs were assigned to various GO based Wallenius non-central hyper-geometric distributions 58 and to the KEGG which used KOBAS 59 software to test the statistical enrichment.
Quantitative real-time PCR analysis. To analyse the miRNA expression pattern and the correlation between miRNAs and their target genes, miRNAs and their target genes with different expression patterns were selected for qRT-PCR. cDNA was synthesized from total RNA using the miRcute miRNA First-Strand cDNA Synthesis kit (Sangon, Shanghai, China) and Mir-X ™ miRNA qRT-PCR SYBR ® Kit (Takara, Dalian, China), according to the manufacturer's instructions. All qRT-PCR reactions were performed with three biological and three technical replicates for each sample in 96-well plates on an ABI 7500 Real-Time PCR System. The reactions were performed in a volume of 10 μL containing 1 μL of cDNA, 5 μL of SYBR Green PCR Master Mix (Applied Biosystems), 0.5 μL of each primer and 3 μL of double-distilled water. The primers used to amplify the miRNAs and target genes are listed in Additional file 15: Table S12. The following reaction conditions were used: denaturation for 10 min at 95 °C, 40 cycles of 95 °C for 15 s, and finally 60 °C for 1 min. A melting curve analysis with 95 °C for 15 s, 60 °C for 1 min, and 95 °C for 15 s was performed to produce a dissociation curve for verification of the amplification specificity. The relative expression of the selected miRNA and target genes were normalized using U6 and β-tubulin and analysed with the 2 −∆∆Ct method 60 .  Table 4. The information of coumarin content and β-glucosidase activity in five genotypes of M. albus.