Identification and characterization of miRNAs in two closely related C4 and C3 species of Cleome by high-throughput sequencing

Cleome gynandra and Cleome hassleriana, which are C4 and C3 plants, respectively, are two species of Cleome. The close genetic relationship between C. gynandra and C. hassleriana provides advantages for discovering the differences in leaf development and physiological processes between C3 and C4 plants. MicroRNAs (miRNAs) are a class of important regulators of various biological processes. In this study, we investigate the differences in the characteristics of miRNAs between C. gynandra and C. hassleriana using high-throughput sequencing technology. In total, 94 and 102 known miRNAs were identified in C. gynandra and C. hassleriana, respectively, of which 3 were specific for C. gynandra and 10 were specific for C. hassleriana. Ninety-one common miRNAs were identified in both species. In addition, 4 novel miRNAs were detected, including three in C. gynandra and three in C. hassleriana. Of these miRNAs, 67 were significantly differentially expressed between these two species and were involved in extensive biological processes, such as glycol-metabolism and photosynthesis. Our study not only provided resources for C. gynandra and C. hassleriana research but also provided useful clues for the understanding of the roles of miRNAs in the alterations of biological processes in leaf tissues during the evolution of the C4 pathway.

statures and short life cycles, are self-fertile and produce a large amount of seed 6 . These advantages make C. gynandra and C. hassleriana potential C 4 and C 3 models for identifying the differences in leaf development and physiological processes between C 3 and C 4 plants.
MicroRNAs are a class of 18-24 nucleotide (nt) small non-coding endogenous RNAs that are capable of regulating gene expression at the post-transcriptional level [9][10][11][12] . Research regarding A. thaliana, Oryza sativa and maize showed that plant miRNAs played important roles in regulating plant development 13,14 . For example, miR165 directs the cleavage of REV to regulate leaf morphogenesis 15 ; miR319 regulates leaf senescence by controlling the TCP transcription factors 10,16 ; and miR166 targets Leafbladeless1 (LBL1) to influence the specification of the adaxial/upper leaf surface 17 . The identification and characterization of miRNAs would provide helpful information for understanding the biological processes in plants.
Due to the merits of Cleome gynandra and Cleome hassleriana in studying the genetic basis of the differences in the cell biology and development between C 4 and C 3 plants, we used high-throughput sequencing technology to identify and characterize the miRNAs in the leaf tissues of these two species to gain some insights into the molecular changes that occur during the evolution of C 3 plants to C 4 plants. Thus, a number of known and novel miRNAs were detected in the two species, and the expression patterns of these miRNAs were profiled. By comparing our data with published data from Arabidopsis, the potential functions of these miRNAs were also investigated. Our study not only provided resources for studying C. gynandra and C. hassleriana but also provided useful clues for the understanding of the roles of the miRNAs in the differences in the biological processes in leaf tissues during the evolution of the C 4 pathway.

Results
Overview of small RNA sequencing data. To identify the miRNAs in C. gynandra and C. hassleriana, two small RNA libraries were constructed for the fully expanded leaves (approximately 5 cm in length) of these two species and then sequenced by an Illumina HiSeq 2500 analyzer. In total, 6,698,595 and 7,177,444 raw reads were obtained ( Table 1). The raw reads were further cleaned and trimmed by removing the low-quality reads and sequences that were either smaller than 18 nt or longer than 30 nt. Then, the clean reads were aligned with the Silva, GtRNAdb, Rfam and Repbase databases to filter the ribosomal RNA (rRNA), transfer RNA (tRNA), small nucleolar RNA (snoRNA) and the repeats, which in total, accounted for approximately 45.32% and 46.68% of the clean reads from the two libraries. Finally, a total of 1,761,554 and 1,687,208 clean unannotated reads were obtained for the two libraries (Table 1).
We investigated the length distribution of the small RNA reads ( Fig. 1a and b). The most abundant small RNA in C. gynandra and C. hassleriana was 21 nt in length, accounting for 55.78% and 60.58% of total sRNA reads, respectively. The second most abundant small RNA in C. gynandra was 20 nt, representing 10.91% of the total sRNA reads. In comparison, the second most abundant small RNA in C. hassleriana was 29 nt, representing 9.81% of total sRNA reads (Supplementary Table 1).
Identification of known miRNAs in C. gynandra and C. hassleriana. C. gynandra and C. hassleriana are members of Cleome, which is the most closely related genus to Arabidopsis thaliana 5,7 . Previous studies showed that C. gynandra, C. hassleriana and Arabidopsis thaliana had high sequence similarities 8 . We could use the well-annotated genome of Arabidopsis thaliana to identify the miRNAs and further investigate their expression in C. gynandra and C. hassleriana 18 .
The high-quality unannotated sRNA reads, which ranged from 18 to 30 nt, were aligned against the Arabidopsis thaliana genome. The mapped reads were further aligned with miRBase21.0 to identify the known miRNAs. In total, 94 and 102 known miRNAs were identified in C. gynandra and C. hassleriana, respectively, which were members of 32 plant conserved families (Table 2). Ninety-one known miRNAs were commonly identified in both C. gynandra and C. hassleriana (Table 2). Among these miRNA families, MIR156 contained the most abundant members, including nine members in C. hassleriana and eight members in C. gynandra. The second most abundant was the MIR166 family, which contained eight members in C. hassleriana and seven members in C. gynandra. Certain miRNA families were only detected in one species. For example, MIR397 and MIR828 were specific to C. hassleriana, while MIR161 and MIR8175 were specific to C. gynandra ( Supplementary Fig. 1).
Based on the number of sequencing reads, the miRNA families displayed a significantly varied abundance ( an average expression level of more than 1,000 TPM in both species. In contrast, certain miRNA families, such as MIR161, MIR170, MIR171, MIR395, MIR397 and MIR399, were observed to have a low expression level ( Table 2).

Identification of novel miRNAs in C. gynandra and C. hassleriana.
To investigate the novel miR-NAs in C. gynandra and C. hassleriana, the unannotated sRNA sequences were used to predict novel miRNA by exploring the hairpin structures based on comparisons with the genome sequence of Arabidopsis thaliana, the Dicer cleavage site and the minimum free energy using miREvo and mirdeep2 software. Finally, only four novel miRNA candidates were identified and temporarily named in the format of Cleome-novel-miR-number (Table 3). Cleome_novel_miR1 and Cleome_novel_miR2 were detected in both C. gynandra and C. hassleriana. Cleome_novel_miR3 was specific to C. gynandra, and Cleome_novel_miR4 was specific to C. hassleriana. These novel miRNAs showed a relatively low expression level in both species (Table 3).
Differentially expressed miRNAs between C. gynandra and C. hassleriana. In total, sixty-seven differentially expressed miRNAs were identified between C. gynandra and C. hassleriana, 21 of which were down-regulated and 46 were up-regulated in C. gynandra compared with those in C. hassleriana ( Fig. 2a and b). Among these miRNAs, some showed drastically changed expression patterns. For example, the expression levels of two members of the miR165 family, Cleome-miR165a-3p and Cleome-miR165b, were both reduced nearly ten-fold in C. gynandra compared with those in C. hassleriana (Supplementary Table 2); Cleome-miR159c had a greater than ten-fold higher expression level in C. gynandra than that in C. hassleriana (Supplementary Table 2). Certain miRNAs were expressed in a species-specific manner. For example, Cleome_novel_miR3, Cleome-miR8175 and Cleome-miR169c were only detected in C. gynandra, while Cleome-miR169a-3p, Cleome-miR397a, Cleome_novel_miR4, Cleome-miR398b-3p and Cleome-miR398c-3p were only expressed in C. hassleriana ( Fig. 2b and Supplementary Table 2). Some miRNAs that belong to the same miRNA family exhibited different expression patterns. For example, three members of the MIR395 family, Cleome-miR395a, Cleome-miR395d, and Cleome-miR395e, were up-regulated in C. gynandra, while three other members of this family, Cleome-miR395b, Cleome-miR395c and Cleome-miR395f, were down-regulated (Supplementary Table 2).
We identified 194 putative target genes for the differentially expressed miRNAs that were identified in our study (Supplementary Table 3). Külahoglu et al. (2014) had investigated the dynamic changes in the gene expression levels during leaf development between C. gynandra and C. hassleriana by high-throughput sequencing 19 . We analyzed these publicly available RNA-seq datasets to explore the expression levels of our putative target genes. In total, 51 of the 194 genes were found to be differentially expressed between C. gynandra and C. hassleriana, 38 of which had opposite miRNAs expression patterns between these two species (Supplementary Table 4). These 38 target genes were found to be the targets of 8 conserved and 1 putative novel miRNA families, including Cleome-miR156a-5p, Cleome-miR160a-5p, Cleome-miR164a, Cleome-miR171b-3p, Cleome-miR319c, Cleome-miR394a, Cleome-miR395a, Cleome-miR396a-5p and Cleome_novel_miR1 (Supplementary Table 4).

Function analysis of the identified miRNAs.
To investigate the roles of the miRNAs in C. gynandra and C. hassleriana, the putative target homolog genes of these miRNAs in Arabidopsis thaliana were predicted and analyzed using miRNA family assignment. Two hundred twenty-eight putative target genes were predicted for 68 miRNAs in C. gynandra. In addition, two hundred twenty-three target genes were predicted for 72 miRNAs in C. hassleriana (Table 4). Of the known miRNAs, the MIR156 family could target the most abundant genes, which has 15 putative target genes. The second most abundant family was the MIR161 family, which could match 13 putative target genes. Of the novel miRNAs, only Cleome_novel_miR1 and Cleome_novel_miR3 were predicted to have target genes. Notably, the novel miRNA, Cleome_novel_miR1, could match 129 putative target genes (Supplementary Table 3). No target genes were found for the remaining 29 miRNAs in C. gynandra and 33 miRNAs in C. hassleriana.  GO categories and KEGG pathway analyses were performed for the differentially expressed miRNAs between C. gynandra and C. hassleriana based on the annotations of their putative target homolog genes in Arabidopsis thaliana. For the biological processes, the enriched GO terms included cellular process, single-organism process, metabolic process, biological regulation, response to stimulus, etc. For the molecular function, these target genes were mainly enriched in binding, catalytic activity, and nucleic acid binding transcription factor activity. For the cell component category, the target genes were mainly involved in cell, cell part and organelle part (Fig. 3a). The KEGG pathway analysis showed that these target genes were enriched in twenty pathways, including pathways for     plant hormone signal transduction, ribosome, starch and sucrose metabolism, nitrogen metabolism, porphyrin and chlorophyll metabolism, and sulfur metabolism (Fig. 3b).

Quantitative real-time PCR of miRNAs.
To confirm the expression patterns of the differentially expressed miRNAs that were obtained from the sequencing analysis, we confirmed the expression levels of fourteen miRNAs using quantitative real-time PCR (qRT-PCR) and compared the differences in their expression levels between C. gynandra and C. hassleriana (Fig. 4). The results showed that 13 of the14 miRNAs had expression patterns that were similar to those of the sequencing data, which indicated that our sequencing data were reliable.

Discussion
In this study, we used two closely related C 4 and C 3 species, C. gynandra and C. hassleriana, to explore the genetic difference between C 4 and C 3 plants using high-throughput sequencing technology. Two small RNA libraries were constructed and sequenced. The obtained small RNA sequences were between 18-29 nt in length. Generally, the 21 nt small RNAs represent typical, mature miRNAs in plants. The most abundant small RNAs in C. gynandra and C. hassleriana were 21 nt in length, accounting for 55.78% and 60.58% of the total small RNAs, respectively. This result was consistent with that in the maize leaf 20 . The second most abundant sRNA in C. gynandra was 20 nt in length, while the second most abundant sRNA in C. hassleriana was 29 nt in length (Table S2). In some plants,  30 . Here, in our study, 94 known and 3 novel miRNAs were identified in C. gynandra, and 102 known and 3 novel miRNAs were identified in C. hassleriana. Ninety-one known and 2 novel miRNAs were commonly detected in both species. Our study is the first to report the identification of miRNAs in C. gynandra and C. hassleriana, which are valuable additions to the plant miRNA kingdom.
Some miRNAs in our study showed species-specific expression patterns. For example, Cleome-miR161.2 was present only in C. gynandra, and Cleome-miR397a was present only in C. hassleriana (Table 2). MiR161 has two forms, miR161.1 and miR161.2, which target PPR-P and regulate cytoplasmic male sterility [31][32][33] . OsmiR397 targets the laccase-like gene OsLAC and affects grain size and panicle branching 34 . The roles of these species-specific miRNAs in the development and function of leaf tissues in C. gynandra and C. hassleriana need to be investigated in the future.
In this study, most of the miRNAs, approximately 61.5%, were differentially expressed between C. gynandra and C. hassleriana. Of these miRNAs, some were reported to participate in the cell biology and development process of the leaf. For example, the MIR165 family directly cleaves the adaxial identity genes REV (AT5G60690) and PHB (AT2G34710) and was involved in the process of leaf morphogenesis 15,35 . In our study, the expression levels of two members of the MIR165 family, Cleome-miR165a-3p and Cleome-miR165b, were both reduced nearly ten-fold in C. gynandra (Supplementary Table 2). This expression pattern might be associated with the specified morphogenesis of the C 4 leaf in C. gynandra. miR164 mediates the cleavage of NAC1 (AT1G56010) and NAC2 (AT5G39610), which were involved in the processes of lateral root emergence and aging-induced cell death and leaf senescence, respectively 36,37 . The over-expression of ath-miR164a repressed the EIN3-induced early senescence phenotypes in Arabidopsis thaliana 38 . Whether miR164 still functions as an aging regulator or has a new function in the C 4 pathway needs to be further investigated. We also found that some miRNAs that belong to the same miRNA family have different expression patterns between C. gynandra and C. hassleriana. For example, the Cleome-miR395a/d/e was significantly up-regulated, while Cleome-miR395 b/c/f was significantly down-regulated in C. gynandra compared with those in C. hassleriana (Supplementary Table 2). The target genes of miR395 include GUN5 (AT5G13630), APS1 (AT3G22890) and AST68 (AT5G10180), which were reported to be involved in chlorophyll synthesis, the abscisic acid (ABA) pathway, and the sulfate metabolism pathway [38][39][40] . The diverse expression patterns of miRNAs suggested an orchestrated temporal and spatial regulation of the gene expression levels in various biological processes between these two species at the post-transcriptional level. One hundred ninety-four putative target genes were predicted for the identified miRNAs in our study (Supplementary Table 3). Among them, 38 were confirmed to have expression patterns that were opposite to those of the miRNAs in the publicly available data 19 , which indicated that these 38 genes might be regulated by miRNAs (Supplementary Table 4). We investigated the expression patterns of the putative target genes mentioned above and found that the target of miR164, NAC1 (AT1G56010), and the two targets of miR395, APS1 (AT3G22890) and AST68 (AT5G10180), were among the 38 genes that had expression patterns that were opposite to those of the miRNAs in the published data (Supplementary Table 4). However, the two targets of miR165, REV (AT5G60690) and PHB (AT2G34710), one target of miR164, NAC2 (AT5G39610), and one target of miR395, GUN5 (AT5G13630), had no such expression patterns. The miRNA-mediated gene regulation is complicated, which means that it might not cause the down-regulation of the target genes, e.g., translational inhibition 41 . In addition, different leaf samples and growth conditions were applied in the previous and our experiments. These differences might be the reasons why large numbers of target genes were not found to be differentially expressed or have opposite miRNAs expression patterns. To obtain more reliable results, small RNAome, transcriptome and degradome sequencings must be simultaneously conducted using the same leaf samples in the future.
The functions of differentially expressed miRNAs were investigated according to the annotation information for Arabidopsis (Table 4). However, there were still 29 miRNAs in C. gynandra that were not predicted to have target genes. These miRNAs might target species-specific genes, which could be validated in the future when the genome sequence and gene annotation of C. gynandra are available.
In this study, we explored the differences in the characteristics of miRNAs between C. gynandra and C. hassleriana using high-throughput sequencing technology. In addition to the identified known and novel miRNAs, the differentially expressed miRNAs and their functions were investigated in these two species. Because two different species were used, some of the observed differences in the miRNAs may be due to the differences in the genetic backgrounds rather than the differences between C 3 and C 4 photosynthesis. Future studies about the functions of these differentially expressed miRNAs will be helpful to clarify the question above and confirm the roles of these miRNAs in the process of photosynthesis. Our study not only provides a foundation for the further elucidation of miRNA function in C. gynandra and C. hassleriana but is also helpful for scientists to obtain some insights into the alteration in molecular pathways during the evolution of the C 3 pathways to C 4 pathways.

Methods
Plant material and RNA isolation. C. gynandra (also known as Gynandropsis gynandra) and C. hassleriana (also known as Tarenaya hassleriana) were grown in a standard potting mix in a 25 °C greenhouse under a 16:8 (light: dark) photoperiod. The fully expanded leaves (approximately 5 cm in length) were collected, immediately frozen in liquid nitrogen and stored at − 80 °C. The total RNA was isolated from the leaves using TRIzol reagent (Life Technology, CA, USA) according to the manufacturer's protocol. The total RNA concentration and purity were assayed with a NanoDrop ND-1000 spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA). The RNA integrity was assessed on an Agilent 2100 Bioanalyzer Lab-on-Chip system (Agilent Technologies, Palo Alto, CA, USA). Small RNA fractions of 10-40 nucleotides were isolated from the total RNA pool with a Novex 15% TBE-urea gel (Invitrogen, USA).
Library construction and high-throughput sequencing of small RNAs. The small RNA (sRNA) sequencing libraries were generated using the NEBNex Multiplex Small RNA Library Prep Set for Illumina (NEB, USA) following the manufacturer's instructions. The libraries were quantified on the Agilent Bioanalyzer 2100 (Agilent Technologies, Waldbroon, Germany) system using DNA High Sensitivity Chips and were pooled so that each index-tagged sample was present in equimolar amounts. The clustering of the pooled index-coded samples was performed on a cBot Cluster Generation System using TruSeq SR Cluster Kit v3-cBot-HS (Illumia, USA) according to the manufacturer's instructions. After the cluster generation, the libraries were sequenced on an Illumina HiSeq 2500 platform, and 50 bp single-end reads were generated. The dataset has been deposited in the National Center for Biotechnology Information (NCBI; accession number SRR5298259, SRR5298260). Data processing. The raw data (raw reads) were first processed through custom scripts. In this step, clean data (clean reads) were obtained by removing the reads that contained ploy-N, 3′ adapter and insert tags, and low-quality reads from the raw data. The remaining unique reads were mapped to the Arabidopsis thaliana genome sequence by Bowtie2 42 . Then, the mapped sRNA reads were aligned with the Silva, GtRNAdb, Rfam and Repbase databases by Bowtie2 to remove the reads that originated from repeat sequences, rRNA, tRNA, snRNA, and snoRNA. The remaining clean unannotated reads could be used to detect the known and novel miRNAs.
The mapped clean unannotated sRNA reads were further used to identify the known miRNA with alignment against miRBase21.0 (http://www.mirbase.org/), mirdeep2 43 and srna-tools-cli software. Matched sequences with no more than two mismatches were considered known miRNAs. The available software miREvo 44 and mirdeep2 were integrated to predict novel miRNA through exploring the hairpin structure, the Dicer cleavage site and the minimum free energy of the small RNA unannotated reads.
Expression and function of miRNAs. The MiRNA expression levels were estimated for each sample through the following steps: First, the sRNA sequences were mapped back onto the precursor sequence, the counts of the reads for each miRNA were obtained from the mapping results, and the TPM (transcripts per million) values of each miRNA were calculated 45 . The differential expression analysis of the miRNAs between the C. gynandra and C. hassleriana samples was performed using the DEseq 46 R package. The P-value was adjusted using the q-value 47 . FDR ≤ 0.05 and |log2(foldchange)| ≥ 1 were set as the threshold for significant differential expression by default.
The prediction of the putative target genes of the miRNAs was performed by psRobot_tar in psRobot 48 for plants based on a comparison with the annotations of Arabidopsis genome. A Gene Ontology (GO) enrichment analysis was used on the putative target genes of the miRNAs with the GOseq R packages based on the Wallenius non-central hyper-geometric distribution 49 . The KOBAS 50 software was used to test the statistical enrichment of the putative target genes of the miRNAs in the KEGG pathways (http://www.genome.jp/kegg/) 51 .
Differential gene expression profiling. The dataset of the publicly available RNA-seq data for the leaves of C. gynandra and C. hassleriana was downloaded from NCBI (accession number: SRP036637 and SRP036837) 19 . The differential expression analysis of genes between C. gynandra and C. hassleriana was performed using the DEGseq R package 52 . The P-value was adjusted using the q-value 47 . FDR ≤ 0.01 and |log2(foldchange)| ≥ 1 were set as the threshold for significant differential expression by default.
SCientifiC RepoRts | 7:46552 | DOI: 10.1038/srep46552 Quantitative real-time PCR. To confirm the predicted results, qRT-PCR for miRNAs detection was performed to examine their expression. Fourteen miRNAs were randomly selected for the qRT-PCR assays in the samples of C. gynandra and C. hassleriana by Platinum SYBR Green-based qPCR (Invitrogen, USA) with the High-Specificity miRNA QuantiMir RT Kit (RA610A-1, System Biosciences) on the ViiA ™ 7 Dx platform (ABI, USA). Amplified primers for all miRNAs were designed according to Varkonyi-Gasic et al. 53 . The miRNA-specific forward primers of the 14 selected miRNAs and the internal control U6 are listed in the supplementary table (Supplementary Table 5). The qRT-PCR procedure was as follows: 95 °C for 10 min, 40 cycles at 95 °C for 15 s and 60 °C for 30 s, and a final step at 95 °C for 15 s, 60 °C 1 min and 95 °C for 15 s. After the qRT-PCR amplification, the melting curve and amplification curve were examined to evaluate the specific amplification. The relative expression levels of the miRNAs were analyzed by the 2 −ΔΔct method, and U6 was used as the internal control. All qRT-PCR reactions were assayed in triplicates.