New insights into tomato microRNAs

Cultivated tomato, Solanum lycopersicum, is one of the most common fruits in the global food industry. Together with the wild tomato Solanum pennellii, it is widely used for developing better cultivars. MicroRNAs affect mRNA regulation, inhibiting its translation and/or promoting its degradation. Important proteins involved in these processes are ARGONAUTE and DICER. This study aimed to identify and characterize the genes involved in the miRNA processing pathway, miRNA molecules and target genes in both species. We validated the presence of pathway genes and miRNA in different NGS libraries and 6 miRNA families using quantitative RT-PCR. We identified 71 putative proteins in S. lycopersicum and 108 in S. pennellii likely involved in small RNAs processing. Of these, 29 and 32 participate in miRNA processing pathways, respectively. We identified 343 mature miRNAs, 226 pre-miRNAs in 87 families, including 192 miRNAs, which were not previously identified, belonging to 38 new families in S. lycopersicum. In S. pennellii, we found 388 mature miRNAs and 234 pre-miRNAs contained in 85 families. All miRNAs found in S. pennellii were unpublished, being identified for the first time in our study. Furthermore, we identified 2471 and 3462 different miRNA target in S. lycopersicum and S. pennellii, respectively.

In S. lycopersicum and S. pennellii, we identified 8 and 11 putative proteins, respectively, belonging to the family of DCL proteins involved in the miRNAs processing. We found DCL1 in S. lycopersicum and S. pennellii. Both proteins displayed the conserved ResIII (also known as DEAD-like), helicase C, Dicer dimer (also known as DUF283), PAZ, ribonuclease IIIa (RNase IIIa), ribonuclease IIIb (RNase IIIb), dsrm and DND1 DSRM domains (Supplementary Fig. S1 and Table S4).
To understand the diversification of AGO proteins in S. lycopersicum and S. pennellii, a phylogenetic analysis was performed using their amino acid sequences (Fig. 2). The phylogeny of the Sly/SpeAGO1 and Sly/SpeAGO10 proteins showed that they were distributed in the phylogenetic tree closest to their respective orthologous proteins from the plant species S. tuberosum and Nicotiniana sylvestris. The phylogenetic tree was divided into seven distinct clades of paralogous AGOs (AGO1, AGO2, AGO4, AGO5, AGO6, AGO7 and AGO10). In addition, to determine the evolutionary relationship among the putative DCL proteins of S. lycopersicum, S. pennellii and their orthologous species, a phylogenetic analysis was also conducted (Fig. 3). The proteins were grouped through global alignment in four distinct clades (DCL1, DCL2, DCL3 and DCL4).

Mature miRNAs and their precursors in S. lycopersicum and S. pennellii.
To show that the miRNA processing machinery is well-conserved in S. lycopersicum and S. pennellii, we sought to identify both mature and precursor sequences of miRNAs, in addition to predicting and defining the nature of their putative target genes using available public databases (Phytozome v12.1 and PlaBi). By applying an optimized and specific algorithm (see Materials  Table S5). In S. pennellii, 388 sequences (3p and 5p) of mature miRNAs were identified, out of which 307 were unique miRNA sequences, and 234 precursor miRNAs were contained in 85 different miRNA families (Supplementary Table S6). Regarding the miRNA precursor gene localization and distribution in S. lycopersicum and S. pennellii genomes, 196 (86.73%) and 195 (81.59%) were found in intergenic regions and 30 (13.27%) and 44 (18.41%) within genes, respectively, whereas in S. pennellii, 3 pre-miRNA genes are located in more than one position in the genome (Supplementary Table S7). The spe-miR169-1 is located on chromosomes 0 and 7, the spe-miR1919 is in two different positions on chromosome 8 and the spe-miR5368 is on chromosomes 6, 7 and in two different positions on chromosome 9. In addition, it was possible to identify that 72 miRNAs in S. lycopersicum and 73 in S. pennellii were organized in clusters (10 kb as the maximum distance between two miRNA genes to consider them clustered, see Materials and Methods). Of these, we identified 50 miRNAs in S. lycopersicum and 47 miRNAs in S. pennellii that were organized in clusters, but were located in antiparallel strands (Supplementary Table S7 Table S9).
In addition, we performed statistical analyses using thermodynamic and structural characteristics of pre-miRNAs found in both tomato species compared to pre-miRNAs deposited in the miRBase from Solanaceae, Fabaceae and Brassicaceae families. We compared the miRNA precursor characteristics from two tomato species (Supplementary Table S10) and Solanaceae family miRNA precursors (Supplementary Table S11), and no differences were observed among the characteristics analysed (p < 0.05). In the analysis performed among S. lycopersicum and S. pennellii against Fabaceae and Brassicaceae families, the pre-miRNA characteristics were significantly different (p < 0.05). Characterization of tomato miRNAs. Among the 172 miRNA families identified in the two species under study, 71 were found in both tomato species. Of all 101 different miRNA families identified in S. lycopersicum and S. pennellii, we highlighted 10 families for more in-depth characterization, since the MIR165/166, MIR167, MIR393,  MIR530, MIR827, MIR828 and MIR7983 families are miRNAs conserved in plants, and the MIR7990, MIR8011 and MIR8025 families were identified for the first time in S. lycopersicum and S. pennellii. For each of these families, we analysed the conservation of sequences and phylogenetic distributions, as well as their putative target mRNAs. We identified 8 precursor miRNAs in MIR165/166 family, 6 in MIR167 and MIR7983, 3 in MIR7990, 2 in MIR8011 and 1 in MIR393, MIR530, MIR827, MIR828 and MIR8025 families of S. lycopersicum (Supplementary  Table S5). In turn, in S. pennellii we identified 7 pre-miRNAs in MIR7983 family, 6 in MIR165/166, 5 in MIR167, 3 in MIR8011, 2 in MIR7990 and 1 precursor sequence in MIR8025, MIR393, MIR530, MIR827 and MIR828 (Supplementary Table S6). We also identified mature miRNAs in these families, including 12 in MIR7983, 10 in MIR167 and MIR165/166, 3 mature sequences in MIR7990 and MIR8011, 2 in MIR393 and MIR827, in addition to 1 in each family of MIR530, MIR828 and MIR8025 of S. lycopersicum. In S. pennellii, 14 mature miRNAs were found in MIR7983, 9 in MIR167, 10 in MIR165/166, 5 in MIR8011, 2 in MIR393, MIR828, MIR827, MIR7990, MIR8025 and 1 sequence in MIR530.
The analysis of the MIR167 family showed a phylogenetic tree distributed in five distinct clades. The miRNAs from S. lycopersicum and S. pennellii were grouped together with miRNAs from Solanaceae species (Supplementary Figure S11). The phylogenetic tree of MIR165/166 family displayed two distinct clades of Eudicot and Monocotyledon. The miRNA sequences from both tomato species were distributed within Solanaceae in Eudicot clade (Fig. 6). In the phylogenetic trees of Sly/SpeMIR393 ( Supplementary Fig. S12), Sly/SpeMIR530 ( Supplementary Fig. S13) and Sly/SpeMIR827 ( Supplementary Fig. S14), two clades of Eudicots and Monocotyledons were observed, as were miRNAs from tomato species within Solanaceae clade. The phylogenetic tree of Sly/SpeMIR828 family showed ten clades of different families of Eudicots ( Supplementary Fig. S15).
miRNA target genes of S. lycopersicum and S. pennellii. To understand the putative roles of miRNAs in tomato, we identified their target genes. We identified 2471 different miRNA target genes from a total of 4197 in S. lycopersicum for 343 mature miRNAs (Supplementary Table S12) and 3462 different targets from a total of 5768 in S. pennellii for 388 mature miRNAs (Supplementary Table S13).
Notably, the Sly/SpeMIR167 family miRNAs showed 247 different target transcripts, 37 of which belonged to SlyMIR167 and 210 to SpeMIR167 families. This major difference in target numbers for the same miRNA family in wild tomato compared with S. lycopersicum can be explained by the domestication and improvement of the cultivars, providing increasing similarities of miRNA-target pairs in cultivated plants compared to those in wild plants, similar to the trend observed in soybeans 92 . Among these targets are: putative zinc-finger in N-recognin (UBR box), retrotransposon gag proteins, magnesium transporter NIPA, annexin p34, BZIP transcription factor, cytochrome P450, RING/U-box superfamily protein, kinetochore protein and a proteins class related to the disease resistance (CC-NBS-LRR). In addition, we identified 74 distinct targets in the Sly/SpeMIR165/166 family, with 35 of the SlyMIR165/166 and 39 of the SpeMIR165/166 families. Their targets constitute the homeobox-leucine zipper family, especially those with the MEKHLA domain, acyl-CoA N-acyltransferases (NAT) superfamily protein and retrotransposon gag protein. Interestingly, the Sly/SpeMIR393 family miRNA targets included 23 distinct transcripts, with 12 of the SlyMIR393 and 11 of SpeMIR393 family. These target genes encode carbohydrate-binding protein of the ER, lipoxygenases and proteins containing the U-box and F-box domains, such as transport inhibitor response 1 (TIR1) and an auxin receptor involved in a hormone-depleting mechanism. The Sly/SpeMIR530 family displayed 31 distinct targets, with 14 of the SlyMIR530 and 17 of SpeMIR530 family. Among the targets were the DEAD/DEAH box helicase, ICP0-binding domain of ubiquitin-specific protease 7, ethylene-responsive transcription factor, adenylate isopentenyltransferase, proteins containing the AP2 domain, NAC domain protein, MULE transposase domain and tyrosine kinase type proteins. The Sly/SpeMIR827 family miRNAs had 50 different mRNAs targets, with 29 of the SlyMIR827 and 21 of SpeMIR827 family. These miRNAs comprise reverse transcriptase (RNA-dependent DNA polymerase), cytochrome P450, OTU-like cysteine protease, no apical meristem (NAM) protein, cell-wall invertase, glyoxylate reductase and SPX domain-containing family protein.
The search for the Sly/SpeMIR828 targets, revealed 31 different genes, with 16 target genes of the SlyMIR828 and 15 of SpeMIR828 family. Among them were MYB-type DNA-binding domain, syntaxin of plants 51, F9H3-4 protein, TCP transcription factor 1 and reverse transcriptase (RNA-dependent DNA polymerase). The tomato Sly/SpeMIR7983 family showed 89 target genes, with 75 of the SlyMIR7983 and 14 of SpeMIR7983 family. The targets contain transcripts encoding 2C phosphatases, serine carboxypeptidase, ribonuclease T2 family, glutathione S-transferase-like protein, the ENTH/ANTH/VHS superfamily protein, alpha/beta-hydrolase superfamily protein, UDP-glucuronate 4-epimerase 4 and Myb family transcription factor family protein. In addition, Sly/SpeMIR7990 miRNAs showed 15 different target transcripts, with 13 of the SlyMIR7990 and 2 of SpeMIR7990 family, such as GDSL esterase/lipase proteins, MADS-box transcription factor family protein, disease resistance protein (TIR-NBS-LRR class), WRKY DNA-binding protein 31, Nop53 protein, bHLH transcription factor GBOF-1 and the permease family. Interestingly, we identified 55 distinct target genes from miRNAs belonging to the Sly/SpeMIR8011 family, with 27 of the SlyMIR8011 and 28 of SpeMIR8011 family. Among them were alpha/beta-hydrolase family proteins, hAT family C-terminal dimerization region, reverse transcriptase (RNA-dependent DNA polymerase), PHD-finger transcription factor, potassium transporter, ripening regulated protein and ARM repeat superfamily protein. Finally, the Sly/SpeMIR8025 family had 11 miRNAs targets, with 5 of the SlyMIR8025 and 6 of SpeMIR8025 family, which encode auxin response factor 9A (ARF9A), kinase protein domain, FKBP-type peptidyl-prolyl cis-trans isomerase, NAD(P)-binding Rossmann-fold superfamily protein and telomerase activating protein Est1 (Supplementary Tables S12 and S13).
Expression of tomato miRNAs in S. lycopersicum and S. pennellii. We experimentally verified the expression of four miRNA genes (miR165/166, miR167, miR530 and miR7983) in two different tissues: leaves and flowers of each tomato species. In leaves, the miR165/166 expression level was 7.3 times higher in S. lycopersicum than in S. pennellii. The miR167 and miR530 were expressed at levels, respectively, 26.7 and 2.2 times higher in leaves of S. pennellii than in S. lycopersicum (Fig. 7a). In tomato flowers, miR167 also showed a 10.0 times higher expression in S. pennellii than in S. lycopersicum, and miR165/166, miR530 and miR7983 had no significant difference in the expression levels between the two tomato species (Fig. 7b). In addition, the miR165/166 expression level was 7.4 times higher in leaves than in flowers, whereas miR167, miR530 and miR7983 showed no difference in the two S. lycopersicum tissues (Fig. 7c). Finally, the miR167 and miR530 had, respectively, 16.0 and 2.1 times higher expression levels in leaves than in S. pennellii flowers, but miR165/166 and miR7983 showed no difference in the two S. pennellii tissues (Fig. 7d).
Transcriptomic changes in tomato leaves and flowers RNA-seq-based. To confirm the expression of the RNA-silencing pathway components, we searched the RNA-seq data of S. lycopersicum and S. pennellii publicly   analysed libraries, respectively. Only one mature miRNA of the MIR7983 and MIR8025 families was not found in any studied library.

Discussion
To better understand the biology, evolution and domestication and accelerate the agricultural applications of the miRNAs, the genes of their biogenesis pathway and their targets in tomato, we undertook the genomic study of these molecules in the cultivated tomato S. Lycopersicum and the wild tomato S. pennellii. Our results suggest that, in general, the miRNAs demonstrated similar evolutionary patterns between the two species studied. However, a larger number of molecules (miRNA processing pathways genes, their mature molecules and target genes) and higher levels of miRNA expression were found in the wild tomato than in the cultivated tomato. This finding showed a possible loss of genes along the evolution and domestication, which may be associated with higher biotic and abiotic resistance found in wild tomato, compared to cultivated tomato. We identified 29 and 32 putative proteins involved in the miRNA pathways in S. lycopersicum and S. pennellii, respectively (Supplementary Tables S1 and S2), which is in agreement with the description of the miRNA biogenesis pathway in other plant species, such as in A. thaliana, P. vulgaris and O. sativa 43,91,94,95 . The presence of these proteins in different studies has shown a high conservation rate of the components of the miRNA processing pathways in plants, performing essential functions in the post-transcriptional regulation of mRNAs in different cells 43,91 .
In fact, our analysis revealed that the components of this pathway were highly conserved in the two species at the amino acid level, as well as the distribution of domains, active site location and phylogenetic distribution compared to orthologous proteins in other plant species. This pattern was observed in both protein families, ARGONAUTE and DICER, both considered to be critical in the processing machinery of miRNAs in plants 47,91 .
Among all proteins found, 15 and 16 proteins belong to the AGO family of S. lycopersicum and S. pennellii, respectively, compared to 19 identified in O. sativa 48 , 17 in P. vulgaris 95 . In A. thaliana 10 AGO proteins have been found 91,94 . Mirzaei et al. 70 and Bai et al. 69 found also 15 AGO proteins in S. lycopersicum, all of which were identified in this study 69,70 . However, in the study by Bai et al. 69 , two classifications were different. Solyc02g069280 and Solyc03g111760 were considered by the authors to be SlyAGO3 and SlyAGO15, but in our analyses they were   Fig. S1 and Table S3) were the same as in proteins from orthologous plant species, and the domain distribution was similar as well [96][97][98][99] . The Piwi conserved domains of the AGO proteins had an active site containing a DDH catalytic triad (Fig. 1a), present in the AGO protein family and proteins with RNase H function 48,94 . These results conformed to AGO1 and AGO10 proteins found in A. thaliana and P. vulgaris 48,94,95 . The conserved domains and active sites found in Sly/SpeAGO1 and Sly/SpeAGO10 showed high conservation compared to AGO1 and AGO10 proteins in A. thaliana, O. sativa, S. tuberosum, G. Max and P. vulgaris 48,95,98,100 . Such high domain conservation may be directly related to the role played by AGO1 and AGO10 proteins in the miRNA processing pathway in different species.
Other important catalytic components of the miRNA processing in plants are the DCL proteins 95,[101][102][103][104][105][106][107] . In this study, we identified 8 and 11 putative DCL proteins of S. lycopersicum and S. pennellii, respectively. The number of DCL proteins found in tomato genome is in agreement with other plant species: 4 DCLs have been found in the A. thaliana genome 91,105 , 6 in P. vulgaris 95 and in O. sativa at least 5 DCL proteins were identified 48,105,108 . Bai et al. 69 and Wang et al. 71 found only 7 DCL proteins in S. lycopersicum 69,71 .
The DCL1 protein is the protein of the DCL family involved in the processing of miRNAs 58,109,110 . The conserved domains found in Sly/SpeDCL1 (Supplementary Fig. S1 and Table S4) were very similar to their orthologous proteins from plant species 95,105,108 , suggesting a conserved function of DCL1 proteins in tomato species. According to recent studies, especially in DCL proteins, the distance between the RNAse III and PAZ conserved domains is suggested to be the largest determinant of the resulting length of processed miRNAs 51,105,111 . This demonstrates the importance of the domain distribution and conservation in the performance of these proteins at the cellular level. The two RNAse III domains (RNase IIIa and RNase IIIb) found in Sly/SpeDCL1 proteins displayed an active site with the EDDE catalytic residues (Fig. 1b), also identified in DCL1 proteins from other plant species 112 . These RNAse III domains are required for the cleavage and processing function of dsRNAs (double-stranded RNA) in the miRNA pathway in plants 105,108 .
A phylogenetic analysis of the Sly/SpeAGO (Fig. 2) and Sly/SpeDCL (Fig. 3) proteins was performed from their amino acid sequences, aiming at understanding the evolutionary diversification of AGO and DCL proteins in the two tomato species. The distribution of the seven clades showed a relationship between some paralogous proteins specific to the AGO family, with a great similarity among the AGO1, AGO5 and AGO10 proteins; AGO2 and AGO7; AGO6 and AGO4 conforming to the phylogenetic distribution in A. thaliana and P. vulgaris 91,95,113 . In general, the identified phylogeny distribution of AGO proteins is consistent with current species protein phylogenies 114 , which suggests that the conservation level for the members of the AGO family in Angiosperms is high. The phylogenetic tree of the AGO proteins showed a strong correlation between AGO1 and AGO10 proteins, with both AGO proteins participating in the miRNA pathway in plants 91,94 . AGO proteins have a key role in transcriptional and post-transcriptional silencing mediated by small RNAs in plants. The role of these proteins in miRNA regulation has been associated with the RISC complex (RNA-induced silencing complex), allowing degradation or inhibition of mRNA translation derived from the base complementarity between the miRNA specific sequences and the target mRNA 91,94,115 .
The analysis of Sly/SpeDCL proteins showed a great similarity to the evolutionary tree of life of these species within each clade 114 . The distribution among the DCL family paralogs showed a strong correlation between the DCL2 and DCL4 proteins, which were also close to DCL1 but further away from DCL3, consistent with the distribution in DCL families in other plant species 48,105 . Thus, it was observed that DCL proteins of S. lycopersicum and S. pennellii are highly conserved in the same way as observed for the AGO proteins, highlighting their great importance in post-transcriptional silencing processes in plants.
The validation of the miRNA pathway genes in the libraries available in the SRA 93 showed the DE of 10 and 11 of the 29 and 32 genes identified in S. lycopersicum and S. pennellii, respectively (Fig. 8, Supplementary Figs S16, S17 and Table S14). Among the DE genes in the two species are the genes of ABH1, AGO1, AGO10, DDL, EXO, HESO1, SE and SQN families. In S. lycopersicum, of the 10 DE genes, 8 were up-regulated in flowers and 2 up-regulated in leaves, and in S. pennellii all 11 DE genes were up-regulated in flowers. This result shows a greater requirement of these genes in flowers, suggesting an essential role of the miRNA in this tissue, and considering that several miRNAs targets are related to different roles in flowers [116][117][118] . Yamaguchi and Abe (2012) and Wang 117 showed an important function of miR172, miR156 and miR159 in the flowering time control in A. thaliana 117,118 . In addition, Aukerman and Sakai (2003) demonstrated that miR172 causes early flowering and disrupts the specification of floral organism when overexpressed in A. thaliana 116 .
Only the SlyEXO.1 and SlyEXO.3 genes were up-regulated in leaves compared with flowers, whereas the SpeEXO.2 gene was not expressed (Fig. 8, Supplementary Figs S16, S17 and Table S14) because these genes were also involved in other pathways or have other functions, such as delivery of mRNAs, miRNAs and cell-specific proteins, considering that this family had the greatest number of paralogous sequences 119,120 . In both species, AGO10 and SQN.2 were the most up-regulated genes in flowers, suggesting that the mechanism of using these genes in flowers is conserved in these two species. Ji et al. 121 showed overlapping roles of AGO10 and AGO1 in the temporal regulation of floral stem cells 121 . Furthermore,  reported that the role of miR168a in sweet orange was the accumulation of AGO1 in leaves and fruits 122 , which may explain up-regulation of AGO10 and SNQ.2 in the leaves, considering that AGO10 has important roles in this tissue, and SQN has a direct relation with AGO1, whereas AGO1 has also important roles in flowers and is down-regulated by miR168 in leaves 121,122 . As for the miRNA sequences, we identified 343 and 388 mature miRNAs, 226 and 234 precursor miR-NAs, distributed in 87 and 85 distinct miRNA families, for 4197 and 5768 target genes in the S. lycopersicum (Supplementary Table S5) and S. pennellii (Supplementary Table S6) genomes, respectively. In addition to predicting previously reported miRNAs, our study identified many additional novel miRNAs in the genomic sequences of S. lycopersicum; notably, of the all miRNAs found, only 110 mature and 77 precursors were deposited in the miRBase (version 21) 124 (Supplementary Table S5 Table S6).
The miRNA precursor gene localization and distribution in S. lycopersicum and S. pennellii genomes showed that 86.73% and 81.59% of the genes were found in intergenic regions, respectively, and that, in S. pennellii, 3 of them (spe-miR169-1, spe-miR1919 and spe-miR5368) are located in more than one position in the genome (Supplementary Table S7 The structural and thermodynamic characteristics of the S. lycopersicum and S. pennellii were similar to those reported in other plant studies 127,[138][139][140] . In plants, the values for MFE in precursor miRNAs are typically smaller than −18 kcal/mol 140 . Furthermore, studies have shown that the sequences of RNAs with mean AMFE −45.93 ± 9.43 kcal/mol are considered real miRNAs, based on the knowledge that AMFEs found in miRNA precursors had higher negative values than in other classes of non-coding RNAs, for example, tRNAs and rRNAs. MFEIs with a mean value of 0.97, above 0.85 were suggested for potential precursors of plant miRNAs, which distinguished them from other RNAs, consistent with our results 138 . The statistical analyses using thermodynamic and structural characteristics of the tomato pre-miRNAs, when comparing the two tomato species (Supplementary Table S10) and Solanaceae family (Supplementary Table S11) showed no differences between the groups (p < 0.05). Both displayed high conservation and similarity among pre-miRNAs from S. lycopersicum, S. pennellii and also species belonging to the Solanaceae family. However, the results for S. lycopersicum and S. pennellii were significantly different from results for Fabaceae and Brassicaceae families (p < 0.05), suggesting a great evolutionary conservation of the pre-miRNA characteristics within each plant family and a significant difference among different families (Supplementary Table S11). These differences with respect to pre-miRNA characteristics were also observed for Monocotyledons and Eudicotyledons 141 .
Several tomato miRNAs showed functions modulating important biological processes through the gene expression control (Supplementary Tables S12 and S13). The targets of Sly/SpeMIR165/166 family are involved mainly in plant vascular development 162 , including the formation of axillary meristems, root lateral meristems, laminar outgrowth and differentiation in leaves, stems, and roots 143,[163][164][165][166][167] . Several studies have also identified targets of miR165/166 in A. thaliana and O. sativa, in agreement with our results, also showing a great conservation of function in this family of plant miRNAs, such as in retrotransposon, homeobox-leucine zipper family and kinase family proteins 99,144,[163][164][165]168,169 .
The miR165/166 gene expression by real-time PCR in leaves and flowers of the two studied species showed interesting differences (Fig. 7). The miR165/166 was expressed at a higher level in S. lycopersicum leaves than in S. pennellii leaves. Specifically, in S. lycopersicum, miR165/166 has been expressed at a higher level in leaves than in flowers. The predicted targets for the tomato miR165/166 families have been related to the HD-Zip protein families, which participate in secondary development and vascular differentiation. In this sense, miR165/166 in S. lycopersicum leaves displayed overexpression compared to S. pennellii leaves suggesting a higher level of regulation of vascular development in the cultivated tomato species leaves than in the wild species. miR165/166 also showed high expression in rice after a short-term heat stress 170 , potato 155 and in leaves, stems and roots of Poncirus trifoliate 171 . miRNAs from this family were also associated with a phenotype diversity involved in drought tolerance [172][173][174] . Furthermore, Guo et al. 175 showed that during the establishment of Chinese kale seedlings, the miR165/166 expression was also regulated, showing its relation to the normal leaf development 175 , consistent with the higher expression found in leaves of S. lycopersicum than in flowers.
Many reports in plants have demonstrated conserved targets for MIR167 family, such as the genes involved in the disease resistance (CC-NBS-LRR) in O. sativa species and genes of the peptidase family proteins in A. thaliana 169 consistent with the Spe/SlyMIR167 targets found in our study. Other important targets of MIR167 family include genes involved in the regulation of auxin response factors (ARF4, ARF6 and ARF8) [176][177][178][179][180][181] . As for expression, the miR167 was expressed at a higher level in S. pennellii than in S. lycopersicum. When comparing the S. pennellii tissues, leaves showed higher expression than flowers. In rice seeds 170 and in S. tuberosum leaves and flowers 182 , miR167 was more abundant than in other tissues. Unlike Citrus grandis, which showed low expression of the miR167 family members in the floral developmental stages, indicating an important role in the reproductive development 183 in agreement with our findings of low expression in flowers. In addition, miR167 was shown to be responsive to heat shock in Arabidopsis 184 and played an important role in nitrogen stress responses, being up-regulated in maize shoots and roots 185 . Furthermore, most of the observed miR167 targets in plants were related to the hormonal pathways regulation, as in A. thaliana, rice and coffee 154,186,187 . Considering that miR167 plays these important roles, such as responses to heat shock, nitrogen stress and hormonal pathway regulation, its higher expression in S. pennellii than in S. lycopersicum suggests a significant influence on the metabolic pathway involved in resistance of wild plants to these abiotic stresses.
The Sly/SpeMIR530 family targets the genes encoding plus-3 domain-containing proteins, which was also found in N. tabacum 196 and O. sativa 169 . Additionally, this miRNA family was involved in the nitrogen regulation and high salinity 197,198 , being pivotal in the plant physiological response to stress 199 . The higher miR530 expression in S. pennellii suggest an important role in the plant physiological control, mainly related to plant response to biotic and abiotic stresses, in which case it may need a higher expression in the wild species, according to their targets 200 .
Other studies also revealed similar targets for Sly/SpeMIR827 family miRNAs, such as SPX domain-containing family protein in A. thaliana 169 and O. sativa 201 . The targets found indicated that miR827 plays a key role in plant stress adaptation, especially in nitrogen and phosphorus deficiency [202][203][204][205] . miR827 showed high expression in rice seeds 154 , unlike maize, which showed a suppressed expression in nitrogen deficiency, consistent with other studies that showed an involvement of miR827 in adaptive responses to low nitrogen and phosphorus conditions, showing its important role in stress adaptation 185,197,198,202,203 .
Sly/SpeMIR828 target genes showed its important roles in cell proliferation, control of apical dominance and organ symmetry, as well as senescence, such as found in studies carried out in other plants 159,202,206,207 . In A. thaliana, the MYB-type DNA-binding domain was also identified as miR828 target 169 . miR828 was expressed at higher levels in active bud in Camellia sinensis. The miR828 targets showed a role in regulating transcription and nucleotide metabolism 208 and were holistically expressed in grapevine 209 . The miR828 presence also showed its likely involvement in phenylpropanoid metabolism, biotic and abiotic stress, cell differentiation and hormone responses 210 . Therefore, the targets found in this study showed the importance of the regulation performed by miRNAs (Supplementary Tables S12 and S13).
The validation of the mature miRNAs identified in this study, used 95 small RNAs libraries of S. lycopersicum available in the SRA (Supplementary Table S15). By performing a robust analysis, 262 mature miRNAs of S. lycopersicum were found in at least one library. This result showed that the methods used in this study for the miRNAs prediction are quite effective. Only 23 mature miRNAs were not found in the libraries. Among these miRNAs were sly-miR172d-4-5p, sly-miR172d-9-3p, sly-miR2111a-3p, sly-miR403a-3p, sly-miR7983-6-3p, sly-miR7992-5p, sly-miR8008b-2-3p and sly-miR8025-5p. Yet, other mature miRNA derived from their pre-miRNAs were present in the libraries, suggesting that in these specific libraries only one mature miRNA was necessarily expressed. Moreover, other mature miRNAs might be absent from the analysed libraries because some miRNAs are expressed only under specific conditions and/or tissues, which were not present in these libraries 211,212 . However, such validation analysis of the mature miRNAs could not be performed with S. pennellii due to the low number of small RNAs libraries available for this species, highlighting the fact that our study was the first to identify miRNAs in S. pennellii.

Conclusion
Considering the global importance of the tomato culture and the problems generated by the susceptibility to the pests of the cultivated tomato (S. lycopersicum) and the differences with the wild tomato (S. pennellii), the obtained results elucidated several aspects of miRNAs in these two species. Our results expand the study of miRNAs in plants by providing a better understanding of their essential roles in the miRNA-based regulation processes in tomato, their processing pathways and gene expression, as well as providing targets for future investigations. were identified and retrieved from the reference protein database (refseq-protein) available at NCBI (National Center for Biotechnology Information). These sequences were used as queries to search for putative proteins involved in the small RNA biogenesis (including miRNAs) in S. lycopersicum and S. pennellii using BLASTp (Basic Local Alignment Search Tool) (http://blast.ncbi.nlm.nih.gov/Blast.cgi) 213,214 . Using the reference sequences, the putative sRNA pathway proteins were searched for in the two tomato species using the S. lycopersicum (ITAG3.0) and S. pennellii (Spenn_v2.0) data of the Sol Genomics Network (https://solgenomics.net/) 28 .
Prediction of mature and precursor miRNAs. Initially, the S. lycopersicum (ITAG3.0) 29 and S. pennellii (Spenn_v2.0) 20 assembly genome were obtained from Phytozome v12.1 (https://phytozome.jgi.doe.gov/) 215 and PlaBi dataBase (http://www.plabipd.de/project_spenn/start.ep), respectively. Genome annotation files were also obtained from the databases for the respective species. The putative miRNAs and their precursors were searched applying the method described by De Souza Gomes et al. 216 , adapted for both species 216 . First, we obtained the genome sequences and found the potential hairpin sequences or similarity to miRNA precursor structures using Blastn (NCBI) and Einverted (EMBOSS tool) programs 217 and the parameters of 336-nucleotides maximum repeat and threshold value of 25. Several filters were applied to these sequences to discard undesirable sequences, such as other non-coding RNAs sequences, and to keep those corresponding to putative miRNAs. The applied filters were based on conserved characteristics of precursor miRNA, as well as characteristics of other known regions, which had no potential to originate precursor miRNAs. These filters were: the GC (guanine and cytosine) content between 20% and 65%, minimum free energy (MFE), homology with conserved mature miRNAs, homology with repetitive regions and non-coding RNAs, except miRNAs 216 .
The miRBase database (version 21 -http://www.mirbase.org/) 124,218 was used for comparisons with the putative hairpin sequences and find novel S. lycopersicum and S. pennelli miRNAs by accepting a maximum of 5 mismatches in final mature sequence. Other RNA groups, such as ribosomal RNA (rRNA), transporter RNA (tRNA), small nuclear RNA (snRNA), small nucleolar RNA (snoRNA), SL RNA, SRP, and RNase P were discarded based on the RFAM database version 12.0 219 and repetitive sequences were discarded using the RepeatMasker 4.0.5 (http://www.repeatmasker.org/). The putative precursor and mature miRNAs identified were used for further analysis.
The pre-miRNA sequences identified in S. lycopersicum and S. pennellii were characterized according to their thermodynamic structures and parameters: minimum free energy (MFE), adjusted minimum free energy (AMFE), free minimum energy index (MFEI), size, content of A, content of U, content of C, content of G, of GC and AU, GC ratio, AU ratio, minimum free energy ensemble (MFEE), ensemble diversity (Diversity) and MFE structure frequency in the ensemble (Frequency). The AMFE was determined with MFE of 100 nucleotide-long sequence, and the MFEI was determined from the equation: MFEI = [(AMFE) × 100]/(G% + C%)] 129 . The pre-miRNA secondary structure prediction, in addition to the diversity calculation, MFE, ensemble frequency and MFE of the secondary structures were performed using the RNAfold (http://rna.tbi.univie.ac.at/cgi-bin/ RNAWebSuite/RNAfold.cgi). The GC content and other structural properties were defined using Perl scripts.

Analysis of conserved domains, active sites, phylogeny, primary and secondary alignments.
For the conservation analyses of domains, active sites and structure evaluation, putative protein sequences and precursor miRNAs, we used the multiple alignment by ClustalX 2.1 and ClustalW 220 . In addition, we used RNAalifold (http://rna.tbi.univie.ac.at/cgi-bin/RNAWebSuite/RNAalifold.cgi) to analyse the primary and secondary structure alignments of precursor miRNAs with their respective orthologues 220 . Adjusted parameters (gap opening: 22.50; gap extension: 0.83) were used for the multiple alignments of precursor miRNA sequences 221 . The pre-miRNAs and proteins sequences logos were originated from WebLogo 2.8.2 (http://weblogo.berkeley.edu/ logo.cgi) 222 .
The conserved protein domains were recovered separately to verify, through multiple alignment, the presence of important amino acid residues. In addition, the PFAM family protein database (http://pfam.xfam.org/) 223 was used to find conserved domains and their putative functions. The Conserved Domain Database (CDD) (http:// www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) 224 was used to identify amino acids from the active sites of putative proteins.
The phylogenetic analysis was performed using MEGA5.2 program and the neighbour-joining method for miRNA precursor sequences and also putative proteins 225,226 . The miRNA evolutionary distance was calculated using the Kimura-2-parameter in site-replaced base units. For the putative proteins, the Jones-Taylor-Thornton (JTT) model was used 227 . A consensus tree was obtained using a bootstrap of 5000 replicates for the precursor miRNAs and 1000 replicates for the proteins, representing the evolutionary history of the analysed sequence group. miRNA target prediction. The miRNA targets identified in S. lycopersicum and S. pennellii were predicted using the psRNATarget tool (2017 Update) (http://plantgrn.noble.org/psRNATarget/) 228,229 . In order to obtain the lowest false-positive prediction rate for miRNA target genes, the threshold was strict, being 3.0 for "Maximum expectation (Exp)". Other parameters were standard: "Length for complementarity scoring (HSP size)" −19 bp; "Top target genes for each small RNA" −200; "Target accessibility -allowed maximum energy to unpair target site (UPE)" −25; "Flanking length around target site for target accessibility analysis" −17 bp before/13 bp after; "Range of central mismatch leading to translational inhibition" −10 -11 nucleotides. To analyse the miRNA target genes predicted in tomato, candidate targets were retrieved by Phytozome v12. 1  of tissue was ground in liquid N 2 and transferred to a tube, to which 1 ml of QIAzol was added and homogenized by vortexing. The tubes were left at room temperature for five minutes. Afterwards, 300 µl of chloroform was added and homogenized by vortexing. The mixture was left at room temperature for five minutes. Afterwards, the tubes were centrifuged at 12000 g at 4 °C for 15 min. Three phases were formed, and 400 µl of the upper phase was collected and transferred to a new tube, to which 500 µl of isopropanol was added and homogenized by vortexing, and the tube was left at room temperature for 10 min. The tubes were then centrifuged at 12000 g at 4 °C for 10 min for precipitation of nucleic acids. The liquid was discarded, and the pellet was washed with 1 ml of 75% ethanol and centrifuged at 7500 g at 4 °C for 10 min, and the liquid was discarded carefully to preserve the pellet. Then, 500 µl of isopropanol was added again and the steps were repeated. The pellet was dried at room temperature and then resuspended in 20 µl of autoclaved mili-Q water.
The RNA integrity was visualized in 0.8% agarose gel, and the quantity and quality (ratio 260/280 and 260/230 between 1.8 and 2.2) were measured on Nanovue spectrophotometer.
DNAse treatment. Total extracted RNA was treated with the Turbo DNA-free TM kit (Life Technologies TM ). Five µg of RNA was used, and the corresponding volume was adjusted to 22 µl with autoclaved milli-Q water. Then, 2.5 µl of the 10x Turbo DNAse Buffer and 0.5 µl of the Turbo DNAse were added to each sample. The reaction was incubated for 30 min at 37 °C. Afterwards, to stop the DNAse enzyme activity, 2.5 µl of the DNAse Inactivation reagent was added. The samples were left at room temperature for 5 min, with occasional mixing, and then centrifuged at 10000 RPM for 2 min. After centrifugation, 15 µl were collected and transferred to a new tube. The samples were stored at −20 °C until cDNA synthesis.
cDNA synthesis and relative expression. The stem-loop method 63,230 was used for cDNA synthesis and expression analysis, following the previously described steps 63,187 . In this method, three primers are needed -the stem-loop RT primer, the forward and the reverse primers. The stem-loop RT primers were designed according to Chen (2004) 230 . The sequence data are presented in Supplementary Table S15. The stem-loop primers contain 50 nucleotides, of which 44 nucleotides correspond to a universal sequence that forms a stable stem-loop structure at low temperatures. The last 6 nucleotides at the 3′ end of the stem-loop RT primer is a reverse complement of the last 6 nucleotides at the 3′ end of the specific miRNA. The forward primers were designed with the exact sequence of the miRNA, excluding the last 6 nucleotides at the 3′ end. To increase the melting temperature, nucleotides were randomly added (5-7 nucleotides) to the 5′ end of the primer. Primer design software was used to calculate the melting temperature and verify the quality of the forward primers. The third, reverse primer is a reverse complement of the universal sequence of the stem-loop RT primer, and, therefore, is used for every miRNA.
The cDNA synthesis from total RNA (DNA-free) was performed using the ImProm-II TM Reverse Transcriptase (Promega). For each miRNA, 500 ng of RNA was used. The volume was adjusted to 7 µl with autoclaved mili-Q water. To the RNA, 1 µl of oligo-dT primer; 2 µl of specific stem-loop RT primer (1 µM) and 1 µl of the dNTP mix were added. The samples were incubated at 70 °C for 10 min for denaturation of the secondary structures and later incubated at 4 °C for 10 min. Then, 5 μl of Improm-II 5 × reaction buffer, 2.4 μl of 25 mM MgCl 2 , 0.6 μl of RNaseOut (Invitrogen), and 1 μl of the Improm-II Reverse Transcriptase were added. The reactions were incubated in a thermocycler at 16 °C for 30 min, followed by reverse transcription of 60 cycles at 30 °C for 30 s, 42 °C for 30 s, and 50 °C for 1 s. For inactivation of the Improm-II Reverse Transcriptase, the reaction was incubated at 70 °C for 15 min. Subsequently, the reactions were stored at −20 °C.
Quantitative real-time PCR (RT-qPCR) was performed using a standard SYBR Green PCR kit protocol on a Rotor-Gene Q (QIAGEN). The reactions were performed in a final volume of 15 μl, using 1.5 μl cDNA, 1.5 μl of each primer at a final concentration of 1 µM, 7.5 μl of SYBR Green PCR Master Mix (QIAGEN), and 3.0 μl of water, for each reaction. The reaction was incubated at 95 °C for 5 min, followed by 40 cycles at 95 °C for 5 s and 60 °C for 10 s. Then, the samples were heated from 55 to 95 °C with an increase of 1 °C to acquire the melting curve of the amplified products. All reactions were run in triplicate.
For the relative expression experiment of miRNAs from selected tissues, the 1:25 dilution was chosen. For the calculation of relative expression, the normalized comparative Cq (quantitative Cycle) method was used 64 , which takes into account the primer efficiency in the calculation. The normalization factor was the geometric mean of the U6 and 5.8S gene expression.
Expression of miRNA pathway genes in tomato RNA-seq libraries. Paired-end reads from leaves and flowers of both wild (Solanum pennellii) and cultivated (Solanum lycopersicum) tomato were retrieved from the Sequence Read Archive (SRA) under accessions numbers SRR786556, SRR786557, SRR786570, SRR786571, SRR786552, SRR786553, SRR786566, SRR786567, SRR786524, SRR786525, SRR786540, SRR786541, SRR786542, SRR786520, SRR786521, SRR786535, SRR786536, SRR786537 93 . In this way, 10 libraries for S. lycopersicum and 8 libraries for S. pennellii, totalling 77354968 paired-end reads, were used for differential expression analyses. The quality of the libraries was evaluated using FastQC software 231 . Adapters were identified using minion 232 and removed with Trimmomatic 233 as well as were the reads with the quality score and length below 20 and 35 bp, respectively. After the quality control, 71703462 paired-end reads were used as inputs to STAR mapper tool (version 2.5.3a) 234 with the default parameters for alignment against their respective genomes (S. lycopersicum version 3.2 and S. pennellii version 2.0) retrieved from the Sol Genomics Network 28 . Approximately 88% and 87% of the S. lycopersicum and S. pennellii reads were uniquely mapped to their genomes, respectively.
The libraries were then sorted by query name using picard tools (version 2.18.0), and for each of them, raw read counts were obtained using the python script htseq-count (version 0.7.2) 235 . Differentially expressed (DE) genes were identified using the Bioconductor R package edgeR by comparing the normalized number of reads aligned to each gene in different tissues 236,237 . The Benjamini and Hochberg's false-discovery rate (FDR) below 0.05 and minimum fold-change of two were the parameters used to consider a gene DE 238 . miRNAs in S. lycopersicum small RNA-seq libraries. Sequencing analysis. Ninety-five raw small RNA data files were retrieved from the NCBI Sequence Read Archive (SRA) with different accession numbers (Supplementary Table S15). The library qualities were evaluated using FastQC software 231 , adapters were removed with Trimmomatic 233 discarding reads with quality score below 20 and length less than 17 nucleotides and longer than 30 nucleotides. The filtered sequences were mapped and quantified using miRDeep2 239 . miRDeep2 and perl scripts were used on each sequence separately to generate the numbers of the reads for each miRNAs identified.
Statistical analysis. For the statistical comparisons among the structural and thermodynamic variables of each category (species and/or families), a basic descriptive analysis was performed followed by non-parametric tests (Wilcoxon-Mann-Whitney). Median values were used to perform the statistical comparisons 240 . Statistical significance was set at p < 0.1.