Introduction

The Solanaceae family is one of the largest families in the plant kingdom, including several plants of agronomic and medical importance. It is composed of approximately 100 genera and more than 3000 species that grow in all habitats, ranging from rainforests to deserts. Tomato is one of the most important fruit crops in the global food industry1,2,3,4,5. The genus Solanum is one of largest genera among the Angiosperms, containing approximately 1500 species6. Solanum lycopersicum, the cultivated tomato is one of the most consumed fruits worldwide7,8. In addition to its importance as food, the tomato has important agronomic characteristics, including fleshy fruits and multicellular and glandular trichomes, which model plants, like Arabidopsis thaliana, do not have. Thus, the common tomato has been used as a model for species that also have these characteristics9,10,11,12,13.

Due to the specific characteristics and growing conditions, tomato varieties are constantly improved14. However, this crop is susceptible to insect attacks and more than 200 diseases caused by various types of pathogens, including viruses, bacteria, fungi and nematodes15. To aid in the control of these diseases, most producers opt for the use of chemical agents, which are often costly, at times ineffective, and have deleterious effects on the environment16. An alternative to these problems is to cross the cultivated tomato with wild species12,17,18,19,20. There are several known species of wild tomato, consisting of underexploited genetic resources, and of great importance for the improvement, research and development of culture12,18,20,21,22,23,24,25,26. Solanum pennellii is a wild tomato species that has been widely used for the construction and mapping of populations due to its tolerance to environmental stress18,20,27,28. In this sense, a better understanding of the tomato molecular basis is required to achieve the efficiency and success in the selection of markers associated with characteristics of interest4.

Current progress in sequencing the tomato genome has generated useful information to assist in the study of their genetic diversity12,23. The sequencing of the S. lycopersicum genome was completed in 201229. Furthermore, the availability of a reference genome provided a framework for the genomic analysis of Solanaceae family, generating a source of important information for their molecular breeding5,13,30. In addition to S. lycopersicum, the S. pennellii genome was also sequenced, which provided a valuable resource for the understanding of several prominent features, such as changes in response to water deficit, biotic and abiotic resistance, and metabolism20,28. However, various parts of these genomes need to be thoroughly studied and annotated, since the sequencing of the genome and the deposit of the sequences in public databases are only the initial steps for a thorough understanding.

Although much is known about tomato biology, relatively little is known about the regulation of gene expression involved in plant development. Recent studies have shown the importance of gene regulation involving several classes of small RNAs, their processing system and cellular performance in different organisms31,32,33,34,35,36. The main representatives of this class of small RNAs are the microRNAs (miRNAs), whose ways of regulation may involve inhibition of the translation process, degradation of mRNA or gene silencing, either by target complementarity or by signalling DNA modifications in specific regions of genome37,38,39,40,41,42. miRNAs are endogenous small RNAs with approximately 21 nucleotides in length in mature form37,43,44.

Different proteins act to generate specific mature miRNAs for post-transcriptional regulation32,45,46. In plants, miRNA processing first occurs in the nucleus from the transcription resulting in a long primary miRNA (pri-miRNA) which after cleavage generates the miRNA precursor (pre-miRNA)43,47,48,49,50,51,52,53,54. The pre-miRNA has a characteristic secondary structure, forming a hairpin via an imperfect complementarity, containing 200–300 nucleotides in length55,56. The pre-miRNA is also cleaved at the end of the hairpin, forming a duplex of mature miRNAs (miRNA/miRNA* double strand). The mature miRNA is exported to the cytoplasm and then loaded in a RISC complex, for specific recognition of the target mRNA to regulate gene expression either by translation inhibition or mRNA degradation32,48,55,57,58,59.

Important biological functions are mediated by miRNAs, which include the adaptation of plants to stress, regulation of leaf and floral development, cell defence, cell proliferation, apoptosis and signal transduction32,43,47,60,61,62. Due to important and complex regulation mediated by the miRNAs, the expression detection and quantification of these molecules in specific tissues is essential for a better understanding of their action. Although miRNAs are transcribed at high expression levels, the latter fluctuate widely between cells and among tissues. A sensitive and rapid method for the detection and quantification of miRNA expression is the quantitative real-time PCR (qRT-PCR)63,64,65. In addition, several computational strategies have been used to identify miRNAs and proteins involved in their pathways in various organisms66,67,68. Computational methods are widely used in gene identification, since some transcripts are expressed only under certain conditions or in specific cells. Thus, the computational techniques aid in the discovery process of new miRNAs, using all the information contained in the genome and/or the transcriptome independently of the sampling. Furthermore, such methods are useful for the prediction of miRNA precursors and their targets.

Previous studies have identified miRNAs and proteins involved in miRNA biogenesis in S. lycopersicum7,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86. Many of these studies have shown important roles of tomato miRNAs, such as those seen in fruit ripening77, response to long-term RPM-treatment87, response to curl New Delhi virus (ToLCNDV)75, responses associated to Botrytis cinerea infection82, male-sterility84, salt-induced stress response78, as well as biotic79, drought83, and high-temperature stress responses86. However, many of these genes, miRNAs and their respective targets have not yet been identified and characterized systematically in tomato species, especially in S. pennellii. Thus, the study of miRNAs in the plant genome and transcriptome has become a very powerful tool to assist in elucidation of biological processes as well as their performances at the cellular level67,88,89,90.

The aims of this study were to identify and characterize by in silico analysis the genes involved in miRNA processing, the miRNA precursors, the miRNA targets, as well as to validate such miRNAs in different tissues of S. lycopersicum and S. pennellii, using genomic and transcriptomic sequences obtained from public databases and RT-PCR. This study allows a better understanding of miRNAs, their processing pathways and their role in regulating the gene expression of these important species, as well as providing targets for future investigations.

Results

The small RNAs processing pathway in S. lycopersicum and S. pennellii: analysis of conserved protein domains, active sites and phylogenetic trees

To understand potential roles of small RNAs in the regulation of gene expression in tomato, we sought to initially identify all possible components of small RNA processing machinery in S. lycopersicum and S. pennellii genomes. Using a local alignment, as well as the presence and arrangement of conserved domains, 71 putative proteins were identified in S. lycopersicum and 108 in S. pennellii likely involved in the biogenesis of small RNAs (siRNAs, tasiRNAs and miRNAs) (data not shown). Among these proteins, the AGO and DCL families are considered the key proteins in the processing machinery of small RNAs in plants47,91. We identified 15 and 16 proteins, which belong to the ARGONAUTE family in S. lycopersicum and S. pennellii, respectively (SlyAGO1.1, SlyAGO1.2, SlyAGO2.1, SlyAGO2.2, SlyAGO2.3, SlyAGO4.1, SlyAGO4.2, SlyAGO4.3, SlyAGO4.4, SlyAGO4.5, SlyAGO5, SlyAGO6, SlyAGO7, SlyAGO10.1, SlyAGO10.2, SpeAGO1.1, SpeAGO1.2, SpeAGO2.1, SpeAGO2.2, SpeAGO2.3, SpeAGO4.1, SpeAGO4.2, SpeAGO4.3, SpeAGO4.4, SpeAGO4.5, SpeAGO4.6, SpeAGO4.7, SpeAGO5, SpeAGO6, SpeAGO7, SpeAGO10). Eight and 11 DICER family proteins were found in S. lycopersicum and S. pennellii, respectively (SlyDCL1, SlyDCL2.1, SlyDCL2.2, SlyDCL2.3, SlyDCL2.4, SlyDCL3.1, SlyDCL3.2, SlyDCL4, SpeDCL1, SpeDCL2.1, SpeDCL2.2, SpeDCL2.3, SpeDCL2.4, SpeDCL2.5, SpeDCL2.6, SpeDCL3, SpeDCL4.1, SpeDCL4.2, SpeDCL4.3).

Among the proteins involved in small RNA processing, 29 putative proteins for S. lycopersicum and 32 for S. pennellii were identified that participate in the miRNA pathways (Supplementary Tables S1 and S2).

ARGONAUTE and DICER proteins in S. lycopersicum and S. pennellii

Analysis of conserved domains, active sites, phylogeny and pathway proteins and pre-miRNA alignment in S. lycopersicum and S. pennellii

We found 15 AGO proteins in the genome of S. lycopersicum and 16 in S. pennellii. The SlyAGO1.1/1.2 and SpeAGO1.1/1.2 proteins displayed the Gly-richAgo1, ArgoN, ArgoL1, PAZ, ArgoL2, ArgoMid and Piwi domains. Similar results were observed for SlyAGO10.1/10.2 and SpeAGO10, which displayed the ArgoN, ArgoL1, PAZ, ArgoL2, ArgoMid and Piwi domains (Supplementary Fig. S1 and Table S3).

The Piwi conserved domain, one of the main domains present in AGO proteins, was identified in SlyAGO1.1, SlyAGO1.2, SpeAGO1.1, SpeAGO1.2, SlyAGO10.1, SlyAGO10.2 and SpeAGO10 proteins. These domains showed active site regions containing an important catalytic triad DDH (aspartate/aspartate/histidine). The DDH catalytic triad of these proteins was found in Asp764-Asp850-His990, Asp690-Asp776-His916, Asp763-Asp849-His989, Asp862-Asp948-His1088, Asp704-Asp790-His930, Asp658-Asp744-His884, Asp704-Asp790-His930 positions, respectively (Fig. 1a).

Figure 1
figure 1

Weblogo analysis and sequence multiple alignment of active site amino acids and their flanking amino acid residues in the conserved domains of S. lycopersicum, S. pennellii and their orthologs. (a) Piwi domain (PF02171) from AGO proteins and (b) Ribonuclease III domain (PF00636) from DCL proteins. The red arrows show the amino acid residues of the active sites (catalytic triad AGO: Asp-Asp-His and DCL: Glu-Asp-Asp-Glu). The height of each amino acid symbol in Weblogo indicates amino acid residue conservation at a given position. The asterisk indicates 100% conserved amino acids at a specific position. Sly - Solanum lycopersicum, Spe - Solanum pennellii, Csi - Citrus sinensis, Gma - Glycine max, Pvu - Phaseolus vulgaris, Stu - Solanum tuberosum, Nsy - Nicotiana sylvestris, Cme - Cucumis melo, Tca - Theobroma cacao and Vvi - Vitis vinifera.

In S. lycopersicum and S. pennellii, we identified 8 and 11 putative proteins, respectively, belonging to the family of DCL proteins involved in the miRNAs processing. We found DCL1 in S. lycopersicum and S. pennellii. Both proteins displayed the conserved ResIII (also known as DEAD-like), helicase C, Dicer dimer (also known as DUF283), PAZ, ribonuclease IIIa (RNase IIIa), ribonuclease IIIb (RNase IIIb), dsrm and DND1 DSRM domains (Supplementary Fig. S1 and Table S4).

We found two RNAse III domains in SlyDCL1 and SpeDCL1 proteins. The two domains (RNase IIIa and RNase IIIb) had the same catalytic residues in both tomato species studied: glutamate (E), aspartate (D), aspartate (D), glutamate (E), representing the EDDE active site. These residues were located in RNase IIIa domains at Glu1591-Asp1595-Asp1687-Glu1690 and Glu1617-Asp1621-Asp1713-Glu1716 positions and the RNase IIIb domains at Glu1390-Asp1394-Asp1465-Glu1468 and Glu1390-Asp1394-Asp1521-Glu1524 positions of the SlyDCL1 and SpeDCL1 proteins, respectively (Fig. 1b).

To understand the diversification of AGO proteins in S. lycopersicum and S. pennellii, a phylogenetic analysis was performed using their amino acid sequences (Fig. 2). The phylogeny of the Sly/SpeAGO1 and Sly/SpeAGO10 proteins showed that they were distributed in the phylogenetic tree closest to their respective orthologous proteins from the plant species S. tuberosum and Nicotiniana sylvestris. The phylogenetic tree was divided into seven distinct clades of paralogous AGOs (AGO1, AGO2, AGO4, AGO5, AGO6, AGO7 and AGO10). In addition, to determine the evolutionary relationship among the putative DCL proteins of S. lycopersicum, S. pennellii and their orthologous species, a phylogenetic analysis was also conducted (Fig. 3). The proteins were grouped through global alignment in four distinct clades (DCL1, DCL2, DCL3 and DCL4).

Figure 2
figure 2

Phylogenetic tree of AGO proteins identified in S. lycopersicum, S. pennellii and their orthologs.

Figure 3
figure 3

Phylogenetic tree of DCL proteins identified in S. lycopersicum, S. pennellii and their orthologs.

Mature miRNAs and their precursors in S. lycopersicum and S. pennellii

To show that the miRNA processing machinery is well-conserved in S. lycopersicum and S. pennellii, we sought to identify both mature and precursor sequences of miRNAs, in addition to predicting and defining the nature of their putative target genes using available public databases (Phytozome v12.1 and PlaBi). By applying an optimized and specific algorithm (see Materials and Methods), we identified 343 sequences (3p and 5p) of mature miRNAs, out of which 285 were unique miRNA sequences, and 226 precursor miRNAs, distributed in 87 distinct miRNA families in S. lycopersicum genome (Supplementary Table S5). In S. pennellii, 388 sequences (3p and 5p) of mature miRNAs were identified, out of which 307 were unique miRNA sequences, and 234 precursor miRNAs were contained in 85 different miRNA families (Supplementary Table S6).

Regarding the miRNA precursor gene localization and distribution in S. lycopersicum and S. pennellii genomes, 196 (86.73%) and 195 (81.59%) were found in intergenic regions and 30 (13.27%) and 44 (18.41%) within genes, respectively, whereas in S. pennellii, 3 pre-miRNA genes are located in more than one position in the genome (Supplementary Table S7). The spe-miR169-1 is located on chromosomes 0 and 7, the spe-miR1919 is in two different positions on chromosome 8 and the spe-miR5368 is on chromosomes 6, 7 and in two different positions on chromosome 9. In addition, it was possible to identify that 72 miRNAs in S. lycopersicum and 73 in S. pennellii were organized in clusters (10 kb as the maximum distance between two miRNA genes to consider them clustered, see Materials and Methods). Of these, we identified 50 miRNAs in S. lycopersicum and 47 miRNAs in S. pennellii that were organized in clusters, but were located in antiparallel strands (Supplementary Table S7).

All identified miRNAs were analysed for their structural and thermodynamic characteristics. S. lycopersicum miRNA precursors displayed an MFE with the mean of −56.50 kcal/mol, spanning the values between −98.2 and −21.8 kcal/mol. Such parameters as AMFE and MFEI, had the averages of −39.38 kcal/mol and of 1.03, respectively (Supplementary Table S8). In S. pennellii, the pre-miRNAs showed an MFE with the mean of −55.60 kcal/mol, with the values between −93.7 and −23.3 kcal/mol, the mean AMFE was −39.13 Kcal/mol and that of MFEI was 1.02 (Supplementary Table S9).

In addition, we performed statistical analyses using thermodynamic and structural characteristics of pre-miRNAs found in both tomato species compared to pre-miRNAs deposited in the miRBase from Solanaceae, Fabaceae and Brassicaceae families. We compared the miRNA precursor characteristics from two tomato species (Supplementary Table S10) and Solanaceae family miRNA precursors (Supplementary Table S11), and no differences were observed among the characteristics analysed (p < 0.05). In the analysis performed among S. lycopersicum and S. pennellii against Fabaceae and Brassicaceae families, the pre-miRNA characteristics were significantly different (p < 0.05).

Characterization of tomato miRNAs

Among the 172 miRNA families identified in the two species under study, 71 were found in both tomato species. Of all 101 different miRNA families identified in S. lycopersicum and S. pennellii, we highlighted 10 families for more in-depth characterization, since the MIR165/166, MIR167, MIR393, MIR530, MIR827, MIR828 and MIR7983 families are miRNAs conserved in plants, and the MIR7990, MIR8011 and MIR8025 families were identified for the first time in S. lycopersicum and S. pennellii. For each of these families, we analysed the conservation of sequences and phylogenetic distributions, as well as their putative target mRNAs.

We identified 8 precursor miRNAs in MIR165/166 family, 6 in MIR167 and MIR7983, 3 in MIR7990, 2 in MIR8011 and 1 in MIR393, MIR530, MIR827, MIR828 and MIR8025 families of S. lycopersicum (Supplementary Table S5). In turn, in S. pennellii we identified 7 pre-miRNAs in MIR7983 family, 6 in MIR165/166, 5 in MIR167, 3 in MIR8011, 2 in MIR7990 and 1 precursor sequence in MIR8025, MIR393, MIR530, MIR827 and MIR828 (Supplementary Table S6). We also identified mature miRNAs in these families, including 12 in MIR7983, 10 in MIR167 and MIR165/166, 3 mature sequences in MIR7990 and MIR8011, 2 in MIR393 and MIR827, in addition to 1 in each family of MIR530, MIR828 and MIR8025 of S. lycopersicum. In S. pennellii, 14 mature miRNAs were found in MIR7983, 9 in MIR167, 10 in MIR165/166, 5 in MIR8011, 2 in MIR393, MIR828, MIR827, MIR7990, MIR8025 and 1 sequence in MIR530.

The precursor and mature miRNAs identified in our work presented great conservation in their sequences and secondary structures in relation to their orthologues, especially in the miRNA mature region. This was the case for the Sly/SpeMIR167 (Fig. 4), Sly/SpeMIR165/166 (Fig. 5 and Supplementary Fig. S2), Sly/SpeMIR393 (Supplementary Fig. S3), Sly/SpeMIR530 (Supplementary Fig. S4), Sly/SpeMIR827 (Supplementary Fig. S5), Sly/SpeMIR828 (Supplementary Fig. S6), Sly/SpeMIR7983 (Supplementary Fig. S7), Sly/SpeMIR7990 (Supplementary Fig. S8), Sly/SpeMIR8011 (Supplementary Fig. S9) and Sly/SpeMIR8025 families (Supplementary Fig. S10).

Figure 4
figure 4

(a) The high-confidence RNA alignments between MIR167 families of S. lycopersicum and S. pennellii to their orthologues were performed using RNAalifold. Mature miRNAs are shown as boxes. Brackets and colours identify matching residues in 5′ and 3′ stems of hairpin structures. Levels of nucleotide identity are indicated below the alignment. (b) Secondary structures of MIR167 family from S. lycopersicum, S. pennellii and their orthologues were drawn by RNAfold. Sly - Solanum lycopersicum, Spe - Solanum pennellii, Nta - Nicotiana tabacum, Ptc - Populus trichocarpa, Mdm - Malus domestica, Mes - Manihot esculenta, Gma - Glycine max, Stu - Solanum tuberosum.

Figure 5
figure 5

The high-confidence RNA alignments between MIR165/166 family of S. lycopersicum and S. pennellii to their orthologues were performed using RNAalifold. Mature miRNAs are shown as boxes. Brackets and colours identify matching residues in 5′ and 3′ stems of hairpin structures. Levels of nucleotide identity are indicated below the alignment. Sly - Solanum lycopersicum, Spe - Solanum pennellii, Mdm - Malus domestica, Nta - Nicotiana tabacum, Tcc - Theobroma cacao, Csi - Citrus sinensis, Vvi - Vitis vinifera, Osa - Oryza sativa, Ath - Arabidopsis thaliana.

The analysis of the MIR167 family showed a phylogenetic tree distributed in five distinct clades. The miRNAs from S. lycopersicum and S. pennellii were grouped together with miRNAs from Solanaceae species (Supplementary Figure S11). The phylogenetic tree of MIR165/166 family displayed two distinct clades of Eudicot and Monocotyledon. The miRNA sequences from both tomato species were distributed within Solanaceae in Eudicot clade (Fig. 6). In the phylogenetic trees of Sly/SpeMIR393 (Supplementary Fig. S12), Sly/SpeMIR530 (Supplementary Fig. S13) and Sly/SpeMIR827 (Supplementary Fig. S14), two clades of Eudicots and Monocotyledons were observed, as were miRNAs from tomato species within Solanaceae clade. The phylogenetic tree of Sly/SpeMIR828 family showed ten clades of different families of Eudicots (Supplementary Fig. S15).

Figure 6
figure 6

Phylogenetic tree of MIR165/166 family identified in S. lycopersicum, S. pennellii and their orthologs. Sly - Solanum lycopersicum, Spe - Solanum pennellii, Mdm - Malus domestica, Stu - Solanum tuberosum, Nta - Nicotiana tabacum, Ptc - Populus trichocarpa, Aly - Arabidopsis lyrata, Ath - Arabidopsis thaliana, Bna - Brassica napus, Csi - Citrus sinensis, Ctr - Citrus trifoliate, Vvi - Vitis vinifera, Zma - Zea mays, Ata - Aegilops tauschii.

miRNA target genes of S. lycopersicum and S. pennellii

To understand the putative roles of miRNAs in tomato, we identified their target genes. We identified 2471 different miRNA target genes from a total of 4197 in S. lycopersicum for 343 mature miRNAs (Supplementary Table S12) and 3462 different targets from a total of 5768 in S. pennellii for 388 mature miRNAs (Supplementary Table S13).

Notably, the Sly/SpeMIR167 family miRNAs showed 247 different target transcripts, 37 of which belonged to SlyMIR167 and 210 to SpeMIR167 families. This major difference in target numbers for the same miRNA family in wild tomato compared with S. lycopersicum can be explained by the domestication and improvement of the cultivars, providing increasing similarities of miRNA–target pairs in cultivated plants compared to those in wild plants, similar to the trend observed in soybeans92. Among these targets are: putative zinc-finger in N-recognin (UBR box), retrotransposon gag proteins, magnesium transporter NIPA, annexin p34, BZIP transcription factor, cytochrome P450, RING/U-box superfamily protein, kinetochore protein and a proteins class related to the disease resistance (CC-NBS-LRR). In addition, we identified 74 distinct targets in the Sly/SpeMIR165/166 family, with 35 of the SlyMIR165/166 and 39 of the SpeMIR165/166 families. Their targets constitute the homeobox-leucine zipper family, especially those with the MEKHLA domain, acyl-CoA N-acyltransferases (NAT) superfamily protein and retrotransposon gag protein. Interestingly, the Sly/SpeMIR393 family miRNA targets included 23 distinct transcripts, with 12 of the SlyMIR393 and 11 of SpeMIR393 family. These target genes encode carbohydrate-binding protein of the ER, lipoxygenases and proteins containing the U-box and F-box domains, such as transport inhibitor response 1 (TIR1) and an auxin receptor involved in a hormone-depleting mechanism. The Sly/SpeMIR530 family displayed 31 distinct targets, with 14 of the SlyMIR530 and 17 of SpeMIR530 family. Among the targets were the DEAD/DEAH box helicase, ICP0-binding domain of ubiquitin-specific protease 7, ethylene-responsive transcription factor, adenylate isopentenyltransferase, proteins containing the AP2 domain, NAC domain protein, MULE transposase domain and tyrosine kinase type proteins. The Sly/SpeMIR827 family miRNAs had 50 different mRNAs targets, with 29 of the SlyMIR827 and 21 of SpeMIR827 family. These miRNAs comprise reverse transcriptase (RNA-dependent DNA polymerase), cytochrome P450, OTU-like cysteine protease, no apical meristem (NAM) protein, cell-wall invertase, glyoxylate reductase and SPX domain-containing family protein.

The search for the Sly/SpeMIR828 targets, revealed 31 different genes, with 16 target genes of the SlyMIR828 and 15 of SpeMIR828 family. Among them were MYB-type DNA-binding domain, syntaxin of plants 51, F9H3-4 protein, TCP transcription factor 1 and reverse transcriptase (RNA-dependent DNA polymerase). The tomato Sly/SpeMIR7983 family showed 89 target genes, with 75 of the SlyMIR7983 and 14 of SpeMIR7983 family. The targets contain transcripts encoding 2C phosphatases, serine carboxypeptidase, ribonuclease T2 family, glutathione S-transferase-like protein, the ENTH/ANTH/VHS superfamily protein, alpha/beta-hydrolase superfamily protein, UDP-glucuronate 4-epimerase 4 and Myb family transcription factor family protein. In addition, Sly/SpeMIR7990 miRNAs showed 15 different target transcripts, with 13 of the SlyMIR7990 and 2 of SpeMIR7990 family, such as GDSL esterase/lipase proteins, MADS-box transcription factor family protein, disease resistance protein (TIR-NBS-LRR class), WRKY DNA-binding protein 31, Nop53 protein, bHLH transcription factor GBOF-1 and the permease family. Interestingly, we identified 55 distinct target genes from miRNAs belonging to the Sly/SpeMIR8011 family, with 27 of the SlyMIR8011 and 28 of SpeMIR8011 family. Among them were alpha/beta-hydrolase family proteins, hAT family C-terminal dimerization region, reverse transcriptase (RNA-dependent DNA polymerase), PHD-finger transcription factor, potassium transporter, ripening regulated protein and ARM repeat superfamily protein. Finally, the Sly/SpeMIR8025 family had 11 miRNAs targets, with 5 of the SlyMIR8025 and 6 of SpeMIR8025 family, which encode auxin response factor 9A (ARF9A), kinase protein domain, FKBP-type peptidyl-prolyl cis-trans isomerase, NAD(P)-binding Rossmann-fold superfamily protein and telomerase activating protein Est1 (Supplementary Tables S12 and S13).

Expression of tomato miRNAs in S. lycopersicum and S. pennellii

We experimentally verified the expression of four miRNA genes (miR165/166, miR167, miR530 and miR7983) in two different tissues: leaves and flowers of each tomato species. In leaves, the miR165/166 expression level was 7.3 times higher in S. lycopersicum than in S. pennellii. The miR167 and miR530 were expressed at levels, respectively, 26.7 and 2.2 times higher in leaves of S. pennellii than in S. lycopersicum (Fig. 7a). In tomato flowers, miR167 also showed a 10.0 times higher expression in S. pennellii than in S. lycopersicum, and miR165/166, miR530 and miR7983 had no significant difference in the expression levels between the two tomato species (Fig. 7b). In addition, the miR165/166 expression level was 7.4 times higher in leaves than in flowers, whereas miR167, miR530 and miR7983 showed no difference in the two S. lycopersicum tissues (Fig. 7c). Finally, the miR167 and miR530 had, respectively, 16.0 and 2.1 times higher expression levels in leaves than in S. pennellii flowers, but miR165/166 and miR7983 showed no difference in the two S. pennellii tissues (Fig. 7d).

Figure 7
figure 7

Relative expression profiles of miR165/166, miR167, miR530 and miR7983 in (a) leaves and (b) flowers of (c) S. lycopersicum (SLY) and (d) S. pennellii (SPE).

Transcriptomic changes in tomato leaves and flowers RNA-seq-based

To confirm the expression of the RNA-silencing pathway components, we searched the RNA-seq data of S. lycopersicum and S. pennellii publicly available in the Sequence Read Archive (SRA)93. The heatmap confirmed expression of all genes identified in the two species in the two analysed tissues, except the SpeEXO.2 gene that did not express in leaves (Fig. 8, Supplementary Figs S16, S17 and Table S14). In S. lycopersicum 10 of the 29 genes (ABH1, AGO1.1, AGO1.2, AGO10.1, AGO10.2, EXO.1, EXO.2, EXO.3, SQN.1 and SQN.2) from the miRNA processing machinery were found to be differentially expressed (DE) between flowers and leaves. Among 10 DE genes, 8 were up-regulated in flowers and 2 were up-regulated in leaves, the latter being SlyEXO.1 and SlyEXO.3. Considering the relative expression between S. lycopersicum flowers and leaves, SlyAGO10.2 and SlySQN.2 showed higher DE in flowers. When compared by absolute expression values (count per million - CPM), SlyAGO1.2 was the most-expressed gene both in the flowers and leaves (Fig. 8a, Supplementary Fig. S16 and Table S14). Similar results were found in S. pennellii, wherein 11 of the 30 genes (AGO1.1, AGO1.2, AGO10, DDL, EXO.1, EXO.5, EXO.6, EXO.8, HESO1.2, SE and SQN.2) identified from the same pathway were found to be DE between flowers and leaves. In this case, all 11 genes were up-regulated in flowers, with SpeSQN.2 and SpeAGO10 genes more DE in flowers than in leaves. Regarding the absolute expression values, SpeAGO1.2 and SpeAGO1.1 were the genes most-expressed in flowers and leaves, respectively (Fig. 8b, Supplementary Fig. S17 and Table S14).

Figure 8
figure 8

Heatmap analysis of the genes involved in the tomato miRNA pathway from leaves and flowers RNA-seq libraries. Heatmap shows the expression profile of the (a) S. lycopersicum and (b) S. pennellii genes in two tissues, being LL (S. lycopersicum leaves), LF (S. lycopersicum flowers), PL (S. pennellii leaves) and PF (S. pennellii flowers).

Validation of the tomato miRNAs in S. lycopersicum small RNAs libraries

To validate the miRNAs identified in this study, we retrieved 95 small RNA libraries of S. lycopersicum available in the SRA and verified the presence of mature miRNAs in each library (Supplementary Table S15). A robust analysis of the presence of 285 different sequences of mature miRNAs found in S. lycopersicum was performed in the 95 different small RNA libraries, in order to validate the miRNAs identified in this study. Among the 285 different mature miRNA sequences, 262 were found in at least one analysed library and 23 mature miRNAs were not found in the searched libraries. Members of the MIR165/166, MIR167, MIR393, MIR530, MIR827, MIR828, MIR7983, MIR7990, MIR8011 and MIR8025 families were found in 69.5%, 64.2%, 43.2%, 95.8%, 27.4%, 14.7%, 62.1%, 31.6%, 54.7% and 1% of the analysed libraries, respectively. Only one mature miRNA of the MIR7983 and MIR8025 families was not found in any studied library.

Discussion

To better understand the biology, evolution and domestication and accelerate the agricultural applications of the miRNAs, the genes of their biogenesis pathway and their targets in tomato, we undertook the genomic study of these molecules in the cultivated tomato S. Lycopersicum and the wild tomato S. pennellii. Our results suggest that, in general, the miRNAs demonstrated similar evolutionary patterns between the two species studied. However, a larger number of molecules (miRNA processing pathways genes, their mature molecules and target genes) and higher levels of miRNA expression were found in the wild tomato than in the cultivated tomato. This finding showed a possible loss of genes along the evolution and domestication, which may be associated with higher biotic and abiotic resistance found in wild tomato, compared to cultivated tomato.

We identified 29 and 32 putative proteins involved in the miRNA pathways in S. lycopersicum and S. pennellii, respectively (Supplementary Tables S1 and S2), which is in agreement with the description of the miRNA biogenesis pathway in other plant species, such as in A. thaliana, P. vulgaris and O. sativa43,91,94,95. The presence of these proteins in different studies has shown a high conservation rate of the components of the miRNA processing pathways in plants, performing essential functions in the post-transcriptional regulation of mRNAs in different cells43,91.

In fact, our analysis revealed that the components of this pathway were highly conserved in the two species at the amino acid level, as well as the distribution of domains, active site location and phylogenetic distribution compared to orthologous proteins in other plant species. This pattern was observed in both protein families, ARGONAUTE and DICER, both considered to be critical in the processing machinery of miRNAs in plants47,91.

Among all proteins found, 15 and 16 proteins belong to the AGO family of S. lycopersicum and S. pennellii, respectively, compared to 19 identified in O. sativa48, 17 in P. vulgaris95. In A. thaliana 10 AGO proteins have been found91,94. Mirzaei et al.70 and Bai et al.69 found also 15 AGO proteins in S. lycopersicum, all of which were identified in this study69,70. However, in the study by Bai et al.69, two classifications were different. Solyc02g069280 and Solyc03g111760 were considered by the authors to be SlyAGO3 and SlyAGO15, but in our analyses they were classified as SLyAGO2 and SlyAGO4, respectively. Of all the AGO proteins identified in plants, only AGO1 and AGO10 showed activity in the biogenesis of miRNAs91,94.

The conserved domains found in SlyAGO1.1/1.2, SpeAGO1.1/1.2, SlyAGO10.1/10.2 and SpeAGO10 (Supplementary Fig. S1 and Table S3) were the same as in proteins from orthologous plant species, and the domain distribution was similar as well96,97,98,99. The Piwi conserved domains of the AGO proteins had an active site containing a DDH catalytic triad (Fig. 1a), present in the AGO protein family and proteins with RNase H function48,94. These results conformed to AGO1 and AGO10 proteins found in A. thaliana and P. vulgaris48,94,95. The conserved domains and active sites found in Sly/SpeAGO1 and Sly/SpeAGO10 showed high conservation compared to AGO1 and AGO10 proteins in A. thaliana, O. sativa, S. tuberosum, G. Max and P. vulgaris48,95,98,100. Such high domain conservation may be directly related to the role played by AGO1 and AGO10 proteins in the miRNA processing pathway in different species.

Other important catalytic components of the miRNA processing in plants are the DCL proteins95,101,102,103,104,105,106,107. In this study, we identified 8 and 11 putative DCL proteins of S. lycopersicum and S. pennellii, respectively. The number of DCL proteins found in tomato genome is in agreement with other plant species: 4 DCLs have been found in the A. thaliana genome91,105, 6 in P. vulgaris95 and in O. sativa at least 5 DCL proteins were identified48,105,108. Bai et al.69 and Wang et al.71 found only 7 DCL proteins in S. lycopersicum69,71.

The DCL1 protein is the protein of the DCL family involved in the processing of miRNAs58,109,110. The conserved domains found in Sly/SpeDCL1 (Supplementary Fig. S1 and Table S4) were very similar to their orthologous proteins from plant species95,105,108, suggesting a conserved function of DCL1 proteins in tomato species. According to recent studies, especially in DCL proteins, the distance between the RNAse III and PAZ conserved domains is suggested to be the largest determinant of the resulting length of processed miRNAs51,105,111. This demonstrates the importance of the domain distribution and conservation in the performance of these proteins at the cellular level. The two RNAse III domains (RNase IIIa and RNase IIIb) found in Sly/SpeDCL1 proteins displayed an active site with the EDDE catalytic residues (Fig. 1b), also identified in DCL1 proteins from other plant species112. These RNAse III domains are required for the cleavage and processing function of dsRNAs (double-stranded RNA) in the miRNA pathway in plants105,108.

A phylogenetic analysis of the Sly/SpeAGO (Fig. 2) and Sly/SpeDCL (Fig. 3) proteins was performed from their amino acid sequences, aiming at understanding the evolutionary diversification of AGO and DCL proteins in the two tomato species. The distribution of the seven clades showed a relationship between some paralogous proteins specific to the AGO family, with a great similarity among the AGO1, AGO5 and AGO10 proteins; AGO2 and AGO7; AGO6 and AGO4 conforming to the phylogenetic distribution in A. thaliana and P. vulgaris91,95,113. In general, the identified phylogeny distribution of AGO proteins is consistent with current species protein phylogenies114, which suggests that the conservation level for the members of the AGO family in Angiosperms is high. The phylogenetic tree of the AGO proteins showed a strong correlation between AGO1 and AGO10 proteins, with both AGO proteins participating in the miRNA pathway in plants91,94. AGO proteins have a key role in transcriptional and post-transcriptional silencing mediated by small RNAs in plants. The role of these proteins in miRNA regulation has been associated with the RISC complex (RNA-induced silencing complex), allowing degradation or inhibition of mRNA translation derived from the base complementarity between the miRNA specific sequences and the target mRNA91,94,115.

The analysis of Sly/SpeDCL proteins showed a great similarity to the evolutionary tree of life of these species within each clade114. The distribution among the DCL family paralogs showed a strong correlation between the DCL2 and DCL4 proteins, which were also close to DCL1 but further away from DCL3, consistent with the distribution in DCL families in other plant species48,105. Thus, it was observed that DCL proteins of S. lycopersicum and S. pennellii are highly conserved in the same way as observed for the AGO proteins, highlighting their great importance in post-transcriptional silencing processes in plants.

The validation of the miRNA pathway genes in the libraries available in the SRA93 showed the DE of 10 and 11 of the 29 and 32 genes identified in S. lycopersicum and S. pennellii, respectively (Fig. 8, Supplementary Figs S16, S17 and Table S14). Among the DE genes in the two species are the genes of ABH1, AGO1, AGO10, DDL, EXO, HESO1, SE and SQN families. In S. lycopersicum, of the 10 DE genes, 8 were up-regulated in flowers and 2 up-regulated in leaves, and in S. pennellii all 11 DE genes were up-regulated in flowers. This result shows a greater requirement of these genes in flowers, suggesting an essential role of the miRNA in this tissue, and considering that several miRNAs targets are related to different roles in flowers116,117,118. Yamaguchi and Abe (2012) and Wang117 showed an important function of miR172, miR156 and miR159 in the flowering time control in A. thaliana117,118. In addition, Aukerman and Sakai (2003) demonstrated that miR172 causes early flowering and disrupts the specification of floral organism when overexpressed in A. thaliana116.

Only the SlyEXO.1 and SlyEXO.3 genes were up-regulated in leaves compared with flowers, whereas the SpeEXO.2 gene was not expressed (Fig. 8, Supplementary Figs S16, S17 and Table S14) because these genes were also involved in other pathways or have other functions, such as delivery of mRNAs, miRNAs and cell-specific proteins, considering that this family had the greatest number of paralogous sequences119,120. In both species, AGO10 and SQN.2 were the most up-regulated genes in flowers, suggesting that the mechanism of using these genes in flowers is conserved in these two species. Ji et al.121 showed overlapping roles of AGO10 and AGO1 in the temporal regulation of floral stem cells121. Furthermore, Liu et al. (2014) reported that the role of miR168a in sweet orange was the accumulation of AGO1 in leaves and fruits122, which may explain up-regulation of AGO10 and SNQ.2 in the leaves, considering that AGO10 has important roles in this tissue, and SQN has a direct relation with AGO1, whereas AGO1 has also important roles in flowers and is down-regulated by miR168 in leaves121,122. Considering the absolute expression values (CPM), Sly/SpeAGO1.2 and SpeAGO1.1 were the most-expressed genes in both tissues, showing the great importance of this protein family in plant development and, consequently, in the post-transcriptional silencing machinery using miRNAs123.

As for the miRNA sequences, we identified 343 and 388 mature miRNAs, 226 and 234 precursor miRNAs, distributed in 87 and 85 distinct miRNA families, for 4197 and 5768 target genes in the S. lycopersicum (Supplementary Table S5) and S. pennellii (Supplementary Table S6) genomes, respectively. In addition to predicting previously reported miRNAs, our study identified many additional novel miRNAs in the genomic sequences of S. lycopersicum; notably, of the all miRNAs found, only 110 mature and 77 precursors were deposited in the miRBase (version 21)124 (Supplementary Table S5). Thus, we identified 233 new mature miRNAs and 149 new precursors contained in 41 miRNAs families in comparison with miRNAs deposited in miRBase.

Some other studies searched for miRNAs in S. lycopersicum, such as Moxon et al.77 who found only 30 miRNAs, whereas Yin et al.74, Kim et al.7, Din and Barozai80, Luan, Wang and Liu125 and Fan et al.73 detected 21, 12, 109, 14 and 218 miRNAs, respectively7,73,74,80,125. In addition, other works found only six miRNAs responsive to long-term RPM-treatment87, 69 miRNAs families linked with phosphate and mycorrhizal signalling81, 112 mature miRNAs responsive to curl New Delhi virus (ToLCNDV)75, 143 miRNAs associated with Botrytis cinerea infection82, 95 miRNAs responsive to salt-induced stress78, and only 69 putative miRNAs associated with biotic stress79. Further, our study identified 114 new mature miRNAs, 88 new precursors contained in 28 miRNAs families of the S. lycopersicum in comparison with studies published to date7,73,74,75,77,80,87,125,126,127,128,129,130,131,132,133. Moreover, we identified for the first time mature miRNAs and their precursors in S. pennellii (Supplementary Table S6).

The miRNA precursor gene localization and distribution in S. lycopersicum and S. pennellii genomes showed that 86.73% and 81.59% of the genes were found in intergenic regions, respectively, and that, in S. pennellii, 3 of them (spe-miR169-1, spe-miR1919 and spe-miR5368) are located in more than one position in the genome (Supplementary Table S7). Such high percentage of intergenic miRNAs was also found in other plants, such as A. thaliana, P. trichocarpa, O. sativa, P. vulgaris, Sorghum bicolour and Catharanthus roseus37,95,96,134,135,136,137. Some of the miRNA genes identified were organized in clusters (72 in S. lycopersicum and 73 in S. pennellii). The presence of miRNA genes distributed in clusters in other plants has been reported before, e.g., 54 miRNAs in clusters were found in A. thaliana, 40 in P. trichocarpa, 103 in O. sativa, 32 in S. bicolour137 and 25 in P. vulgaris95. Furthermore, 50 miRNA genes in S. lycopersicum and 47 in S. pennellii were also organized in clusters but in antiparallel strands, in agreement with other plant genomes deposited in the miRBase (version 21), such as miR399a and miR399e from O. sativa, miR169e and miR169d from A. thaliana and miR167e and miR167i from G. max124.

The structural and thermodynamic characteristics of the S. lycopersicum and S. pennellii were similar to those reported in other plant studies127,138,139,140. In plants, the values for MFE in precursor miRNAs are typically smaller than −18 kcal/mol140. Furthermore, studies have shown that the sequences of RNAs with mean AMFE −45.93 ± 9.43 kcal/mol are considered real miRNAs, based on the knowledge that AMFEs found in miRNA precursors had higher negative values than in other classes of non-coding RNAs, for example, tRNAs and rRNAs. MFEIs with a mean value of 0.97, above 0.85 were suggested for potential precursors of plant miRNAs, which distinguished them from other RNAs, consistent with our results138.

The statistical analyses using thermodynamic and structural characteristics of the tomato pre-miRNAs, when comparing the two tomato species (Supplementary Table S10) and Solanaceae family (Supplementary Table S11) showed no differences between the groups (p < 0.05). Both displayed high conservation and similarity among pre-miRNAs from S. lycopersicum, S. pennellii and also species belonging to the Solanaceae family. However, the results for S. lycopersicum and S. pennellii were significantly different from results for Fabaceae and Brassicaceae families (p < 0.05), suggesting a great evolutionary conservation of the pre-miRNA characteristics within each plant family and a significant difference among different families (Supplementary Table S11). These differences with respect to pre-miRNA characteristics were also observed for Monocotyledons and Eudicotyledons141.

Several studies have shown that the MIR165/166 and MIR167 families are highly conserved in different plant species, such as A. thaliana, O. sativa, G. max, P. vulgaris and S. tuberosum95,127,142,143,144,145,146. Members of the MIR393 family were also found in Ferns, Gymnosperms, Angiosperms, Monocotyledons and Eudicots, such as A. thaliana and P. vulgaris47,56,95,109,127,147,148,149,150. The MIR530 family was also identified in orthologous plants, such as S. tuberosum, O. sativa, C. sinensis, P. vulgaris and Z. mays95,106,127,148,151,152. Furthermore, P. trichocarpa, O. sativa, Z. mays, S. tuberosum, A. thaliana and Brassica spp. also possess miRNAs of the MIR827 family in their genomes127,149,153,154,155, and studies have demonstrated the presence of miRNAs of the MIR828 family in other species, such as A. thaliana, O. sativa, P. vulgaris, V. vinifera and Z. mays95,148,150,156,157,158,159. On the other hand, the MIR7990 and MIR8025 families were identified in S. tuberosum124,155, MIR7983 in Solanum pimpinellifolium160, and to date, the MIR8011 family has been found in Lycium chinense152. In addition MIR7990, MIR8011 and MIR8025 families were identified for the first time in S. lycopersicum and S. pennellii in this study suggesting that these family can be restricted to Solanaceae (solanum-specific)155.

The phylogenetic analysis performed of the MIR167 (Supplementary Fig. S11), MIR165/166 (Fig. 6), MIR393 (Supplementary Fig. S12), MIR530 (Supplementary Fig. S13), MIR827 (Supplementary Fig. S14) and MIR828 miRNAs families (Supplementary Fig. S15) of S. lycopersicum and S. pennellii highlighted an evolutionary distribution that highly conforms to the accepted tree of life for plant species161.

Several tomato miRNAs showed functions modulating important biological processes through the gene expression control (Supplementary Tables S12 and S13). The targets of Sly/SpeMIR165/166 family are involved mainly in plant vascular development162, including the formation of axillary meristems, root lateral meristems, laminar outgrowth and differentiation in leaves, stems, and roots143,163,164,165,166,167. Several studies have also identified targets of miR165/166 in A. thaliana and O. sativa, in agreement with our results, also showing a great conservation of function in this family of plant miRNAs, such as in retrotransposon, homeobox-leucine zipper family and kinase family proteins99,144,163,164,165,168,169.

The miR165/166 gene expression by real-time PCR in leaves and flowers of the two studied species showed interesting differences (Fig. 7). The miR165/166 was expressed at a higher level in S. lycopersicum leaves than in S. pennellii leaves. Specifically, in S. lycopersicum, miR165/166 has been expressed at a higher level in leaves than in flowers. The predicted targets for the tomato miR165/166 families have been related to the HD-Zip protein families, which participate in secondary development and vascular differentiation. In this sense, miR165/166 in S. lycopersicum leaves displayed overexpression compared to S. pennellii leaves suggesting a higher level of regulation of vascular development in the cultivated tomato species leaves than in the wild species. miR165/166 also showed high expression in rice after a short-term heat stress170, potato155 and in leaves, stems and roots of Poncirus trifoliate171. miRNAs from this family were also associated with a phenotype diversity involved in drought tolerance172,173,174. Furthermore, Guo et al.175 showed that during the establishment of Chinese kale seedlings, the miR165/166 expression was also regulated, showing its relation to the normal leaf development175, consistent with the higher expression found in leaves of S. lycopersicum than in flowers.

Many reports in plants have demonstrated conserved targets for MIR167 family, such as the genes involved in the disease resistance (CC-NBS-LRR) in O. sativa species and genes of the peptidase family proteins in A. thaliana169 consistent with the Spe/SlyMIR167 targets found in our study. Other important targets of MIR167 family include genes involved in the regulation of auxin response factors (ARF4, ARF6 and ARF8)176,177,178,179,180,181. As for expression, the miR167 was expressed at a higher level in S. pennellii than in S. lycopersicum. When comparing the S. pennellii tissues, leaves showed higher expression than flowers. In rice seeds170 and in S. tuberosum leaves and flowers182, miR167 was more abundant than in other tissues. Unlike Citrus grandis, which showed low expression of the miR167 family members in the floral developmental stages, indicating an important role in the reproductive development183 in agreement with our findings of low expression in flowers. In addition, miR167 was shown to be responsive to heat shock in Arabidopsis184 and played an important role in nitrogen stress responses, being up-regulated in maize shoots and roots185. Furthermore, most of the observed miR167 targets in plants were related to the hormonal pathways regulation, as in A. thaliana, rice and coffee154,186,187. Considering that miR167 plays these important roles, such as responses to heat shock, nitrogen stress and hormonal pathway regulation, its higher expression in S. pennellii than in S. lycopersicum suggests a significant influence on the metabolic pathway involved in resistance of wild plants to these abiotic stresses.

Interestingly, the Sly/SpeMIR393 family affects important proteins that regulate hormone-sensitive signalling in plant development188,189,190,191,192. The targets found for Sly/SpeMIR393, mainly the TIR1 genes, are in line with the data reported for A. thaliana, O. sativa, Z. mays, P. vulgaris, Hordeum vulgare, Gossypium hirsutum and M. truncatula43,56,95,127,149,169,188,189,190,193,194,195. In barley, miR393 acts as an integrator of environmental cues in auxin signalling, being able to improve plant resistance to aluminium toxicity195. In addition, miR393 exhibited important roles in nitrate response, defence against pathogenic bacteria and plant development191.

The Sly/SpeMIR530 family targets the genes encoding plus-3 domain-containing proteins, which was also found in N. tabacum196 and O. sativa169. Additionally, this miRNA family was involved in the nitrogen regulation and high salinity197,198, being pivotal in the plant physiological response to stress199. The higher miR530 expression in S. pennellii suggest an important role in the plant physiological control, mainly related to plant response to biotic and abiotic stresses, in which case it may need a higher expression in the wild species, according to their targets200.

Other studies also revealed similar targets for Sly/SpeMIR827 family miRNAs, such as SPX domain-containing family protein in A. thaliana169 and O. sativa201. The targets found indicated that miR827 plays a key role in plant stress adaptation, especially in nitrogen and phosphorus deficiency202,203,204,205. miR827 showed high expression in rice seeds154, unlike maize, which showed a suppressed expression in nitrogen deficiency, consistent with other studies that showed an involvement of miR827 in adaptive responses to low nitrogen and phosphorus conditions, showing its important role in stress adaptation185,197,198,202,203.

Sly/SpeMIR828 target genes showed its important roles in cell proliferation, control of apical dominance and organ symmetry, as well as senescence, such as found in studies carried out in other plants159,202,206,207. In A. thaliana, the MYB-type DNA-binding domain was also identified as miR828 target169. miR828 was expressed at higher levels in active bud in Camellia sinensis. The miR828 targets showed a role in regulating transcription and nucleotide metabolism208 and were holistically expressed in grapevine209. The miR828 presence also showed its likely involvement in phenylpropanoid metabolism, biotic and abiotic stress, cell differentiation and hormone responses210. Therefore, the targets found in this study showed the importance of the regulation performed by miRNAs (Supplementary Tables S12 and S13).

The validation of the mature miRNAs identified in this study, used 95 small RNAs libraries of S. lycopersicum available in the SRA (Supplementary Table S15). By performing a robust analysis, 262 mature miRNAs of S. lycopersicum were found in at least one library. This result showed that the methods used in this study for the miRNAs prediction are quite effective. Only 23 mature miRNAs were not found in the libraries. Among these miRNAs were sly-miR172d-4-5p, sly-miR172d-9-3p, sly-miR2111a-3p, sly-miR403a-3p, sly-miR7983-6-3p, sly-miR7992-5p, sly-miR8008b-2-3p and sly-miR8025-5p. Yet, other mature miRNA derived from their pre-miRNAs were present in the libraries, suggesting that in these specific libraries only one mature miRNA was necessarily expressed. Moreover, other mature miRNAs might be absent from the analysed libraries because some miRNAs are expressed only under specific conditions and/or tissues, which were not present in these libraries211,212. However, such validation analysis of the mature miRNAs could not be performed with S. pennellii due to the low number of small RNAs libraries available for this species, highlighting the fact that our study was the first to identify miRNAs in S. pennellii.

Conclusion

Considering the global importance of the tomato culture and the problems generated by the susceptibility to the pests of the cultivated tomato (S. lycopersicum) and the differences with the wild tomato (S. pennellii), the obtained results elucidated several aspects of miRNAs in these two species. Our results expand the study of miRNAs in plants by providing a better understanding of their essential roles in the miRNA-based regulation processes in tomato, their processing pathways and gene expression, as well as providing targets for future investigations.

Methods

Identification of putative proteins involved in the miRNA pathway in S. lycopersicum and S. pennellii

The sequences of proteins involved in small RNA pathways in A. thaliana and Solanum tuberosum were identified and retrieved from the reference protein database (refseq-protein) available at NCBI (National Center for Biotechnology Information). These sequences were used as queries to search for putative proteins involved in the small RNA biogenesis (including miRNAs) in S. lycopersicum and S. pennellii using BLASTp (Basic Local Alignment Search Tool) (http://blast.ncbi.nlm.nih.gov/Blast.cgi)213,214. Using the reference sequences, the putative sRNA pathway proteins were searched for in the two tomato species using the S. lycopersicum (ITAG3.0) and S. pennellii (Spenn_v2.0) data of the Sol Genomics Network (https://solgenomics.net/)28.

Prediction of mature and precursor miRNAs

Initially, the S. lycopersicum (ITAG3.0)29 and S. pennellii (Spenn_v2.0)20 assembly genome were obtained from Phytozome v12.1 (https://phytozome.jgi.doe.gov/)215 and PlaBi dataBase (http://www.plabipd.de/project_spenn/start.ep), respectively. Genome annotation files were also obtained from the databases for the respective species. The putative miRNAs and their precursors were searched applying the method described by De Souza Gomes et al.216, adapted for both species216. First, we obtained the genome sequences and found the potential hairpin sequences or similarity to miRNA precursor structures using Blastn (NCBI) and Einverted (EMBOSS tool) programs217 and the parameters of 336-nucleotides maximum repeat and threshold value of 25. Several filters were applied to these sequences to discard undesirable sequences, such as other non-coding RNAs sequences, and to keep those corresponding to putative miRNAs. The applied filters were based on conserved characteristics of precursor miRNA, as well as characteristics of other known regions, which had no potential to originate precursor miRNAs. These filters were: the GC (guanine and cytosine) content between 20% and 65%, minimum free energy (MFE), homology with conserved mature miRNAs, homology with repetitive regions and non-coding RNAs, except miRNAs216.

The miRBase database (version 21 - http://www.mirbase.org/)124,218 was used for comparisons with the putative hairpin sequences and find novel S. lycopersicum and S. pennelli miRNAs by accepting a maximum of 5 mismatches in final mature sequence. Other RNA groups, such as ribosomal RNA (rRNA), transporter RNA (tRNA), small nuclear RNA (snRNA), small nucleolar RNA (snoRNA), SL RNA, SRP, and RNase P were discarded based on the RFAM database version 12.0219 and repetitive sequences were discarded using the RepeatMasker 4.0.5 (http://www.repeatmasker.org/). The putative precursor and mature miRNAs identified were used for further analysis.

The pre-miRNA sequences identified in S. lycopersicum and S. pennellii were characterized according to their thermodynamic structures and parameters: minimum free energy (MFE), adjusted minimum free energy (AMFE), free minimum energy index (MFEI), size, content of A, content of U, content of C, content of G, of GC and AU, GC ratio, AU ratio, minimum free energy ensemble (MFEE), ensemble diversity (Diversity) and MFE structure frequency in the ensemble (Frequency). The AMFE was determined with MFE of 100 nucleotide-long sequence, and the MFEI was determined from the equation: MFEI = [(AMFE) × 100]/(G% + C%)]129. The pre-miRNA secondary structure prediction, in addition to the diversity calculation, MFE, ensemble frequency and MFE of the secondary structures were performed using the RNAfold (http://rna.tbi.univie.ac.at/cgi-bin/RNAWebSuite/RNAfold.cgi). The GC content and other structural properties were defined using Perl scripts.

Analysis of conserved domains, active sites, phylogeny, primary and secondary alignments

For the conservation analyses of domains, active sites and structure evaluation, putative protein sequences and precursor miRNAs, we used the multiple alignment by ClustalX 2.1 and ClustalW220. In addition, we used RNAalifold (http://rna.tbi.univie.ac.at/cgi-bin/RNAWebSuite/RNAalifold.cgi) to analyse the primary and secondary structure alignments of precursor miRNAs with their respective orthologues220. Adjusted parameters (gap opening: 22.50; gap extension: 0.83) were used for the multiple alignments of precursor miRNA sequences221. The pre-miRNAs and proteins sequences logos were originated from WebLogo 2.8.2 (http://weblogo.berkeley.edu/logo.cgi)222.

The conserved protein domains were recovered separately to verify, through multiple alignment, the presence of important amino acid residues. In addition, the PFAM family protein database (http://pfam.xfam.org/)223 was used to find conserved domains and their putative functions. The Conserved Domain Database (CDD) (http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi)224 was used to identify amino acids from the active sites of putative proteins.

The phylogenetic analysis was performed using MEGA5.2 program and the neighbour-joining method for miRNA precursor sequences and also putative proteins225,226. The miRNA evolutionary distance was calculated using the Kimura-2-parameter in site-replaced base units. For the putative proteins, the Jones-Taylor-Thornton (JTT) model was used227. A consensus tree was obtained using a bootstrap of 5000 replicates for the precursor miRNAs and 1000 replicates for the proteins, representing the evolutionary history of the analysed sequence group.

miRNA target prediction

The miRNA targets identified in S. lycopersicum and S. pennellii were predicted using the psRNATarget tool (2017 Update) (http://plantgrn.noble.org/psRNATarget/)228,229. In order to obtain the lowest false-positive prediction rate for miRNA target genes, the threshold was strict, being 3.0 for “Maximum expectation (Exp)”. Other parameters were standard: “Length for complementarity scoring (HSP size)” −19 bp; “Top target genes for each small RNA” −200; “Target accessibility - allowed maximum energy to unpair target site (UPE)” −25; “Flanking length around target site for target accessibility analysis” −17 bp before/13 bp after; “Range of central mismatch leading to translational inhibition” −10 – 11 nucleotides. To analyse the miRNA target genes predicted in tomato, candidate targets were retrieved by Phytozome v12.1 and Sol Genomics Network databases.

miRNA expression by real-time PCR

Plant material

The plant material was harvested at the “HortiAgro Sementes S/A” Experimental Station of the Center of Development and Transfer of Technology of the Federal University of Lavras (UFLA), at the municipality of Ijaci, MG. Leaves, flowers and fruits of Solanum lycopersicum and Solanum pennellii were harvested from three plants as biological replicates. The material was immediately frozen in liquid N2 and stored in 50-ml tubes in a −80 °C freezer until RNA extraction.

RNA extraction

Total RNA was extracted using the QIAzol® Lysis Reagent (QIAGEN). Approximately 100 mg of tissue was ground in liquid N2 and transferred to a tube, to which 1 ml of QIAzol was added and homogenized by vortexing. The tubes were left at room temperature for five minutes. Afterwards, 300 µl of chloroform was added and homogenized by vortexing. The mixture was left at room temperature for five minutes. Afterwards, the tubes were centrifuged at 12000 g at 4 °C for 15 min. Three phases were formed, and 400 µl of the upper phase was collected and transferred to a new tube, to which 500 µl of isopropanol was added and homogenized by vortexing, and the tube was left at room temperature for 10 min. The tubes were then centrifuged at 12000 g at 4 °C for 10 min for precipitation of nucleic acids. The liquid was discarded, and the pellet was washed with 1 ml of 75% ethanol and centrifuged at 7500 g at 4 °C for 10 min, and the liquid was discarded carefully to preserve the pellet. Then, 500 µl of isopropanol was added again and the steps were repeated. The pellet was dried at room temperature and then resuspended in 20 µl of autoclaved mili-Q water.

The RNA integrity was visualized in 0.8% agarose gel, and the quantity and quality (ratio 260/280 and 260/230 between 1.8 and 2.2) were measured on Nanovue spectrophotometer.

DNAse treatment

Total extracted RNA was treated with the Turbo DNA-freeTM kit (Life TechnologiesTM). Five µg of RNA was used, and the corresponding volume was adjusted to 22 µl with autoclaved milli-Q water. Then, 2.5 µl of the 10x Turbo DNAse Buffer and 0.5 µl of the Turbo DNAse were added to each sample. The reaction was incubated for 30 min at 37 °C. Afterwards, to stop the DNAse enzyme activity, 2.5 µl of the DNAse Inactivation reagent was added. The samples were left at room temperature for 5 min, with occasional mixing, and then centrifuged at 10000 RPM for 2 min. After centrifugation, 15 µl were collected and transferred to a new tube. The samples were stored at −20 °C until cDNA synthesis.

cDNA synthesis and relative expression

The stem-loop method63,230 was used for cDNA synthesis and expression analysis, following the previously described steps63,187. In this method, three primers are needed – the stem-loop RT primer, the forward and the reverse primers. The stem-loop RT primers were designed according to Chen (2004)230. The sequence data are presented in Supplementary Table S15. The stem-loop primers contain 50 nucleotides, of which 44 nucleotides correspond to a universal sequence that forms a stable stem-loop structure at low temperatures. The last 6 nucleotides at the 3′ end of the stem-loop RT primer is a reverse complement of the last 6 nucleotides at the 3′ end of the specific miRNA. The forward primers were designed with the exact sequence of the miRNA, excluding the last 6 nucleotides at the 3′ end. To increase the melting temperature, nucleotides were randomly added (5–7 nucleotides) to the 5′ end of the primer. Primer design software was used to calculate the melting temperature and verify the quality of the forward primers. The third, reverse primer is a reverse complement of the universal sequence of the stem-loop RT primer, and, therefore, is used for every miRNA.

The cDNA synthesis from total RNA (DNA-free) was performed using the ImProm-IITM Reverse Transcriptase (Promega). For each miRNA, 500 ng of RNA was used. The volume was adjusted to 7 µl with autoclaved mili-Q water. To the RNA, 1 µl of oligo-dT primer; 2 µl of specific stem-loop RT primer (1 µM) and 1 µl of the dNTP mix were added. The samples were incubated at 70 °C for 10 min for denaturation of the secondary structures and later incubated at 4 °C for 10 min. Then, 5 μl of Improm-II 5 × reaction buffer, 2.4 μl of 25 mM MgCl2, 0.6 μl of RNaseOut (Invitrogen), and 1 μl of the Improm-II Reverse Transcriptase were added. The reactions were incubated in a thermocycler at 16 °C for 30 min, followed by reverse transcription of 60 cycles at 30 °C for 30 s, 42 °C for 30 s, and 50 °C for 1 s. For inactivation of the Improm-II Reverse Transcriptase, the reaction was incubated at 70 °C for 15 min. Subsequently, the reactions were stored at −20 °C.

Quantitative real-time PCR (RT-qPCR) was performed using a standard SYBR Green PCR kit protocol on a Rotor-Gene Q (QIAGEN). The reactions were performed in a final volume of 15 μl, using 1.5 μl cDNA, 1.5 μl of each primer at a final concentration of 1 µM, 7.5 μl of SYBR Green PCR Master Mix (QIAGEN), and 3.0 μl of water, for each reaction. The reaction was incubated at 95 °C for 5 min, followed by 40 cycles at 95 °C for 5 s and 60 °C for 10 s. Then, the samples were heated from 55 to 95 °C with an increase of 1 °C to acquire the melting curve of the amplified products. All reactions were run in triplicate.

The primer efficiency was calculated for each miRNA and the reference genes by a standard curve of a 1:5 serial dilution of a pool of cDNA of the respective miRNA. The primer efficiencies were: miR165/166 (87%), sly-miR167 (97%), spe-miR167 (116%), miR7983 (97%) and miR530 (98%).

For the relative expression experiment of miRNAs from selected tissues, the 1:25 dilution was chosen. For the calculation of relative expression, the normalized comparative Cq (quantitative Cycle) method was used64, which takes into account the primer efficiency in the calculation. The normalization factor was the geometric mean of the U6 and 5.8S gene expression.

Expression of miRNA pathway genes in tomato RNA-seq libraries

Paired-end reads from leaves and flowers of both wild (Solanum pennellii) and cultivated (Solanum lycopersicum) tomato were retrieved from the Sequence Read Archive (SRA) under accessions numbers SRR786556, SRR786557, SRR786570, SRR786571, SRR786552, SRR786553, SRR786566, SRR786567, SRR786524, SRR786525, SRR786540, SRR786541, SRR786542, SRR786520, SRR786521, SRR786535, SRR786536, SRR78653793. In this way, 10 libraries for S. lycopersicum and 8 libraries for S. pennellii, totalling 77354968 paired-end reads, were used for differential expression analyses.

The quality of the libraries was evaluated using FastQC software231. Adapters were identified using minion232 and removed with Trimmomatic233 as well as were the reads with the quality score and length below 20 and 35 bp, respectively. After the quality control, 71703462 paired-end reads were used as inputs to STAR mapper tool (version 2.5.3a)234 with the default parameters for alignment against their respective genomes (S. lycopersicum version 3.2 and S. pennellii version 2.0) retrieved from the Sol Genomics Network28. Approximately 88% and 87% of the S. lycopersicum and S. pennellii reads were uniquely mapped to their genomes, respectively.

The libraries were then sorted by query name using picard tools (version 2.18.0), and for each of them, raw read counts were obtained using the python script htseq-count (version 0.7.2)235. Differentially expressed (DE) genes were identified using the Bioconductor R package edgeR by comparing the normalized number of reads aligned to each gene in different tissues236,237. The Benjamini and Hochberg’s false-discovery rate (FDR) below 0.05 and minimum fold-change of two were the parameters used to consider a gene DE238.

miRNAs in S. lycopersicum small RNA-seq libraries

Sequencing analysis

Ninety-five raw small RNA data files were retrieved from the NCBI Sequence Read Archive (SRA) with different accession numbers (Supplementary Table S15). The library qualities were evaluated using FastQC software231, adapters were removed with Trimmomatic233 discarding reads with quality score below 20 and length less than 17 nucleotides and longer than 30 nucleotides. The filtered sequences were mapped and quantified using miRDeep2239. miRDeep2 and perl scripts were used on each sequence separately to generate the numbers of the reads for each miRNAs identified.

Statistical analysis

For the statistical comparisons among the structural and thermodynamic variables of each category (species and/or families), a basic descriptive analysis was performed followed by non-parametric tests (Wilcoxon-Mann-Whitney). Median values were used to perform the statistical comparisons240. Statistical significance was set at p < 0.1.