Genome-wide identification of miRNAs and their targets during early somatic embryogenesis in Dimocarpus longan Lour.

miRNAs are endogenous regulatory factors that play pivotal roles in post-transcriptional regulation. However, their specific roles in early somatic embryogenesis (SE) remain unclear. Study of the SE system is fundamental for clarifying the molecular mechanisms in Dimocarpus longan. We identified 289 known miRNAs from 106 different miRNA families and 1087 novel miRNAs during early longan SE, including embryogenic callus (EC), incomplete pro-embryogenic culture (ICpEC), globular embryo (GE), and non-embryogenic callus (NEC). The abundances of known miRNAs were concentrated in GE. The differentially expression (DE) miRNAs showed five expression patterns during early SE. Largely miRNAs were expressed highly and specially in EC, ICpEC, and GE, respectively. Some miRNAs and putative target genes were enriched in lignin metabolism. Most potential targets were related to the pathways of plant hormone signal transduction, alternative splicing, tyrosine metabolism and sulfur metabolism in early longan SE. The regulatory relationships between dlo-miR166a-3p and DlHD-zip8, dlo-miR397a and DlLAC7, dlo-miR408-3p and DlLAC12 were confirmed by RNA ligase-mediated rapid amplification of cDNA ends. The expression patterns of eight DE miRNAs detected by qRT-PCR were consistent with RNA-seq. Finally, the miRNA regulatory network in early SE was constructed, which provided new insight into molecular mechanism of early SE in longan.

embryonic stages or in mixed samples, without referencing the genome of that species. This was likely to decrease the accuracy of miRNA transcriptional sequencing in species-specific stages of embryonic development.
Longan (Dimocarpus longan Lour.), a member of the Sapindaceae family, is an economically important subtropical evergreen fruit tree in China. Recent studies indicated that seed abortion normally occured during early embryonic stage in this species [18][19][20] , but it was a challenge to collect raw materials from early-stage zygotic embryos in order to study this. Consequently, it was important to clarify the molecular mechanisms involved in SE development in longan. Lin 21 revealed some miRNAs were stably expressed during the development of SE. The functions of the miRNAs (miR398, 393, 160 and 390) were further verified and analysed during the maturation of SE in longan [22][23][24] . The findings revealed that the miRNA regulatory network played an important role in the development of SE in longan, and early somatic embryo development was closely associated with the totipotency of differentiated embryogenic calli and the seed size.
To separate miRNA from early SE in longan, in this study, miRNAs and their target genes were identified from the early SE (embryogenic callus, EC; incomplete embryogenic compact structure, ICpEC; and globular embryo, GE) and non-embryogenic callus (NEC). The differential expression patterns and functional enrichment of metabolic pathways of miRNAs were analysed, and the essential roles of miRNAs in early SE were discussed. The results of this study provided insight into the specific miRNA regulatory network in early SE of longan.

Methods
Plant materials and RNA isolation. The somatic embryogenesis system of 'Honghezi' longan was constructed by Lai Zhongxiong [25][26][27] . The materials from four different stages of somatic embryogenesis were obtained in longan: embryogenic callus (EC), incomplete pro-embryogenic culture (ICpEC), globular embryo (GE) and non-embryogenic callus (NEC). EC, ICpEC and GE were obtained by culturing on MS + 1.0 mg/L 2,4-D, MS + 0.5 mg/L 2,4-D and MS + 0.1 mg/L 2,4-D, respectively, for 20d. The cellular morphology of EC was loosely packed and pale yellow, ICpEC was more tight, and that of GE was more tightly packed and featured protoderm cells. NEC was obtained by culturing on MS + 4.0 mg/L 2,4-D for 45 d 27 , and showed an irregular cell shape. After freezing in liquid nitrogen, the samples were stored at −80 °C for the extraction of total RNA for RNA-seq and quantitative reverse-transcriptase polymerase chain reaction (qRT-PCR).
Total RNA of the materials in the four stages was extracted with TRIZOL Reagent (Invitrogen, USA), in accordance with the manufacturer's instructions. Materials from all stages were analysed as three biological replicates. One percent agarose gel electrophoresis and an ultramicro-ultraviolet spectrophotometer were used to determine the quality of RNA. The RNA samples with A 260 /A 280 ratios between 1.9 and 2.1 and A 260 /A 230 > 2.0 were used for further RNA ligase-mediated rapid amplification of cDNA ends (RLM-RACE) PCR and qRT-PCR analyses.
Small RNA library construction and HiSeq sequencing. The four small RNA libraries (EC, ICpEC, GE, NEC) of longan were constructed and sequenced using the Illumina HiSeq2000.platform (Beijing Genome Institute, BGI). First, the electrophoretic bands of RNA between 18 and 30nt were separated on a PAGE gel and small RNA was recovered. A 5′ adapter was connected to the small RNA by T4 RNA ligase, followed by mixing, centrifugation and reaction at a suitable temperature for the set time. PAGE gel was used to purify and recover the 5′ ligation product. The method to purify and recover the 3′ ligation product was the same as for preparation of the 5′ ligation product. Second, Superscript II reverse transcriptase (Invitrogen) was used to synthesise the first strand of cDNA; the reverse transcription primer was 5′-CAAGCAGAAGACGGCAGCACGA-3′. Third, the small RNA libraries were constructed by bridging PCR and PAGE gel. Finally, Agilent 2100 Bioanalyzer 28 and the ABI Step One Plus Real-Time PCR System were used to detect the high throughput and accuracy of the small-RNA libraries. miRNA identification and bioinformatic analysis. After removing the low-quality reads, such as those containing poly-N, 5′ adapter contaminants, no 3′ adapter or insert tag, or those containing poly(A), (T), (G) or (C), from the raw data, the clean data were obtained. The clean tags were mapped to the longan genome (NCBI accession number: BioProject PRJNA305337) 29 and longan embryogenic callus transcriptional database (SRA050205) 30 by SOAP or Bowtie 31 . The expression and distribution of sRNA in the genome were analysed. Second, the quality, length, and public sRNA sequences were analysed statistically. Noncoding RNA, such as miRNA, siRNA, piRNA, rRNA, tRNA, snRNA and snoRNA, was identified via comparisons with miRBase21.0 (http://www.mirbase.org/ftp. shtml), Rfam (11.0) 32 and GenBank (http://www.ncbi.nlm.nih.gov/genbank/); the precursor sequences of potential miRNA were extracted and predicted with miREAP software (https://sourceforge.net/projects/mireap/) by comparison with known miRNAs in miRBase 21.0. Then, the known miRNAs were determined by comparing with the known database alignment and conserved miRNA sequences. Mireap and Mfold were used to annotate the sRNAs and make a single annotation for each unique sRNA. sRNA was traversed according to the priority order of rRNA> known miRNA> piRNA> repeat> exon> intron. Then compared exon antisense chain, intron and intergenic region 33,34 , if they did not annotate and map to the longan genome then they would be used to predict the novel miRNA by Mireap (a software independently developed by HuaDa for predicting miRNAs in plants and animals), and the novel secondary structure map was plotted. Finally, the first base distribution of candidate specific miRNAs was analysed statistically and the accuracy of the predicted results was judged. Gene Ontology (GO) functional annotation 35 and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses (www.kegg.jp/kegg/kegg1. html) 36 were used to investigate the functions of the target genes.
Differential expression analysis of miRNAs. The differential expression of known and novel miRNAs (DE miRNAs) was normalised in the four libraries using the ExpDiff method (Shenzhen, China). The miRNAs in the four libraries were normalised to one million to reduce potential errors before calculating the fold change, p-value and ratio as follows 37 In the formula, N1 and N2 represent the total counts of clean reads in two sRNA libraries, and x and y stand for the counts of miRNAs in the two sRNA libraries. The thresholds of log2 (fold change)|>1 and p-value < 0.01 were applied to define differential expression of the miRNAs between the two libraries, under the same circumstances, the p-value < 0.05 suggested that the difference was significant. In this study, the ratio defined by fold change revealed a miRNA expression of one stage of SE library compared with that in another stage library. A ratio> 2 meant the miRNA was upregulated, and a ratio < 1/2 meant the miRNA was downregulated. Standardised results were used to count fold_change, p-value and made drawings 37 . The DE miRNAs of different stages were identified using a Venn diagram (http://bioinformatics.psb.ugent.be/webtools/Venn/). Prediction and annotation of the target genes of miRNAs. psRNATarget 38 was used to identify the relationship between potential miRNAs and their target genes. Then, in the NR database, Blastx with an e-value less than e-10 was used to search the candidate targets to predict their possible functions. The functions of the differentially expressed miRNAs were analysed using Blast2GO software 39 . Modified 5′ RLM-RACE and real-time quantitative PCR. psRNATarget was used to predict the relationships of dlo-miR397a and DlLAC7, dlo-miR166a-3p and DlHD-zip, dlo-miR408-3p and DlLAC12. Gene Racer Kit (Invitrogen) reverse-transcription kit was used to synthesise cDNA template GeneRacer 5′ primers and gene-specific primers were designed according to the target gene sequence and miRNA matching position (Supplemental Data S1). The PCR products were purified, cloned and sequenced. The annealing temperatures of the first-and second-round PCR of cleavage sites validated PCR procedure were 57 °C and 56 °C, and the extension time was 30 s. The GeneRacer ™ 5′ primer was 5′-CGACUGGAGCACGAGGACACUGA-3′ and GeneRacer 5′ nested primer was 5′-GGACACUGACAUGGACUGAAGGAGUAGAAA-3′, which were used as universal primers.
The first-chain cDNA of miRNA qRT-PCR was synthesised by poly(A) miRNA-based qRT-PCR in accordance with the TransScript miRNA First-Strand cDNA Synthesis SuperMix Instruction Manual; then, the cDNA was diluted tenfold and an miRNA-qRT-PCR reaction system was established consisting of 0.4 μL of forward primer, 0.4 μL of universal primer, 10 μL of 2 × TransStart Tip Green qPCR SuperMix, 2 μL of cDNA template and 7.2 μL of ddH 2 O. The procedure was as follows: predenaturation at 95 °C for 30 s, followed by 40 cycles of denaturation at 95 °C for 10 s, annealing at 60 °C for 30 s and extension at 72 °C for 15 s. The first-chain cDNA of genes was synthesised by qRT-PCR using the PrimeScript RT Reagent Kit (Takara), in line with the instruction manual. cDNA from the three stages of somatic embryogenesis was diluted tenfold. The qRT-PCR reaction system consisted of 10 μL of SYBR, 2 μL of cDNA template, 0.8 μL each of forward and reverse primers, and 6.4 μL of ddH 2 O. The procedure was as follows: initial denaturation for 30 s at 95 °C, followed by 40 cycles of denaturation at 95 °C for 10 s, annealing at 60 °C for 30 s and extension at 72 °C for 15 s, performed on a LightCycler 480 instrument (Roche Applied Science, Switzerland). Each reaction was performed in triplicate. cDNA was diluted into different concentrations of 5×, 25×, 125× and 625× to make a solution curve. DlU6 21 and DlUBQ 40 were used as internal controls for the miRNAs and their potential target genes, respectively. Three biological and three technical replicates were used for analysing the expression of each miRNA and their potential target genes. The primers of miRNAs and their potential targets were used in qRT-PCR analyses (Supplemental Data S1). The expression level of miR-NAs and their potential targets were calculated by the 2 −ΔΔCt method. Finally, one-way ANOVA was performed in SPSS (Statistical Product and Service Solutions) to assess the statistical significance of differences in the data.

Results
The categories and sizes of sRNA of longan early SE varied greatly. Illumina HiSeq.used to construct a small RNA database of the four stages (EC, ICpEC, GE, and NEC). A total of 47,791,087 raw reads were generated. Clean reads comprised more than 90% of raw reads (Supplemental Data S2). In total, 80.78%-86.19% of the reads were perfectly matched to the longan genome (Table 1). Overall, 7099 (0.7%) to 10658 (0.26%) unique miRNA sequences were obtained in the four stages. The number of miRNAs in early SE was more than in NEC, and markedly larger in GE (Table 1). In addition, a total of 289 known mature miRNAs were identified, belonging to 106 miRNA gene families. In total, 140, 168, 163 and 116 mature miRNAs were identified in EC, ICpEC, GE and NEC, respectively (Supplemental Data S3). Most families had only one member; some families had two members, such as miR157, 159, 162, 168, 169, 319, 390, 393, 394, 395, 397, 408, 2592 and 396. Furthermore, miR159, 171 and 166 included five members; miR156 and 482 included six members; and miR167, 2592, 2118, 398 and 827 were consisted of three members respectively (Supplemental Data S4).
In general, the size of sRNAs ranged from 18 to 30nt in the four stages (EC, ICpEC, GE, and NEC). The most common sizes of miRNAs were 24 and 21nt. The development of early SE was mainly regulated by 24nt miRNAs, NEC was mainly regulated by 21nt miRNAs. The distributions of small RNA categories and their sizes were shown in Fig. 1. In total, 142 24nt miRNAs of four stages, 15 24nt miRNA were specifically expressed in NEC; 25 24nt miRNA were co-expressed in the three stages of EC, ICpEC and GE ( Fig. 2A). A total of 63 21nt miRNAs were identified in four stages, 13 of them were expressed in NEC, while 8 of them were in EC, ICpEC and GE. The

Identification of known and novel miRNAs during early SE in longan.
To identify the known conserved miRNAs, sRNAs were compared with other annotated miRNAs of plants in the miRBase database by Blast or Bowtie. Overall, 110, 118, 118, and 101 conserved miRNAs were found in EC, ICpEC, GE, and NEC, respectively. In addition, in these groups, 213, 398, 381, and 612 novel hairpin sequences (Supplemental Data S6) and 147, 245, 251, and 394 novel mature miRNAs were predicted and obtained (Supplemental Data S7). Moreover, 1250, 2438, 2482, and 3543 target genes were regulated by the novel mature miRNAs, with 1308, 2754, 3055 and 4042 targeted cleavage sites. The novel mature miRNAs ranged from 20 to 23nt in length, the most common sizes were 21 and 22nt in NEC, while the main size was 23nt miRNA in ICpEC (Supplemental Data S7).
In addition, the mature miRNA was cleaved by dicer, and the first specificity base site was preferentially U (uracil). In our study, when the predicted novel miRNA cut its target gene, the 10th shear site tended to A. The percentages of adenine (A), uracil (U), guanine (G) and cytosine (C) of the miRNA site were predicted in the changed dynamically development of longan early SE. Based on the analysis of the first base bias of 20-23nt novel miRNA showed that (Supplemental Data S8, Fig. S1 NEC-GE), the first base of 23nt miRNAs were all biased to U in the four stages, the 20nt miRNAs were not showed a regular rule. In addition, the results of novel miRNA nucleotide bias at each position analysis showed that (Supplemental Data S8, Fig. S1 NEC-GE): in early SE (EC, ICpEC, GE), the first base was preferentially U and the 10th was prefer to A or G, this consistent with the general  Table 1. Distribution of total and unique small RNA (sRNA) sequences from the early SE (EC, ICpEC, GE) and NEC. "Unann" refers to sRNA sequences for which no annotation information by database comparisons. www.nature.com/scientificreports www.nature.com/scientificreports/ rule of miRNA base distribution. At the NEC stage, the first base of miRNA was preferentially C rather than U, and the 10th base was preferentially G rather than A. The above results were inconsistent with the general rule of base distribution of miRNA.
The above result showed that it was necessary to further analyse the DE miRNAs important functions of specifically expressed in the comparisons of NEC vs. EC, EC vs. ICpEC, ICpEC vs. GE and EC vs. GE.
A indicated that there was at least one miRNA reads more than 10000 (13) in the four stages; B mean that it was only expressed in all three stages of EC, ICpEC, GE, at least one miRNA reads more than 500. C indicated that 11 miRNA only in NEC stage, D indicated that only 4 miRNAs were expressed in the EC, E indicated that 7 miRNAs were expressed specifically in the ICpEC, F indicated that 10 miRNAs were expressed specifically in the GE, which reads all more than 100.

Target prediction of DE miRNAs during early SE in longan.
To analyse the biological functions of miRNAs in early SE and NEC, the target genes of miRNAs were predicted. The results showed that the known miRNAs of the four stages (EC, ICpEC, GE, NEC) regulated 1690, 1446, 1566 and 1888 target genes. Novel miRNAs targeted 1250, 2438, 2482 and 3543 candidate genes, respectively. Many potential target genes weren not TFs, such as LRR receptor-like serine/threonine protein kinases, DNA-directed RNA polymerases I II and III subunit RPABC1 (POLR1C), DELLA protein, ATP-dependent DNA helicase HFM1/MER3, (cytosine 38-c5)-methyltransferase, MOB1, acylaminoacyl-peptidase (AAP) and L-ascorbic acid oxidase (Supplemental Data S12). The TF-encoding genes, including MYB, zinc-binding motif transcription factors and auxin response factor (ARF).

GO and KEGG functional analyses of DE miRNAs.
To further elucidate the potential pathways of the candidate target genes during early SE and NEC, GO and KEGG analyses were used to investigate the functions of putative target genes in early SE. GO enrichment analysis showed that the target genes of DE miRNAs were mainly enriched in three categories (Supplemental Data S14): biological process (BP), molecular function (MF) and cellular component (CC). In the four stages, the target genes were concentrated in 'cellular process' , The colors ranging from red to green indicate high to low correlation. (C) is the relative expressions of six miRNAs corresponding to the heat map and their potential target genes. U6 snRNA was used as a reference gene to normalize miRNA and DlUBQ was used as a reference gene to normalize mRNA expression data. " *" indicate significant difference at p-value < 0.05,"**" indicate significant difference at p-value < 0.01.

Scientific RepoRtS |
(2020) 10:4626 | https://doi.org/10.1038/s41598-020-60946-y www.nature.com/scientificreports www.nature.com/scientificreports/ 'metabolic process' , 'catalytic activity' , 'binding' , 'cell' , 'cell part' and 'single-organism process' . BP analysis showed that cytokinin-activated signalling pathway, lignin metabolic process, cell cycle process and DNA replication were significantly enriched in the comparison of NEC vs. EC (Supplemental Data S14). Lignin metabolism and DNA replication were significantly enriched in EC vs. ICpEC; however, lignin metabolism was only found in EC vs. GE. This suggested that the lignin metabolism pathway and cytokinin activated signalling pathway were particularly important in embryogenic induction (NEC vs. EC) of longan, and the lignin metabolism pathway plays a major role in early SE.
KEGG analysis was used to investigated target genes of the DE miRNAs in signal pathways; the threshold of FDR ≤ 0.05 was calculated using the corrected Q-value. Overall, 856, 743, 812 and 1121 target genes were annotated in EC, ICpEC, GE and NEC, respectively. The enrichment analysis showed that: in NEC vs. EC, 1129 target genes were enriched in 230 metabolic pathways. In addition, 597 target genes were enriched in 195 pathways in EC vs. ICpEC; 684 target genes were enriched in 208 pathways in EC vs. GE and 449 target genes were enriched in 170 pathways in ICpEC vs. GE.
The first 20 enrichment pathways of KEGG were summarised ( Table 2). Some metabolic pathways were only enriched in one comparison. For example, sulfur metabolism, glucuronide biosynthesis, glucoside biosynthesis-ganglion cell series, trialkane, piperidine and pyridine alkaloid synthesis were only enriched in NEC vs. EC, these metabolic pathways might be closely related to embryonic totipotency. The metabolic pathway of glyoxylate and dicarboxylate metabolism was only identified in EC vs. GE. Some metabolic pathways were only found in two comparisons: in NEC vs. EC and NEC vs. ICpEC brassinosteroid biosynthesis, plant hormone signal transduction and collecting duct acid were mainly founded These pathways might be representative in the differences of physiological, biochemical and morphological between non-embryogenic and embryogenic stages. Tyrosine metabolism, benzoate degradation, aminobenzoate degradation and ethylbenzene degradation might involve in the comparisons of EC vs. GE (Table 2), surprisingly, these two pathways in plants have been realy reported. However, plant pathogen interaction and pyrimidine metabolism appeared in six groups, suggesting that these two enriched pathways played important roles in early SE and NEC.
Lignin metabolism pathway during early SE in longan. GO enrichment analysis of the target genes showed that lignin metabolism (GO: 0009808) appeared in four stages (EC, ICpEC, GE, and NEC). Lignin metabolism and cytokinin metabolism were mainly identified in the comparison of NEC vs. EC. Moreover, dlo-miR397a targeted 12 DlLAC18-2; Dlo_000026.1, DlLAC11-1; Dlo_007670.1, DlLAC17-2; Dlo_023721.1, DlLAC17-7; and Dlo_007673.1, DlLAC17-3) to regulate lignin metabolism. Laccase (LAC) was not only involved in the metabolism of phenol-related substances but also participates in the lignin metabolism pathway in early SE. The expression of dlo-miR397a was up-regulated in early SE compared with NEC, indicating that the accumulation of dlo-miR397a might promote the development of embryogenesis and early SE in longan. It was inferred that the dynamic changes of lignin metabolism might affect the SE development of longan.

Sulfur metabolism pathway of embryogenesis in longan.
According to the analysis of enrichment factors in the KEGG metabolic pathway. Sulfur metabolism and glucosinyl metabolism were only identified in the comparison of NEC vs. EC, which were regulated by four miRNAs (dlo-miR2592ae, dlo-miR7539, dlo-miR7545, and dlo-miR952b). dlo-miR2592ae regulated two desulfoglucosinolate sulfotransferases a/b/c (dsGSs), while dlo-miR7539 regulated two sulfite oxidases. dlo-miR7545 targeted a thioglucose-thiotransferase a/b/c. dlo-miR952b targeted 3H-adenosine phosphate 5-phosphate sulfate synthase (PAPS). However, sulfur metabolism and glucosinyl metabolism were not to be activated in other stages, this pathway might play a leading role in NEC and EC (Supplemental Data S15, Fig. 4b).
Alternative splicing and ubiquitin-mediated proteolytic pathways in EC and ICpEC. According to the first 20 KEGG metabolic pathways in longan, the alternative splicing pathway only occurred in the comparison of EC vs. ICpEC. dlo-miR162a-5p regulated splicing factor 3B subunit 4, while dlo-miR2118e regulated two members of the ABC transporter G family. dlo-miR3512 regulated pre-mRNA-splicing helicase BRR2. dlo-miR5083 regulated U4/U6 small nuclear ribonucleoprotein PRP4. dlo-miR5540 regulated heat shock 70 kDa protein. dlo-miR7545 targeted splicing factor 3B subunit 3, as well as ATP-binding cassette, subfamily B (MDR/TAP), and PRP22. dlo-miR952b also targeted polyglutamine-binding protein 1 and THO complex subunit 2. Among others, dlo-miR5183, dlo-miR5083, dlo-miR7545 and dlo-miR952b were down-regulated 7-15 times, which suggested that these miRNAs might be the main participants in the development of ICpEC. These miRNAs targeted splicing dexd/h cassette ATP enzymes such as prp3, prp4, prp8, prp16 and prp22, which constituted the protein shear complex. Therefore, it was inferred that the development of ICpEC not only involved biological processes of cell division, proliferation and DNA replication, as well as alternative splicing events.
www.nature.com/scientificreports www.nature.com/scientificreports/ could regulate DlHD-zip8, three cleavage sites were located at the 6th CUG/GUC, the 10th site was UCC/GAA, and the 16th was GUA/AGG of the dlo-miR166a-3p. Moreover, the highest probability of cleavage sites occurred at the 10th site. A cleavage site might be located outside of the targeted complementary region, but the probability of this was small. dlo-miR397a targeted DlLAC7 with three cleavage sites, one of the most typical cleavage sites was located at the 10th site GUG/CAG. The other two cleavage sites were located outside of the complementary region. dlo-miR408-3p targeted DlLAC12 with two cleavage sites; the most typical cleavage site was the 10th site AAG/AGG. The other cleavage site was located outside of the complementary area. In conclusion, dlo-miR166a-3p, dlo-miR397a and dlo-miR408-3p tended to cleave their target genes at the 10th site, but they might also target sites outside of the target complementary region and be targeted by other miRNAs.
The three miRNAs (dlo-miR390a-5p, dlo-miR408-3p, and dlo-miR397a) up-regulated in the early SE, the target gene DlBR1 and DlSERK1 of dlo-miR390a-5p down-regulated in GE. DlSERK4 was down-regulated and targeted by dlo-miR408-3p in GE. DlLAC17-2 and DlLAC17-5 were down-regulated in GE and targeted by dlo-miR397a. The results showed that DlBR1, DlSERK1, DlSERK4, DlLAC17-2 and DlLAC17-5 were negatively regulated by dlo-miR390a-5p, dlo-miR408-3p and dlo-miR397a at ICpEC and GE. DlLAC12 was another target gene of dlo-miR408-3p, and down-regulated in the early SE. The expression levels of dlo-miR166a-3p and dlo-miR403-3p decreased slowly and then increased in the early SE, while the expression levels of their target genes, DlHD-zip 8 and Dlo-014588.1, first increased and then decreased. However, the expression level of dlo-miR171f decreased gradually in early SE of longan, and its target gene, DlDELLA, was up-regulated. Therefore, in longan early SE, most of the miRNA expression patterns were consistent with the sequencing results, and the negative regulatory relationships between target genes and miRNAs were verified, this further indicated that the RNA-seq results were credible (Fig. 7).

Many miRNAs are involved in early SE in longan.
A previous study demonstrated that miRNAs participate throughout the development of somatic embryogenesis in longan 11 , but did not includ a large-scale genomewide analysis of the expression levels of miRNAs in every stage of somatic embryogenesis in longan. In our study, more than 100 conserved miRNAs were measured in every stage. Among them, dlo-miR159a, dlo-miR398b, www.nature.com/scientificreports www.nature.com/scientificreports/ dlo-miR3932, dlo-miR319a-3p, dlo-miR408-3p and dlo-miR319 were expressed throughout the development of SE and NEC. Our findings generally support those reported by Lin 11 , it was beneficial for the development of embryogenic callus to down-regulate dlo-miR159a and dlo-miR319. Moreover, in our study, the accumulation of dlo-miR159 (dlo-miR159a and dlo-miR159b-3p) was beneficial for the development of GE. In Arabidopsis, miR159 was shown to regulate MYB33 and was induced by ABA during seed germination 41 , while the accumulation of ABA was increased from NEC to EC 27 . This is consistent with the participation of miR159 in ABA signal transduction and its promotion of embryonic induction, similar to the findings of our study. dlo-miR398b was found to be highly expressed in heart embryo, torpedo embryo and cotyledonary embryos during sequencing of sample mixtures from nine stages 11 . In our study, dlo-miR398b was higher expressed in NEC than in early SE, and particularly highly expressed in GE. Besides, we found that a high expression level of dlo-miR408-3p was beneficial to NEC morphogenesis, while it was also up-regulated from EC to GE (Fig. 4a). In maize somatic embryogenesis, miR408, and miR398, as well as their target genes (LAC2 and SOD9) exhibited substantial changes during the photoperiod exposure 42 , our study made up for previous research.
dlo-miR166a-3p, dlo-miR156a-5p, dlo-miR2118 and dlo-miR7696a-3p were stably highly expressed in early SE. dlo-miR166a-3p targeted DlHD-zip III (Fig. 7), which regulated GE morphogenesis; this further confirmed the findings of the research by Lin 11 . In Arabidopsis, miR166a might be induced by GA 3 and regulate the auxin response 43 . We found that, although miR2118 and miR482 might belong to the same family present in many plants 44,45 , dlo-miR2118 was highly expressed in early SE, while dlo-miR482 was only highly expressed in NEC (4281 reads). miR482 exhibited sequence diversity and has been widely found in dicotyledonous plants, its target gene, NBS-LRR, could have evolved over time to respond to biological stress 44 . The expression pattern of dlo-miR482 was similar to that previously reported in Citrus 46 . In mulberry phloem sap, it was shown that miR482 could participate in two-way transportation during long-distance conduction 47 . In particular, in our study, dlo-miR8019-3p, dlo-miR171c-5p, dlo-miR827, dlo-miR169q, and dlo-miR8153 were specifically expressed in GE, which contrasted with the findings in a study by Lin 11 . The miR169 family is a large and conserved miRNA family in many plant species. In Arabidopsis, miR169 targeted the NF-YA3 transcription factor and was shown to be highly expressed in late SE induction (10 days) 48 ; in addition, miR169 participats in nitrogen-starvation responses and stress 48,49 . dlo-miR169q was only highly expressed in GE of longan, which might be related to nitrogen or stress. In Lilium SE, miR169 was differentially expressed in NEC and EC1 17 . dlo-miR171c-5p was only expressed in GE, whereas dlo-miR171f was highly expressed in EC and ICpEC (Fig. 7), notablly, miR171c was largely accumulated in EC of Citrus 46 . miR827 could target NLA (Nitrogen Limitation and Adaptation) and PHT5 (Phosphate Transporter 5) 50 . Although it was conserved, dlo-miR827 targeted two glutathione S-transferases (Dlo_032407.1 and Dlo_032413.1) in longan. dlo-miR166e-3p, dlo-miR157d, dlo-miR156b, dlo-miR482, dlo-miR2118-5p, dlo-miR398b-3p and dlo-miR397-3p were not expressed in early SE, but only in NEC; inhibition of these miRNAs might promote the initiation of EC. dlo-miR397a, dlo-miR408-3p and dlo-miR166a-3p might participate in the development of early SE via the lignin metabolism pathway. miR397, miR408 and miR166 are conserved miRNAs in plants, which have been reported to participate in lignin and xylem synthesis [51][52][53] . Laccase and HD-Zip have also been widely studied [54][55][56] . However, there has been limited research regarding the associations between miRNAs and lignin regulation in plant embryo development, because target genes vary among species; moreover, miRNA biological functions are diverse and exhibit functional redundancy.
Lignin is one of the main components in the plant secondary cell wall, and is related to plant growth and differentiation 57 . In rice, OsmiR397 was strongly specifically expressed in undifferentiated embryogenic calli 58 , down-regulated laccase to vital to maintaining embryonic cells in a thin-wall and meristematic state. In Citrus, miR397 was highly expressed in globular embryos 10 , consistent with our results. In maize SE, the expression of miR408 and miR397 occurred in response to different concentrations of hormone in embryogenic tissues 42 . Our study suggested that dlo-miR408-3p might target DlLAC12 and DlSERK4, but this hypothesis is not supported by the results of degradome sequencing 11 . miR166/165 is an ancient and conserved miRNA family in plants. In Arabidopsis thaliana, miR166/165 can maintain the differentiation into stem apical meristem by influencing with AGO10, such that it cannot integrate into the AGO1 complex; this inhibits the expression of HD-Zip III and affects xylem formation and differentiation 59 . Our study found that the expression of dlo-miR166a-3p was significantly up-regulated in early SE, and most highly expressed in GE. Its target gene was DlHD-Zip 8 (ATHB8), it was significantly more highly expressed in GE than that in NEC and EC. Zhiqian Li investigated the HD-zip family from the whole-genome analysis of ovules and somatic embryos in grape, and found that HD-zip genes family might regulate seed abortion 60 . However, ATHB8 of the HD-Zip family was induced by auxin in Arabidopsis thaliana, and it was strongly expressed in vascular tissue when embryos developed to the heart embryonic stage 43 . In contrast, the accumulation of DlHD-Zip 8 (ATHB8) occurred earlier in the early SE of longan than in Arabidopsis thaliana SE, which suggested that more complete vascular tissue might form in the longan GE. In Arabidopsis root, miR166 can cleave the mRNA of HD-Zip III to varying degrees, by moving from cortex to middle column in different gradient concentrations, it was shown to regulate the formation of epidermal xylem and primary xylem 61 . Similarly, the interaction of dlo-miR166 and HD-Zip III might involve in the regulation of xylem in longan early SE. In recent years, it has been shown that HD-Zip III might be induced by BR and be involved in the division and differentiation of procambium cells to produce xylem 62 . In this experiment, the auxin signalling pathway and brassinolide pathway were significantly enriched in the comparison of NEC vs. EC; dlo-miR166a-3p was not expressed in NEC, suggesting that it might play a major role in totipotency. The dlo-miR166a-3p might be www.nature.com/scientificreports www.nature.com/scientificreports/ induced by auxin and brassinolide. Hence, the changes in xylem cells and lignin metabolism might be regulated by dlo-miR166a-3p, dlo-miR408-3p, and dlo-miR397a, which might be affected by auxin and BR during the development of longan SE (Fig. 4a).

The miRNAs targeted RNA polymerase II might play an important role in early SE of longan.
RNA polymerase II not only involved in transcribing mRNA, but also played an important role in the process of miRNA biosynthesis; specifically, it transcribed miRNA genes into primary miRNAs (pri-miRNAs) 63 . Intriguingly, AAP was annotated more frequently in NEC than in early SE, it was targeted by dlo-miR8011a-5p and dlo-miR5764a, and it encoded proline oligopeptidase (POP) in plants. It was found that flavonoids and lignans could inhibit the activity of POP in Rhodiola tangut 64 . POP is the key enzyme in the synthesis of amanitatoxin, an α-ointment peptide. At low concentration, amanitatoxin could inhibit the activity of RNA polymerase II, there by hindering protein synthesis 65 . α-Ointment peptide was reported to inhibit the hypocotyl growth of mung bean seeds 66 . However, the annotated frequency of POLR1C and MOB1 were only lowest abundance in NEC, highest abundance in EC and lower in ICpEC and GE. Both POLR1C and MOB1 were regulated by the same miRNA (dlo-miR2118a-3p, dlo-miR9484), which was contrary to the abundance of AAP in the early SE. MOB1 is a cell-ploidy maintenance protein and plays an important role in cell mitosis 67 , involved in the initiation of cell division, the coordination of cell polarity and the cell cycle during mitosis 68 . In Arabidopsis thaliana, down-regulation of the mob1 gene may cause cell cycle disorder and affect sporophyte and gametophyte development 69 , and may be involved in the regulation of growth and development via auxin signal transduction 70 . It may also involved in Hippo metabolic pathways, regulating cell division, withdrawal and polarity establishment to control tissue growth and programmed cell death, and determine the size of organs 71 . The most important of these processes must be regulated by binding Pol II. Therefore, we speculated that POP activity was higher in NEC, which might inhibit the activities of POLR1C, and hindered the synthesis of some proteins in NEC. This further supported the findings of Chen Chunling's research, in which protein staining was very weak and protein metabolism was very low at NEC 27 . At the same time, POLR1C might also regulate MOB1 to participate in cell mitosis controlling the size of organs [72][73][74] . Therefore, AAP (POP) might influence the biosynthesis of miRNA and cell mitosis by affecting the activity of Pol II and MOB1 during the development of early SE in longan (Fig. 4a).

Data availability
All the data generated or analyzed during this study are included in this published article and its Supplementary Information Files.