Introduction

Eucommia ulmoides is a tree widely cultivated in the temperate zone, and it produces Eucommia rubber (Eu-rubber), a trans-polyisoprene (trans-1, 4-polyisoprene, TPI), is a special natural material. These specific properties, including high rigidity, low coefficient of thermal expansion/contraction, exceptional insulation, and resistance to acid and alkali conditions, could be exploited as an raw material for pharmaceutical, and industrial instruments1,2,3,4. However, the relatively low rubber content in E. ulmoides organs greatly increases the production cost. Previous studies reported that the accumulation of Eu-rubber is related to its organ development5. Hence, the systematic identification of regulatory genes for organ development in E. ulmoides might help to elucidate the underlying molecular mechanisms of Eu-rubber accumulation. A concrete step in this direction was the genome sequencing of E. ulmoides, which provides a comprehensive overview of various gene families.

To regulate the development of organs, plants have acquired complex mechanisms during their long evolution. Mitogen-activated protein kinase (MAPK) cascades are universal modules of signal transduction in eukaryotes that play crucial roles in plant development processes6. MAPK cascades consist of a core module of three kinases, namely MAPK, MAPK kinase (MAPKK), and MAPKK kinase (MAPKKK), which connect upstream sensors/receptors to downstream targets7. MAP kinases form a linear cascade of three consecutively acting protein kinases: MAPKKK are activated by interlinking MAPKKK kinases, by receptor phosphorylation, or by physical interaction, then, MAPKKKs activate downstream MAPKKs by phosphorylating the serine/threonine residues in the conserved S/TXXXXXS/T motif, and MAPKKs activate MAPKs by phosphorylating the tyrosine and threonine residues in the conserved TEY or TDY motif8. The activated MAPKs phosphorylate multifarious signaling components, transcription factors, or enzymes that modulate the downstream gene expression to achieve signal amplification9,10.

Plant MAPK cascade genes were first reported in Arabidopsis thaliana 6. Based on phylogenetic analyses, MAPKs and MAPKKs were divided into four groups (A–D)6, whereas MAPKKKs were classified into three subfamilies, namely MEKK, RAF, and ZIK, based on differences in the conserved domain or signature motif11. Previous studies have reported that MAPK cascade genes play various roles in plant innate immunity12, biotic13 and abiotic defense14,15,16,17, stress and hormone response18,19, organ and tissue development20,21, cell division22, differentiation23, and death24, and mRNA regulation25,26.

The genome sequencing of various plant species has allowed the identification of MAPK cascades: 20 MAPKs, 10 MAPKKs, and 80 MAPKKKs were reported in A. thaliana 6,8; 16 MAPKs, eight MAPKKs, and 75 MAPKKKs in rice27,28; 38 MAPKs, 11 MAPKKs, and 150 MAPKKKs in soybean29; 16 MAPKs, five MAPKKs, and 89 MAPKKKs in tomato30; 10 MAPKs, five MAPKKs, and 32 MAPKKKs in mulberry31; 14 MAPKs, six MAPKKs, and 59 MAPKKKs in cucumber32; 16 MAPKs, 12 MAPKKs, and 73 MAPKKKs in Brachypodium distachyon 33; and 25 MAPKs, 10 MAPKKs, and 77 MAPKKKs in banana34,35. However, little information about MAPK cascades have been reported in E. ulmoides.

In this study, we identified 13 MAPKs, five MAPKKs, and 57 MAPKKKs in E. ulmoides that named based on the corresponding homology with A. thaliana MAPK cascades. All the protein sequences were used to construct phylogenetic trees and study the evolutionary relationships in dicots. The predicted conserved domains, motifs, and gene structures were subsequently analyzed. The transcript profiles of all predicted EuMAPK cascades in various organs at different development stages were analyzed, and several genes with special expression patterns were screened and validated by qRT-PCR. Overall, our study provides a solid foundation for further studies on the precise roles of MAPK cascades in organ development and signaling pathways in E. ulmoides.

Results and Discussion

Identification of MAPK, MAPKK, and MAPKKK families in E. ulmoides

The availability of E. ulmoides sequences allowed the genome-wide identification and analysis of MAPK, MAPKK, and MAPKKK families. A BLASTP search was performed in the E. ulmoides protein database using A. thaliana MAPK cascade protein sequences as queries. After screening and validating the conserved domains of all candidate sequences using the Batch Web CD-Search Tool, we identified 13 EuMAPKs, five EuMAPKKs, and 57 EuMAPKKKs (Supplementary Files S1, S2, and S3). The predicted MAPKs, MAPKKs, and MAPKKKs in E. ulmoides were named based on their corresponding homology with MAPK, MAPKK, and MAPKKK proteins from A. thaliana 6,8, similarly as in soybean29, cucumber32, and Brachypodium distachyon 33. If two or more E. ulmoides genes had the same homolog in A. thaliana, they were distinguished by an additional part such as −1, −2, −3. Furthermore, a BLASTN search was conducted and showed that all the predicted EuMAPKs (Table 1), EuMAPKKs (Table 2), and EuMAPKKKs (Table 3) were supported by the existence of ESTs or unigenes.

Table 1 Characteristics of the MAPKs in E. ulmoides.
Table 2 Characteristics of the MAPKKs in E. ulmoides.
Table 3 Characteristics of the MAPKKKs in E. ulmoides.

The 13 EuMAPK predicted proteins contained 343 (EuMPK11) to 599 (EuMPK15) amino acid residues with a putative pI ranging from 5.20 (EuMPK4-3) to 9.38 (EuMPK15) and a putative Mw ranging from 39.4 (EuMPK11) to 67.7 (EuMPK15). EuMAPKs were predicted to be localized in the nucleus, cytoplasm, mitochondria, or plasma membranes (Table 1). The five EuMAPKK predicted proteins contained 352 (EuMKK2) to 488 (EuMKK3) amino acid residues with a putative pI ranging from 5.67 (EuMKK3) to 9.22 (EuMKK5) and a putative Mw ranging from 39.0 (EuMKK5) to 54.4 (EuMKK3). EuMAPKKs were predicted to be localized in the nucleus or cytoplasm (Table 2). The 57 EuMAPKKK predicted proteins contained 125 (EuRAF22-2) to 1,290 (EuRAF20-3) amino acid residues with a putative pI ranging from 4.69 (EuMEKK16) to 9.92 (EuMEKK3-3) and a putative Mw ranging from 14.29 (EuRAF22-2) to 140.71 (EuRAF16-1). EuMAPKKKs were predicted to be localized in the nucleus, mitochondria, cytoplasm, or chloroplasts (Table 3).

Phylogenetic relationship and evolution pattern analysis

Unrooted phylogenetic trees were generated based on the aligned protein sequences of all 13 EuMAPKs, five EuMAPKKs, and 57 EuMAPKKKs and showed similar topologies, except for only minor modifications at deep nodes. Based on the phylogenetic trees and the homology with A. thaliana, the 13 EuMAPKs were classified into four groups (A–D; Fig. 1a); the five EuMAPKKs were also classified into four groups (A–D; Fig. 2a); whereas the 57 EuMAPKKKs were classified into three sub-families (12 MEKKs, 34 RAFs, and 11 ZIKs) (Fig. 3a). These results were consistent with those reported in previous studies on rice28, tomato30, and cucumber32.

Figure 1
figure 1

Phylogenetic relationship, conserved domain and motif analysis of MAPKs in E. ulmoides. (a) The unrooted phylogenetic tree was construceted based on the amino acid sequences by the NJ method using MEGE 7.0. Bootstrap supports from 1000 replicates are indicated at each branch. The members of each subfamily are indicated with the same color. (b) Conserved domain was analyzed by searching those known domains with PlantsP. (c) Motif was analyzed by MEME program online. Different colors of boxes represent different motifs in the corresponding position.

Figure 2
figure 2

Phylogenetic relationship (a), conserved domain (b),and motif analysis (c) of MAPKKs in E. ulmoides. Additional details were shown in the Fig. 1.

Figure 3
figure 3

Phylogenetic relationship (a), conserved domain (b), and motif analysis (c) of MAPKKKs in E. ulmoides. Additional details were shown in the Fig. 1.

To study the evolutionary relationships of the MAPKs, MAPKKs, and MAPKKKs in dicots, we compared the member number of each family in E. ulmoides with that in other dicotyledons. According to the Angiosperm Phylogeny Group (APG IV) classification36, both tomato and E. ulmoides were classified as Asterids, and A. thaliana and Populus tremula were also selected as a model plant and model forest tree, respectively. Τhe MAPK cascades of all the above species were re-confirmed using the most updated genome versions and the same screening criteria. The number of MAPKs, MAPKKs, and MAPKKKs in different species is listed in Table 4. Unrooted phylogenetic trees were constructed based on 71 MAPKK, 31 MAPKK, and 339 MAPKKK sequences (Supplementary Table S1). The results showed that MAPKs and MAPKKs were clearly classified into four distinct groups (Supplementary Figs S1 and S2), and MAPKKKs were classified into three subfamilies, namely, MEKK, RAF, and ZIK (Supplementary Fig. S3). Meanwhile, all groups and subfamilies contained most members of the four species, indicating that MAPK cascades might derive from a common ancestor. The evolutionary relationship of MAPK cascades in E. ulmoides and those in tomato was closer than that of the same genes in A. thaliana and those in P. tremula, results that were in conformity with the APG taxonomic system.

Table 4 The number of MAPK cascades in E. ulmoides, S. lycopersicum, A. thaliana, and P. tremula.

Analysis of conserved domains/motifs and gene structure

All the members of the three MAPK families harbored a protein kinase domain (Figs 1b, 2b, and 3b), confirming the reliability of all predicted EuMAPK cascades. In the EuMAPK family, the members of group D had an extended C-terminal region, but lacked a serine/threonine protein kinase active-site signature (Fig. 1b), similarly as those in A. thaliana 6 and cucumber33; EuMPK11 was predicted to harbor a transmembrane region (Fig. 1b), which confirmed its predicted subcellular localization in the plasma membrane. All EuMAPKKs harbored a protein kinase domain, a tyrosine kinase, an ATP-binding region, and a serine/threonine protein kinase active site, and EuMAPKK3 was predicted to have a long C-terminal region (Fig. 2b), similarly to MAPKKs in cucumber33. All EuMAPKKKs contained a protein tyrosine kinase. The kinase domain of most ZIK subfamily proteins was located at the C-terminal, whereas that of most RAF subfamily proteins was located at the N-terminal. A protein kinase ATP-binding region signature was only found in the MEKK subfamily. All these results were consistent with those previously reported in A. thaliana 8, rice28, and tomato30.

The motifs were analyzed by the MEME. In the EuMAPK family, almost all the members in the same subfamily shared a similar quantity of motifs (Fig. 1c). For instance, all the members of group D had ten motifs, whereas all the members of group A, B, and C had nine motifs, except for EuMPK3. Meanwhile, all the members of group D had the 9th motif in the N-terminal region and the 10th motif in the C-terminal region, whereas the opposite trend was observed for all the members of group A, B, and C. The same results were obtained for the EuMAPKK and EuMAPKKK families (Figs 2c and 3c), indicating that the classification was supported by motif analysis.

To evaluate the phylogenetic relationships based on the gene structure, the exon-intron organization of all EuMAPK cascades was analyzed. The number of introns in the EuMAPKs was 1–12 (Fig. 4), and that in the EuMAPKKs was 0–8, the intron phase and exon/intron organization in the EuMAPKs and EuMAPKKs were relatively conserved within the same group (Fig. 5), indicating that the classification of EuMAPKs and EuMAPKKs was supported by the gene structure analysis. However, the number of introns displayed a higher degree of variability in the EuMAPKKKs (Fig. 6), ranging from 0 to 17. In the MEKK subfamily, the number of introns was 0–17; EuMEKK21 had no introns, EuMEKK16 and EuMEKK13 had only one intron, whereas the remaining members had 7–17 introns, results that were consistent with those reported in cucumber32. The RAF subfamily members had 1–16 introns, whereas the ZIK subfamily members had 0–9 introns, results that were consistent with those reported in B. distachyon 33. Collectively, the classification of the EuMAPKKKs was supported by the comparison with orthologous families. The size of introns in the three EuMAPKs was positively correlated with the genome size in E. ulmoides, A. thaliana 6, B. distachyon 8, cucumber32, and banana35, whereas the number of introns was relatively conserved among the species.

Figure 4
figure 4

Phylogenetic relationship and gene structure analysis of MAPKs in E. ulmoides. Right part illustrates the intron/exon configurations of the each EuMAPK. The yellow boxes denote the exons, and the lines denote the introns.

Figure 5
figure 5

Phylogenetic relationship and gene structure analysis of MAPKKs in E. ulmoides. Additional details were shown in the Fig. 4.

Figure 6
figure 6

Phylogenetic relationship and gene structure analysis of MAPKKKs in E. ulmoides. Additional details were shown in the Fig. 4.

Expression analysis of EuMAPK, EuMAPKK, and EuMAPKKK genes in various organs at different developmental stages

To reveal the temporal and spatial expression patterns of EuMAPK cascades, we compared the transcription levels in various organs at different developmental stages, including fruits, leaves, barks, male flowers, female flowers, and seeds. The expression levels of these genes were clustered and presented in heatmaps (Figs 7, 8, and 9). The results revealed all MAPK cascade members were expressed in almost all tested organs.

Figure 7
figure 7

Expression profiles of EuMAPKs in various organs at different developmental stages based on RNA-seq data. The expression levels of genes are presented in heatmap using fold-change values transformed to Log2 format by HemI 1.0. The color scale and Log2 values are shown at the top of the heatmap. Genes were clustered according to their expression profiles.

Figure 8
figure 8

Expression profiles of EuMAPKKs in various organs at different developmental stages based on RNA-seq data. Additional details were shown in the Fig. 7.

Figure 9
figure 9

Expression profiles of EuMAPKKKs in various organs at different developmental stages based on RNA-seq data. Additional details were shown in the Fig. 7.

To find the key members of EuMAPK cascades in the course of E. ulmoides organ development, the coefficient of variation (CV) of gene expression levels in all tested organs at various developmental stages (CVall) as well as in the fruits and leaves at all developmental stages (CVF and CVL, respectively) were calculated (Supplementary Tables S2, S3, and S4). The results showed that no genes had a CVall lower than 10%, and only one had a CVall higher than 200% (EuRAF2-3; 262.63%). EuRAF3-1 and EuRAF22-2 showed the lowest CVall (23.1%) and CVF (9.58%), respectively, and EuRAF34-1 and EuRAF33-2 had the two lowest CVL (1.64% and 8.79%, respectively), indicating that these genes had stable expression levels and might play important roles in the corresponding organs at all developmental stages.

The relative expression is an important indicator of the gene function. Based on the Fragments per kilobase of per million fragments mapped (FPKM) values, we found that the relative expression of EuZIK1 and EuMKK2 was significantly (p < 0.01) higher than that of the other 73 EuMAPKs, suggesting that these two genes might play important roles in the EuMAPK cascade. Additionally, Our results showed that some genes expression levels were significantly higher in fruits and seeds at late developmental stage than those in other organs, therefore, we calculated the log2-base ratio value between different organs or between different stages of the same organ. The expression levels of EuRAF2-3 increased more than 5.5-fold (log2-base value) and 7.5-fold (log2-base value) in fruits and seeds, respectively, at late development stages, suggesting that this gene might participate in fruit and seed ripening. The expression levels of EuMPK11 and EuMEKK21 increased more than 2.5-fold (log2-base value) in fruits and leaves and more than 4.5-fold (log2-base value) in fruits, respectively, at late development staged, suggesting that both genes might participate in fruit ripening, whereas the former might also participate in leaf development.

Validation of key MAPK cascades by qRT-PCR

Three genes (EuRAF22-2, EuRAF34-1, and EuRAF33-2) with stable expression patternsat all stages of fruit or leaf development, three genes (EuRAF2-3, EuMPK11, and EuMEKK21) with differential expression patterns, and two highly expressed genes (EuZIK1 and EuMKK2) were selected for qRT-PCR analysis to validate the RNA-seq data. The integral trend of expression patterns of all the selected genes was consistent with that obtained from the RNA-seq data, confirming data reliability (Fig. 10).

Figure 10
figure 10

qRT-PCR analysis of relative expression of eight screened genes during E. ulmoides fruits and leaves development.

Methods

Search for MAPK cascades and sequence analysis

The predicted E. ulmoides peptide sequences were acquired from the E. ulmoides genome database to construct a local protein database. A BLASTP search was performed using 20 MAPK, 10 MAPKK, and 80 MAPKKK protein sequences from A. thaliana (Supplementary Table 5) as queries in The Arabidopsis Information Resource (TAIR; http://www.arabidopsis.org/), the National Center for Biotechnology Information (NCBI; https://www.ncbi.nlm.nih.gov/), and the Universal Protein Resource (Uniprot; http://www.uniprot.org/) databases with an e-value of 1e-10 and a minimum amino acid identity of 50%. Then, a self-BLAST of all hits was carried out to remove redundancies. All the candidate genes were detected by the NCBI Batch Web CD-Search Tool (http://www.ncbi.nlm.nih.gov/Structure/bwrpsb/bwrpsb.cgi) database to confirm the presence of the kinase domain. MAPKs should contain a T(E/D)YVxTRWYRAPE(L/V) signature motif, MAPKKs should contain a VGTxxYM(S/A)PER motif, whereas MAPKKKs should contain one of the three signature motifs: G(T/S)(P/A)x(W/F/Y)MAPE (MEKK-like), GTxx(W/Y)MAPE (Raf-like), or GTPE(Y/F)MAPExY(ZIK-like)8. A local BLASTN search was performed against the E. ulmoides expressed sequence tags (ESTs) and unigenes to verify the existence of the predicted genes. The putative isoelectric point (pI) and the molecular weight (Mw) of the obtained protein sequences were predicted using Compute pI/Mw (http://web.expasy.org/compute_pi/). The subcellular localization of each gene was predicted using CELLO 2.5 (http://cello.life.nctu.edu.tw/).

Multiple sequence alignment and phylogenetic tree construction

The predicted full-length EuMAPK cascade protein sequences were aligned using Clustal W. Phylogenetic trees were constructed in MEGA 7.037 using the Neighbor Joining (NJ) methods with 1,000 bootstrap replications.

Conserved motif/domain and gene structure analysis

Domains and motifs were discovered by PlantsP (http://plantsp.genomics.purdue.edu/cgi-bin/fscan/feature_scan_rest.cgi?db = PlantsP) and MEME (http://meme-suite.org/tools/meme). The exon-intron organization and intron phase were analyzed by the Gene Structure Display Server (http://gsds.cbi.pku.edu.cn/).

Gene expression analysis and qRT-PCR

To study the transcriptional expression characteristics of each predicted member of the EuMAPK cascades, the raw reads were downloaded from National Center for Biotechnology Information (NCBI, https://www.ncbi.nlm.nih.gov/) under accession numbers: female/male flower buds (SRR2170964, SRR2170970), seeds (SRR3203241), and fruit, leaf, and bark during the developmental stages (unpublished). Firstly, raw reads were pre-processed to remove low quality regions and adapter sequences. Index of the reference genome was built using Bowtie v2.2.3 and paired-end clean reads were aligned to the E. ulmoides genome (unpublished) using TopHat v2.0.1238. Then, HTSeq v0.6.1 was used to count the reads numbers mapped to each gene39. Finally, FPKM each gene was calculated based on the length of the gene and reads count mapped to this gene40.

Based on FPKM values, heatmaps and hierarchical clusters were created by HemI 1.0 (http://hemi.biocuckoo.org/down.php). Coefficients of variation (CV) and p values were calculated by Minitab 16 (http://www.minitab.com/zh-cn/). To obtain candidate genes that potentially control E. ulmoides organ development, special genes identified by CV and p values were selected for qRT-PCR. Total RNA was extracted, and reverse-transcribed into cDNA using the AMV First Strand cDNA Synthesis Kit (Sangon, Shanghai, China). Primers were designed by Primer 5.0 (Supplementary Table S6 ), and 18S was used as an internal reference gene. qPCR was performed using an ABI StepOnePlus system (Applied Biosystems, Foster City, CA, USA). The expression levels were calculated by the 2−ΔΔCt method41. Each sample was repeated in triplicate.