Introduction

Transcription factors (TFs) are characterized as proteins with at least one domain that corresponds to a specific-DNA binding site and control the transcriptional regulatory schemes in plant cells. TFs regulate the spatio-temporal expression of target genes involved in plant growth and development, and response systems to the terrestrial environment. TF mediated responses are established upon intrinsic and external signals in controlling and coordinating the activation or repression of functional gene expression1,2,3,4. TFs have a unique DNA binding site, known as the cis-regulatory element (CREs) in the promoter region of a gene for independent regulation, induction and/or cross-regulatory activation such as epigenetics and signalling process. TFs are categorized according to the conserved motifs in DNA-binding domains (DBDs) such as NAC, SBP, MADS-box, WRKY, B3 among others. In plants, the distribution of TF families is assumed plant species-specific. Currently, 58 different TF families are deposited in the PlantTFDB database and they have been exclusively characterized in model plants3. Amongst these TF families, WRKY, MADS-box and MYB are the most important transcriptional regulators that are widely distributed in the plant kingdom and actively involved in plant development and, biotic and abiotic stress responses4.

The WRKY, the seventh-largest family of TFs is involved in the developmental processes and defense responses such as seed germination, pollen development, hormonal regulation, biosynthesis of secondary metabolites5. WRKY TF family is characterized by a WRKY signature domain that contains WD containing amino acid residues positioned at the N-terminus and a zinc-finger domain at the C-terminus of the sequence. It consists of approximately 60–70 amino acid residues with WRKYGQK /WRKYGKK motif for DNA-binding promoter element or W-Box (TTGACC/T) recognition6,7. In the MYB family, TFs are involved in plant development and defense responses including cell cycle, cell morphogenesis, central circadian oscillator and regulation of stress signalling8,9. The MYB domain contains three irregular repeats that form a helix-turn-helix (HTH) structure of about 53 amino acids10. In MYB proteins, the R1, R2, R3 (conventional) and R4 groups (numbered according to the number of the adjacent repeats) of MYB-domain repeats stabilize the DNA-binding structure11. The TFs with MCM1/AGAMOUS/DEFICIENS/SRF (MADS)-box regulate the developmental processes such as seed germination, vegetative growth, the transition from vegetative to reproductive growth, floral development and senescence and regulating the abiotic and biotic stress tolerance. They contain a conserved MADS domain consists of 60-amino acid long at the N-terminal and recognizes the CArG-box DNA motif (CC[A/]6GG) in the target genes. Generally, they are classified into two lineages namely, type I and type II. Type 1 contains MADS domain and an extended highly variable carboxy-terminal domain whilst type II contains four conserved domains known as the MIKC that consists of M-domain, Intervening-domain, Keratin-like domain and the carboxy-terminal domain12.

Rice and Arabidopsis are important non-halophytes model plants for monocot and dicot crops, respectively. They are short-rotation plants with high sensitivity to stressors; oxidative, osmotic and ion/salt stress13,14. The first rice genome was published in 2006 and has become an excellent model system for the economically important related monocotyledons crops such as maize, wheat, sorghum and barley. On the other hand, the dicotyledonous A. thaliana was the first model plant with a completed genome sequence published in the year 2000 (http://www.arabidopsis.org)13. It has been actively used by the plant research community in revolutionizing genetics and breeding studies14. More than 5% of the Arabidopsis genes encode for TFs and only about 7% of them have been functionally and genetically characterized. The genome size of Arabidopsis is approximately 135 megabase pairs, about one-fourth of the size of the rice genome and contains up to 30 000 genes. Currently, there are 2296 and 2408 genes encoding TFs in Arabidopsis and rice, respectively15.

The Arabidopsis and rice WRKY, MADS-box and MYB TF families are reported to show diverse functional roles. In rice, the OsMYB-R1 gene regulates multiple stress tolerance16, RADIALS-LIKE3 (OsRL3) promotes dark-induced leaf senescence and reduce susceptibility to salt stress17, OsWRKY74 and OsWRKY28 regulate the phosphate homeostasis18,19 and OsMADS27 regulates root development under a salt-tolerant condition20. In Arabidopsis, AGL21, the MADS-box TF acts as environmental surveillance during seed germination. There are 109 and 74 WRKY families in rice and Arabidopsis, respectively21. The MYB TF family with up to180 members is the largest TF family in Arabidopsis and rice9. The MADS-box TF family contains more than 100 members and are generally involved in almost every developmental process of a higher plant22.

TFs are an important component in complex regulatory networks established by plants during their response to stressors19,20,21. They either enhance or suppress the expression of genes that are directly associated with target resistance genes. In this study, the WRKY, MADS-box and MYB TF families from rice and Arabidopsis were identified and collated for a comprehensive in silico genome-wide analysis in the search for conserved functional roles between different TF families and species. The phylogenetic relationship of the exon–intron arrangement, conserved motif analysis, and promoter analysis of stress-responsive cis-regulatory elements present in the orthologous gene pairs (Arabidopsis and rice) of three WRKY, MADS-box and MYB TF families are investigated to provide useful insights on the conserved regulatory modules of TFs with potential manipulation for plant biotechnology and breeding programmes.

Materials and methods

Data resources

Genes of Arabidopsis thaliana and Oryza sativa WRKY, MADS-box and MYB encoding transcription factors (TFs) were retrieved from Plant Transcription Factor Database v5.0 (PlantTFDB 5.0; http://planttfdb.cbi.pku.edu.cn)15. The corresponding protein-coding sequences were obtained from Phytozome 12.1 (https://phytozome.jgi.doe.gov/pz/portal.html)23.

Multiple sequence alignment and phylogenetic analysis

The multiple sequence alignment (MSA) was conducted using ClustalW v2.1 software with the following parameters set: open penalty of 10 gaps and gap extension at 0.1 to 0.224 followed by the phylogenetic tree construction using MEGA v7.2 software with the Neighbor-Joining (NJ) method with 1000 bootstrap replicates25,26. The phylogenetic tree was visualized and annotated using FigTree software v1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/)27.

Chromosomal location analysis

The chromosomal location analysis of the WRKY, MADS-box and MYB TF gene families were performed using TAIR Chromosome Map Tool (https://www.arabidopsis.org/jsp/ChromosomeMap/tool.jsp)28 for Arabidopsis and Oryzabase Chromosome Map Tool (http://viewer.shigen.info/oryzavw/maptool/MapTool.do) for rice29. Genes separated by less than five gene loci at 100 kb distance were considered as tandem duplicates30.

Exon–intron arrangement and motifs search distributions

The exon–intron structural features of WRKY, MADS-box and MYB TF genes were visualized using Gene Structure Display Server 2.0 (http://gsds.cbi.pku.edu.cn/)31. The conserved motifs of the target sequences were identified by Multiple Expectation Maximization for Motif Elicitation (MEME) Suite Software (http://meme-suite.org/) using the following parameters: maximum number motifs is set at 20 and allow zero or one occurrence per sequence (zoops) mode32. Pfam online tool (https://pfam.xfam.org) was employed for conserved motif annotation33.

Prediction of cis-regulatory element on promoter regions

Promoter region and the cis-regulatory elements (CREs) of the WRKY, MADS-box and MYB target sequences were examined using a web-based tool, the PLANTCARE (http://bioinformatics.psb.ugent.be/webtools/plantcare/html)34 followed by the visualization of CREs using Illustrator for Biological Sequences (IBS) software (http://ibs.biocuckoo.org)35.

In silico co-expression analysis and functional similarity between orthologous gene pair

Gene identifier of orthologous pair for WRKY, MADS-box and MYB target sequences was searched against PLANT co-expression database (PLANEX, http://planex.plantbioinformatics.org)36. The co-expression data were retrieved, and the networks were visualized using Cytoscape v3.7.0 software37. Functional similarity of the co-expression network was measured using kappa value from PLANEX database36 that represents the distance of co-expression data between rice and Arabidopsis.

Results

Phylogenetic analysis of WRKY, MADS-box and MYB genes in Arabidopsis and rice

101 OsWRKY, 34 OsMADS box and 122 OsMYB sequences were identified in rice and 72 AtWRKY, 66 AtMADS box and 144 AtMYB sequences were identified in Arabidopsis after the repetitive and redundant gene sequences were removed. A phylogenetic tree for the WRKY transcription factor (TF) family was built from 173 collated Arabidopsis and rice WRKY genes. 101 OsWRKY and 72 AtWRKY genes are distributed in all clades except clade 5 where only one Arabidopsis gene (AtWRKY) is present among 22 rice genes (OsWRKY) whilst Clade 10 contains WRKY genes from rice only. The highest gene number (GN) is observed in clade 8 (GN = 37), followed by clade 6 (GN = 26), clade 7 (GN = 24) and clade 5 (GN = 23). Clade 9 is the smallest with a GN = 3 (Fig. 1). A phylogenetic tree of the MADS-box TF family constructed from 66 AtMADS-box and 34 OsMADS-box genes shows consistent distribution among 14 clades. Clade 1 and clade 7 are the biggest clusters with a similar size (GN = 25), followed by clade 6 (GN = 20), clade 2 (GN = 15), clade (GN = 9) and clade 5 (GN = 5). Clade 4 is the smallest with GN = 2. Clade 3 and clade 6 contain gene members from AtMADS-box only while clade 4 and clade 5 are unique to OsMADS-box members (Fig. 2). A phylogenetic tree of the MYB TF family shows 14 clades, with fairly even Arabidopsis and rice genes representation. Clade 1 is the biggest cluster (GN = 54), followed by clade 10 (GN = 29), clade 7 (GN = 27), clade 4 (GN = 25) and clade 7 (GN = 6) (Fig. 3). In each TF family phylogenetic tree, the orthologous gene pairs identified by red circles were selected for subsequent analysis. A total of 22 orthologous gene pairs are obtained as following: WRKY;10, MADS-box; 1 and MYB; 11 (Figs. 1, 2, 3).

Figure 1
figure 1

Phylogenetic tree of collated rice and Arabidopsis full-length WRKY protein sequences. Red dots represent the rice-Arabidopsis orthologous gene pairs. The tree is built using the neighbor-joining (NJ) method (MEGA7.0 software) and are divided into ten clades, numbered in bold.

Figure 2
figure 2

Phylogenetic tree of collated rice and Arabidopsis full-length MADS-box protein sequences. Red dots represent the rice-Arabidopsis orthologous gene pairs. The tree is built using the neighbor-joining (NJ) method (MEGA7.0 software) and are divided into seven clades, numbered in bold.

Figure 3
figure 3

Phylogenetic tree of collated rice and Arabidopsis full-length MYB protein sequences. Red dots represent the rice-Arabidopsis orthologous gene pairs. The tree is built using the neighbor-joining (NJ) method (MEGA7.0 software) and are divided into 14 clades, numbered in bold.

Distribution of the WRKY, MADS-box and MYB orthologous genes in Arabidopsis and Oryza sativa chromosomes

The in silico mapping of WRKY, MADS-box and MYB orthologous gene pairs showed an uneven distribution in Oryza sativa (Os) and Arabidopsis thaliana (At) chromosomes (Chr). In Arabidopsis, the orthologous genes were distributed randomly in AtChr1, AtChr2, AtChr3, AtChr4 and AtChr5. A total of five genes, one from MADS-box, two each from MYB and WRYK TF families were located on AtChr1. On AtChr2 and AtChr4, three WRKY and one MYB genes were located at various distances. All four genes located on AtChr3 are from the MYB family. The AtChr5 showed a random distribution of three MYB and two WRYK genes. In rice, the orthologous genes were present on almost every chromosome except OsChr6, OsChr9 and OsChr10. The OsChr1 contain the highest gene number (GN) at 7, followed by OsChr4 (GN = 3) and OsChr7 (GN = 3), and OsChr8, OsChr11 and OsChr12 with GN = 2 each. The least number of genes were distributed in OsChr2, OsChr3 and OsChr5 (GN = 1) (Fig. 4). Detailed distribution of WRKY, MADS-box and MYB orthologous genes on Arabidopsis and rice chromosomes are shown in Table 1. Separated by at least more than five gene loci, no tandem duplications were observed among the genes. The longest protein was encoded by AtWRKY1 (1789 aa) in Arabidopsis and OsMYB50 (72 aa) in rice. Likewise, the shortest protein was encoded by AtWRKY43 (109 aa) and OsWRKY58 (181 aa). More than half of the proteins encoded by AtMYB and OsMYB genes were acidic with a theoretical isoelectric point value of less than7 whilst two MADS-box proteins (AtAGL65 and OsMADS68) were acidic. A total of 8 OsWRKY proteins were acidic in comparison to 2 from AtWRKY. The average molecular weight (MW) of these proteins were 48.7 kDa and 45.4 kDa in Arabidopsis and rice, respectively. Detailed information on the sequence characteristics is given in Table 1.

Figure 4
figure 4

The chromosomal distribution of rice-Arabidopsis WRKY, MADS-box and MYB orthologous gene. (A) Distribution of gene loci on Arabidopsis chromosomes. (B) Distribution of gene loci on rice chromosomes. Different gene loci colours (naming) represents a gene transcription factor family: WRKY; black, MADS-box; purple and MYB; red.

Table 1 Orthologous WRKY, MADS-box and MYB gene-pairs in Arabidopsis and rice.

Gene structure and conserved motif analysis: WRKY, MADS-box and MYB orthologous genes in Arabidopsis and rice

A total of 173 WRKY, 100 MADS-box and 266 MYB genes were identified with distinctive exon number (EN) and intron number (IN). Among the WRKY genes, EN ranged at 1–15. A total of 95 genes showed EN = 3 and 88 genes showed IN = 3, 22 genes with EN = 2, and 22 genes with EN = 2 and IN = 4. The AT4G12020 gene showed the highest EN and IN with 15 and 14, respectively. Meanwhile, 63 MADS-box genes showed EN = 1, 16 genes with IN = 1, and 13 genes with EN = 2, and eight genes with IN = 2. Among the MYB genes, 156 genes showed EN = 3, 154 genes with IN = 2, 58 genes with EN = 2, and 57 genes with IN = 1 (Supplementary File: Figs. 1, 2, 3). Generally, MADS-box (EN = 1–11) and MYB (EN = 1–13) genes showed a similar range of ENs. Comparatively, the rate of EN and IN difference in the WRKY and MYB TF families was higher than the MADS-box. The exon–intron structure of the ortholog and paralog pairs were further examined. Dissimilarities in the number of exons among the following orthologous gene pairs suggest either a protein gain or loss event in both species: (i) LOC_Os01g54600- AT1G29280, (ii) LOC_Os02g53100- AT1G68150, (iii) LOC_Os11g43740- AT1G18750, and iv) LOC_Os12g38400- AT2G37630. The rice LOC_Os01g54600, LOC_Os02g53100, LOC_Os11g43740 and LOC_Os12g38400 genes were identified to gain one exon whilst their counterpart pairs, AT1G29280, AT1G68150, AT1G18750 and AT2G37630 showed a lost one exon (Fig. 5).

Figure 5
figure 5

Exon–intron structure of Arabidopsis and rice WRKY (blue column), MADS-box (yellow column) and MYB (green column), orthologous gene pairs displayed according to clade numbers in their TF family-phylogenetic tree. The exon–intron structure is described as following: the yellow rectangles and grey lines denote exons and introns, respectively whilst the blue boxes represents the untranslated regions (UTRs).

A total of 20 distinct conserved motifs were identified in Arabidopsis and rice orthologous genes comprised of 20 WRKY, two MADS-box, and 22 MYB proteins. Almost all orthologous genes, the same type of motifs were present in each gene sequence with different distribution patterns. Evaluation by transcription factor family shows that genes in a common clade shared a closely similar pattern of motif distributions (Fig. 6). The WRKY TF family shows apparent motif similarity with the genes in clade1, 4, 5, 7 and 8 except clade 9. Each clade contains various number of motifs with unique distribution patterns. In the MYB TF family, clade 1–10 were similar with at least 3 identical conserved motifs. Clade 12 showed the highest number of motifs and clade 13 showed the least number. Motif 1 was present within the MYB TF family members whereas motif 2 was found in all clades except clades 12 and 13. The MADS-box TF family represented by a pair of orthologous genes contained 20 different motifs distributed in a similar pattern. Detailed information on motif function annotation of the motifs identified in the WRKY, MYB and MADS-box TF family rice-Arabidopsis orthologous genes is presented in Supplementary File 2: Table 1.

Figure 6
figure 6

Distribution pattern of conserved motifs in Arabidopsis and rice WRKY, MADS-box and MYB orthologous genes, identified by MEME web server. Orthologous gene pairs are presented by transcription factor (TF) families: column blue; WRKY, column yellow; MADS-box and column green; MYB. The p-values are significant at 0.05. Motif distribution includes different coloured boxes, each represent a unique numbered motif as indicated in the legend. The width differences among the boxes represents the motif length.

Distribution of cis-regulatory elements (CREs) in putative promoter regions of Arabidopsis and rice orthologous WRKY, MADS-box and MYB genes

The orthologous Arabidopsis and rice genes (WRKY, MADS-box and MYB TF family) were screened for cis-regulatory elements (CREs) distribution within the sequence. The CREs were randomly distributed in positive and negative strands of the promoter region of the gene sequence. Comprehensive details of the CREs identified in Arabidopsis and rice WRKY, MADS-box and MYB orthologous genes are presented in Supplementary File 4. In rice, the most abundant CREs were encoding for jasmonate-responsive signalling (CGTCA-motif and TGACG-motif), light-responsive (Sp1 and G-box) and plant development (GC-motif) whereas, in Arabidopsis, biotic and abiotic stress-responsive elements such as MYB, ABRE, STRE, As-1 and MYC are distributed within the TF family genes. The stress-responsive CRE, ABRE is present in both species, whereas the TGA binding site, such as TGACG-motif and as-1 are unique to rice and Arabidopsis, respectively. The CGTCA-motif and TGACG-motif are present in all WRKY, MADS-box and MYB TF family genes except in the OsMYB50 gene. The MYB binding sites are found in WRKY and MYB genes, with high occurrence in the MYB genes. Other stress-related elements are found in rice genes that include the oxidative stress-responsive element (ARE) and light stress (I-box, Box II and LTR). The elicitor responsive element (W-box), light stress (GT1-motif and GATA-motif) and defense response (G-box) were consistently present in all Arabidopsis genes (Fig. 7). The orthologous rice and Arabidopsis gene pairs showed common CRE function despite displaying diversity in CRE identities and numbers. The annotation of CREs function involved in the development activities, hormone response and abiotic/biotic stress are compared among the orthologous gene pairs (Table 2).

Figure 7
figure 7

Distribution of the cis-regulatory elements (CRE) in the 1.5 kb promoter region of Arabidopsis and rice WRKY, MADS box and MYB orthologous genes as identified by PlantCARE and visualized using the IBS software (http://ibs.biocuckoo.org). The CREs are denoted by in different shapes and colours. Each CRE is drawn as following: (i) thick black line for the reverse strand and (ii) thin black line for the forward strand.

Table 2 Comparison of plant development, hormone and stress-responsive cis-regulatory elements (CREs) in the promoter regions of Arabidopsis and rice WRKY, MADS-box, and MYB orthologous gene pairs.

In silico analysis of co-expression and functional similarity between Arabidopsis and rice orthologous gene pairs

Co-expression analysis was conducted on the 19 Arabidopsis and 18 rice orthologous genes identified in the previous analysis where the expression datasets were retrieved from PLANEX (planex.plantbioinformatics.org). The correlation values (r) among the WRKY, MADS-box and MYB genes in Arabidopsis and rice were ranked as follows: (i) poor; r < 0.20, (ii) fairly moderate; r = 0.2–0.4, (iii) fairly strong; r > 0.4–0.6 and (iv) strong; r > 0.6–0.8. The average positive correlation within the Arabidopsis and rice network were 0.212 and 0.160, respectively. The negative correlation of the Arabidopsis network (r = − 0.248) was much stronger than the rice network(r = − 0.084). In Arabidopsis, AtMYB4R1 showed the strongest correlation (r = 0.465, fairly strong) with MADS-box (AtAGL65), MYB (AtMYB103, AtMYB91, AtMYB5 and AtMYBCDC5) and WRKY (AtWRKY65, AtWRKY9, AtWRKY44, AtWRKY55 and AtWRKY43) transcription factor (TF) genes. For rice TFs, OsMYB46 showed the strongest correlation with OsMYB13, OsMYB19, OsWRKY13, OsWRKY17, OsWRKY22, OsWRKY23, OsWRKY32 and OsWRKY119 shown at r = 0.827 (Fig. 8).

Figure 8
figure 8

Gene co-expression network of Arabidopsis and rice WRKY, MADS-box and MYB orthologous genes. (A) Frequencies of co-expression interactions identified by PLANEX. Increasing r-values show stronger positive correlation and vice versa. (B) Co-expression network comprised of nodes, represent genes, different node colour s indicate unique transcription factor family (red node = MYB, blue node = WRKY and purple node = MADS-box) and edges indicate positive (red lines) and negative (blue lines) correlations.

The occurrence of possible functional similarity between Arabidopsis and rice orthologous genes were compared on their co-expression networks using the Kappa statistics retrieved from PLANEX (Table 3). Kappa (k) score = 1 denotes a perfect functional similarity between networks35,38. A k-score > 0 is assumed significantly similar, whilst k-score = 0 denotes no significant similarity35,38. Eleven Arabidopsis-rice orthologous genes were accounted for 69% of the total genes (k-score = 0.2 – 0.4) that showed fair functional similarity, followed by three genes (19%) and two genes (13%) of poor (k-score =  > 0.0 to 0.2) and moderate (k-score = 0.4 to 0.6) functional similarity, respectively. The OsWRKY32- AtWRKY9 and OsMADS68-AtAGL65 orthologous pairs were highly significant with a k-score of 0.44 and 0.50, respectively.

Table 3 Functional similarity between the Arabidopsis and rice WRKY, MADS-box and MYB orthologous gene-pairs.

Discussion

Over the years, natural and human activities have caused significant changes to the global environment. Climate change, decrease in arable land, increase in CO2 concentration, declining water availability, drought and high salinity had set major challenges to agricultural systems, worldwide. The quest for yield and productivity is becoming increasingly challenging with a continuum decline in plant stress resistance. Plants are complex multicellular organisms with highly flexible adaptivity to adverse conditions such as the exposure to abiotic and biotic factors that trigger various responses governed by complex regulatory mechanism i.e. the transcriptional regulation39 and through gene expression, they respond to these changes by either activating or repressing the expression of the downstream genes40,41.

Transcription factors (TFs) are deployed as the master key regulators in plant growth and development, and defense-related responses. The WRKY, MADS-box and MYB are major TF families that regulate various aspects of plant development through specificity and/or crosstalk regulation between different TFs; growth and developmental processes42, and biotic and abiotic stress responses35,43,44. Cis-acting regulatory elements (CREs) at the binding site or near to the structural genes interact with TFs to control the expression of the corresponding genes. The promoters present at the upstream of a gene encoded region contain numerous CREs which are unique to various proteins involved in the transcription initiation and regulation40,45. The CREs have been reported to display diverse functions associated with biotic and abiotic components: pathogen and wound responsive, light and phytohormone responsive. Studies on cis-regulatory elements (CREs) are important to further understand the plant defense responses to abiotic and biotic stresses 38.

In this study, the Arabidopsis and rice WRKY, MADS-box and MYB TF genes showed a similar TF-family abundance level. Although the rice genome size is larger than Arabidopsis’s, the number of TF genes in both species were similar. Phylogenetic trees built on a collated rice and Arabidopsis WRKY, MADS-box and MYB TF family members were each divided into 10, 7 and 14 clades, respectively. The findings suggest that MYB TF family is the most diverse family, followed by WRKY and the MADS-box, being the least diverse TF family. Generally, both WRKY and MYB TF members were much closely related to one other in comparison to MADS-box members. In the WRKY- and MYB- specific phylogenetic tree, both the Arabidopsis and rice genes were present in virtually all clades. In contrast, MADS-box specific-phylogenetic tree, very few clades showed a representation of rice and Arabidopsis; clades were dominated by a single species, either the Arabidopsis or rice (Fig. 2). Ortholog genes are similar genes with the same gene function that may have arisen from speciation events. A relatively higher number of orthologous gene pairs observed in the Arabidopsis-rice WRKY and MYB TF families may explain the existence of ancestral relationships between Arabidopsis and rice before divergence during evolution (Figs. 1 and 3). Chromosomal distribution of orthologous WRKY and MYB genes in rice and Arabidopsis showed no apparent pattern. However, it is noteworthy to mention that most of the orthologous genes were distributed within the single arms of the chromosomes (Fig. 4).

Gene structure analysis imparts understanding into evolutionary processes such as duplication events46. In this study, the three different TF orthologous gene families from Arabidopsis and rice displayed various exon and intron numbers, implying possible roles in diversification events of the two Angiosperms. For instance, the rice OsWRKY13 gene consists of three exons, whilst its counterpart orthologous pair, the Arabidopsis AtWRKY65 contains two exons only. These results suggest that some of the TF family genes may have undergone loss of introns during the evolutionary processes and cause subsequent functional differences in rice and Arabidopsis. Most of the Arabidopsis-rice orthologous gene pairs under the WRKY and MYB TF family consist of similar exon numbers, and thus, implies similar gene function acquirement during stable evolution47. The number of proteins with motifs identified in the WRKY TF family was comparable to the MYB TF family; 20–22 proteins. The MADS-box TF family contained only two protein sequences with motifs (Fig. 6). The disparity between the WRKY and MYB TF families over the MADS-box TF family could be implicated in the functional differences between these TF families. The MADS-box are highly involved in plant growth and development in comparison to WRKY and MYB TF families which are actively responsive to biotic and abiotic responses. Similar types of motifs were identified in all three TF families, however, the motif and CRE distribution displayed a similar trend by the TF family suggesting the functional niche unique to each TF family.

Motif distributions are conserved between the orthologous gene pairs that share a common clade. Each specific motif present in the orthologous genes corresponds to a specific protein function. For example, WRKY genes with a DNA-binding domain were mainly enriched within motif 1–3 and 5. MYB genes enriched with motif 1–4 correspond to Myb-like DNA-binding domain and MADS-box genes with an abundant number of motif 1, motif 3 and motif 5 correspond to DNA-binding and dimerisation domain, K-box region and connexin4, respectively. In general, WRKY and MYB orthologous genes show motif abundance and diversity to a major extent. It is also noteworthy to observe the impact of motif loss in the orthologous gene pairs. As such, the rice OsWRKY58 gene lacks motifs 5, 6, 9, 10, 13, 14, 15 and 18 in comparison to its orthologous pair, which is the Arabidopsis AtWRKY19 gene. These differences may imply the occurrence of the OsWRKY58 gene functional divergence with the AtWRKY19 gene.

The CRE analysis of Arabidopsis and rice WRKY, MADS-box and MYB genes showed functional involvement in stress-related, phytohormone-related and plant development-related activities. All Arabidopsis and rice genes contain a combination of different CREs except for the following orthologous pairs which contain a phytohormone-related ABRE motif: OsMADS68-AtAGL65 (clade 2, MADS-box TF), OsMYB58-AtMYB125 (clade 10, MYB TF) and OsWRKY22-AtWRKY55 (clade 5, WRKYT F). The OsWRKY32-AtWRKY9 orthologous pair share both ABRE and G-box element motifs. Previous studies showed the role of G-box as a stress-responsive element against pathogen48, in phytohormone like abscisic acid (ABA) and jasmonic acid (JA) signalling regulator, and favours reactive oxygen species (ROS) burst under environmental stress47,49. Additionally, ABA responsive element (ABRE) also acts as a positive regulator of ABA signalling under saline and drought conditions47,50. Phytohormone-related elements (CGTCA-motif and TGACG-motif) abundantly present in rice genes suggest its crucial function in JA-responsiveness. The TGACG-motif and As-1 elements are both known as TGA elements. Interestingly, TGACG-motif was predominantly found in rice genes and As-1 element in Arabidopsis genes, mainly. Our findings showed an apparent divergence of stress-related elements in rice and Arabidopsis. The CREs that are unique to rice genes are Sp1, ARE and GC-motif. On the other hand, MYB, MYC, STRE and W-box motifs are unique to the Arabidopsis gene. ARE (Anaerobic responsive elements) consisting of GC and GT motifs act as an oxidative responsive element. Previous studies showed that the rice genome contains higher GC motifs than in Arabidopsis47,51.

An ongoing duplication event within plant species may had led to the divergence of the WRKY, MADS-box and MYB TF families. Apparent gain and loss in gene structures were evident within each TF family. Co-expression network analysis revealed a moderately fair (r = 0.2–0.4) interaction in Arabidopsis and poor interaction(r =  > 0–0.2) in rice. OsMYB46 gene in rice encodes the transcriptional regulation of secondary wall biosynthesis. Rice co-expression network analysis has shown a strong association of the OsMYB46 gene with lignin biosynthetic transcription factors (OsMYB13 and OsMYB19)52, and rice resistance to blast and bacterial blight encoding OsWRKY2253, OsWRKY1354 and OsWRKY2355 genes. These findings suggest that both MYB and WRK TF family genes are switched on to orchestrate SA- and JA- mediated signalling pathways during the pathogen attack.

The functional similarities between WRKY, MADS-box and MYB genes within Arabidopsis and rice was measured and compared against each other via the co-expression network analysis. Two independent Arabidopsis and rice co-expression networks were about similar size as indicated by the total number of nodes (number of genes); 19 in Arabidopsis and 18 in the rice co-expression network. In each co-expression network, all three different WRKY, MADS-box and MYB genes showed positive and negative correlations to a considerable extent. Interestingly, the hub gene denotes as the gene with the most number of interactions belongs to the MYB TF family in both Arabidopsis and rice co-expression networks.

The functional similarities of Arabidopsis and rice orthologous gene- pairs were detected at significant k-scores38. Previously studies using co-expression networks analysis have functionally characterized several genes, i.e. the Arabidopsis AtAGL65 gene that regulates pollen tube growth and maturation56, and OsMADS68 that regulates the downstream OsCPK21 gene during anther development in rice57. The OsMYB80-AtMYB80, rice-Arabidopsis orthologous gene pair is functionally conserved as the positive regulators of pollen development58,59. Meanwhile, the Arabidopsis AtWRKY9 gene was shown to be induced in response to pathogen-associated molecular patterns (PAMP)52, and the rice OsWRKY32 gene has been activated during rice blast pathogen, Magnoporthae oryzae pathogenesis60. Based on the expression profiles, Arabidopsis AtWRKY43 gene showed close association with the pathogen defense transcription factor, the rice OsWRKY23 gene55,61. The discovery of stress-related genes and their association with the Arabidopsis and rice WRKY, MADS-box and MYB orthologous genes offers a basis for future biotechnology and breeding studies aimed to enhance plant stress responses.

Feeding more than half the world population, rice is a premier staple food worldwide, especially among the majority of Asians. Rice yield improvement has been a key breeding objective as farming and subsequent productivity are affected by numerous factors such as soil fertility, abiotic stressors (salinity, drought, heat and cold) and susceptibility to a wide range of diseases. The present-day rice breeding strategies have evolved tremendously. From conventional breeding to breeding by design, the identification of candidate desirable genes is a core component to kickstart any breeding programmes. Improvement of complex traits controlled by multiple genes with each displaying a relatively small effect had led to trait-based selections that are unfavourably related62. As a result, the current pace of rice breeding does not meet the breeding objectives designed for the development of climate-resilient, fit and adaptive, and resource-use efficient cultivars.

Gene similarities are key aspect of gene function. Gene data sets which includes the gene expression and gene co-expression networks elucidate associated functions between genes across and within the plant kingdoms. The overall functional similarity between two genes requires multi-aspect considerations. Although both rice and Arabidopsis are two important model plant organisms subjected to different research pace, the latter is much more thoroughly investigated and functionally described in comparison to rice. In addition, most gene function association studies performed are projected on Arabidopsis to better understand the any given plant organism of interest. In this study, the Arabidopsis and rice TF families are comparatively evaluated to gain multi-dimensional information on the WRKY, MADS-box and MYB gene pattern of distribution, structure and function.

In the ‘breeding by design’ technique such as the target chromosome-segment substitution63, mapping of loci governing agronomically desirable traits serves as the pre-requisite step. Under this technique, information on the desirable gene loci along their interrelated functional roles is crucial to accomplish a successful breeding programme. Ultimately, using transcription factor genes, the present findings offer a knowledge base to facilitate efficient selection of desirable genes as TF genes among the different families (WRKY, MADS-box and MYB) displaying inter-relations with each other. In parallel, current findings enables manipulation of biologically important multi-functional TF genes governing rice stress responses and developmental processes. Rice improvement guards global food security, and thus, the production of resilient planting materials could be facilitated and accelerated in breeding programmes catered for rapid development of rice varieties.

Conclusions

Plant growth and development, and environmental responses are key targets for manipulation in biotechnology and breeding programmes. This study investigated 172 WRKY, 100 MADS-box and 266 MYB TF genes in Arabidopsis and rice. Twenty-two Arabidopsis-rice orthologous gene pairs were identified from the WRKY, MADS-box and MYB TF family, and their exon–intron distribution along the motif compositions are mostly similar and conserved. The majority of the WRKY, MADS-box and MYB genes in Arabidopsis and rice showed specific interaction with abiotic/biotic and phytohormone responsiveness elements. Further, the co-expression interaction among the WRKY, MADS-box and MYB genes between Arabidopsis and rice illustrated a similar trend based on the average correlation measurement. The functional similarity of co-expression data comprised of orthologous genes indicates their important roles in pollen development, hormone-mediated and defense response to the pathogen. The orthologous genes identified in this study informs the selection of genes governing the conserved regulatory module of defense and development in rice and Arabidopsis.