Novel Genomic and Evolutionary Insight of WRKY Transcription Factors in Plant Lineage

The evolutionarily conserved WRKY transcription factor (TF) regulates different aspects of gene expression in plants, and modulates growth, development, as well as biotic and abiotic stress responses. Therefore, understanding the details regarding WRKY TFs is very important. In this study, large-scale genomic analyses of the WRKY TF gene family from 43 plant species were conducted. The results of our study revealed that WRKY TFs could be grouped and specifically classified as those belonging to the monocot or dicot plant lineage. In this study, we identified several novel WRKY TFs. To our knowledge, this is the first report on a revised grouping system of the WRKY TF gene family in plants. The different forms of novel chimeric forms of WRKY TFs in the plant genome might play a crucial role in their evolution. Tissue-specific gene expression analyses in Glycine max and Phaseolus vulgaris showed that WRKY11-1, WRKY11-2 and WRKY11-3 were ubiquitously expressed in all tissue types, and WRKY15-2 was highly expressed in the stem, root, nodule and pod tissues in G. max and P. vulgaris.

Statistical analysis of WRKY TFs. Tajima's relative rate test was conducted to determine the statistical significance of the investigated WRKY TFs. In all three replicate analyses, the p-values were found to be significant. The X 2 -test results with one degree of freedom were 5.76 (for monocot, dicot and lower eukaryotic WRKY TFs), 13.76 (for monocot and lower eukaryotic WRKY TFs), 4.45 (for dicot and lower eukaryotic WRKY TFs), 5.00  The phylogenetic tree shows eight independent groups. We named them as groups I (red), II (lime), III (black), IV (blue), V (black), VI (pink), VII (green), and VIII (black). To get details about distribution of different WRKY TF in different group, please refer to Supplementary Figure 3. The phylogenetic tree revealed that, the WRKY family members of one group overlapped with another group. The phylogenetic tree was constructed using MEGA6. Table 2. Phylogenetic tree of WRKY TFs of monocot, dicot, and lower eukaryotic (amoeba, algae, bryophyte, pteridophyte, and gymnosperm) plants. The phylogenetic tree revealed eight different groups, but the WRKY TF gene families were not restricted to any specific group, and one or more member of WRKY TFs were distributed in two or more groups. The numbers indicate the number of WRKY TFs (for example 1, 2, and 3 and others indicate WRKY1, WRKY2, and WRKY3 and so on). The phylogenetic tree was constructed using the MEGA6 software and the Poisson substitution model by using 1000 bootstrap replicates.  The phylogenetic tree shows six independent phylogenetic groups. We named them as groups I (red), II (lime), III (green), IV (blue), V (pink) and VI (green). The WRKY TF group members are specific to their groups and no WRKY TF members in one group overlap with those in any other group. The phylogenetic tree was constructed using MEGA6.  Gene expression profile of WRKY TFs. The expression profile of the WRKY TFs was elucidated by investigating the gene expression data for G. max and P. vulgaris and analyzing their transcription levels. In G. max, the transcription profile was determined for different tissue samples, including roots, root hair, leaves, stems, flowers, pods, seeds, nodules and shoot apical meristem. In G. max, the expression level of GmWRKY65-1 was found to be the highest (105.342) among all other WRKY transcription factors (Supplementary Table 2). The expression levels of GmWRKY6-4 and GmWRKY6-5 in the root were found to be 74.668 and 43.341, respectively. Some other WRKY TFs, the expression levels of which were relatively higher than those of others were GmWRKY6-6, GmWRKY11-2, GmWRKY11-3, GmWRKY11-4, GmWRKY11-6, and GmWRKY15-2 (Supplementary Table 2 GmWRKY70-3, GmWRKY71-2, GmWRKY72-1, and GmWRKY72-2 were not expressed in the root tissues (Supplementary Table 2). Unlike the higher expression in roots, the expression of GmWRKY65-1 (35.199) was also found to be the highest in the root hair. Some other WRKY TFs that were expressed relatively at higher levels were GmWRKY6-4, GmWRKY11-1, GmWRKY11-2, GmWRKY11-3, GmWRKY11-4, GmWRKY11-6, GmWRKY11-7, GmWRKY11-8, GmWRKY15-1, and GmWRKY15-2 (Supplementary Table 2). The WRKY TFs, the expression of which was not detected in root tissues, were GmWRKY4-3, GmWRKY6-3, GmWRKY10, Figure 6. Unrooted phylogenetic tree of WRKY TFs of dicot and lower eukaryotic (amoeba, algae, bryophyte, pteridophyte, and gymnosperm) plants. The phylogenetic tree shows the presence of three phylogenetically distinct groups. We named them as groups I (pink), IIa (red), IIb (lime), IIc (blue), and III (green). The WRKY TF group members of group IIa, IIb and IIc overlap with each other and were hence retained under sub-group of group II. The classification of groups I, II, and III resembled that used in previous studies. The WRKY TF members of groups I and III did not overlap with one another and resembled the grouping system of used in previously published studies. The phylogenetic tree was constructed by using MEGA6.  Table 4. Phylogenetic tree of WRKY TFs of dicot and lower eukaryotic plants. The phylogenetic tree revealed three phylogenetically distinct groups. The WRKY TF members of groups I, II and III are distributed redundantly. The WRKY members of one group were present in the other groups. This grouping was similar to that reported in previously studies such as groups IIa, IIb, and IIc. The members of groups I and III are significantly specific to their own groups, no members of one group overlap with another. These findings confirm that the nomenclatures of all WRKY TFs are correct. This nomenclature and grouping system should be applied to dicot plants only. Groups I, II, III, IV, V and VI of monocot plants is not the same as that of the respective group of dicots. Hence it is highly recommended to follow lineage specific grouping system to avoid any confusion. The numbers indicate the WRKY TF members distributed in different groups (1, 2, etc. indicate WRKY1, WRKY2 etc.). The phylogenetic tree was constructed using MEG6 software and Poisson substitution model by using 1000 bootstrap replicates. expression of which was found to be higher in the leaf tissue, were GmWRKY6-4, GmWRKY15-1, GmWRKY15-2, GmWRKY26-3, GmWRKY41-1, GmWRKY41-2, GmWRKY41-3, and GmWRKY41-7 (Supplementary Table 2). The WRKY TFs, expression of which was not detected in the leaves were GmWRKY4-3, GmWRKY6-3,    Table 2). In flowers, a higher level of expression was detected in WRKY26-2 (67.456), WRKY26-3 (51.836), WRKY70-6 (61.053), and WRKY70-7 (63.153) whereas, that of GmWRKY10, GmWRKY13-4, GmWRKY29-1, GmWRKY50-2, GmWRKY67, GmWRKY70-4, GmWRKY72-2, and GmWRKY72-4 was not detected. The expression of GmWRKY44-2 (17.882), GmWRKY23-4 (10.417), GmWRKY11-5 (9.898), GmWRKY11-6 (9.725) and GmWRKY3-1 (9.665) was higher in pods. The GmWRKY72-6 was not detected in pods. In seeds, the expression of GmWRKY 21-2 (11.200), and GmWRKY21-3    Table 7. A P-value of less than 0.05 was used to reject the null hypothesis of equal rates between lineages. The analysis involved three amino acid sequences in each group. All positions containing gaps and missing data were eliminated. Evolutionary analyses were conducted using MEGA6.

Discussion
Identification and nomenclature of WRKY TFs. Advancements in genome sequencing technology and available of well annotated genome database led us to identify the WRKY TF gene family of 43 species. Predicting the potential function and activity of newly sequenced genes and their protein products in every organism is very difficult. The major cellular roles of newly identified genes/proteins can be inferred from previously characterized orthologous gene members of the same family. Large-scale comparative genomic studies can reveal important information regarding the function and evolutionary relationship of orthologous species 55 . The same principle can be applied at the gene family level as well (e.g. WRKY TF gene family). Therefore, we identified and analysed the WRKY TF gene family members from 43 different plant species. All identified WRKY TFs were assigned a specific name according to the orthology based nomenclature system [55][56][57][58] . Providing a unique name to every gene is necessary for its future identification. The role of a genome is insignificant unless a comparative genomics study is conducted.
Genomics of WRKY TFs. Availability of large-scale genomic data from various plant species allowed the detailed investigation of the WRKY TF gene family in plants. The WRKY TF gene family members vary across species likely because of gene duplication, whole genome duplication, ploidy, gene deletion or mutation. WRKY TFs are considered to be evolutionary conserved and supposed to be present only in plants 4,7,59 . However, the WRKY TF gene family was also found in amoeba, fungi and diplomonad species 54,60 . Dictyostelium purpureum, the amoeba that lives in soil belongs to the phylum mycetozoa. The genome of this species encodes nine WRKY TFs. The tetraploid monocot plant P. virgatum encodes the highest number (167) of WRKY TFs, whereas, the unicellular C. reinhardtii and C. subellipsoidea encode for the lowest number (only one) of WRKY TFs. In general it is a general assumption that, larger the genome size more will be the number of WRKY TFs in the genome; however, this concept is not true. Genome size is not directly related to the number of genes of a gene family in the genome . Therefore, the presence of a higher or lower number of genes in a gene family of a particular species can be attributed to its functional requirement and diverse cellular processes. Cai et al. 46 reported the presence of 120 WRKY TFs in Gossypium raimondii, which is similar to the number of WRKY TFs identified in our study 46 63 also reported the absence of any WRKY domain in OsWRKY94 63 .
In the present study, we identified several novel chimeric WRKY TFs from different plant species (Fig. 2) with varying numbers of WRKY domains and other novel domains fused with them ( Fig. 2A to P). These chimeric WRKY TFs might have evolved recently via fusion with other domains 64 . The kinase domain phosphorylates to its target protein. Thus, determining whether, these fused kinase domains play any crucial role in the auto-phosphorylation events in the WRKY TFs to which they are fused, and hence regulate gene expression. In some cases, the kinase domain is followed by a WRKY domain (Fig. 2E and F), whereas, in other cases the WRKY domain is followed by a kinase domain (Fig. 2G). The kinase domains of WRKY TFs most likely get phosphorylated by the cognate up-stream kinase, and regulate the expression of WRKY TFs 65 . The position of the kinase domain might be speculated to be very important in the regulation of WRKY TFs and the phosphorylation events in plants. In some other cases, the WRKY domain is fused with the toll-interleukin receptor (TIR) domain ( Fig. 2J and K), which mediates the interactions between the toll-like receptor and signal transduction components [66][67][68] . Plant proteins that harbor TIR motifs are associated with plant resistance to disease 67,69,70 . Therefore, the WRKY TFs that harbors the TIR motif might control disease resistance in plants. The leucine-rich repeat (LRR) motif also involved in plant resistance to diseases 69 , and the WRKY TFs that harbor both the TIR and LRR motifs might also remarkably contribute to plant disease resistance.
The diploid species, B. rapa encodes 145 WRKY TFs. Of them three encode novel chimeric WRKY TFs (Fig. 2). Among the three novel WRKY TFs, one is fused with the CBS domain (Fig. 2D), and the other two are fused with the kinase domain (Fig. 2G). The CBS domain is found in various other proteins, including adenosine monophosphate (AMP)-activated protein kinase. The CBS domain binds to AMP, adenosine triphosphate (ATP) or s-adenosylmethionine residues, and regulates the activity of associated enzymes 71 . Similarly, the tetraploid species G. max encodes 145 WRKY TFs. Among them, only one encodes a chimeric WRKY TF that is fused with the TIR domain ( Fig. 2-J). Plant proteins associated with a toll-like receptor mediate disease resistance in plant. The monocot species Panicum virgatum encodes two chimeric WRKY TFs; one chimeric WRKY TF is fused with a protease domain (Fig. 2I) and the other with the B3 domain. The B3 domain was previously reported to be a DNA-binding domain present in combination with auxin response factor (ARF); it has been found with the WRKY protein, abscisic acid insensitive 3 (ABI3), or related to ABI3/VP1 (RAV) like TFs. The results of this study showed that the B3 domain, which is present in combination with WRKY TFs might mediate auxin and abscisic acid signaling. The model monocot plant O. sativa encodes 101 WRKY TFs, of which one contains a chimeric WRKY TF, which is fused with the protease domain ( Fig. 2-I). Presence of an ULP protease domain in conjunction with the WRKY protein indicates that WRKY TFs plays a crucial role in the ubiquitination process of the SUMO protein. Linum usitatissimum and Brassica rapa encode chimeric WRKY TFs that contain squamosa promoter-binding proteins (ZF_SBP) domain. The SBPs are a major family of plant-specific TFs related to flower development 72 . The SBP zinc finger binds to the consensus sequence TNCGTACAA 73 . The presence of ZF_SBP domain along with WRKY TFs might increase the binding efficiency of WRKY TFs to other consensus sequences such as TNCGTACAA. In addition, the role of the SBP domain in flower development indicates that WRKY TFs with three WRKY domains and a ZF_SBP domain might regulate flower development in plants. The paired amphipathic helix (PAH) domain is found in the components of a co-repressor complex that silences the transcription process and plays a remarkable role in the transition between proliferation and differentiation 74 . The presence of a PAH domain along with a WRKY domain suggests its role in the translational co-repression of cellular proliferation and differentiation. The ATP_GRASP super-family genes regulate several metabolic pathways, including de novo purine biosynthesis, and the biosynthesis of fatty acids, peptidoglycan, glutathione, ribosome, arginine, pyrimidine, polyphosphate, lysine and dipeptide 75 . The fusion of WRKY TFs with the ATP_GRASP domain suggests that these WRKY TFs might be involved in diverse cellular process. All novel genomic rearrangements appear to have evolved recently. In addition, their abundance is very limited; they are present only in a fewer number of species. Once formed, these chimeric genes undergo positive selection when they combine with different components of signaling pathways. This might lead to the creation of a new and diverse signaling pathway, or accelerate the existing signaling process via short-circuiting signaling pathways.  Figure 2). Although the W-R-K-Y-G-Q-K heptapeptide sequence was highly conserved, sequence similarity beyond the domain was considerably low among most genes. Instead of harboring the W-R-K-Y domain, several WRKY TFs were found to contain W- (Fig. 3). These domains were exactly aligned with the W-R-K-Y domains and hence assumed to be newly evolved. Among these new domains, W-K-K-Y, W-T-K-Y, W-S-K-Y, W-H-K-C, W-Q-K-Y, W-R-K-C, W-K-K-C, W-H-Q-Y, R-S-Q-Y, G-R-K-Y, W-R-E-Y, W-L-K-Y, and W-R-K-R are present in the N-terminal region, whereas W-R-K-N, W-R-K-D, F-R-K-Y, W-I-K-Y, W-R-I-Y, W-W-K-N and W-W-K-S are present in the C-terminal region (Fig. 3). Therefore, the entire WRKY TF gene family which might result from long-time evolutionary history, represents divergent WRKY domains even in very closely related gene pairs. Characterization of these novel motifs might shed new insight into their functional significance.

Phylogeny and grouping of WRKY TFs. The WRKY TF gene family from various plant species, including
A. thaliana 4,76 , B. distachyon 14 , G. raimondii 46 , O. sativa 48 , S. lycopersicum 51 , T. aestivum 52 has been well elucidated. Surprisingly, when we combined the data from several published reports, none of them were found to be correlated with one another ( Table 8). The WRKY TF group members of different species vary and are not consistent (  48 , there is absence of a WRKY TF family member in group IIe (Table 8) 48 . Similar inconsistent grouping exists in other studies as well (Table 8). These inconsistencies might be attributed to the improper nomenclature of WRKY TFs, or improper citations of previously published manuscripts. Notable different sub-groups of a specific group are generally present within that group (e.g., if IIa, IIb, IIc, and IId, others are a sub-group of group II, they would be included itself). However, this concept of grouping was not followed correctly during the grouping of WRKY TFs. In the grouping system developed by Wen et al. 14 (Fig. 3 of Wen et al., 2012), sub-groups IIa and IIb are confined to a phylogenetically distinct group, sub-groups IId and IIe are confined to another phylogenetically distinct group, and sub-group IIc is confined to yet another phylogenetically distinct group. However, how sub-groups IIa and IIb, IId and IIe, and IIc can be sub-group members of group II if they are confined to phylogenetically distinct groups and are phylogenetically far away from other is not clear. Personal correspondence with Wen et al. 14 arrived at a certain conclusion regarding the discrepancies in nomenclature and grouping system for WRKY TFs. Hence, in this study we developed a unified grouping system for WRKY TFs in plants.
The inconsistencies in distribution of different WRKY TF family members within and between groups were overcome by developing an appropriate naming system for all WRKY TFs. In general, the sequences that are highly similar tend to fall into the same group as far as orthology-based similarity is concerned 55,56 . The orthology based nomenclature system of WRKY TFs has the potential to overcome this problem; therefore, we developed an unique nomenclature system to all WRKY TFs of 43 species 55,56,58 . In total, 3035 WRKY TF genes from the 43 species were identified and classified according to the unique naming system (Supplementary Table 1). The nomenclature is described in detail in the Materials and Methods section. Orthology also lends the legitimacy to common ancestry and evolutionary history of function. Therefore, the orthology-based nomenclature system can provide ideas regarding the possible function of specific genes in the plant species being investigated. This nomenclature system can also be extended to the newly identified gene family of other plant species. A proper grouping system of WRKY TFs was developed by first dividing the studied plant species into different groups. The groups were (I) WRKY TFs of monocot, dicot, and lower eukaryotic (algae, bryophytes, pteridophytes and gymnosperms) plants; (II) WRKY TFs of monocots with lower eukaryotic plants; and (III) WRKY TFs of dicots with lower eukaryotic plants. When phylogenetic trees were constructed by considering the WRKY TFs from monocot, dicot, and lower eukaryotic plants, eight groups were identified (Fig. 4, Table 2, Supplementary Figure 3). In the resultant phylogenetic tree, WRKY TF gene family members were not consistent with any specific group and overlapped in two or more groups. For example, WRKY TFs 3,5,7,8,10,11,13,16,17,19,22,23,24,25,26,28,29,33,34,36,43,45,48,49,50,51,56,57,58,59,67,68,71,72,75,77,84,102, 103, and 106 belonged to group A and 1, 2, 3,4,5,10,19,20,24,25,26,30,32,33,34,35,44,45,53,57,58,59,70,78,80,81,82,84,85,90,96, and 105 belonged to group II (Fig. 4, Table 2, Supplementary Figure 3). The WRKY TF members 3,5,19,24,25,26,33,34,45,57,58,59, and 84 were distributed in both the groups (group I and II). Similar trends were observed in other WRKY groups as well. Therefore, the grouping of WRKY TFs based on a combined study of monocots, dicots and lower eukaryotic plants did not prove to be suitable. When the phylogenetic tree was constructed by considering the WRKY TF gene family members of monocot and lower eukaryotic plants, six phylogenetically distinct groups were formed; they were named as groups I (red), II (lime), III (green), IV (blue), V (pink) and VI (green) (Fig. 5, Table 3). The WRKY TF gene family members of monocot and lower eukaryotic plants were very specific to their concerned group. In this case, no single WRKY TF member of one specific group overlapped with another group. When the phylogenetic tree constructed by considering WRKY TF gene family members of dicot and lower eukaryotic plants, three different groups were generated where group II contained three sub-groups (Fig. 6, Table 4). We named the groups as I (pink), IIa (red), IIb (lime), IIc (blue), and III (green). We found that the WRKY TF members of groups I and III were very specific to their respective group and did not overlap with one another (Table 4). These results clearly showed that the WRKY TF grouping system is very specific to the lineages (monocot/dicot). The WRKY TF grouping system of monocot and dicot plants differs remarkably; this might be one of the most important reasons why co-linearity was absent in the grouping system of WRKY TF gene family members (Table 8). Therefore, in this study, we proposed that WRKY TF grouping should be specific to monocot or dicot plant lineages. The monocot-specific WRKY TFs can be grouped into six groups (groups I, II, III, IV, V, and VI) whereas dicot-specific WRKY TFs can be grouped into three groups (groups I, IIa, IIb, IIc and III). The phylogenetic tree of monocot and dicot plants varied markedly. This might be due to fact that monocot plant lineage is comparatively more conserved than dicot lineage owing to early ploidy and whole genome duplication 77,78 . Therefore, monocot and dicot plants should be grouped according to the grouping system of monocot plants and dicot plants, respectively.
We conducted another analysis by dividing WRKY TFs into single WRKY domain-containing (C-terminal) and double WRKY domain-containing (N-and C-terminal) groups. The phylogenetic analysis in the single WRKY domain group resulted in six phylogenetically distinct groups, whereas the double WRKY domain group resulted in seven phylogenetically distinct groups (Tables 5 and 6). The WRKY TF members of domain specific studies were not confined to any specific group and the group members were overlapped with each other. Although single and double WRKY domain-containing TFs resulted into six and seven phylogenetically independent groups, respectively; only group II of previously studies could be sub-grouped into IIa, IIb, IIc, IId and IIe is not clear. However, the permutation and combination study showed that WRKY TFs could be grouped as monocot and dicot lineage-specific. The WRKY TFs of monocot plants can be grouped into six groups, and dicot plants can be grouped into three groups. Earlier reported grouping systems such as groups I, II (IIa, IIb and IIc) and III can be applied to dicot plants, but it is ensuring that WRKY TF group members are confined to their specific groups is important.
The substitution rate of monocot and lower eukaryotic WRKY was slightly higher than that of dicot and lower eukaryotic WRKY proteins. No considerable difference was observed in the substitution and evolutionary rate of WRKY proteins with a single or double domain. This explains why WRKY proteins are highly conserved across the plant lineage. The phylogenetic analysis of all plant species showed that all WRKY TFs were present in monocot, dicot and lower eukaryotes, indicating that the appearance of most WRKY TFs in plants predates the divergence of these species. No species-specific group, or sub-group or clades were observed in the phylogenetic tree. This implies that the WRKY TF gene family was more conserved during evolution. In addition, the WRKY domains from the same lineage tended to cluster together in the phylogenetic tree, which was not observed in this study. This suggests that they experienced duplication after divergence. The WRKY TFs that clustered together are orthologous ones that are evolutionarily closer than others. The phylogenetic similarity found in this study 1, 2, 3, 4, 10, 20, 25, 26, 32, 33, 34, 44, 45, 58 18, 40, 60 6, 9, 31, 36, 42, 47, 61 8, 12, 13, 23, 24, 28, 43, 48, 49, 50, 51, 56, 57, 59 7, 11, 15, 17, 21, 39 14, 16, 22, 27, 29, 35 30, 41, 46, 53, 54, 55 4 Ia NTWD Ia CTWD Ib 25, 66, 81, 82 2, 4, 6, 8, 30, 46, 59, 63  showed that WRKY TFs evolved conservatively. Only few WRKY TFs were found in lower eukaryotes, including C. reinhardtii, C. subellipsoidea, M. pusilla, and V. carteri whereas higher plants possessed a larger number of WRKY TF genes. This indicated that the earliest evolutionary origin of the gene containing the WRKY TF was from unicellular green algae. This suggested that WRKY proteins evolved before plants transitioned from an aquatic to a terrestrial habitat. With the continuous evolution of species, land plants have evolved a series of highly sophisticated signaling mechanisms that helped them to adapt to the ever changing environmental conditions, and hence, the number of WRKY TFs increased in different species. Presence of the WRKY TF gene in diplomonands, amoebozoa, and fungi sheds new light on the early evolution of WRKY genes. Understanding the evolution of the WRKY TFs in plant lineage is very challenging. If the concept of early evolution is considered, in green algae, a BED finger-like C2H2 zinc finger domain incorporated a WRKY domain N-terminal to the zinc finger. This single-domain WRKY TFs served as the progenitor for all other WRKY genes 54  Gene duplication and evolution. Evolution by gene duplication is one of the most important processes responsible for the supply of raw genetic material to an organism for its biological evolution 79 . Duplication can occur via recombination, aneuploidy, retro-transposition or whole genome duplication. A. thaliana encodes about 16,574 (65%) duplicated genes among its total of 25498 genes 79,80 . In the present study, we found several duplicated WRKY TFs (Supplementary Table 1). Most duplicated WRKY TF genes are present as paralogous genes 79 . More specifically, gene duplication analysis of some novel WRKY TFs (Fig. 2, Table 9), performed using Pinda (pipeline for intraspecies duplication analysis) server revealed that most of the WRKY TFs are duplicated. Some of the novel WRKY TFs, such as SbWRKY59, PvWRKY94-1 and SiWRKY59-2, were found to be nonduplicated. The Z-score values of these non-duplicated WRKY TFs ranged from 1.11 to 1.78. A z-score value of less than four indicates a non-duplicated gene 81 .
Statistical analysis. Tajima's relative rate test, the simplest test that can be applied to test the molecular evolutionary clock, can be applied to both nucleotide and amino acid sequences. This method yields results as the Chi-square test, and can even be applied when the pattern of substitution is unknown or the substitution rate varies across sites 82 . In Tajima's relative rate test of WRKY TFs, the p-value and Chi-square test were found to be significant (Table 7).
Gene expression profile of WRKY TFs. Understanding the tissue-specific expression of genes can lead to elucidation of the molecular mechanisms and the role of the genes in tissue development and function. Understanding the genes, how they expressed and were regulated in different tissues is a challenging and fundamental question. Therefore, we investigated the tissue-specific expression of WRKY TFs of G. max and P. vulgaris (Supplementary Table 2). In G. max, expression analysis was conducted in the roots, root hairs, leaves, stems, flowers, pods, seeds, nodules and shoot apical meristem tissue. Of the total of 145 G. max WRKY TFs, 143 were found to be expressed in either of the mentioned tissues. Expressions of GmWRKY65-1 (105.342), GmWRKY6-4 (74.668), and GmWRKY6-5 (43.341) were found to be significantly higher than those of others in the roots, suggesting their important role in root development. Expression of 24 GmWRKY was not detected in root tissue (Supplementary Table 2), indicating that these genes might not play any active role in root development. The expression level of GmWRKY65-1 (35.199) was found to be the highest in root hair, suggesting its active role in the development of root hair. Expression levels of at least eight genes were not detected in root hairs. The expression levels of GmWRKY6-4 (51.394), GmWRKY6-5 (81.847), GmWRKY26-2 (80.957), GmWRKY26-3 (72.911), and GmWRKY41-3 (72.788) were significantly higher in the leaf tissues than in any other tissues, suggests that these genes might play crucial roles in leaf development. Expression levels of at least 24 genes were not detected in leaf tissues. In stems, the expression levels of GmWRKY21-3 (47.276), GmWRKY11-6 (24.872), and GmWRKY15-2 (24.886) were found to be significantly higher than that of other genes, suggesting their role in stem development. Expression levels of at least 15 genes were not detected in the stem tissue. In flowers, the expression levels of GmWRKY26-2 (67.456), GmWRKY26-3 (51.836), GmWRKY70-6 (61.053) and GmWRKY70-7 (63.153) were found to be significantly higher than those of other genes, suggesting that these genes might plays an important role in flower development. Expression levels of at least eight genes were not detected in flower tissue (Supplementary Table 2). In pods, the expression level of GmWRKY44-2 (17.882) was found to be significantly higher than that of other genes, suggesting its important role in pod development. The expression levels of at least 19 genes were not detected in pod. In seeds, the expression level of GmWRKY21-3 (31.762) was found to be significantly higher than that of other genes, suggesting its important role in seed development. In nodules, the expression level of GmWRKY65-1 (39.186) was significantly higher than that of other genes, suggesting its important role in nodule development. The expression level of GmWRKY65-1 was higher in root and root hairs as well. Thus, GmWRKY65-1 might play a crucial role in root, root hair, and nodule development. In the shoot   Table 9. Gene duplication analysis of novel WRKY TFs identified during this study. The result showed that SbWRKY29, PvWRKY94-1 and SiWRKY59-2 are non duplicated WRKY TFs. A z-score value above four is considered duplicated, whereas a value below four was considered nonduplicated. The duplication analysis was performed as described in Pinda (pipeline for intraspecies duplication analysis) 81 .

Conclusion
Analysis of the WRKY TF gene family across the plant lineage revealed the presence of novel WRKY TFs. The monocot or dicot lineage specific grouping and orthologous-based nomenclature system of WRKY TFs might be crucial in future studies. Expression analysis showed that WRKY11-1, WRKY11-2, and WRKY11-3 were highly expressed in all tissue types in G. max and P. vulgaris. Similarly, WRKY15-2 was found to be highly expressed in the stems, roots, nodules and pods in G. max and P. vulgaris, suggesting its important role in the development of these tissues. Understanding the functional role of novel WRKY TFs will help to understand their functional and evolutionary roles.

Material and Methods
Identification of WRKY TFs. WRKY  and BLASTP program was used as well to search the WRKY TFs of the investigated plant species by using the default parameters of the phytozome database. The sequences generated by BLASTP searches were collected for further analysis to confirm whether they were WRKY TFs. All the collected sequences were then analysed using the scanprosite and MEME software to confirm the presence of WRKY domains 86,87 . Default parameters were used in the scanprosite software to identify the WRKY domains. The identified sequences that contained the WRKY domain were retained for further validation which was accomplished by subjecting the sequences to BLASTP analysis in the TAIR and rice genome annotation project database using the default parameters. Further, all the sequences were analysed using HMMER web server to identify the interactive sequence similarities 88 . Sequences that resulted in BLASTP hits with WRKY TFs in the TAIR or rice genome annotation database were confirmed as WRKY TFs.
Nomenclature of WRKY TFs. All identified WRKY TFs were assigned a specific name. Nomenclature of the WRKY TFs was assigned according to an orthology-based nomenclature system proposed by different researchers 55,56,89 . In the nomenclature system, names were assigned by considering the first letter of the genus in upper case and the first letter of the species in lower case followed by the WRKY and orthology-based number of A. thaliana or O. sativa. When redundancies were found in the nomenclature system, 2 to 4 letters of the species name were considered for the nomenclature. When more than one orthologous gene was found, they were considered as paralogous genes which were numbered by including a hyphen. For example, if there are two OsWRKY46 in O. sativa, they would be named OsWRKY46-1 and OsWRKY46-2.
Multiple sequence alignment. The multiple sequence alignment of WRKY TFs was conducted using the Multalin software (http://multalin.toulouse.inra.fr/multalin/) with default parameters which were as follows: protein weight matrix, Blossum62-12-12; gap penalties at opening, default; gap penalty at extension, default; gap penalty at extremities, none; one iteration only, any; high consensus value, 90% (default); and low consensus value, 50% (default). The multiple sequence alignment of proteins containing single and double WRKY domain was conducted separately by using the same parameters.
Construction of phylogenetic tree. Unrooted phylogenetic trees were constructed to understand the closeness and evolutionary relatedness of WRKY TFs in plants. We constructed different phylogenetic trees by grouping the WRKY TFs into different groups. Groupings included (1) monocot, dicot and lower eukaryotic plants, (2) monocot and lower eukaryotes, (3) dicot and lower eukaryotes (4) single WRKY domain (C-terminal WRKY domain)-containing WRKY TFs and (5) double WRKY domain (N-and C-terminal WRKY domain)-containing WRKY TFs. To construct the phylogenetic trees, we created clustal files for each group using the clustalW or clustal omega program 90,91 . The generated clustal files were converted to the MEGA file format, after which the MEGA files were run in MEGA6 software to construct the phylogenetic tree 92 . Different statistical parameters used to construct the phylogenetic trees included the following: analysis, phylogeny reconstruction; statistical method, maximum likelihood; test of phylogeny, bootstrap method; number of bootstrap replicates, 1000; substitution type, amino acids; model/method, Poisson model; rates among sites, uniform rates; gap/missing data treatment, partial deletion/use all sites; site coverage, 95%; ML heuristic method, nearest-neighbor-interchange (NNI); and branch swap filter, very strong.
Statistical analysis. Different statistical analyses were performed to understand the evolutionary aspects of WRKY TFs using the MEGA6 program 92 . The MEGA files of all five groups that were used in the construction of the phylogenetic tree were subjected to the MEGA6 program for statistical analysis. Tajima's relative rate test was conducted to evaluate the statistical significance of WRKY TFs to understand whether there were significant variations in molecular evolution. In this test, sequences 1, 2, and 3 were considered simultaneously where sequence 3 was considered as an out group. If n ijk was the observed number of sites in which sequences 1, 2 and 3 have protein/nucleotides I, j, and k. under the molecular clock hypothesis, E(n ijk ) = E(n jik ) irrespective of the substitution model used and whether the substitution rate varied across the sites. If the hypothesis is rejected, then the molecular clock hypothesis of evolution can be rejected for the given set of sequences 1, 2 and 3. The statistical parameters used to perform Tajima's relative rate test were as follows; analysis, Tajima's relative rate test; scope, for 3 chosen sequences; substitution type, amino acids; and gaps/missing data treatment, complete deletion.
All the data used in this study were obtained from publicly available database (https://phytozome.jgi.doe.gov/ pz/portal.html, http://congenie.org/start) available in the public domain.
Gene expression data. The expression data of G. max and P. vulgaris were downloaded from the phytomine database (https://phytozome.jgi.doe.gov/phytomine/template.do?name= One_Gene_Expression&scope= global) of phytozome. Locus ID of G. max and P. vulgaris were used for to searching the expression data in different tissue samples.