Introduction

The complexity in plant cell organization can be directly related to intricate inter-connections between genes and regulatory network inside the cell. This observation is further substantiated by studies on vast genomic and transcriptomic sequence information available in the public domain1. The inter-cellular biological circuit in higher plants is governed at several discreet levels, one of them is regulated by a specific group of DNA binding proteins, the WRKYs2 are among the ten largest families of transcription factors (TFs) in higher plants. The literature cites several papers after the first report on Ipomea batata (SPF1) in 19943, from dicots4, monocots5, orchids6 to unicellular eukaryote (Giardia lamblia) and the slime mold (Dictyostelium discoideum), revealing their evolutionary significance and complex organization7,8. The 60 amino-acid characteristic conserved sequence of WRKY transcription factor (TFs) are most commonly identified by specific hepta-nucleotide signature sequence (WRKYGQK), the W-Box, which binds to the promoter sequence of target gene(s) modulating its activity9. The large WRKY super-family is phylogenetically classified into three groups (I, II &III) based on the number of WRKY domains and type of zinc finger sequences at the C-terminal. WRKY proteins classified in group I is characterized by two WRKY domains and zinc-finger motifs (C2H2), while group II and III WRKY proteins constitute single WRKY domain. Zinc-finger motif in group II & III comprise of C2H2 and C2HC zinc-finger pattern, respectively10. Studies have shown WRKY binding motifs (W-boxes) are present in multiple numbers in WRKY responsive gene promoters11. The promoters of 83% genes of the 72 WRKYs in Arabidopsis, contain at least two perfect W-boxes (TTGACC/T), and 58% had four or more core element sequence (TTGAC)11. Some WRKYs had 11 to 12 (AtWRKY66, AtWRKY17) core elements in the promoter fragment as analysed by Dong et al.12. Interestingly, studies confirm the presence of W-boxes also in the promoter region of WRKY genes, suggesting a potentially strong transcriptional networking between WRKY proteins11. Studies using co-transfection assays have revealed role of WRKY proteins on the promoters of their own genes and on other WRKY genes thereby modulating reporter gene13. Also in-vitro DNA-protein binding assays have highlighted single WRKY binding to several target gene promoters as elucidated in WRKY53 binding to three different WRKY genes, confirming complex interactive regulatory network. Microarray experiments using Arabidopsis genome illustrated more than 70% (45 out of 61) of the WRKY genes are co-regulated with other WRKYs14 and transcription factors12. Biological role of WRKYs are being studied in several plants15. They have been found to regulate several target genes in response to stress16 including metal stress17, development18 and secondary metabolite biosynthesis1. WRKYs have shown regulatory role in pathogen-induced response12 resulting in concerted activation of variety of genes. WRKY TFs have been found to rapidly and transiently regulate gene induction in response to signalling molecule19, wounding, stress, physiological processes like flowering20, seed germination and development21 and senescence4. Expressed Sequence Tags (ESTs) and other plant database have revealed presence of several hundred WRKYs in various tissues under different physiology, stress18, cold22, stomatal movement23 and defense24,25 implying their predominant role in varied biological functions. However, under normal growth conditions also, WRKY proteins have demonstrated broad-spectrum regulatory role as reported in morphogenesis and development of trichomes26 embryo development18, senescence13, dormancy27, plant growth28, immunity29, systematic acquired resistant and metabolic pathways30.

Two decades of studies on WRKY TFs has resulted in more than 14500 WRKY genes from 165 plant species31 with most of the species from eudicots (100 species) followed by monocots (38 species) and chlorophytae (16 species)31. Legumes with 12 species contributed to 1094 WRKY genes32. No report on WRKY transcription factors has been published from Glycyrrhiza species, though transcriptome, genome and EST databases are available in public domain from G. uralensis.

Glycyrrhiza belongs to Fabaceae sub-family of Leguminoseae family. The underground roots (Licorice) of the genus (G. uralensis, G. glabra and G. echinata) are commercially valued for its pharmaceutical, flavour enhancer natural sweetener, and cosmaceutical properties33. Roots of the plant are rich in bioactive flavonoids and tri-terpenoid saponins including glycyrrhizin34. Glycyrrhizin molecule is pharmaceutically sought molecule for its multitude of bioactivities33. The global demand of the roots of Glycyrrhiza is evident by a market report, as per Transparency Market report (ALBANY, New York, April 4, 2017 /PRNewswire). Where projected compound annual growth rate was estimated to be 5.7% during 2017–2025 equivalent to USD 2,393.9 million by 2025.

Present research underlines the transcriptome-wide identification and characterization of 147 WRKY TFs from Glycyrrhiza genus. Here, we analysed 87 WRKY genes from G. glabra and 60 from G. uralensis, categorized them into different structural groups based on conserved motif composition. We also predicted functions based on STRING prediction algorithm in G. glabra WRKY members. Subsequently their expression profiles were investigated under various stress conditions in the aerial tissues of the in vitro cultured plant. We also characterized 31 promoters (between 0.5 kb to 4.1 kb) of the 87 GgWRKY genes (from the transcriptomic data) to get an insight into its functioning and regulation of secondary metabolites.

Results and Discussion

Transcriptome-wide analysis and characterization of Glycyrrhiza WRKY TF

We have done the transcriptomics of G. glabra plant and mined the data for the WRKY transcription factor. Among the 125 sequences that matched WRKY genes on BLAST and PF03106 HMM profile searches, 87 GgWRKYs had complete CDS, and 38 gene sequences were partial (Table 1). All of these were revalidated using Uniprot (https://www.uniprot.org/) resulting in 78 sequences with best hits, while 47 sequences were found unique. Out of these, 55 (UniProt hits) and 32 (unique) sequences were full length, and 23 (UniProt hits) and 15 (unique) were partial sequences (Table 1). Further, we used the publicly available G. uralensis transcriptome data as a reference source (http://ngs-data-archive.psc.riken.jp/Gur-genome/download.pl.) to retrieve the WRKY transcription factor using BLAST and PF03106 HMM profile searches, we could identify 60 WRKY genes from G. uralensis. Subsequently, all the full-length protein sequences (147) were re-examined for the presence of WRKY domains using conserved domain database (https://www.ncbi.nlm.nih.gov/cdd/) and through HMMScan (https://www.ebi.ac.uk/Tools/hmmer/search/hmmscan). The GgWRKY sequences were submitted to NCBI, and their accession numbers are given in (Supplementary File 1). The identified GuWRKY protein sequences were included in the sequence alignment and phylogenetic studies only (Supplementary File 2). The detailed GgWRKY protein sequence features are listed in Table 2. The deduced GgWRKY proteins had amino acid residues between 112 (GgWRKY67) to 760 (GgWRKY12). The coding sequences of 87 full-length GgWRKYs ranged from 339 bp (GgWRKY67) to 2283 bp (GgWRKY12), and their molecular weight (MW) varied between 13291.91 Da (GgWRKY67) to 82181.16 Da (GgWRKY12) (Table 3). The isoelectric point (pI) of 44 GgWRKYs were acidic, one (GgWRKY55) was neutral with pI value equal to 7.0, and the remaining 42 were basic proteins. According to the instability index proteins with index value higher than 40.0 is unstable35. In the present study most of the GgWRKYs were found to be unstable, having maximum instability index of 68.68 (GgWRKY34) with the exception of ten GgWRKYs namely, GgWRKY10 (30.20), GgWRKY16 (38.82), GgWRKY48 (39.86), GgWRKY50 (33.92), GgWRKY60 (39.40), GgWRKY73 (33.70), GgWRKY80 (39.40), GgWRKY83 (37.16), GgWRKY84 (32), GgWRKY86 (35.94) (Table 3). Additionally, the WoLFPSORT prediction showed that 81 GgWRKY proteins were localised in nucleus, suggesting that they play regulatory role predominantly in cell nucleus, while 4 GgWRKYs (-23, 32, 75, 84) had chloroplast orientation. GgWRKY73 had mitochondrial and GgWRKY 86 had cytoplasmic subcellular localization (Table 3). Further, five GgWRKY members (GgWRKYs 10,-33,-67,-68,-87) had WRKYGKK domain instead of the common WRKYGQK (Table 2). Earlier studies have also reported replacement of Q by K as common variant. Rice WRKYs have shown 19 variants, where the characteristic WRKY is substitution by WRRY,WSKY,WKKY, WVKY or WKKY motifs5.

Table 1 Sequence information of WRKY genes in G. glabra.
Table 2 Sequence features of WRKY genes in G. glabra.
Table 3 Physical parameters of GgWRKY genes.

Conserved domain in Glycyrrhiza WRKY members

Generally, similar domains in a protein impart similar function. Transcription factors gene families have a common conserved domain involved in DNA binding. All the 147 WRKYs (G. glabra & G. uralensis) had a distinctive hepta-peptide DNA binding sequence (WRKYG[Q/K]K), the identifying character of the WRKY family. In the present study, 28 WRKYs showed the presence of additional motif besides the WRKY domain (Figs. 1 and S1). GgWRKY55 possessed Coat family motif (30 amino acid residues) at the N-terminal while GgWRKY20 and GgWRKY21 had DivIVA super-family (63 amino acid residues) and SerS Superfamily motif (61 amino acid residues), respectively at the C-terminal. G. uralensis, on the other hand, had GuWRKY9 having Exo70 exo cyst complex subunit (338 amino acid) and Flac-arch super (GuWRKY26) at the N-terminal and PAT1 (GuWRKY23) and SGNH_hydrolase (GuWRKY60) at the C-terminal. However two motifs were common in both the species-Plant Zinc Cluster (26–40 amino acid) in 16 WRKYs and bZIP domain (42amino acid) in 2 WRKYs was present in both the species. Plant Zn cluster super-family domain was present in nine GgWRKYs (GgWRKYs63,64, 66, 69,70,71,72,73 and 86) and seven GuWRKYs (GuWRKYs1,13,18,29,35,49 and 56). All the domains simultaneously reported in the present study in both the species of Glycyrrhiza were reported individually in different plants in earlier studies. We have not come across any report mentioning DivIVA, SerS Superfamily, PAT1, Exo70 exo cyst complex subunit SGNH_hydrolase, Flac-arch super & Coat family protein in plant WRKY proteins. However, bZIP & plant zinc cluster have been reported from A. thaliana earlier36. The analysis of sequence motifs using MEME platform (http://meme.nbcr.net/meme/cgi-bin/meme.cgi)37 displaying common and unique motifs within the GgWRKY sequences are shown in Fig. 2.

Figure 1
figure 1

Classification of full-length GgWRKY amino acid sequences with different conserved domains (DivIVA, SerS, bZIP, Coat & Plant Zn cluster, WRKY). The conserved domains were investigated by CDD; * are exceptions in the classified groups and sub-groups in the phylogeny.

Figure 2
figure 2

(a) Visualization of classification of 82 GgWRKY proteins. Conserved regions of GgWRKYs were used to construct the NJ phylogenetic tree with 1000 bootstrap value. (b) Architecture of 15 conserved protein motifs in GgWRKYs. Each motif is represented in different color (Motif 1–15). The conserved motifs were predicted by MEME program.

Phylogeny

The relatedness among 136 Glycyrrhiza WRKY proteins with the 109WRKYs identified from Arabidopsis thaliana, Psychometrella patens, Human FLYWCH CRAa and GCMa were investigated (Fig. 3) and tabulated in Table 4. The phylogeny of 136 WRKY proteins from the genus Glycyrrhiza displayed 22WRKYs (17GgWRKYs & 5GuWRKYs) belonging to group-I, 98 WRKYs (61 GgWRKYS & 37 GuWRKYs) clustering in group-II and 16 WRKY members comprising of group-III (4GgWRKYS & 12GuWRKYs). Group-II was further sub-divided into five sub-groups, IIa (11), IIb (17), IIc (16 + 8), IId (17), IIe (15) and an additional novel sub-group IIf (14) based on WRKY transcription factor rules adopted in Arabidopsis9. The present paper reports few exceptions observed in the WRKY members identified in the genus Glycyrrhiza. The GuWRKY27 possessed three WRKY domains (N1, N2 &C). Few recent publications have also reported more than 2 WRKY domains in Gossypium raimondii38, Linum usitatissimum Lupinus angustifolius, Aquilegia coerulea and Setaria italic32. Phylogenetic analysis of the indicated proteins, however clubbed them into different subgroups. For example, in G. raimondii (WRKY108) the three domains (WRKY108N1, WRKY108N2 &WRKY108C) were clustered into IIc, III & IId sub-groups, respectively. In the present study, however, all the three WRKY domains (N1, N2 &C) of GuWRKY27 were found to be clustered into Group-III having Zn finger pattern similar to groupIII. This implies that the GuWRKY27 protein sequences are highly homologous to the group III WRKY member proteins, unlike the earlier published reports. Another exception was seen in GuWRKY20, where the protein was classified into group I based on the number of WRKY domains (2). However, it was clustered into group-III in the phylogenetic classification. MSA revealed that both the WRKY domains had Zn finger pattern similar to Group-III (C-X7-C-X23-HXC). The third exception was observed in GuWRKY3 whose Zn finger pattern was unlike any of the existing subgroups of group II. It could be the starting point for the evolution of a new subgroup in group II.

Figure 3
figure 3

Neighbour-Joining JTT model of phylogenetic tree comprising of 82 Glycyrrhiza glabra (maroon), 54 Glycyrrhiza uralensis (cyan blue), 70 Arabidopsis thaliana (dark green), 37 Psycometrella patens (violet), with GCMa (blue) and FLYWCH CRAa (red) WRKY domains. Suffix ‘N’ and ‘C’ indicates the N-terminal and the C-terminal of 60 amino acids WRKY domains of Group I.

Table 4 Phylogenetic classification of WRKY domains identified from G. glabra, A. thaliana, P. patens and G. uralensis WRKY proteins.

It was further observed among the 82 GgWRKY proteins in the phylogenetic tree, Group IN (17 members) clubbed with ten GgWRKYs (GgWRKYs 59,-14,-17,-28,-50,-42,-6,-57,-74,-78) belonging to Group-II with unique Zn finger pattern (C-X4-C-X22-HXH) which was not reported earlier in this group. We propose a new subgroup-IIf based on the present findings which could be the initiation of divergence into a new sub-group maintaining the characteristic WRKY domain.

The phylogenetic analysis of the 60 amino acid region of the Glycyrrhiza WRKY proteins indicated their diverse origin. The N-terminal and the C-terminal of Group I of the WRKY proteins clustered them into different clades indicating their dissimilar background. Further, the majority of the subgroup IIc proteins (8 proteins) were found to assemble with group IC indicating their common origin with respective clusters. Contrary to our results, Zhu et al.39 found that subgroup IIc WRKY domain in Triticum aestivum, originated from the N-terminal WRKY domain of group I. However, recent study on legumes have revealed that IIc sub-groups have multiple origins32. The present study also showed gathering of sub-groups IIa & IIb, while sub-group IId + IIe were clustered with group III, signifying close relationship with members with respective groups. Previously Zhang & Wang8 proposed a phylogenetic tree based evolutionary relationships which classified the WRKY gene family into four clades including groups I + IIc, groups IIa + IIb, group IId, and group IIe. But according to Rinerson et al.40 hypothesis the WRKY protein evolution may have followed two alternative paths, “Group I Hypothesis” which proposed that all WRKY proteins evolved from the C-terminal WRKY domains of group I proteins, and the “ IIa + b Separate Hypothesis” which suggested that groups IIa and IIb have evolved directly from a single domain algal gene separated from a group I-derived lineage. It is hard to explain the origin of the WRKY gene family on the basis of any one hypothesis, as mounting number of studies have demonstrated their multiple origins. Based on our phylogenetic analyses, we found that a phylogenetic cluster was a mix of WRKY genes from at least two different groups or sub-group indicating their dynamic nature.

Further, the present study could identify eleven WRKY proteins (GgWRKY-83 to-87 & GuWRKY-15, -21, -30, -41, -45 &-52), not included in the phylogenetic analyses, that possessed WRKY domain but had truncated characteristic zinc finger motif. Earlier studies on Vitis venifera41 and rice42 had also shown loss of Zn finger motifs in WRKY proteins. The phylogenetic clustering was further examined at sequence level by multiple sequence alignment (MSA).

Multiple sequence alignment of the identified WRKY proteins

The multiple sequence alignment of 60 amino acids conserved region of all the 87 GgWRKY proteins were clustered in 9 different groups and sub-groups with very high homology (>70%) as shown in (Fig. 4). Group IN displayed conserved motif 1 (DG[Y/F]NWRKYGQK[L/Q/H]VK) and zinc finger pattern of C-X4-C-X22-HXH showing conservancy with 27 GgWRKY proteins, 17 of them belonged to Group IN and 10 GgWRKYs (59,-14,-17,-28,-50,-42,-6,-57,-74,-78) belonging to new sub-group IIf, having Zn finger motif (C-X4-C-X22-HXH) which was similar to Zn finger domain of group IN unlike group II members. While Group IC had 17 GgWRKYs belonging to group IC and 8 GgWRKYs from group IIc (GgWRKYs 11,-13,-48,-60,-80,-49,-23,-58). All the 25 GgWRKY members in Group IC displayed conserved motif 2 (DG[Y/F]RWRKYGQK), zinc finger pattern of C-X4-C-X23-HXH and high identity (70.4%) as shown in Fig. 4. The third group IIa had nine GgWRKY proteins (GgWRKYs40,-37,-62,-55,-56,-61,-52,-53,-54) displaying motif 3 (DGYQWRKYGQKVT[R/K] DN) and a zinc finger motif pattern of C-X5-C-X23-HNH having 86.2% identity, except GgWRKY62 which had C-X5-C-X13-HN Zn finger pattern. Group IIb had five GgWRKYs (GgWRKYs 20,-21,-44,-38,-39) with three conserved motifs, motif 4 (WRKYG[Q/K]K), motif 5 (PRAYYRC) and motif 6 (CPVRKQVQRC) with 85.8% identity, while Group IIc comprised of 11 sequences (GgWRKYs35,-36,-76,-51,-79,-75,-4,-10,-67,-68,-33) with conserved sequence motif 4 (WRKYG[Q/K]K) and a zinc finger motif pattern of C-X4-C-X23-HXH (77.6% identity). The 60 amino acid signature sequence of Group IId proteins comprising of 10 GgWRKYs, when aligned together showed only 32% identity (Fig. 5). Six of the ten members had zinc finger, while four (GgWRKYs 69,-70,-81,82) had no zinc finger present in them; instead they had 50 amino acid zinc cluster domain at the N terminal. Based on the conservancy, when this group was divided into two sub-groups IId1 (GgWRKYs 69,-70,-81,-82) & IId2 (GgWRKYs 72, -64,-71,-73,-63,-66,) the identity was significantly increased from 32% to 54.9% and 83.7%, respectively. The members clustered in group IId1 showed conserved motif 7(WRKYGQKPIKGSP) and no zinc finger at the C-terminal end, while subgroup IId2 displayed conservancy of three motifs- 7(WRKYGQKPIKGSP), 8(PRGYYKC) & 9(RGCPARKHVER) along with a common zinc finger pattern C-X5-C-X23-HNH (Fig. 5). Further, when conserved domain sequence of 60amino acid of all the 10 GgWRKYs of sub-group IId, was increased to 110 amino acids the two sub-groups (IId1&IId2) combined into a single group (IId) displaying 70.68% identity among all the members. We also confirmed the conservancy of each group and subgroup with the WRKY members belonging to A. thaliana and G. uralensis (Figs. 4 and 5). The MSA further proves the dynamic nature of GgWRKYs.

Figure 4
figure 4

Multiple sequence alignment (MSA) of conserved GgWRKY domain. The alignment was performed using Clustal W program and displayed using DNAMAN software. Conserved motifs (1–10) and type of zinc-finger pattern are indicated within groups or sub-groups. Blue color represents 100% sequence identity. Pink color is for more than 75% while cyan color is for less than 75% sequence identity.

Figure 5
figure 5

Multiple sequence alignment (MSA) profile of group IId (10 sequences). Initially conserved 60 amino acids region is used to build alignment that showed low sequence identity (32%). When it was separated in two groups (IId1&IId2), identity increased significantly (54.9& 83.7%). Sub-group IId1 (4 sequences) with sequences upstream to WRKY domain having Plant Zinc cluster with motif 7and no zinc finger; subgroup IId2 (6 sequences) with 7, 8, 9 motifs and Zn finger. When four sequences of sub group IId1 were extended 50 amino acids towards N’terminal (total110 amino acid), sequence identity of sub groupIId increased to 70.68%.

Promoter analysis

The upstream region of 31 GgWRKY genes was examined for the presence of Cis-regulatory elements. Several stress-responsive elements like UV, salinity, ABA, GA signalling, etiolation, water stress, auxin and sulphur responsive elements were identified (Fig. 6). Also, several copies of WRKY binding motifs were identified in the promoter region of GgWRKY genes. The DNA binding WRKY motifs in the promoter region ranged from 1 (GgWRKY20, 23) to 11 (GgWRKYs 18 &62). Overall, twenty-seven GgWRKYs had three or more W-boxes in their promoter region. Observation revealed presence of multiple W-box elements mostly in the stress-related genes, which is following the earlier studies6,12. Additionally promoters of several glycyrrhizin biosynthesis genes (CYP88D6, CYP72A154 & squalene epoxidase) contained W boxes (unpublished data) suggesting regulatory role of WRKY in glycyrrhizin biosynthesis, thereby providing a platform to understand its regulation.

Figure 6
figure 6

Analysis of cis-regulatory elements (CREs) in GgWRKY promoter region. Total ten stress responsive elements were mapped on sense and anti-sense strand using RSAT tool.

Protein-protein interaction

The protein-protein interaction of GgWRKYs was predicted by STRING43 with A. thaliana (taxonomic ID 3702) as a model using Markov clustering (MCL) having inflation factor of 8.5. The STRING software is a prediction pipeline for deducing protein-protein associations from co-expression data and interaction conservation (Fig. S2; Supplementary File 3). It predicts interaction between orthologs in taxonomically different organism. The corresponding GgWRKY orthologs selected had more than 60% protein sequence homology having one WRKY domain (PF03106) as predicted by Pfam and three domains (IPR003657, IPR003657 and IPR017412) as analysed by INTERPRO. The analysis revealed 74, 08 and 03 GO term significantly enriched in biological processes, molecular function and cellular components, respectively (Supplementary File 4). The MCL clustering displayed 8 distinct groups, largest being associated with 8 WRKY proteins (red) showing strong interaction (AtWRKYs 15,22,11,33,40,53,30 &48) corresponding to predicted orthologs GgWRKYs 73, 29, 73, 15, 53, 32, 32 & 67, respectively (Fig. 7; Supplementary File 5). These specific associations indicated that these proteins jointly contributed to a shared function of cis or trans in nature as inferred from curated databases or experimentally determined data available in public domains43. The AtWRKYs and corresponding GgWRKYs were shown to be involved in various biological processes including ROS induced modulation, plant growth and osmotic stress (AtWRKY15/GgWRKY73), development (AtWRKY22/GgWRKY29), Jasmonic acid-induced response (AtWRKY11/GgWRKY73), wound-induced response and positive regulator of stress (AtWRKY33/GgWRKY15), senescence (AtWRKY40/GgWRKY53), leaf development and senescence (AtWRKY53/GgWRKY32), abiotic stress and senescence (AtWRKY30/GgWRKY32), and hormonal signal response and defense (AtWRKY48/GgWRKY67). Strong association between 8 AtWRKYs and corresponding ortholog GgWRKYs indicated co-regulation of several biological processes related to senescence, Jasmonate response, hormonal signaling and wound induced response (Supplementary File 5).

Figure 7
figure 7

Protein-Protein interaction of GgWRKYs transcription factor based on AtWRKYs orthologs as predicted by STRING search tool.

The associated proteins need not physically connect in a protein-protein interaction of a specific step instead, they may form functional protein linkages especially in transcriptional or post-transcriptional regulation of a process. Also, it has been observed that evolutionarily related proteins usually maintain their three-dimensional structure, even when they have diverged43,44. This interaction between orthologs is expected to display high degree of interaction conservation more so in indirect or transient types of protein-protein associations. Based on protein conservancy of GgWRKYs with AtWRKYs, we assessed the putative functions of GgWRKYs and verified the expression profile of few of the predicted functions of GgWRKYs experimentally in Lab (Table 5) under abiotic stress.

Table 5 AtWRKYs, their induction factor and experimentally verified responses in GgWRKYs.

Real-time expression analysis

The A. thaliana based protein conservancy for the functional prediction of putative orthologs in G. glabra was experimentally performed. The expression profile of twenty-five GgWRKY genes was investigated post-hormonal treatments (NAA & GA3) and under eight abiotic stress treatments including carbon starvation, salinity, heat, cold, dark, UV, senescence and wounding administered to the aerial tissues of the in-vitro cultured G. glabra plant. Out of the 25 GgWRKYs examined, eight GgWRKYs responded to the NAA treatment (Fig. 8). As can be seen from the heat map, transcripts of GgWRKYs 8, 15 & 29 accumulated maximum (4.1, 3.3 &1.6 folds, respectively) between 0.5 to 1.00 hrs, GgWRKY55 took longer (1.30 hrs) to display its maxima (17.5 folds). GgWRKYs 54 & 56 were mostly up-regulated all the time, while GgWRKYs 4 & 38 were down-regulated in the specified time of study. It seems GgWRKY56 & GgWRKY 38 had definite positive (3.3 folds) & negative regulatory (0.001folds) effects, respectively on the aerial tissues of the plant treated with auxin. GA3 treatment, on the other hand (Fig. 8), revealed GgWRKY 58 was highly up-regulated (257.3 folds) in the aerial tissues of the plant, while GgWRKY15 was up-regulated (43.6 fold) in the underground tissues of the plant as compared to the control. Most of the GgWRKYs responded within 1.5 hrs of GA3 treatment, except GgWRKY 20 which took longer to show their maxima (2.0 hrs) in the root tissues. The results inferred from the present study were compared with the earlier published reports on the functions of AtWRKYs45 which are presented in Table 5. Of the 25 GgWRKYs assessed for abiotic stress treatment, maximum showed response to post-cold treatment (17), followed by dark (13). Nine GgWRKYs responded to senescence and salinity, while eight triggered a response on carbon starvation. Maximum number of GgWRKYs were up-regulated (10) after dark treatment followed by senescence (9). Darkness induced up-regulation of GgWRKYs 5, 24, 36, 38, 40, 45, 51, 53, 54 and 57, while GgWRKYs 56, 69 & 70 were down-regulated. Nine GgWRKYs 2, 5, 8, 15, 29, 38, 45, 54 & 55 were up-regulated during senescence, GgWRKYs 5, 14, 24 & 54 were down-regulated under saline conditions. The transcript levels of GgWRKYs 14, 24 & 36 were more under heat stress, while GgWRKY 24 was up-regulated on UV treatment. The injured plant showed up-regulated transcripts of GgWRKY 54, while GgWRKY53 was found to be down-regulated. Cold treated samples showed higher transcript levels of GgWRKYs 15, 53 & 54, while GgWRKYs14, 40, 55, 56, 57, 58, 59, 62, 69 &70 were down-regulated. The Carbon starved plants showed up-regulation of only GgWRKY51 while GgWRKYs 2, 4, 8, 15, 36, 45 & 58 were found to be down-regulated (Fig. 9). Significantly up-regulated (P ≤ 0.001) GgWRKYs were observed only in senescence (GgWRKYs 45&15), while in salinity, GgWRKY36 was significantly down-regulated. Out of the ten different treatments performed to assess the role of GgWRKYs in abiotic stress, predicted by STRING based on AtWRKY protein conservancy, the response of 11 GgWRKYs corroborated very well with 15 AtWRKYs whose functions were reported in literature (Table 5). Among the 25 GgWRKYs examined, 23 responded to abiotic stress, 17 were induced by hormone while 13 were common to both, suggesting role of hormone under stress conditions. Further study on these functionally assigned GgWRKYs will throw light on their role in underlying molecular mechanism. On comparing the experimental data with the STRING predicted data, it was found that our results corroborated well with the earlier reports on the induction of AtWRKY4 on senescence, AtWRKY40 on wounding, AtWRKYs 2, 28 &33 during salinity and AtWRKY33 under cold treatments. Few AtWRKYs whose functions were not assigned, like AtWRKYs 21, 24, 31, 38, 61, 69 and 71, were also designated putative function based on identity percentage.

Figure 8
figure 8

GgWRKY genes are represented as rows and treatment time duration as columns in the matrix. Expression analysis of selected GgWRKY genes displaying differential expression pattern in shoot and roots under various hormonal stress. Heat map showing- Cluster analysis of GgWRKY genes according to their expression profiles in (a) shoots and (b) roots after GA3 treatment for 0.5, 1, 2, 4, 8, 12 and 24 h time interval; (c) Cluster analysis of GgWRKY genes according to their expression profiles in shoots after NAA treatment for 0.5, 1, 1.5, 2 and 3 h time interval.

Figure 9
figure 9

Expression profiles of selected GgWRKY genes under eight different stresses. The Y-axis indicates relative expression level and X-axis indicates control shoot tissues (C) and treated shoot tissues (T). (a) expression patterns under etiolated conditions; (b–h) expression profiles under heat, UV, wounding, cold, dark, carbon starvation and salinity, respectively. Actin was used as internal reference. Three biological replicates were used to calculate error bars using standard deviation. Asterisks indicate that the corresponding gene was significantly up- or down regulated in a given treatment (*P ≤ 0.05; **P ≤ 0.01; ***P ≤ 0.001).

In any biological process, understanding the role of transcription factor provides an insight into its regulatory mechanism. WRKY transcriptional factors have been extensively studied in a plant for plant growth, development, and response to biotic and abiotic stresses. However, WRKY genes present in Glycyrrhiza species have not been elucidated. In conclusion, we identified and characterized 147 full length putative WRKY genes in the genus Glycyrrhiza. These putative genes were grouped based on the number of WRKY domains & zinc finger pattern and further analysed for various properties like molecular weight, iso-electric point, instability index, sub-cellular location. The phylogenetic analysis categorised more than one group/sub-group together, indicating their multiple origins. The present paper highlights several findings not reported earlier, like the novel Zn finger motif of C-X4-CX22-HXH type (sub-group IIf). Also these group-II members shared homology with group IN WRKY members, unlike the other members of group II. This paper also reports several additional domains (DivIVA, SerS, Coat, Exo70 exo cyst complex subunit, Flac-arch super, PAT1and SGNH_hydrolase) apart from the conserved WRKY domain in the WRKY proteins. MSA based 60 amino acid signature sequence of group IId showed very low sequence identity (32%), however when its length was increased to 110 amino acid the identity increased to 70.7%. A closer look at the subgroup IId showed presence of 50 amino acid plant Zn cluster domain upstream to the WRKY domain in four members. However, characteristic Zn finger motif was absent in these members.

Additionally, putative functions were assigned to the identified GgWRKYs, based on STRING database which comprised of both theoretically reported and experimentally verified data. Verification of the data in the Lab displayed 11 out of 15 functions as assigned. The study provides significant evidence to further investigate and validate the role of WRKYs in Glycyrrhiza species in growth, under stress condition and in secondary metabolite biosynthesis.

Materials and Methods

Identification and sequence annotation of WRKY genes

Transcriptome-wide identification of WRKY genes in G. glabra and G. uralensis transcriptome data was done by local similarity (tblastn) search and HMM profile methods. Initially, seventy AtWRKY proteins were downloaded from Arabidopsis Information Resource (TAIR; http://www.Arabidopsis.org/), and HMM profile of WRKY family with accession number PF03106 was retrieved from the Pfam protein family database (https://pfam.xfam.org/). The A. thaliana (AtWRKYs) and PF03106 profile were used as a query sequence to search against the transcriptome data of G. glabra and G. uralensis. An e-value cut off of 1e−50 was applied for the homologue recognition. Parsing the BLAST data from G. glabra, a total of 125 contig hits were found. All these contigs were further analysed in ORF finder (https://www.ncbi.nlm.nih.gov/orffinder/) to get the full-length CDS of 87 GgWRKY sequences. Publicly available transcriptome database of G. uralensis (http://ngs-data-archive.psc.riken.jp/Gur-genome/download.pl.) was used to get 60 GuWRKYs. The retrieved coding sequences (CDSs) were then translated by ExPASy translate (https://web.expasy.org/translate/) tool and validated using the Uniprot protein database (https://www.uniprot.org/), conserved domain database (https://www.ncbi.nlm.nih.gov/cdd/) and HMMScan (https://www.ebi.ac.uk/Tools/hmmer/search/hmmscan). The molecular weight (MW), Theoretical isoelectric point (pI), instability index, aliphatic index, Grand average of hydropathicity (GRAVY) of GgWRKY proteins were predicted via the ProtParam (http://web.expasy.org/protparam/). Additionally, subcellular localisation was also predicted by an advanced protein subcellular localisation prediction tool WoLFPSORT (https://wolfpsort.hgc.jp/).

Multiple sequence alignment, phylogenetic analysis and classification

The multiple sequence alignment (MSA) of 245 WRKY proteins was performed using 82 WRKY proteins of G. glabra (GgWRKY), 54 WRKY proteins from G. uralensis (GuWRKY), 70 from A. thaliana (AtWRKYs), 37 from P. patens (PpWRKYs) and one each from Human FLYWCH CRAa and GCMa. The protein sequences of Arabidopsis were downloaded from TAIR (http://www.Arabidopsis.org/), GuWRKYs from (http://ngs-data-archive.psc.riken.jp/Gur-genome/download.pl.), PpWRKYs were obtained from P. patens v3.3 (https://phytozome.jgi.doe.gov/pz/portal.html#!info?alias = Org_Ppatens), Human FLYWCH CRAa (EAW85440) and GCMa (BAA13651) were retrieved from NCBI (http://www.ncbi.nlm.nih.gov/protein/). The conserved regions of 60 amino acids for the WRKY proteins were searched using HMMScan and aligned using CLUSTALW for the construction of the phylogenetic tree. For the GgWRKY based phylogenetic tree, complete protein sequences were used. The tree was constructed using MEGA 7.0 with neighbor-joining method using JTT substitution model and pair-wise deletion method with 1000 bootstrap value. The 60 amino acid conserved region of MSA of G. glabra was visualised using DNAMAN. The MSA included conserved region of WRKY members representing each group and subgroup from A. thaliana and G. uralensis as reference.

Protein-protein interaction analysis and motif detection

The conserved motifs of GgWRKY proteins were analyzed using Multiple Expectation Maximization for Motif Elicitation (MEME: http://meme-suite.org/tools/meme) with the following parameters: minimum and maximum motif widths 6 and 50, respectively and the maximum number of motifs 15. Protein-protein interactions were predicted by STRING43 with A. thaliana as model using Markov clustering with inflation factor of 8.5.

Analysis of cis-regulatory elements in GgWRKYs promoter regions

Promoter sequences of 31 GgWRKYs of up to 2.5 kb (kilobase) upstream to the transcription start site were retrieved manually (Supplementary File 6). These promoter sequences were used as queries to scan the presence of various Cis-regulatory elements in Plant Cis-acting Regulatory DNA Elements (PLACE, http://www.dna.affrc.go.jp/PLACE/)46. The position of identified CREs (biotic and abiotic stress-responsive elements) was mapped on both sense and anti-sense strand using RSAT47 (http://rsat.sb-roscoff.fr/feature-map_form.cgi) drawing tool.

Plant material and treatments

Five months old in-vitro cultured plants grown in SPB medium48 under controlled conditions of 25 °C (±1.5) temperature and a 16 h light/8 h dark cycle (light intensity of 200mmol m–2 s–1), were exposed to various treatments including hormone, temperature, salinity, senescence and wounding. The plantlets, grown in liquid SPB medium were individually subjected to 50 µM auxin (NAA) for 0.5, 1.0, 1.5, 2.0 & 3.0 hrs, and 10 µM of gibberellin (GA3) treatments for 0.5, 1.0, 2.0, 4.0, 8.0, 12 & 24 hrs. Controls were sprayed with water. Different sets of plants were independently subjected to different abiotic treatments like NaCl (500 mM) for 72 hrs, dark, cold (4 °C) and heat (55 °C) treatments for 48, 24 and 8 hrs, respectively. For the Ultra-violet treatment, plants were kept under UV-C for 30 minutes. Mechanically injured plants were examined after 8 hrs of injury. Yellow aerial tissues of plants were used for senescence study, and Carbon starvation was given to the plant for 48 hrs in SPB medium having no sugar. All the respective controls were kept under culture conditions. The control and treated plants were harvested at the appropriate times as indicated, frozen in liquid nitrogen and stored at −80 °C for RNA extraction. Each treatment was used in triplicate and was repeated at least twice.

RNA extraction and quantitative real-time reverse transcription PCR (qRT-PCR)

Total RNA of control and treated shoots and root tissues were extracted using the Pure Link RNA Mini Kit (Invitrogen, US). RNA integrity was analysed on a 1.5% agarose gel and quantity was determined using a NanoDrop 2000C spectrophotometer (Thermo Scientific, USA). cDNA synthesis was carried out using SuperScript™ VILO™ cDNA Synthesis Kit (Thermo Scientific, USA). qRT-PCRs were performed using the SYBR Green PCR Master Mix (Takara, Japan) and carried out in triplicate for each tissue sample. Gene-specific RT primers were designed manually (Supplementary File 7). The amplicons size ranged between 200 to 250 bp. Actin gene was selected as an internal reference gene. The amplification was done in a ten μl reaction volume, which contained 5.0 μl of SYBR Green PCR Master Mix, 0.2 μl of each primer (10 pc), 0.2 μl of ROX, 1.0 μl cDNA template (150 ng/μl), and 3.4 μl ddH2O. PCRs with no-template controls were also performed for each primer pair. The real-time PCRs were performed employing7500 Fast Real-Time PCR System and software (Applied Biosystems, USA). All the PCRs were performed under following conditions: 30 sec at 95 °C, 3 sec at 95 °C, respective optimized Tm for 1 min (40 cycles) followed by 95 °C (15 seconds), 60 °C (30 sec) and 95 °C (15 sec) in MicroAmp fast reaction tubes (Applied Biosystems, USA). The specificity of amplicons was verified by melting curve analysis (55 to 95 °C) after 40 cycles.