A Comprehensive Transcriptome-Wide Identification and Screening of WRKY Gene Family Engaged in Abiotic Stress in Glycyrrhiza glabra

The study reports 147 full-length WRKY genes based on the transcriptome analysis of Glycyrrhiza genus (G. glabra and G. uralensis). Additional motifs in G. glabra included DivIVA (GgWRKY20) and SerS Superfamily (GgWRKY21) at the C-terminal, and Coat family motifs (GgWRKY55) at the N-terminal of the proteins, while Exo70 exo cyst complex subunit of 338 amino acid (GuWRKY9) was present at the N-terminal of G. uralensis only. Plant Zn cluster super-family domain (17 WRKYs) and bZIP domain (2 WRKYs) were common between the two species. Based on the number of WRKY domains, sequence alignment and phylogenesis, the study identified GuWRKY27 comprising of 3 WRKY domains in G. uralensis and a new subgroup-IIf (10 members), having novel zinc finger pattern (C-X4-C-X22-HXH) in G. glabra. Multiple WRKY binding domains (1–11) were identified in the promoter regions of the GgWRKY genes indicating strong interacting network between the WRKY proteins. Tissue-specific expression of 25 GgWRKYs, under normal and treated conditions, revealed 11 of the 18 induction factor triggered response corroborating to response observed in AtWRKYs. The study identified auxin-responsive GgWRKY 55 & GgWRKY38; GA3 responsive GgWRKYs15&59 in roots and GgWRKYs8, 20, 38, 57 &58 in the shoots of the treated plant. GgWRKYs induced under various stresses included GgWRKY33 (cold), GgWRKY4 (senescence), GgWRKYs2, 28 & 33 (salinity) and GgWRKY40 (wounding). Overall, 23 GgWRKYs responded to abiotic stress, and 17 WRKYs were induced by hormonal signals. Of them 13 WRKYs responded to both suggesting inter-connection between hormone signalling and stress response. The present study will help in understanding the transcriptional reprogramming, protein-protein interaction and cross-regulation during stress and other physiological processes in the plant.

The complexity in plant cell organization can be directly related to intricate inter-connections between genes and regulatory network inside the cell. This observation is further substantiated by studies on vast genomic and transcriptomic sequence information available in the public domain 1 . The inter-cellular biological circuit in higher plants is governed at several discreet levels, one of them is regulated by a specific group of DNA binding proteins, the WRKYs 2 are among the ten largest families of transcription factors (TFs) in higher plants. The literature cites several papers after the first report on Ipomea batata (SPF1) in 1994 3 , from dicots 4 , monocots 5 , orchids 6 to unicellular eukaryote (Giardia lamblia) and the slime mold (Dictyostelium discoideum), revealing their evolutionary significance and complex organization 7,8 . The 60 amino-acid characteristic conserved sequence of WRKY transcription factor (TFs) are most commonly identified by specific hepta-nucleotide signature sequence (WRKYGQK), the W-Box, which binds to the promoter sequence of target gene(s) modulating its activity 9 . The large WRKY super-family is phylogenetically classified into three groups (I, II &III) based on the number of WRKY domains and type of zinc finger sequences at the C-terminal. WRKY proteins classified in group I is characterized by two WRKY domains and zinc-finger motifs (C 2 H 2 ), while group II and III WRKY proteins constitute single WRKY domain. Zinc-finger motif in group II & III comprise of C 2 H 2 and C 2 HC zinc-finger pattern, respectively 10 . Studies have shown WRKY binding motifs (W-boxes) are present in multiple numbers in WRKY responsive gene promoters 11 . The promoters of 83% genes of the 72 WRKYs in Arabidopsis, contain at least two perfect W-boxes (TTGACC/T), and 58% had four or more core element sequence (TTGAC) 11 . Some WRKYs had 11 to 12 (AtWRKY66, AtWRKY17) core elements in the promoter fragment as analysed by Dong et al. 12 . Interestingly, studies confirm the presence of W-boxes also in the promoter region of WRKY genes, suggesting a potentially strong transcriptional networking between WRKY proteins 11 . Studies using co-transfection assays have revealed role of WRKY proteins on the promoters of their own genes and on other WRKY genes thereby modulating reporter gene 13 . Also in-vitro DNA-protein binding assays have highlighted single WRKY binding to several target gene promoters as elucidated in WRKY53 binding to three different WRKY genes, confirming complex interactive regulatory network. Microarray experiments using Arabidopsis genome illustrated more than 70% (45 out of 61) of the WRKY genes are co-regulated with other WRKYs 14 and transcription factors 12 . Biological role of WRKYs are being studied in several plants 15 . They have been found to regulate several target genes in response to stress 16 including metal stress 17 , development 18 and secondary metabolite biosynthesis 1 . WRKYs have shown regulatory role in pathogen-induced response 12 resulting in concerted activation of variety of genes. WRKY TFs have been found to rapidly and transiently regulate gene induction in response to signalling molecule 19 , wounding, stress, physiological processes like flowering 20 , seed germination and development 21 and senescence 4 . Expressed Sequence Tags (ESTs) and other plant database have revealed presence of several hundred WRKYs in various tissues under different physiology, stress 18 , cold 22 , stomatal movement 23 and defense 24,25 implying their predominant role in varied biological functions. However, under normal growth conditions also, WRKY proteins have demonstrated broad-spectrum regulatory role as reported in morphogenesis and development of trichomes 26 embryo development 18 , senescence 13 , dormancy 27 , plant growth 28 , immunity 29 , systematic acquired resistant and metabolic pathways 30 .
Two decades of studies on WRKY TFs has resulted in more than 14500 WRKY genes from 165 plant species 31 with most of the species from eudicots (100 species) followed by monocots (38 species) and chlorophytae (16 species) 31 . Legumes with 12 species contributed to 1094 WRKY genes 32 . No report on WRKY transcription factors has been published from Glycyrrhiza species, though transcriptome, genome and EST databases are available in public domain from G. uralensis.
Glycyrrhiza belongs to Fabaceae sub-family of Leguminoseae family. The underground roots (Licorice) of the genus (G. uralensis, G. glabra and G. echinata) are commercially valued for its pharmaceutical, flavour enhancer natural sweetener, and cosmaceutical properties 33 . Roots of the plant are rich in bioactive flavonoids and tri-terpenoid saponins including glycyrrhizin 34 . Glycyrrhizin molecule is pharmaceutically sought molecule for its multitude of bioactivities 33 . The global demand of the roots of Glycyrrhiza is evident by a market report, as per Transparency Market report (ALBANY, New York, April 4, 2017 /PRNewswire). Where projected compound annual growth rate was estimated to be 5.7% during 2017-2025 equivalent to USD 2,393.9 million by 2025.
Present research underlines the transcriptome-wide identification and characterization of 147 WRKY TFs from Glycyrrhiza genus. Here, we analysed 87 WRKY genes from G. glabra and 60 from G. uralensis, categorized them into different structural groups based on conserved motif composition. We also predicted functions based on STRING prediction algorithm in G. glabra WRKY members. Subsequently their expression profiles were investigated under various stress conditions in the aerial tissues of the in vitro cultured plant. We also characterized 31 promoters (between 0.5 kb to 4.1 kb) of the 87 GgWRKY genes (from the transcriptomic data) to get an insight into its functioning and regulation of secondary metabolites.

Results and Discussion
transcriptome-wide analysis and characterization of Glycyrrhiza WRKY TF. We have done the transcriptomics of G. glabra plant and mined the data for the WRKY transcription factor. Among the 125 sequences that matched WRKY genes on BLAST and PF03106 HMM profile searches, 87 GgWRKYs had complete CDS, and 38 gene sequences were partial ( Table 1). All of these were revalidated using Uniprot (https:// www.uniprot.org/) resulting in 78 sequences with best hits, while 47 sequences were found unique. Out of these, 55 (UniProt hits) and 32 (unique) sequences were full length, and 23 (UniProt hits) and 15 (unique) were partial sequences (Table 1). Further, we used the publicly available G. uralensis transcriptome data as a reference source (http://ngs-data-archive.psc.riken.jp/Gur-genome/download.pl.) to retrieve the WRKY transcription factor using BLAST and PF03106 HMM profile searches, we could identify 60 WRKY genes from G. uralensis. Subsequently, all the full-length protein sequences (147) were re-examined for the presence of WRKY domains using conserved domain database (https://www.ncbi.nlm.nih.gov/cdd/) and through HMMScan (https://www.ebi.ac.uk/Tools/ hmmer/search/hmmscan). The GgWRKY sequences were submitted to NCBI, and their accession numbers are given in (Supplementary File 1). The identified GuWRKY protein sequences were included in the sequence alignment and phylogenetic studies only (Supplementary File 2). The detailed GgWRKY protein sequence features are listed in Table 2. The deduced GgWRKY proteins had amino acid residues between 112 (GgWRKY67) to 760 (GgWRKY12). The coding sequences of 87 full-length GgWRKYs ranged from 339 bp (GgWRKY67) to 2283 bp (GgWRKY12), and their molecular weight (MW) varied between 13291.91 Da (GgWRKY67) to 82181.16 Da  GgWRKY2  MK511240 1155  384  1  2(WRKYGQK)  C-X4-C-X22-HNH  C2H2   GgWRKY3  MK511241 804  267  2e  WRKYGQK  C-X5-C-X23-HNH  C2H2   GgWRKY4  MK511242 840  279  2c  WRKYGQK  C-X4-C-X23-HNH  C2H2   GgWRKY5  MK511243 762  253  1  2(WRKYGQK)  C-X4-C-X22-HNH  C2H2   GgWRKY6  MK511244 603  200  2f  WRKYGQK  C-X4-C-X22-HNH  C2H2   GgWRKY7  MK511245 1767  588  1  Continued www.nature.com/scientificreports www.nature.com/scientificreports/ (GgWRKY12) ( Table 3). The isoelectric point (pI) of 44 GgWRKYs were acidic, one (GgWRKY55) was neutral with pI value equal to 7.0, and the remaining 42 were basic proteins. According to the instability index proteins with index value higher than 40.0 is unstable 35 . In the present study most of the GgWRKYs were found to be unstable, having maximum instability index of 68.68 (GgWRKY34) with the exception of ten GgWRKYs namely, GgWRKY10 (30.  (Table 3). Additionally, the WoLFPSORT prediction showed that 81 GgWRKY proteins were localised in nucleus, suggesting that they play regulatory role predominantly in cell nucleus, while 4 GgWRKYs (-23, 32, 75, 84) had chloroplast orientation. GgWRKY73 had mitochondrial and GgWRKY 86 had cytoplasmic subcellular localization (Table 3). Further, five GgWRKY members (GgWRKYs 10,-33,-67,-68,-87) had WRKYGKK domain instead of the common WRKYGQK (     36 . The analysis of sequence motifs using MEME platform (http://meme. nbcr.net/meme/cgi-bin/meme.cgi) 37 displaying common and unique motifs within the GgWRKY sequences are shown in Fig. 2. Phylogeny. The relatedness among 136 Glycyrrhiza WRKY proteins with the 109WRKYs identified from Arabidopsis thaliana, Psychometrella patens, Human FLYWCH CRAa and GCMa were investigated (Fig. 3) and tabulated in Table 4. The phylogeny of 136 WRKY proteins from the genus Glycyrrhiza displayed 22WRKYs (17GgWRKYs & 5GuWRKYs) belonging to group-I, 98 WRKYs (61 GgWRKYS & 37 GuWRKYs) clustering in group-II and 16 WRKY members comprising of group-III (4GgWRKYS & 12GuWRKYs). Group-II was further sub-divided into five sub-groups, IIa (11), IIb (17), IIc (16 + 8), IId (17), IIe (15) and an additional novel sub-group IIf (14) based on WRKY transcription factor rules adopted in Arabidopsis 9 . The present paper reports few exceptions observed in the WRKY members identified in the genus Glycyrrhiza. The GuWRKY27 possessed three WRKY domains (N1, N2 &C). Few recent publications have also reported more than 2 WRKY domains in Gossypium raimondii 38 , Linum usitatissimum Lupinus angustifolius, Aquilegia coerulea and Setaria italic 32 . Phylogenetic analysis of the indicated proteins, however clubbed them into different subgroups. For example, in G. raimondii (WRKY108) the three domains (WRKY108N1, WRKY108N2 &WRKY108C) were clustered into IIc, III & IId sub-groups, respectively. In the present study, however, all the three WRKY domains (N1, N2 &C) of GuWRKY27 were found to be clustered into Group-III having Zn finger pattern similar to groupIII. This implies that the GuWRKY27 protein sequences are highly homologous to the group III WRKY member proteins, unlike the earlier published reports. Another exception was seen in GuWRKY20, where the protein was classified into group I based on the number of WRKY domains (2). However, it was clustered into group-III in the phylogenetic classification. MSA revealed that both the WRKY domains had Zn finger pattern similar to Group-III (C-X 7 -C-X 23 -HXC). The third exception was observed in GuWRKY3 whose Zn finger pattern was unlike any of the existing subgroups of group II. It could be the starting point for the evolution of a new subgroup in group II.
The phylogenetic analysis of the 60 amino acid region of the Glycyrrhiza WRKY proteins indicated their diverse origin. The N-terminal and the C-terminal of Group I of the WRKY proteins clustered them into different clades indicating their dissimilar background. Further, the majority of the subgroup IIc proteins (8 proteins) were found to assemble with group IC indicating their common origin with respective clusters. Contrary to our www.nature.com/scientificreports www.nature.com/scientificreports/ results, Zhu et al. 39 found that subgroup IIc WRKY domain in Triticum aestivum, originated from the N-terminal WRKY domain of group I. However, recent study on legumes have revealed that IIc sub-groups have multiple origins 32 . The present study also showed gathering of sub-groups IIa & IIb, while sub-group IId + IIe were clustered with group III, signifying close relationship with members with respective groups. Previously Zhang & Wang 8 proposed a phylogenetic tree based evolutionary relationships which classified the WRKY gene family into four clades including groups I + IIc, groups IIa + IIb, group IId, and group IIe. But according to Rinerson et al. 40 hypothesis the WRKY protein evolution may have followed two alternative paths, "Group I Hypothesis" which proposed that all WRKY proteins evolved from the C-terminal WRKY domains of group I proteins, and the " IIa + b Separate Hypothesis" which suggested that groups IIa and IIb have evolved directly from a single domain algal gene separated from a group I-derived lineage. It is hard to explain the origin of the WRKY gene family on the basis of any one hypothesis, as mounting number of studies have demonstrated their multiple origins. Based on our phylogenetic analyses, we found that a phylogenetic cluster was a mix of WRKY genes from at least two different groups or sub-group indicating their dynamic nature.   www.nature.com/scientificreports www.nature.com/scientificreports/ end, while subgroup IId2 displayed conservancy of three motifs-7(WRKYGQKPIKGSP), 8(PRGYYKC) & 9(RGCPARKHVER) along with a common zinc finger pattern C-X 5 -C-X 23 -HNH (Fig. 5). Further, when conserved domain sequence of 60amino acid of all the 10 GgWRKYs of sub-group IId, was increased to 110 amino acids the two sub-groups (IId1&IId2) combined into a single group (IId) displaying 70.68% identity among all the members. We also confirmed the conservancy of each group and subgroup with the WRKY members belonging to A. thaliana and G. uralensis (Figs. 4 and 5). The MSA further proves the dynamic nature of GgWRKYs. promoter analysis. The upstream region of 31 GgWRKY genes was examined for the presence of Cis-regulatory elements. Several stress-responsive elements like UV, salinity, ABA, GA signalling, etiolation, water stress, auxin and sulphur responsive elements were identified (Fig. 6). Also, several copies of WRKY binding motifs were identified in the promoter region of GgWRKY genes. The DNA binding WRKY motifs in the promoter region ranged from 1 (GgWRKY20, 23) to 11 (GgWRKYs 18 &62). Overall, twenty-seven GgWRKYs had three or more W-boxes in their promoter region. Observation revealed presence of multiple W-box elements mostly in the stress-related genes, which is following the earlier studies 6,12 . Additionally promoters of several glycyrrhizin biosynthesis genes (CYP88D6, CYP72A154 & squalene epoxidase) contained W boxes (unpublished data) suggesting regulatory role of WRKY in glycyrrhizin biosynthesis, thereby providing a platform to understand its regulation.
The associated proteins need not physically connect in a protein-protein interaction of a specific step instead, they may form functional protein linkages especially in transcriptional or post-transcriptional regulation of a process. Also, it has been observed that evolutionarily related proteins usually maintain their three-dimensional structure, even when they have diverged 43,44 . This interaction between orthologs is expected to display high degree of interaction conservation more so in indirect or transient types of protein-protein associations. Based on protein conservancy of GgWRKYs with AtWRKYs, we assessed the putative functions of GgWRKYs and verified the expression profile of few of the predicted functions of GgWRKYs experimentally in Lab (Table 5) Table 5. Of the 25 GgWRKYs assessed for abiotic stress treatment, maximum showed response to post-cold treatment (17), followed by dark (13). Nine GgWRKYs responded to senescence and salinity, while eight triggered a response on carbon starvation. Maximum number of GgWRKYs were up-regulated (10) after dark treatment followed by senescence (9). Darkness induced up-regulation of GgWRKYs 5,24,36,38,40,45,51,53,54 (Fig. 9). Significantly up-regulated (P ≤ 0.001) GgWRKYs were observed only in senescence (GgWRKYs 45&15), while in salinity, GgWRKY36 was significantly down-regulated. Out of the ten different treatments performed to assess the role of GgWRKYs in abiotic stress, predicted by STRING based on AtWRKY protein conservancy, the response of 11 GgWRKYs corroborated very well with 15 AtWRKYs whose functions were reported in literature (Table 5). Among the 25 GgWRKYs examined, 23 responded to abiotic stress, 17 were induced by hormone while 13 were common to both, suggesting role of hormone under stress conditions. Further study on these functionally assigned GgWRKYs will throw light on their role in underlying molecular mechanism. On comparing the experimental data with the STRING predicted data, it was found that our results corroborated well with the . Initially conserved 60 amino acids region is used to build alignment that showed low sequence identity (32%). When it was separated in two groups (IId1&IId2), identity increased significantly (54.9& 83.7%). Sub-group IId1 (4 sequences) with sequences upstream to WRKY domain having Plant Zinc cluster with motif 7and no zinc finger; subgroup IId2 (6 sequences) with 7, 8, 9 motifs and Zn finger. When four sequences of sub group IId1 were extended 50 amino acids towards N'terminal (total110 amino acid), sequence identity of sub groupIId increased to 70.68%. (2020) 10:373 | https://doi.org/10.1038/s41598-019-57232-x www.nature.com/scientificreports www.nature.com/scientificreports/ earlier reports on the induction of AtWRKY4 on senescence, AtWRKY40 on wounding, AtWRKYs 2, 28 &33 during salinity and AtWRKY33 under cold treatments. Few AtWRKYs whose functions were not assigned, like AtWRKYs 21, 24, 31, 38, 61, 69 and 71, were also designated putative function based on identity percentage.
In any biological process, understanding the role of transcription factor provides an insight into its regulatory mechanism. WRKY transcriptional factors have been extensively studied in a plant for plant growth, development, and response to biotic and abiotic stresses. However, WRKY genes present in Glycyrrhiza species have not been elucidated. In conclusion, we identified and characterized 147 full length putative WRKY genes in the genus Glycyrrhiza. These putative genes were grouped based on the number of WRKY domains & zinc finger pattern and further analysed for various properties like molecular weight, iso-electric point, instability index, sub-cellular location. The phylogenetic analysis categorised more than one group/sub-group together, indicating their multiple origins. The present paper highlights several findings not reported earlier, like the novel Zn finger motif of C-X 4 -CX 22 -HXH type (sub-group IIf). Also these group-II members shared homology with group IN WRKY members, unlike the other members of group II. This paper also reports several additional domains (DivIVA, SerS, Coat, Exo70 exo cyst complex subunit, Flac-arch super, PAT1and SGNH_hydrolase) apart from the conserved WRKY domain in the WRKY proteins. MSA based 60 amino acid signature sequence of group IId showed very low sequence identity (32%), however when its length was increased to 110 amino acid the identity increased to 70.7%. A closer look at the subgroup IId showed presence of 50 amino acid plant Zn cluster domain upstream to the WRKY domain in four members. However, characteristic Zn finger motif was absent in these members.
Additionally, putative functions were assigned to the identified GgWRKYs, based on STRING database which comprised of both theoretically reported and experimentally verified data. Verification of the data in the Lab Figure 6. Analysis of cis-regulatory elements (CREs) in GgWRKY promoter region. Total ten stress responsive elements were mapped on sense and anti-sense strand using RSAT tool.
Multiple sequence alignment, phylogenetic analysis and classification. The multiple sequence alignment (MSA) of 245 WRKY proteins was performed using 82 WRKY proteins of G. glabra (GgWRKY), 54 WRKY proteins from G. uralensis (GuWRKY), 70 from A. thaliana (AtWRKYs), 37 from P. patens (PpWRKYs) and one each from Human FLYWCH CRAa and GCMa. The protein sequences of Arabidopsis were downloaded from TAIR (http://www.Arabidopsis.org/), GuWRKYs from (http://ngs-data-archive.psc.riken.jp/Gur-genome/ download.pl.), PpWRKYs were obtained from P. patens v3.3 (https://phytozome.jgi.doe.gov/pz/portal.html#!info?alias = Org_Ppatens), Human FLYWCH CRAa (EAW85440) and GCMa (BAA13651) were retrieved from NCBI (http://www.ncbi.nlm.nih.gov/protein/). The conserved regions of 60 amino acids for the WRKY proteins were searched using HMMScan and aligned using CLUSTALW for the construction of the phylogenetic tree. For the GgWRKY based phylogenetic tree, complete protein sequences were used. The tree was constructed using MEGA 7.0 with neighbor-joining method using JTT substitution model and pair-wise deletion method with 1000 bootstrap value. The 60 amino acid conserved region of MSA of G. glabra was visualised using DNAMAN. The   Table 5. AtWRKYs, their induction factor and experimentally verified responses in GgWRKYs.  carbon starvation and salinity, respectively. Actin was used as internal reference. Three biological replicates were used to calculate error bars using standard deviation. Asterisks indicate that the corresponding gene was significantly up-or down regulated in a given treatment (*P ≤ 0.05; **P ≤ 0.01; ***P ≤ 0.001).