Genome-wide identification of WRKY transcription factor family members in Miscanthus sinensis (Miscanthus sinensis Anderss)

Miscanthus is an emerging sustainable bioenergy crop whose growing environment is subject to many abiotic and biological stresses. WRKY transcription factors play an important role in stress response and growth of biotic and abiotic. To clarify the distribution and expression of the WRKY genes in Miscanthus, it is necessary to classify and phylogenetically analyze the WRKY genes in Miscanthus. The v7.1 genome assembly of Miscanthus was analyzed by constructing an evolutionary tree. In Miscanthus, there are 179 WRKY genes were identified. The 179 MsWRKYs were classified into three groups with conserved gene structure and motif composition. The tissue expression profile of the WRKY genes showed that MsWRKY genes played an essential role in all growth stages of plants. At the early stage of plant development, the MsWRKY gene is mainly expressed in the rhizome of plants. In the middle stage, it is mainly expressed in the leaf. At the end stage, mainly in the stem. According to the results, it showed significant differences in the expression of the MsWRKY in different stages of Miscanthus sinensis. The results of the study contribute to a better understanding of the role of the MsWRKY gene in the growth and development of Miscanthus.

In constructing a phylogenetic tree to classify MsWRKY genes, we need to use Sorghum bicolor WRKY amino acid sequences.The data were obtained from The Arabidopsis Information Resource (TAIR) (https:// www.arabi dopsis.org/).They will be used with our MsWRKY sequences.The MEGA v7.0 (https:// www.megas oftwa re.net/) for constructing a phylogenetic tree used multiple sequence alignments with ClustalW to process SbWRKY and MsWRKY protein sequences.The Neighbor-Joining method and the p-distance model were used in the process, and the pairwise deletion and 1000 bootstrap replicates were selected 24 .Eventually, the phylogenetic tree of SbWRKY and MsWRKY sequences was obtained.Thus, the unknown MsWRKY genes can be divided into different groups and subgroups.By using the sequence alignment data and the phylogenetic tree, the putative Miscanthus WRKY orthologs in Arabidopsis can be identified 5 .

Gene structure analysis and conserved motif distribution analysis of MsWRKY genes
The genomic sequence and coding sequence (CDS) of each MsWRKY gene can be used to predict the gene structure of the MsWRKY gene.The exon-intron structure of MsWRKYs was analyzed using TBtools 25 .

Gene ontology annotation and analysis of cis-acting elements of MsWRKY genes
For the gene ontology (GO) annotation analysis of the obtained MsWRKY proteins, the eggNOG-mapper 2.1.12(http:// eggnog-mapper.embl.de/) 27 was used.Then, the TBtools was used to map and annotate the obtained data.Ultimately, the biological processes, molecular functions and cellular components of these proteins were obtained.
The online website PlantCARE (http:// bioin forma tics.psb.ugent.be/ webto ols/ plant care/ html/) analyzed 2000 bp of the upstream region for all MsWRKY genes to analyze the cis-acting elements of MsWRKY genes.It provided the cis-acting elements of MsWRKY genes.

Synteny analysis of MsWRKY genes
The Multiple Collinearity Scan toolkit (MCScanX) was used to examine the gene duplication events with the default parameters.The TBtools was the platform used to analyze the data.The Evalue of the blastP was 15.To explore the syntenic relationships of the WRKY genes obtained from Miscanthus and other selected species, syntenic analysis maps were constructed using MCScanX.

Chromosome mapping and classification and phylogenetic analysis of MsWRKY genes
The locations of the 179 MsWRKY genes were determined by MG2C v2.1 (Fig. 1).MsWRKY genes were distributed on all 19 Miscanthus chromosomes (Chr).Chr01-Chr19 was the chromosome (Chr) number indicating the names and positions of the MsWRKY.Chr 5 had the highest number of MsWRKYs, with 23, representing 12.9% of the total gene family.It was followed by 18 genes on Chr 6, 14 on Chr 16, and 13 on Chr 17. Chromosomes 9, 12, and 13 had only four MsWRKYs each, the fewest.
An unrooted phylogenetic tree to study the evolution of MsWRKY family members was constructed by the multiple sequence alignment and neighbor-joining method in MEGA7.0.The data was full-length protein sequences of 40 SbWRKYs and 179 MsWRKYs.SbWRKYs are used as the basis for grouping.They were from Sorghum bicolor (L.) moench, a Poaceae plant that is similar to Miscanthus sinensis.
179 MsWRKYs could be divided into three major groups (I, II, and III) according to the constructed phylogenetic tree (Fig. 2).Of the 24 MsWRKYs in group I, all of them had two WRKYGQK motifs and 23 of them have two C2H2-type zinc finger motifs (C-X3-4-C-X22-23-H-X1-H), corresponding to two complete WRKY domains.Although the protein encoded by MsWRKY09 had only one zinc finger motif, it belonged to group I on the phylogenetic tree.
Group II had a total of 97 protein sequences and was the largest group, accounting for 54.2% of all putative MsWRKYs.Similar reports can also be found in sorghum, cucumber and chickpeas.Most of these proteins had one WRKY domain and the C2H2-type zinc finger motif (C-X4-5-C-X23-H-X1-H).This group was further divided into five subgroups, IIa, IIb, IIc, IId, and IIe, with 8, 16, 32, 21, and 20 members, respectively.Fifty-eight proteins belong to Group III.The proteins in this group had one WRKY domain and the C2HC-type zinc finger motif (C-X7-C-X23-H-X1-C) 28 .In summary, the classification of MsWRKYs indicated the diversity of these proteins.An extremely wide range of functions could be performed by these proteins.

Gene structure analysis and conserved motif distribution analysis of MsWRKY genes
The exon-intron structures of MsWRKY family members can illustrate the evolution of MsWRKY family members.The introns number of MsWRKY genes ranged from zero to five, while their size varied.The data showed that genes within the same group had certain similarities in the exon-intron distribution patterns.These results suggest that the MsWRKY genes had important structural diversity.It represented the functional diversity among closely related members of MsWRKYs (Fig. 3).MEME (version 5.5.3),used to analyze the conserved motifs of all MsWRKY protein sequences, identified 20 distinct conserved motifs.The distribution of 20 conserved motifs identified by MEME in the different groups of MsWRKYs is shown in Fig. 4. Motifs 1 is the WRKY domain.Similar Motif structures can be found between the genes in the same group or subgroup through the results.Motifs 1, 2, 3, and 4 are found in almost all genes.Motifs 15 and 19 were unique to group I. Motifs 9 and 13 were unique to group IIb.Motifs 12 were unique to group IIe.Motifs 10, 16, and 18 were unique to group III.Some of the motifs shared by different groups.Motif 5 was shared by groups I and IIc, and motif 6 and 7 were shared by groups IIa and IIb.

Gene ontology annotation and analysis of cis-acting elements of MsWRKY genes
The Blast2GO analyzed the Gene ontology (GO) annotations of 179 MsWRKY proteins.The MsWRKY target genes can be categorized into three main categories according to different functional groups.The biological processes, molecular functions, and cellular components together make up the Gene ontology (GO) annotations.Through the enrichment analysis, the involvement of MsWRKY in biological processes, molecular functions, and cellular components is in Fig. 5.Most MsWRKYs are involved in regulating cellular processes, biosynthetic processes, and different metabolic processes.Further analysis showed that most MsWRKYs were involved in the plant's stress to external adversity.Many MsWRKYs have been linked to bacterial infections and environmental stress 29  Firstly, many transcription-related cis-acting elements can be found including TATA-box, CAAT-box, A-box, HD-Zip, and W-box.The stress-responsive elements formed an important part in the cis-acting elements.This suggests that MsWRKY plays an important role in plants' resistance to external stress.These cis-acting elements include MBS (Anti-drought stress), LTR (anti-low temperature stress), and WUN-motif (wound-responsive elements).In addition, many of the cis-acting elements were regulated by phytohormones.ABA regulates the ABA-responsive element (ABRE).Methyl jasmonate (MeJA) responsive element (TGACG-motif and CGTCAmotif) is regulated by jasmonate phytohormones 30 .Finally, there are auxin-responsive elements (AuxRR-core and TGA-element), salicylic acid-responsive elements (TCA-element), etc.Finally, there are many light-responsive elements and other regulatory elements.At least one stress-responsive element is on all of the MsWRKY genes and reflects the potential functional variation of the MsWRKY gene.

Synteny analysis of MsWRKY genes
The segmental duplication events occurring in the Miscanthus WRKY family were investigated by conducting a synteny analysis of the MsWRKY genes using BLASTP and MCScanX in TBtools.As shown in Fig. 6, 19 segmental duplication events involving 53 WRKY genes were observed.Tandem duplication events, which were defined by a chromosomal region within 200 kb containing two or more genes, were widely identified for Miscanthus WRKY genes.A very large tandem duplication event was observed in the chromosome 14.These results suggested that some MsWRKYs were possibly generated by segmental duplication events and that the evolution of MsWRKY genes may have been driven, at least in part, by segmental duplication events.
The phylogenetic mechanisms of the Miscanthus WRKY family were further explored by constructing comparative syntenic maps of cucumber associated with four representative species, including two dicots (Arabidopsis and cucumber) and two monocots (sorghum and maize) (Fig. 7).249, 231, 28, and 27 pairs of genes showed syntenic relationships between the other four species: cucumber, Arabidopsis, sorghum and maize, respectively.A total of 249 WRKY collinear gene pairs between Miscanthus and maize were identified, followed by Miscanthus and sorghum (231), Miscanthus and cucumber (28), and Miscanthus and Arabidopsis (27).Both Miscanthus and maize belong to the Poaceae family, and more than 75.4% of the MsWRKY genes showed a syntenic relationship with WRKYs in maize.But some of MsWRKY genes were associated with more than one syntenic gene pair, indicating that WRKY genes in Poaceae family have gone through multiple rounds of duplication events.This may be the reason why monocotyledonous plants have far more WRKY genes than dicotyledonous plants.Importantly, collinear MsWRKY09/11/60/64/83/85/112/179 genes pairs were observed between Miscanthus and all of the other four species, suggesting that these orthologous pairs may have formed before the divergence of dicotyledonous and monocotyledonous plants.

Digital expression analysis of MsWRKY genes at different seasons and in different tissues
The study of the temporal and spatial expression profiles of MsWRKY genes used the microarray data provided by the JGI database and presented the results as heatmaps by TBtools. the microarray datasets used gene expression data for M. sinensis and included a total of 22 samples.The samples were taken from leaf (7), rhizome (9)  and stem (6).The samples were collected from plants at different times of the year and reflected the expression of the MsWRKY gene at different stages of plant growth.175 of the 179 genes showed differential expression in plants.Most  www.nature.com/scientificreports/growth and development stages were also analyzed.Firstly, the MsWRKY gene is first expressed in rhizome in large quantities during plant growth.Then, some MsWRKY genes were heavily expressed in the leaf.Eventually, some MsWRKY genes are over-expression in the stem when plants wither.The results showed that these genes may be involved in stress response at sensitive developmental stages to improve plant tolerance (Fig. 8).By analyzing the gene expression heat map and cis-acting elements together, it can be found that genes activated in different periods have different characteristics.In the early stage of plant development, the expression of the MsWRKY gene is mostly controlled by plant hormones and light regulatory elements.In the middle stage www.nature.com/scientificreports/ of plant development, the MsWRKY gene expressed in leaves is more regulated by infection and injury.At the end of plant development, MsWRKY genes expressed were mostly regulated by ABA and jasmonic acid, and some were stressed by environmental conditions such as drought.This suggests that the MsWRKY gene plays an important role in the growth of the perennial plant Miscanthus sinensis.

Discussion
WRKY transcription factors (TFs) are widely distributed in the plant kingdom and play a crucial role in stress tolerance.The WRKY gene family comprises 66 genes in Arabidopsis, 119 in maize, 94 in sorghum, 79 in potatoes, 70 in chickpeas, and 61 in cucumbers.By analyzing the genomic assembly of Miscanthus sinensis, 179 WRKY genes were identified.As the Miscanthus sinensis is a paleotetraploid 31 , the amount of the WRKY genes is much higher than that of normal plants, but this has not received much attention in previous studies.The MsWRKY gene is distributed across all 19 chromosomes of Miscanthus, and we have identified MsWRKY92, located on chromosome 7, as a gene that promotes flowering 32 .Conserved WRKY domains bind to the W-box motif in the promoter of WRKY target genes, which is the most important feature of the WRKY family 33,34 .A phylogenetic analysis of all the obtained MsWRKY genes has been performed.We performed a phylogenetic analysis of all the obtained MsWRKY genes and classified them into groups I, II, and III based on the number of WRKY domains and the type of zinc finger motif.group II is further subdivided into five subgroups: IIa, IIb, IIc, IId, and IIe.Group I had 24 MsWRKY genes, group II had 97, and group III had 58.In group II, group IIc had the most MsWRKY genes, with 32.The proportions of these genes are similar to those found in other plants [35][36][37] .
Most MsWRKY genes have a very conserved WRKYGQK motif.However, other similar sequences have been found in many genes.(MsWRKY07 MsWRKY12 MsWRKY17 MsWRKY24 MsWRKY26 MsWRKY34 MsWRKY40 MsWRKY46 MsWRKY51 MsWRKY58 MsWRKY67 MsWRKY71 MsWRKY103 MsWRKY104 MsWRKY105 MsWRKY108 MsWRKY109 MsWRKY124 MsWRKY137 MsWRKY138 MsWRKY145 MsWRKY149 MsWRKY152 MsWRKY158 MsWRKY161 MsWRKY164 MsWRKY173) Furthermore, there are some genes that are clearly WRKY genes but are missing this sequence (MsWRKY44 MsWRKY65 MsWRKY106).These differences can seriously affect the ability of MsWRKY proteins to bind to W-box elements, which in turn affects the biological function of these proteins.Similar heptapeptide motif variations have been found in other plants, such as sorghum 4 .In soybeans, for example, two WRKY genes with WRKYGKK motif do not bind to normal sequences from RNA into the genome), duplication of existing intron-free genes, and horizontal gene transfer 39 .Differences in the intron size of MsWRKY genes may result from gene duplication, inversion, and/or fusion events 40 .In conclusion, the diverse exon-intron structure of MsWRKY genes reflects the evolutionary diversity of the MsWRKY gene family.
Motif structural studies of the MsWRKY gene family reveal both structural conservatism and diversity.Motifs 1, 2, 3, and 4 correspond to WRKY domains and zinc finger domains, and they are found in most MsWRKY genes.Although most motifs' functions are unclear, their distribution also has certain rules.Motifs 15 and 19 were unique to group I. Motifs 9 and 13 were unique to group IIb.Motifs 12 were unique to group IIe.Motifs 10, 16, and 18 were unique to group III.Some of the motifs shared by different groups included motif 5, shared by groups I and IIc, and motifs 6 and 7, shared by groups IIa and IIb.Motif 8 is a nuclear localization signal (NLS), mainly in groups IId, IIe, and III 41 .In conclusion, motif analysis clearly demonstrates the structural differences of genes in different groups in MsWRKY.These motifs may reflect that these genes participate in specific biological processes and play similar biological functions.
Studying the cis-acting elements of MsWRKY can obtain more information about the gene expression of the MsWRKY gene family.Firstly, many transcription-related cis-acting elements 42 including TATA-box, CAAT-box, A-box and HD-Zip are essential for gene expression, and most are involved in constructing transcription complexes.In addition, there is also a batch of cis-acting elements regulated by phytohormones.ABA regulates the ABA-responsive element (ABRE).Methyl jasmonate (MeJA) responsive element (TGACG-motif and CGTCAmotif) is regulated by jasmonate phytohormones.In addition, there are auxin-responsive elements (AuxRR-core and TGA-element), salicylic acid-responsive elements (TCA-element), and so on.A variety of biological and abiotic stresses also regulate these genes.These cis-acting elements include MBS (Anti-drought stress), LTR (anti-low temperature stress), and WUN-motif (wound-responsive elements).At the same time, W-box was also found in the promoter region of many MsWRKY genes.This suggests that there is also mutual regulation

Conclusion
In this study, 179 WRKY genes were identified from Miscanthus sinensis.The identification, chromosome mapping, classification, phylogenetic analysis, gene structure analysis, conserved motif distribution analysis, gene ontology annotation, analysis of cis-acting elements, and digital expression pattern analysis have been performed.Through digital expression pattern analysis, the specific expression of the MsWRKY gene in different developmental stages and different parts of plants was found.At the same time, some MsWRKY genes may play an important role in plant stress resistance.In conclusion, this study is conducive to further research on the important functions of the WRKY gene in response to abiotic and biological stresses.

Figure 2 .
Figure 2. Phylogenetic tree of WRKY members in Miscanthus and Sorghum.All MsWRKYs genes were further divided into subgroups I, II, and III, and group II was further divided into subgroups IIa, IIb, IIc, IId, and IIe.

Figure 3 .
Figure 3. Exon-intron structures of MsWRKY genes.Exon-intron structures of MsWRKY genes were obtained after analysed with TBtools for gene structure.Green bars indicate upstream and downstream UTRs, yellow bars indicate coding sequences (CDS), and black lines indicate introns in the gene diagrams.

Figure 6 .
Figure 6.Schematic representations for the interchromosomal relationships of MsWRKYs.Blue lines show duplicated WRKY gene pairs in the Miscanthus genome.

Figure 8 .
Figure 8. Heatmaps of MsWRKY gene expression.MsWRKY expression levels in different tissues and at different seasons.

Gene name Gene locus ID Chromosome location Gene start Gene end pI MW Conserved heptapeptide Zinc finger type Domain number Group Protein length (aa)
. The cellular component of this protein family is mainly organelle and intracellular organelle.Most of the MsWRKY proteins are located in the cell nucleus.Cis-acting elements are essential sequences in regulating gene expression by transcription factors.An online tool PlantCARE was used to analyze all MsWRKY cis-acting elements and extracted 2000 bp promoter regions upstream of all MsWRKY genes.The result shows that every MsWRKY have many cis-acting elements.

Table 1 .
MsWRKY genes are highly expressed in rhizome.The expression patterns of MsWRKY in different Characteristics of the identified MsWRKY genes.