Genome-wide Identification, Classification, Expression and Duplication Analysis of GRAS Family Genes in Juglans regia L.

Fifty-two GRAS genes are identified in walnut genome. Based on the evolutionary relationship and motif analysis, the walnut GRAS gene family was divided into eight subfamilies, and the sequence features analysis of JrGRAS proteins showed that the JrGRAS protein sequences were both conserved and altered during the evolutionary process. Gene duplication analysis indicated that seven GRAS genes in walnut have orthologous genes in other species, and five of them occurred duplicated events in walnut genome. Expression pattern analysis of the GRAS family genes in walnut showed that two JrGRAS genes (JrCIGRa-b and JrSCL28a) were differentially expressed between flower bud and leaf bud (p < 0.01), and two JrGRAS genes (JrCIGRa-b and JrSCL13b-d) were differentially expressed between the different development stages of flower buds transition (p < 0.01), besides, three hub genes (JrGAIa, JrSCL3f and JrSHRc) were identified by co-expression analysis, which suggested these GRAS genes may play an important role in regulating the development of apical meristem in walnut. This study laid a foundation for further understanding of the function of GRAS family genes in walnut.

walnut genome sequences 48 and transcriptome data of the walnut female flower buds and leaf buds, it is possible for us to identify all the GRAS family genes in walnut.
In this study, GRAS family genes in walnut have been identified in genome-wide. The phylogenetic relationship, sequence alignment, conserved motif composition and gene duplication of the JrGRAS genes were systematically analyzed, and their expression patterns in different tissues (flower bud and leaf bud) and different development stages (before, during, after the flower transition period) were explored using transcriptome data and validated by qRT-PCR experiments. Finally, protein-protein interactions analysis was conducted to investigate how they participate in diverse functions by interacting with other proteins. This research lay a foundation for further function investigations of GRAS genes in walnut.

Results
Identification of GRAS family members in walnut. A total of seventy protein sequences (include protein isoforms) encoded by fifty-two genes, which including the GRAS domain were identified as the walnut GRAS proteins for further analysis. Fifty-two GRAS genes locate in 44 scaffolds, and their start position and end position are shown in Table 1. The candidate GRAS members were then uploaded to the CD-search website (https://www. ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) and their domain information were listed in Table 1, too. Besides, the gene structures of JrGRAS was presented in Fig. S1, and subcellular location information of the JrGRAS proteins was presented in Table S1.
Definition the sequence features of JrGRAS proteins. The GRAS proteins in walnut share a highly conserved C-terminal, which is constituted by five distinct conserved motifs in the following order: LHR I (leucine heptad repeat I), VHIID, LHR II (leucine heptad repeat II), PFYRE and SAW, while the N-terminal region of the sequences seems to be variable (Fig. 2).
The presence of leucine heptad repeats in the GRAS proteins suggests that these proteins may function as multimers and a potentially complicated higher order of interaction 1 . The VHIID sequence consists of valine, histidine, isoleucine and aspartic acid, which is not absolutely conserved although it can be readily recognizable (position: 214-218, Fig. 2). Besides, we noticed the VHIID motif, the P residues (position: 191, Fig. 2) are absolutely conserved in the VHIID motif. The PFYRE motif consists of the P(position: 342)-F(position: 363)-Y(position: 374)-R(position: 366)-E(position: 369) ( Fig. 2) residues, the P residues are absolutely conserved in PFYRE motif as well as in motif VHIID. The SAW motif is characterized by the residues S-A-W (position: 481-483, Fig. 2), the W(position: 472,483) residues are absolutely conserved in the other JrGRAS protein sequences, except the JrSLR1 which lack the SAW motif. And the absolute conservation of the residues in the VHIID and SAW motifs indicates that these residues could be necessary for the functions of the GRAS proteins.

Conserved motifs analyses.
All JrGRAS proteins were subjected to MEME website (http://meme-suite. org/tools/meme) 49 to identify conserved motifs (Fig. 3). Among the twenty Motifs, Motifs 10 and 4 consisted the LHR I domain, Motifs 1 and 8 consisted the VHIID domain, Motifs 6,9 and 17 or 6 and 20 consisted the LHR II domain, Motifs 7,3 and 19 consisted the PFYRE domain, and Motifs 2, 16 and 5 or 14 and 5 consisted the SAM domain (Fig. 3). Interesting, almost all JrGRAS protein include the complete GRAS motif model, which consists of LHR I, VHIID, LHR II, PFYRE, and SAM domain, and the five domains distribute in the same order, except JrSLR1, JrSCLa and JrSCLf.
Gene duplication in walnut genome. Gene duplication events were surveyed to explore the evolutionary patterns of the GRAS gene family in walnut genome (Fig. 4B). Physical locations of 52 JrGRAS genes in walnut were investigated by analysis of genomic distribution on scaffolds. Fifty-two JrGRAS genes were distributed unevenly across the 44 scaffolds in the walnut genome (Fig. 4B). Analysis of walnut GRAS family genes revealed seven paralogous gene pairs (JrGAIa&JrGAIb/JrRGL1c& JrSCl21e/JrSCL14b&JrSCL34a/JrSCL22b&JrSCL27/JrSCL28a&JrSCL28b-c/ JrSCL3c&JrSCL3d-e/JrSCL4a&JrSCL4b) existed in walnut GRAS family genes. Among the 14 GRAS paralogous www.nature.com/scientificreports www.nature.com/scientificreports/ genes, 5 of them were orthologous genes identified between species, which indicated they were involved in the duplication event in walnut genome. (The gene name with an underline means this gene was identified as the orthologous gene between different species and the '&' means connector between duplicated gene pairs).

Expression profiles of GRAS members.
We used the FPKM values of 52 JrGRAS genes to investigate the expression profiles of the JrGRAS family genes. Ten of the JrGRAS genes were excluded to draw the heatmap for their FPKM value were zero in both flower bud and leaf bud.
First, expression levels of JrGRAS genes in female flower bud and in leaf bud were compared (Fig. 5A). Three JrGRAS genes (JrSCL22a/JrGAIb/JrGAIc) were highly expressed in both flower bud and leaf bud, and four JrGRAS genes (JrSCL18/JrSCL32/JrRGL1d/JrPAT1a-d) were lowly expressed in both flower bud and leaf bud. Besides, two JrGRAS genes (JrCIGRa-b/JrSCL28a) were differentially expressed between flower bud and leaf bud (p < 0.01).

GO enrichment.
The GO enrichment analysis based on the 70 JrGRAS proteins annotated in the GO database. In the biological process category, significantly enriched terms were associated with biological regulation, cellular process, metabolic process, and response to stimulus. In the cellular component category, cell, cell part, and organelle were significantly enriched. In the molecular function category, GO terms related to binding and nucleic acid binding transcription factor activity were highly represented. Besides, GO: 003674 (molecular function) was the most GO term enriched by the JrGRAS members (Fig. S2). ) were used to conduct a qRT-PCR experiment (Fig. 7). The results were similar to those of our RNA-seq analysis and the DEGs were evidently differentially expressed among different tissues and development stages (P < 0.01). In leaf bud, JrCIGRa-b and JrSCL28a were all significantly up-regulated than that in flower bud (P < 0.01). As for flower bud transition periods, JrCIGRa-b and JrSCL13b-d were up-regulated in F_3 than that in F_1 and F_2 (P < 0.01). Among the DEGs, JrCIGRa-b differentially expressed in different tissues and different development period of flower buds, suggesting that this gene should work as the candidate gene for flower bud transition in walnut.  www.nature.com/scientificreports www.nature.com/scientificreports/ GRAS proteins are listed below them (Fig. 8). The result showed that several JrGRASs (such as JrGAIb/JrGAIc) were predicted to be core nodes in the network, which suggested that they might participate in diverse functions by interacting with other proteins.

Discussion
In general, analysis of whole genome location and evolution rely on the available information of species genome assembled in Chromosomes-level. However, the walnut genome was assembled only in scaffold-level, and there is no access to the information of walnut Chromosomes until now. In this article, the 44 scaffolds which including the 52 JrGRAS genes were used to represent the walnut genome in the synteny and gene duplication analysis, and this may provide a new insight to the analysis of whole genome evolution for the species whose genome assembled in scaffold-level.

Evolution of divergence and conservation.
Divergence and conservation always come together with the process of species evolution. Phylogenetic analysis divided the JrGRAS family into eight subgroups based on the evolutionary relationship, and each subgroup always function differently (Fig. 1). However, sequence alignment www.nature.com/scientificreports www.nature.com/scientificreports/ indicated that it was high conserved for the distribution of five motifs (LHR I, VHIID, LHR II, PFYRE, SAW motif) in JrGRAS family members, and the order of these motifs within each protein is the same (Figs 2 and 3). Besides, in VHIID and SAW motifs, the absolute conserved residues suggested that these residues could be necessary for the activity of the GRAS proteins (Fig. 2).
The duplication of GRAS genes between species and in walnut genome. Gene duplication between species indicated that Arabidopsis, walnut, and grape share the same seven ancestral GRAS genes. The number of orthologous genes of GRAS family genes in the three species showed a ratio of 7:21:21 (Arabidopsis: walnut: grape), which suggest a triplication event could occur in the GRAS family gene of walnut and grape. These caused us to further investigate the expansion of GRAS family gene in the walnut genome.
However, duplication analysis in walnut genome indicated that the triplicated speculation was invalid. Besides, duplicate genes face fates as follow: non-functionalization, neo-functionalization (evolving novel functions), or sub-functionalization (partition of gene functions) 51 . The seven orthologous GRAS genes (JrGAIb/JrS CL22b/JrSCL28b-c/JrSCL15/JrSCL14b/JrSCL9/JrSCL28a) occurred gene duplication event with function divergent in walnut genome, five of them duplicated with their pair genes (JrGAIb&JrGAIa/JrSCL22b&JrSCL27/JrSCL 28b-c&JrSCL28a/JrSCL14b&JrSCL34a) still belong to the GRAS gene family. As for one of the five duplicated gene pairs mentioned above, (JrSCL28b-c&JrSCL28a), it seems that both of JrSCL28b-c and JrSCL28a come from the orthologous GRAS genes and this suggests that the duplication of them could be earlier than that in the other four Expression and function analysis of JrGRAS genes. JrCIGRa-b and JrSCL28a were identified to have a lower expression level in flower bud than that in leaf bud, which suggested these JrGRAS genes may negatively control the flower buds transition. And expression levels of JrCIGRa-b and JrSCL13b-d were detected up-regulated after flower buds transition (F_3) compared to that in (i) before the flower buds transition (F_1) and (ii) during the flower buds transition (F_2), which indicated that these JrGRAS genes may positively participate in the regulation of walnut flower organs development. Besides, three hub JrGRAS genes (JrSCL3f/JrGAIa/JrSHRc) were predicted by co-expression analysis, which suggested that they may involve in the regulation network of walnut flower buds transition, too.
Functional analysis of the JrGRAS proteins seems to accord with the result of expression analysis. The GRAS domains are interacting with other domains identified by forming the heterodimer or homodimer structure. Up to now, two models of the GRAS domain interacting with other domains have been reported: (i) SHR-SCR heterodimeric structure; (ii) the homodimeric structure of the SCL7 GRAS domain 4 . And in this study, the SCR proteins (JrSCLa/b/c/d/e/f) were predicted to interact with the SHR proteins (JrSHRa/b/c) (Fig. 8), which consist www.nature.com/scientificreports www.nature.com/scientificreports/ with the SHR-SCR heterodimeric structure model. Besides, protein-protein interaction analysis showed that three hub JrGRASs (JrSCL3f/JrGAIa/JrSHRc) identified by expression analysis also have many interaction partners in the JrGRAS protein-protein interaction network (Fig. 8), these results illustrate how JrGRAS family proteins might form functional complexes, mediating the expression of flower bud transition genes in walnut.
Importantly, the LAS subfamily is involved and necessary in the growth regulation of the meristem formation 41,43,44 . A differentially expressed JrGRAS gene (JrSCL28a) in the LAS subfamily was found expressed both in leaf bud and flower bud, however, its expression level in leaf bud was significantly higher than that in www.nature.com/scientificreports www.nature.com/scientificreports/ flower bud (P < 0.01), the mechanism is still unclear. PAT1 is involved in phytochrome A signal transduction in Arabidopsis 35 . In this study, two DEGs (JrCIGRa-b and JrSCL13b-d, P < 0.01), identified (i) before, (ii) during and (iii) after flower bud transition (F_1, F_2 and F_3), were classified into the PAT subfamily, which indicated light signaling via the phytochrome A photoreceptor controls basic plant developmental processes, including flower bud development. Recently, a single walnut GRAS gene, JrGRAS2 (LOC108996381, JrSCL23b, belongs to the SCL subfamily in this article), was reportedly involved in high-temperature stress tolerance 40 , which offer new insights to the functional diversity of walnut GRAS family members.
In summary, our work laid a foundation for future function investigation of the GRAS members in walnut and provides valuable information about the gene functions of GRAS family in the development of walnut flower bud transition.

Methods
Identification of GRAS family members in walnut. The latest protein sequences file (GCF_001411555.1_wgs.5d_protein.faa) of walnut genome was downloaded from the NCBI website (ftp://ftp. ncbi.nlm.nih.gov/genomes/all/GCF/001/411/555/GCF_001411555.1_wgs.5d/GCF_001411555.1_wgs.5d_protein.faa.gz). The hmm model of GRAS domain was constructed based on the PF03514 (PFAM website, http:// pfam.xfam.org/family/pf03514) by the hmmbuild program HMMER 3.2 52 . Then, we used the hmm model mentioned above to search against the protein databases of walnut genomes with the hmmsearch program in HMMER 3.2 52 , the E-value cutoff was 1e−10. The candidate GRAS members were then uploaded to the CD-search website (https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) to further confirm if they include the proper GRAS domains (sequences included GRAS domain and length of domain sequences was more than 150aa). Gene structures of the JrGRAS genes were drawn by the Biosequence Structure Illustrator program of the TBtools software 53 . Subcellular location information of the JrGRAS proteins was predicted by online software WoLF PSORT II (https://www.genscript.com/wolf-psort.html?src=leftbar).

Multiple alignments and phylogenetic analyses. The domain sequences of the GRAS proteins in
Arabidopsis, walnut and grape were downloaded from the Plant Transcription Factor Database (http://planttfdb. cbi.pku.edu.cn/) and aligned using Clustal X 2.1 54 . Then these sequences were used to conduct phylogenetic analyses using MEGA 6 software 55 with 1000 bootstrap replicates. Motifs in the JrGRAS family members were identified by MEME program (http://meme-suite.org/tools/meme) 49 with a maximum of 20 motifs shown in the result. www.nature.com/scientificreports www.nature.com/scientificreports/ Synteny and gene duplication analysis. Analysis of gene duplication events using MCScanX toolkit 56 , paralogous genes in walnut genome were identified by the duplicate_gene_classifier program with the default parameters of the MCScanX toolkit, and orthologous genes between species were identified by the detect_col-linear_tandem_arrays program with the default parameters of the MCScanX toolkit 56 . The genome sequences files and annotation files of Arabidopsis (RefSeq assembly accession: GCF_000001735.2, ftp://ftp.ncbi.nlm. nih.gov/genomes/all/GCF/000/001/735/GCF_000001735.4_TAIR10.1), walnut (RefSeq assembly accession: GCF_001411555.1) and grape (RefSeq assembly accession: GCF_000003745.3, ftp://ftp.ncbi.nlm.nih.gov/ genomes/all/GCF/000/003/745/GCF_000003745.3_12X/) were downloaded from NCBI website (https://www. ncbi.nlm.nih.gov). The circle map of syntenic analysis maps in walnut genome was constructed by TBtools software 53 . Because of the walnut genome was assembled only in scaffold-level, the 44 scaffolds which including the 52 JrGRAS genes were used to represent the walnut genome in the synteny and gene duplication analysis.
Expression analysis of GRAS members. Transcriptome sequencing and library construction were reported in our previous study 57 . Expression analysis of walnut GRAS members was evaluated using the walnut RNA-sequence data among different tissues (leaf bud and female flower bud), development stages (F_1, F_2, F_3). The FPKM values were normalized with the treatment of log10(FPKM), and the results were then used to generate heatmap using the HemI software 58 .
RNA isolation and qRT-PCR analysis. The female flower buds were collected before, during, and after flower transition (F_1, F_2 and F_3), and leaf buds were collected during the floral transition period. The Leaf buds and female flower buds (F_1, F_2 and F_3) were collected and immediately frozen in liquid nitrogen. Total RNA was extracted with RNAout 1.0 (Tianenze, China) as described by the manufacturer and cDNA was reversed reverse-transcribed using the PrimeScript RT Reagent Kit (Takara, China). The real-time PCR analysis www.nature.com/scientificreports www.nature.com/scientificreports/ was performed using CFX Manager (Bio-Rad, USA) with SYBR Green mixture (Toyobo, Japan), and the walnut actin gene and walnut gadph gene were used for normalization, the amplification was applied using the cycling parameter as described previously 45 . The results were evaluated by the 2 −ΔCt method according to Livak and Schmittgen 59 . GO enrichment. The Blast2GO 60-63 software was employed to perform the GO annotation. First, protein sequences of the JrGRAS were used to perform the blastp search against the Swissport database with the E-Value of 1E-05, number of blast hits was 5. Then the result was conducted a GO mapping, and after that the GO annotation program was used to get the GO annotation of the JrGRAS members. Finally, the GO enrichment analysis was conducted by the online GO enrichment program on the omicshare website (https://www.omicshare.com/ tools/Home/Soft/gogsea).
Interaction network of JrGRAS proteins. The blastp program was used between the walnut GRAS proteins and the Arabidopsis GRAS proteins, each walnut GRAS protein matched a homologous Arabidopsis GRAS protein with the highest score (Table S7). Thirty-three Arabidopsis GRAS proteins which represent the 70 walnut GRAS proteins were uploaded to the String website (https://string-db.org/) 64 to predict protein interactions. Except the 33 input proteins, five predicted functional partners of the input proteins were used to construct the network. The walnut GRAS proteins corresponded with the Arabidopsis GRAS proteins are listed below them. The online program ran with default parameters.