Introduction

Transcription factors (TFs) are the class of proteins that control the functioning of various genes by binding with their promoters and thus involved in the gene regulation process1. Previously, more than 320K TFs belonging to 58 transcription factor families have been reported from 165 different plant species. Out of these, GRAS represents one of the major families that are involved in plant growth, development, cell signaling, and stress tolerance2. GRAS family was first reported in bacteria and characterized by the three TFs i.e. (i) GAI (gibberellic-acid insensitive); (ii) RGA (repressor of GAI), and (iii) SCR (scarecrow) with the size range from 400 to 770 amino acids3,4. It was observed that GAI and RGA are the part of DELLA proteins that take part in gibberellin (GA) and Jasmonate (JA) response as well as light signaling. Likewise, SCR and Short Root (SHR) played a key role in the radial organization of root by forming the SCR/SHR complex5.

Previously, it has also been reported that the C-terminal region of GRAS members is highly conserved, while the N-terminal is highly divergent that might provide the specificity to each protein6. Furthermore, the evolutionary analysis suggested the phenomenon of horizontal gene transfer (HGT) from bacteria to plants. GRAS gene family is further categorized into different subfamilies such as eight in Arabidopsis and rice7, while 8 to 13 in tomato, popular, castor beans, etc8,9,10. Recently, Cenci et al. classified the GRAS family into 17 subfamilies such as DELLA, Lateral suppressor (LS), Hairy meristem (HAM), and SCR11. Previous studies showed that it is one of the widely explored transcription factor family in the various plant species including tomato, potato, buckwheat, and sweet orange12,13,14.

Bottle gourd (Lagenaria siceraria), a member of the Cucurbitaceae family is commonly cultivated in the tropical and subtropical regions and is believed to be originated in southern Africa15. It is a diploid species (2n = 2× = 22) having 22 chromosomes that belong to the genus Lagenaria. In 2017, the first draft genome of Lagenaria siceraria cultivar USVL1VR-Ls was reported with a total of 22,472 genes covering 313.4 Mb region of the genome. In the past, genome-wide analysis studies such as identification of graft responsive mRNA, miRNA has been done in bottle gourd16,17. But, in-depth genome-wide analysis of the GRAS transcription factor family is still lacking in case of the bottle gourd. Therefore, considering the use of bottle gourd as an important rootstock material and GRAS transcription factors in plant growth and development, a comprehensive genome-wide search was done in bottle gourd genome to identify the GRAS TFs, their phylogenetic relationship and expression pattern in the different plant tissues.

Material and methods

The genome and proteome of bottle gourd (Lagenaria siceraria cultivar USVL1VR-Ls) was downloaded from the cucurbits genomics database (CuGenDB,  https://cucurbitgenomics.org/)15,18. The hidden markov model (HMM) profile of GRAS TF (PF03514.14) was downloaded from the Pfam database using HMMer software19,20. We have used PfamScan, InterproScan, and HMMScan to identify the GRAS transcription factors [Suppl. Figure-S1]20,21. PfamScan was used to search the complete proteome against the HMM profile at e-value and domE value cut-off 1e−03. Similarly, hmmscan was used with incdomE value cut-off 1e-03. InterproScan was used against the complete Pfam database and thereafter hits containing GRAS domain (PF03514) were filtered out22. SMART tool (https://smart.embl-heidelberg.de/) was used to further confirm the presence of GRAS domain in the respective hits23.

Gene structure, chromosomal location, and gene duplication

To determine the chromosomal location of the GRAS TFs containing genes, nucleotide sequence of respective hits were extracted and subjected against the bottle gourd genome using Blast software24. To identify the tandem gene duplication, DupGen_finder software was used which requires the blast results and GFF file of target and out-group species25. In this study, BlastP tool was used with the parameters i.e. e-value (1e−10), maximum target hits (5) and A. thaliana as an outgroup species.

Sequence analysis and subcellular location prediction

ProtParam and BUSCA webserver was used for computing the protein properties and subcellular location of the GRAS genes26,27. To identify the motifs signature present in the GRAS genes, MEME suite (Multiple Em for Motif Elicitation) webserver was used with the maximum number of motifs (10), minimum motif length (6), maximum motif length (50), minimum (2) and maximum (37) sites per motif respectively28.

Phylogenetic tree

We have considered 397 GRAS TF’s from eight different species that were classified into 17 different subfamilies11. A local Blast database was constructed and each of the 37 identified GRAS genes were subjected to blast search against this database. Based on the best-hit, each gene was further assigned to the respective group11. Subsequently, the final dataset of 434 genes was used to perform multiple sequence alignment (MSA) using Muscle tool29 and MEGA7 was used to construct the phylogenetic tree using the JTT model with gamma distribution and complete deletion of removal or gaps30. Finally, the tree was visualized using the iTOL (interactive tree of life) software31.

Protein–protein interaction (PPI) prediction and differential gene expression (DGE)

Bottle gourd and Cucumis melo (C. melo) are phylogenetically closely related species and belong to the same Cucurbitaceae family therefore, we have used C. melo as a reference to search GRAS genes and their interacting partners in the String database32. All the identified interacting partners were collected and queried against the bottle gourd genome at e-value 1e−10 using blast software. The single best-hit for each gene was considered for the construction of a PPI network using Cytoscape33. Finally, the top five hub genes from the interaction network were predicted using cytoHubba plugin of Cytoscape34. Further, to understand the contribution of these genes, transcriptome of the different tissues from the bottle gourd were searched. We found one dataset in the Cucurbitaceae genome database with the gene expression (FPKM) values for five different tissues. FPKM value of all the 37 GRAS genes were extracted and plotted in the form of a heatmap to understand the relationship.

Results

Identification of GRAS transcription factors

We have identified 38, 37 and 37 genes encoding for GRAS TF’s using HMMScan, PfamScan, and InterProScan respectively. Further, domain-based analysis using the SMART tool confirmed the presence of 37 genes from the GRAS family. Therefore, we have finally selected these 37 genes for further in-depth analysis. In addition to the GRAS domain, we identified a DELLA domain in four genes, WD40 in one gene, and a maximum of nine low complexity regions (Table 1). The search for the chromosomal location has identified a minimum of one gene on chromosome-8 and 11, while a maximum of seven genes have been found on chromosome-7 (Figure 1).

Table 1 Depict the GRAS genes, their chromosomal location, and additional domains in Bottle Gourd.
Figure 1
figure 1

Chromosomal distribution of GRAS genes in bottle gourd using MapChart tool.

Analysis of gene structure, duplicated genes and sequence properties

As observed from the Table 2, the GARS family protein sequence length varies from 378 to 1466 AA’s with a PI range from 4.7 to 8.22. Analysis of the gene structure revealed that out of the 37 genes, 25 genes were encoded by a single exon. However, one gene (Lsi06G016090.1) of length 1466 amino acid was encoded by the 13 exons (Table 2). We have searched for gene duplication events and observed one tandem duplication at Chr7 in the gene Lsi07G002180.1 and Lsi07G002190.1. Further, the prediction of their subcellular location showed that most of the genes were confined into the nucleus, whereas 10 genes in the chloroplast, one each in mitochondria, endomembrane system, and the extracellular space. Further, we have identified 10 most prominent motifs in the GRAS genes with motif length varying from 16 to 41 amino acids [Suppl-1.docx: Table-S1]. Except for the Lsi07G013900 gene, the predicted motifs were localized in the C-terminal region of the genes. We observed that three motifs of length 21, 21 and 25 respectively were highly conserved and present in all the GRAS genes [Suppl.-1.docx].

Table 2 Details of GRAS gene properties and their predicted subcellular localization.

Phylogenetic analysis

A phylogenetic tree was constructed from the 434 sequences including 397 from previously published work and 37 from the bottle gourd genome. As observed from the Figure-2, all the 37 identified GRAS genes could be divided into 16 different subfamilies. Further analysis revealed that except for the SCLA, at least one gene from each subfamily was present in the genome (Fig. 2). A maximum of six genes were observed from the Phytochrome A signal transduction (PAT) subfamily followed by four genes from the HAM and DELLA subfamilies. In contrast to that, LS and Required for arbuscule development (RAD) belong to the smallest subfamily with only one gene.

Figure 2
figure 2

Phylogenetic tree of GRAS sequences visualized using iTOL software with background colors differentiates subfamilies. Bottle gourd genes in the subfamily are highlighted in grey color.

Protein–protein interaction (PPI) prediction and DGE analysis

PPI prediction analysis revealed that a total of 178 unique genes from C. melo were involved in 467 possible interactions. Next, we searched their homologs in the bottle gourd genome. We observed that two genes (XP_008452266.1 and XP_008459103.1) do not have any homologs, and therefore we excluded them further in our study. Finally, we had a set of 169 exclusive genes from the bottle gourd genome that possibly interacts with each other to control various biological functions. Further, network analysis revealed the presence of 169 nodes and 467 edges with clustering coefficient value 0.452 and 5.4 average no. of neighbour’s, respectively. Based on the maximum degrees, we have identified Lsi02G029020, Lsi02G024800, Lsi01G004880, Lsi04G000500, and Lsi06G005500 as the candidate hub genes with maximum interactions 22, 21, 20, 19, and 19 respectively (Fig. 3). We have also studied expression level of the GRAS transcription factors in different plant tissues and observed that Lsi01G004880 gene was up-regulated in all five tissues with higher expression (> 6 folds) in the root tissue (Fig. 4). Similarly, Lsi04G000500 and Lsi04G005500 expressed in the root tissue, whereas with no or poor expression in stem and leaves. On the other hand, the Lsi02G024800 gene showed down-regulation in the stem and leaves while no significant change in the expression pattern of the Lsi02G029020 gene was observed (Fig. 4). The GRAS genes Lsi05G003650, and Lsi07G014150 expressed in roots but with no significant expression in the stem tissues.

Figure 3
figure 3

Depicts the predicted protein–protein interactions between GRAS Transcription Factors using Cytoscape.

Figure 4
figure 4

Differential Gene Expression of GRAS transcription factors.

Functional annotation of Hub and interacting genes

We investigated the function of five candidate hub genes based on their involvement in a biological and cellular process. Lsi02G024800 gene is a DELLA protein that acts as a repressor of GA induced growth and interacts with gene encoding for auxin efflux carrier, gibberellin receptor, and lignin degradation and detoxification. Thus, it eventually controls the root growth, seed germination and elongation of the stem. Similarly, Go-term analysis showed that Lsi02G029020 gene has DNA binding transcription factor activity involved in the regulation of transcription, gene expression and is localized in the nucleus. This gene interacts with transcriptional activator genes that control the genes encoding for stamen development, cell expansion, and flowering time and also modulate the growth of roots. Next, hub-gene Lsi01G004880 is a scarecrow-like protein that interacts with 10 other proteins including a Zn-finger domain and phytohormone protein. The phytohormone gene controls the phototropic response by modulating the light signal. Similarly, the Lsi04G000500 gene also belongs to the scarecrow-like class and interacts with genes like serine/arginine-rich SC35-like splicing factor SCL28, jasmonate O-methyltransferase (JMT), and Nutcracker (NUC). JMC gene converts the jasmonate into methyl-jasmonate and plays an important role in plant defense. NUC gene acts as a transcriptional activator and is involved in the regulation of flowering, and asymmetric cell division. Also, Lsi06G005500 gene belongs to the scarecrow-like class and acts on auxin response factor (ARF), gibberellin 2-beta-dioxygenase-1 (GA2OX1), and phytochrome that is involved in plant growth and development.

Discussion

GRAS proteins have been recognized as an important plant cellular component that play role in signal transduction process, root, and shoot development35,36 as well as in managing the various kind of biotic and abiotic stress37. In the present study, we explored the GRAS family in the bottle gourd genome including their gene structure, chromosomal distribution, phylogenetic analysis, and gene expression in different tissues.

In this study, a total of 37 GRAS genes were computationally identified in the bottle gourd genome, which is lower than tomato, Cucurbita but higher than Arabidopsis. Previous studies reported that GRAS genes are mostly encoded by the single exon with a length of around 400–770 amino acids35. We have also observed a similar pattern in case of bottle gourd with a small variation in the gene length. Intronless gene is the important feature of prokaryotic genome thus, it suggested the phenomenon of horizontal gene transfer from the prokaryotes as well as the close evolutionary relationship among different members4. Further, our subcellular localization analysis is also in agreement with previous studies that most of these genes are localized in the nucleus. We also found one tandem gene duplication event that could play an important role in the expansion of GRAS gene family.

It is well-known that some genes of the DELLA subfamily such as GAI, RGA, and RGL act as repressor of gibberellin signaling, while SCR and SHR are involved in radial root development5,38,39. Expression of the DELLA gene (a negative regulator of GA signaling) in multiple tissues highlighted their role in plant growth and developments40,41. Similarly, SCL3 is involved in the root elongation, and SCR interacts with the RGA gene that ultimately controls the root meristem size in Arabidopsis42. Previous studies reported that mutation in the SCR gene resulted in the disruption of the asymmetrical cell division and thus affect the root growth and development. Thus, a higher expression of the SCR gene in the plant roots is helpful for the radial organization of roots43.

Bottle gourd has been widely used as a rootstock in controlling different types of biotic and abiotic plant stress44,45,46,47. Previous studies suggested that the SCL14 gene in Arabidopsis is essential for the activation of stress-inducible promoters48. Likewise, overexpression of VaPAT1 and OsGRAS23 confers the abiotic stress tolerance in Arabidopsis and rice37,49. In 2019, Garcia-Lozano et al. compared the transcriptome of the bottle gourd grafted on the watermelon and vice-versa44. They reported more than 400 mobile RNA between the different hetero-grafts and observed that the use of bottle gourd as rootstock increased the size and rind thickness of the watermelon fruits. Similarly, Liu et al. reported the differential expression of 787 genes between watermelon homo and heterograft (bottle gourd rootstock)16. To highlight the importance of bottle gourd as rootstock, Wang et al. (2020) analyzed the transcriptome of bottle gourd (rootstock) and watermelon (scion) under chilling reatment. They reported that bottle gourd homograft, as well as hetero-graft, are tolerant to chilling stress compared to the water-melon homograft50. Thus, in-depth analysis of GRAS transcription factors in the bottle gourd genome will be helpful for enhancing the use of bottle gourd as valuable rootstock material and could also be extended in other vegetable crops.