Genome-wide identification, in silico characterization and expression analysis of the RNA helicase gene family in chickpea (C. arietinum L.)

The RNA helicases are an important class of enzymes which are known to influence almost every aspect of RNA metabolism. The majority of RNA helicases belong to the SF2 (superfamily 2) superfamily, members of which are further categorized into three separate subfamilies i.e., the DEAD, DEAH and DExD/H-box subfamilies. In chickpea, these RNA helicases have not been characterized until now. A genome-wide analysis across the chickpea genome led to the identification of a total of 150 RNA helicase genes which included 50 DEAD, 33 DEAH and 67 DExD/H-box genes. These were distributed across all the eight chromosomes, with highest number on chromosome 4 (26) and least on chromosome 8 (8). Gene duplication analysis resulted in identification of 15 paralogous gene pairs with Ka/Ks values < 1, indicating towards the genes being under purifying selection during the course of evolution. The promoter regions of the RNA helicase genes were enriched in cis-acting elements like the light and ABA-responsive elements. The drought responsiveness of the genes was analysed by studying the expression profiles of few of these genes, in two different genotypes, the cultivated variety ICC 8261 (kabuli, C. arietinum) and the wild accession ILWC 292 (C. reticulatum), through qRT-PCR. These genotypes were selected based on their drought responsiveness in a field experiment, where it was observed that the percentage (%) reduction in relative water content (RWC) and membrane stability index (MSI) for the drought stressed plants after withholding water for 24 days, over the control or well-watered plants, was least for both the genotypes. The genes CaDEAD50 and CaDExD/H66 were identified as drought-responsive RNA helicase genes in chickpea. The protein encoded by the CaDExD/H66 gene shares a high degree of homology with one of the CLSY (CLASSY) proteins of A. thaliana. We hypothesize that this gene could possibly be involved in regulation of DNA methylation levels in chickpea by regulating siRNA production, in conjunction with other proteins like the Argonaute, RNA dependent RNA polymerases and Dicer-like proteins.

www.nature.com/scientificreports/ tion of the genes across the chromosomes was visualized using MapChart tool 18 . The gene structures of the genes were determined by comparing the coding sequences with the corresponding full-length gene sequences, using the Gene Structure Display Server 19 .

Synteny and gene duplication analysis.
To infer the syntenic relationship of the RNA helicase genes in different species (Arabidopsis thaliana, Cajanus cajan, Cicer arietinum, Glycine max and Medicago truncatula), an analysis using MCScanX algorithm was performed and plots were generated using the Dual Synteny Plotter function in TBtools v 1.098693 20 . The duplicated or paralogous gene pairs, were identified using MCScanX algorithm and visualized as a Circos plot. The Ka/Ks (non synonymous/synonymous substitution rates) ratios of the duplicated gene pairs were calculated to study the molecular evolutionary rates for each gene pair. The divergence time of these gene pairs was estimated using the formula "t = Ks/2λ", with λ value of 6.05 × 10 −9 substitutions/synonymous site/year representing neutral substitution 21 . This was expressed in million years ago (Mya).
Promoter sequence analysis. Promoter sequences (1000 bp upstream from the predicted transcription start site, TSS) of few of the helicase (8 CaDEAD,4 CaDEAH and 8 CaDExD/H) genes were subjected to search for cis-regulatory elements in the PlantCARE (http:// bioin forma tics. psb. ugent. be/ webto ols/ plant care/ html/) database. The elements were visualized along the lengths of promoter sequences through TBtools v 1.098693 20 .
In addition, the promoter sequences were manually scanned for various, well-characterized, drought-responsive cis-acting elements like the dehydration-responsive element (DRE), G-Box, MYBL, etc. based on a published report where these cis-acting elements have been identified in GmRD26 gene in soybean 22 .
In silico characterization of the RNA helicase proteins. The identified proteins were analysed for various physiochemical properties like protein length (number of amino acids), molecular weight (Da) and theoretical isoelectric point (pI), using the online ExPASy website (http:// web. expasy. org/) 23 . The Cello v2.5 webserver was used for the prediction of the subcellular location of these proteins, in silico (http:// cello. life. nctu. edu. tw/). MEME (Multiple Em for Motif Elicitation, version 4.11.4) was used to identify conserved motifs within the chickpea helicase proteins using the following input parameters: optimum motif width set to ≥ 6 and ≤ 50; maximum number of motifs: 10 (http:// meme-suite. org/ tools/ meme) 24 . Network analysis was performed to determine the physical interaction between the RNA helicase proteins using the STRING database (https:// string-db. org/). Since these proteins are known to modulate different aspects of RNA metabolism, their interaction with chickpea small RNA biogenesis and processing proteins like Argonaute (AGO) and RNA dependent RNA polymerase (RDR) was examined 25 . For this purpose, the protein sequences of these genes were retrieved from protein database of NCBI.
Sequence alignment and phylogenetic tree analysis. A total of 151 DEAD, 106 DEAH and 203 DExD/H-box proteins belonging to three different species (Arabidopsis thaliana, Oryza sativa and Cicer arietinum) were analysed for phylogeny. The protein sequences for Arabidopsis and rice RNA helicases were obtained from the previously published report 4 . The sequences were aligned through multiple sequence alignment (MSA) using Muscle tool (http:// www. drive5. com/ muscle). MEGA7 was used to construct the phylogenetic tree with bootstrapping for 1000 replicates and with the following main parameters: p distance and pairwise deletion (https:// www. megas oftwa re. net/).

Analysis of gene expression levels of the CaDEAD, CaDEAH and CaDExD/H-box genes.
Two sets of publicly available transcriptome data were used to analyse the gene expression profiles of the chickpea RNA helicase genes. The first dataset consisted of the FPKM (Fragments per kilobase of transcript per million mapped reads) values available at the NCBI GEO datasets, for the BioProject ID: PRJNA401922. These were generated from the RNA-Seq data for different tissues (leaves, roots, flowers and young pod) of the genotype, ICC 4958 (cultivated, C. arietinum) and leaves of wild chickpea accession, PI 489777 (C. reticulatum) 26 . The FPKM values were log 2 transformed for the CaDEAD, CaDEAH and CaDExD/H loci. The heatmap depicting the differential expression of the genes was drawn by the software TBtools v 1.098693 20 Fig. S1). ILWC 292 is a registered cyst nematode-resistant germplasm and is a wild collection in the public repository of ICARDA, Aleppo, Syria. The voucher specimen is maintained by ICARDA 27  www.nature.com/scientificreports/ seeds. Two holes each of 1 cm diameter were cut out at the bottom of the bags in order to allow excess water to drain from the bags. Each bag was filled with equal amounts of soil collected from the nearby chickpea cultivation field of IARI. The soil was prepared as per the standard agronomic practices used for chickpea cultivation. Two seeds/genotype were sown at a depth of 2 cm in the soil. A day prior to sowing, the seeds were treated with the fungicide Bavistin @ 5 g/kg of seed and Rhizobium culture @ 25 g/kg of seed. A total of nine bags were sown per genotype (1 genotype × 3 replicates × 3 treatments-Control, T12, T24) amounting to a total of 63 bags (7 genotypes × 9 bags) ( Supplementary Fig. S2). Initially, for the first 90 days after sowing (90 DAS) all the plants were watered with equal amounts of water regularly in order to roughly maintain same levels of soil moisture in all the bags. At 93 days after sowing (93 DAS, after the initiation of flowering in most of the genotypes), the bags were separately labelled ( Supplementary Fig. S3). One-third of the bags were designated as control (C) bags. Plants growing in these bags were regularly watered (with equal amounts of water) throughout the experimental period (117 days). One-third of the bags were labelled as T12, the plants growing in these bags were not supplied with water for 12 days, beginning with 93 DAS. The remaining third of the bags were labelled as T24 and the plants growing in these bags were not supplied with any water for 24 days post 93 DAS.
The leaf tissues of all the plants (regularly watered until now) were collected for physiological characterization on day 0 (93 + 0 = 93 DAS). On day 12 (93 + 12 = 105 DAS), leaf tissues were collected separately from C12 and T12 plants and on day 24 (93 + 24 = 117 DAS), sampling for leaf tissues was done separately from the C24 and T24 plants. For the purpose of gene expression analysis, the leaf samples were collected in liquid nitrogen and stored at − 80 °C for RNA isolation.
Physiological and root morphological characterization. The leaf relative water content (RWC) of chickpea genotypes was estimated at intervals of day 0 (day of drought stress initiation; 93 DAS), day 12 (105 DAS) and day 24 (117 DAS) after drought stress initiation. The RWC was estimated for five biological replicates/ genotype/treatment. For this purpose, third fully expanded, young leaves from the top were collected in polybags before noon from the plants and kept on ice packs. These were collected for all the plants on day 0 and both the control and drought stressed plants separately, on days 12 and 24. The fresh weights (FW) of these leaves were recorded. These were then kept in covered petriplates containing distilled water for 4 h and then weighed to obtain the turgid weights (TW). These samples were thereafter dried at 65 °C in an oven for 24-48 h and the dry weights (DW) of the samples were recorded. The RWC (%) values were calculated according to the formula given below 28 : The sampling of leaves for estimation of membrane stability index (MSI) was done along with the sampling for RWC. For estimation of MSI, 500 mg fresh leaf samples were collected for each genotype. The leaf samples were thoroughly washed 2-3 times with de-ionized water. These were then kept in test tubes containing 10 ml of distilled water. The test tubes were kept in a water bath at 40 °C for 30 min and conductivity of the medium (C1) was recorded by using a conductivity meter (EutechCyberScan CON 110). After recording the C1 values, the test tubes were again kept in water bath at a temperature of 100 °C for 15 min. At the end of 15 min, the tubes were taken out and allowed to cool to room temperature. The conductivity of the samples was recorded again (C2). The MSI (%) was calculated using the following formula 29 : At the end of the experimental period, 117 DAS, the roots of the C24 and T24 plants, for each genotype, were carefully extracted from the soil and washed thoroughly. These were then suspended in a transparent tray filled with water (2-3 mm ht). For the purpose of root scanning the image analysis system (WinRhizo, Regent Instruments INC., Quebec, Canada) was used. The roots were carefully disentangled and data were recorded for various root traits namely root length (RL, cm), surface area (SA, cm 2 ), root diameter (RD, mm) and root volume (RV, cm 3 ). Root scanning was performed for three biological replicates/genotype/treatment.

Identification of the drought-responsive CaDEAD, CaDEAH and CaDExD/H-box genes through qRT-PCR.
Total RNA was isolated from frozen leaf tissues of the samples (ICC 8261 and ILWC 292, C24 and T24 for each genotype, 117 DAS) using the Spectrum™ Plant Total RNA Kit (Sigma Life Science). These two genotypes were chosen based on their favourable response to drought stress and also considering that since there exists greater diversity (in terms of SNPs) between the kabuli and wild genomes of chickpea 30,31 , higher levels of differences in gene expression might be observed for the two genotypes as opposed to gene expression level differences existent between the desi and wild genotypes. The isolated RNA was eluted in the supplied elution buffer (EB). The quantity and quality of the isolated RNA was estimated using Nanodrop spectrophotometer (NanoDrop™ Lite Spectrophotometer, Thermo Fisher Scientifc). The RNA was converted into cDNA by using the cDNA synthesis kit (Invitrogen™SuperScript™ III First-Strand Synthesis System) following the manufacturer's instructions. Real-time quantitative PCR (qRT-PCR) was performed using SYBR Green mix (TaKaRa) on a Real Time PCR Thermo Cycler (EppendorfRealPlex 2 qPCR). The thermal cycling conditions used were: 94 °C for 10 min, 40 cycles with each cycle consisting of 94 °C for 30 s and 60 °C for 15 s, followed by melting curve analysis. The gene expression analysis was carried out for 7 CaDEAD, 1 CaDEAH and 7 CaDExD/H-box genes. Primers were designed using the Primer Quest tool (https:// www. idtdna. com). All the primers for qRT-PCR are listed in Supplementary Table S1. These genes were selected based on their phylogenetic relationships and an attempt was made to select genes with higher diversity. The experiment was performed for 36 samples (2 genotypes × 2 treatments (C24 and T24) × 3 biological replicates × 3 technical replicates). The relative fold differences were calculated based on the comparative Ct method using the 2−△△Ct method 32 , using the GAPDH , respectively, were considered as control and normalized to 1. This was done to identify genotype-specific and treatment induced differences in gene expression of the genes in response to drought stress. The specificity of the primers was checked by loading the amplified products on a 1.5% agarose gel ( Supplementary Fig. S4). BLASTN searches were also performed for the 15 gene sequences (C. arietinum) against the C. reticulatum PI489777 v2.0 genome to confirm the specificity of genes amplified (https:// www. pulse db. org/).

Statistical analysis.
The data for the physiological parameters (RWC and MSI) and root traits were subjected to statistical analysis (https:// www. socsc istat istics. com). For each trait, the mean values and coefficient of variation (CV) were calculated (Supplementary Table S2). One-way analysis of variance (ANOVA) with posthoc Tukey HSD (Honestly significant difference) was performed to test for statistical significance between means of different traits for the genotypes studied. Student's t-test was conducted in order to study the statistical significance of the differences in means of the control and treated plants for all the physiological, root morphological and qRT-PCR data.
Ethical approval. The authors declare that the experimental research work involving the growth of plants in this study, was conducted in compliance with relevant institutional, national, and international guidelines and legislation.

Results
The chromosomal distribution of the RNA helicase genes across the chickpea genome and their structural characterization. Based on combined results of the BLAST searches (BLASTP and TBLASTN), a total of 150 RNA helicase genes could be identified in the chickpea genome. These were classified into three different subfamilies i.e., the CaDEAD (50), CaDEAH (33) and CaDExD/H (67)-box genes, where the numbers in parenthesis indicate the number of genes identified in each subfamily. The genes belonging to each of the three subfamilies were chronologically named based on their chromosomal positions in congruence with the previously published reports in tomato, cotton, etc 13,14 . These genes were unevenly distributed across all the eight chromosomes and a few were mapped on the unplaced scaffolds of the assembly (Supplementary Table S3). The highest number of genes were present on chromosome 4 (26) and the least on chromosome 8 (8). Chromosome 4 also had the highest density of the RNA helicase genes (52%) whereas chromosome 2 had the least density of the genes (32%) (Fig. 1). A total of 11 genes were mapped on the scaffolds. The smallest gene amongst the CaDEAD-box genes was the CaDEAD24 gene (506 bp) which is a putative eukaryotic translation initiation factor (eIF) gene while the longest gene was CaDEAD35 (31,471 bp). For the CaDEAH subfamily the smallest member was CaDEAH28 gene (3622 bp) followed closely by CaDEAH9 gene (3824 bp), while the longest member was CaDEAH7 (44,560 bp). The smallest gene within the CaDExD/H gene subfamily was CaDExD/H62 (1511 bp) and the longest gene was CaDExD/H33 (74,802 bp) followed by CaDExD/ H65 (58,276 bp). The number of introns in all the three subfamilies of RNA helicases was highly variable. The number of introns for the CaDEAD gene subfamily ranged from as low as 0 (CaDEAD24) to 24 (CaDEAD31) (Fig. 2a). The gene CaDEAH28 possessed only 1 intron whereas the genes CaDEAH1 and CaDEAH3 possessed as high as 24 and 23 introns, respectively (Fig. 2b). The genes CaDExD/H5 and CaDExD/H10 possessed 1 intron each whereas the gene CaDExD/H54 possessed as many as 33 introns (Fig. 2c).

Syntenic relationship of the chickpea RNA helicase genes with other plant species and identification of paralogous gene pairs within the chickpea genome.
In order to determine the evolutionary relationship of the chickpea RNA helicase gene family with other plant species, we generated and analyzed the synteny maps between chickpea and the model plant species, A. thaliana and other leguminous species like M. truncatula, C. cajan and G. max ( Supplementary Fig. S5). A total of 39 RNA helicase loci showed syntenic relationship with 46 loci in A. thaliana. A higher degree of synteny was observed for the RNA helicase genes of chickpea and the legumes studied. 87 chickpea RNA helicase genes were syntenic with 114 genes from M. truncatula. 53 genes of chickpea RNA helicase gene family had 64 corresponding syntenic genes in C. cajan. For G. max, 85 genes in chickpea were syntenic with as many as 181 genes.
A total of 15 paralogous gene pairs were identified for the RNA helicase gene family in chickpea ( Table 1). The highest number of gene duplication events were observed for the CaDEAD genes (10/15) and least (1/15) for the CaDEAH genes. Eleven of these duplication events could be ascribed to segmental duplication as the duplicated genes for these gene pairs were present on different chromosomes. One gene pair, CaDEAD38/CaDEAD44 was categorized as a proximal duplication event as they are proximally placed on the same chromosome but possibly separated by a few intervening non-homologous genes 33 . For three gene pairs, where one of the genes were mapped on the scaffolds, the nature of duplication could be segmental, tandem or proximal. The Ka/Ks ratios for all the gene pairs were < 1, indicating the existence of negative selective pressure or purifying selection for these genes. In terms of gene expression, almost all of the 15 gene pairs showed similar digital expression levels in the two contrasting genotypes, ICC 8261, the DT genotype and ICC 283, the DS genotype, on exposure to drought stress treatment. Only for one gene pair namely CaDExD/H12/CaDExD/H39 the expression profiles of the two genes were different in both the genotypes on drought stress exposure (Fig. 3).   www.nature.com/scientificreports/ www.nature.com/scientificreports/ function of these genes. The cis-acting elements present in 1000 bp upstream regions from the transcription start site (TSS) of 20 RNA helicase genes were identified. The genes included 8 CaDEAD, 4 CaDEAH and 8 CaDExD/H-box genes. An attempt was made to select for significantly diverse genes (based on phylogenetic analysis) as these could possibly have varied functions as well. The various cis-acting elements identified, within the promoter regions of the genes examined, were the LRE (light-responsive element) and the stress-related elements like LTR (element involved in low-temperature responsiveness), MBS (MYB binding site involved in drought-inducibility), MRE (MYB binding site involved in light responsiveness) and the phytohormone related elements like ABRE (element involved in the abscisic acid responsiveness), ARE (part of an auxin-responsive element), SARE (element involved in salicylic acid responsiveness), GRE (gibberellin-responsive element) and JARE (element involved in the methyl jasmonate or MeJA responsiveness). The most abundant cis-acting element identified in all the promoter sequences was the LRE element followed by MBS and ABRE elements (Fig. 4).
A search for various, well-characterized, drought-responsive cis-acting elements across the promoters of RNA helicase genes resulted in identification of the elements with the core sequences ACGT (drought stress and aging response element), TAA CTG (MYB2AT element) and A/CCC GAC (dehydration-responsive element).

Physiochemical properties of the CaDEAD, CaDEAH and CaDExD/H-box proteins. The mean
length of the CaDEAD-box proteins was 669 amino acids with CaDEAD5 being the largest (2381 amino acids) and CaDEAD24 being the smallest (108 amino acids). For the CaDEAH-box proteins, the length ranged from 691 amino acids (CaDEAH23) to 1739 amino acids for CaDEAH8, with an average of 1062 amino acids. Amongst the CaDExD/H-box proteins, the CaDExD/H62 protein was the smallest (182 amino acids) and the CaDExD/ H54 was the largest, composed of 3496 amino acids. The mean length of these proteins was 1136 amino acids. The mean theoretical pI of all these proteins (CaDEAD, CaDEAH and CaDExD/H) was 7.44 (Supplementary Table S3). The vast majority of the chickpea RNA helicase proteins were predicted to be localized within the nucleus. A few were predicted to be localized in cellular organelles like chloroplast and mitochondrion. The latter might be responsible for regulation of organelle specific RNA metabolism.
The structural features of the CaDEAD, CaDEAH and CaDExD/H-box proteins. The N-terminal sequence was highly variable compared to the C-terminal sequence for all the three classes of RNA helicase www.nature.com/scientificreports/ proteins. The nine characteristic motifs belonging to the two domains, domains I and II, could be identified in the majority (46/50) of DEAD-box RNA helicase proteins (Fig. 5a). The domain I consisted of MEME motifs 2, 4, 7, 6 and 5 while the domain II consisted of MEME motifs 9, 1 and 3. The MEME motif 2 consisted of motifs Q and I, MEME motifs 4, 7, 6 and 5 contained the characteristic motifs Ia, Ib, II (DEAD) and III, respectively. MEME Motif 8 was additionally found between the motifs 4 and 7 in some proteins. The MEME motifs 9, 1 and 3 correspond to the motifs IV, V and VI, respectively of the domain II. The additional motif 10 was found within www.nature.com/scientificreports/ the domain II of some of the DEAD-box proteins. Amongst the nine characteristic motifs, the highest degree of sequence conservation was observed for the motifs Q, I and II (~ 90%) while least sequence conservation was observed at motif IV (~ 60%) ( Supplementary Fig. S6). The consensus sequence present at the motif I for the CaDEAD-box helicases was AKTGSGKT. This is a distinguishing feature between DNA and RNA helicases, as it has been reported that the DNA helicases (SF1) have GXXXXGKT (where X can be any amino acid, G: glycine; K: lysine; T: threonine), as a consensus sequence at the motif I whereas the RNA helicases have AXXXXGKT (A: alanine) as the consensus sequence at this motif 2 . It was observed that motif V (MEME motif 1) was present in all the CaDEAD proteins. The protein CaDEAD5 lacked most of the conserved sequence motifs but showed the presence of the conserved Helicase_C domain (pfam 00271, e-value = 1.49e−07) and was therefore considered to be a member of the CaDEAD-box proteins. The proteins CaDEAD21, CaDEAD31 and CaDEAD35 also lacked most of the characteristic sequence motifs of the DEAD-box proteins but shared a high degree of homology with the Arabidopsis DEAD-box proteins and were therefore considered as CaDEAD-box proteins. Unlike the CaDEAD-box proteins, the CaDEAH-box proteins were found to be more variable in terms of motif sequence and organization. The Q motif was conspicuously absent in all the CaDEAH proteins and the motif II (DEAH) had a sequence conservation of ~ 50%. Motif I (MEME motif 5) was present in all the members of the CaDEAH subfamily (Fig. 5b). In addition to the conserved motifs which constitute the core helicase centre of the RNA helicase proteins, the CaDEAH-box proteins had additional conserved domains at the C terminus unlike the CaDEAD-box proteins. A few notable of these are the oligosaccharide-binding (OB)-fold domain (OB_NTP_bind, PF07717) almost always found in association with the HA2 domain (Helicase associated domain, PF04408). These domains interact strongly with the helicase core and this interaction is believed to be responsible for creating a less flexible helicase enzymatic activity than the DEAD-box helicases 34,35 .
The CaDExD/H -box proteins also lacked the Q motif for most of the members and the motif II had a sequence conservation of approx. 54%. Like the CaDEAH proteins, the motif I (MEME motif 2) could be identified in most of the CaDExD/H-box proteins. The other motifs were highly variable in composition as compared to the CaDEAD and CaDEAH-box proteins (Fig. 5c).
The CaDEAD-box proteins were the most conserved in terms of structure with an average pairwise distance of 1.38 as compared to the CaDEAH and CaDExD/H -box proteins which had an average pairwise distance of 2.0 and 2.27, respectively. It was also observed that the proteins encoded by the duplicated gene pairs (enclosed in red boxes in the dendrograms) within each subfamily, shared a very high degree of sequence homology (in terms of protein lengths, motif structure and organization), with high bootstrap support values (Fig. 5a-c).
An interaction network consisting of 968 edges, with a PPI (protein-protein interaction) enrichment p value of < 1.0 × 10 -16 was obtained on analysis of protein-protein interaction among the chickpea RNA helicases and sRNA processing enzymes (Fig. 6).The network consisted of only interactions with a minimum confidence score of 0.7 (high). The interacting proteins were predicted to be involved in KEGG pathways 36 like homologous recombination, splicing, RNA degradation and transport (FDR < 10 -5 ). An interactive module involving the proteins CaDEAD35, CaDExD/H45, CaDExD/H57 and CaDExD/H66, as major interacting partners with sRNA associated proteins (CaAGO, CaRDR) was also identified, suggestive of close association of these proteins in regulation of sRNA biogenesis and processing in chickpea. Within the network of sRNA associated proteins, we could also identify putative CaDCL (Dicer-like) proteins belonging to the CaDExD/H subfamily. The proteins CaDExD/H2, 21, 42 and 55 which interact with various CaAGO and CaRDR proteins are putative DCL proteins in chickpea. Ca) proteins was performed by using the neighbour-joining method in MEGA 7.0. The DEAD-box proteins could be clustered into two major groups (Groups I and II). The group II could be further divided into nine subgroups (II a-II i). The average pairwise distance for these proteins was estimated to be 1.41, with the proteins LOC_Os01g45190.1 and LOC_Os03g36930.1 with least pairwise distance of 0 and the proteins AT1G16280.1 and CaDEAD31 with a pairwise distance of 2.79. CaDEAD33 could be identified as a homolog of the Arabidopsis LOS4 protein (AT3G53110) (Fig. 7a). The DEAH proteins belonging to the three species could be broadly categorized into two groups (Groups I and II), each of which could be further divided into two subgroups (Fig. 7b). Similarly, the DExD/H-box proteins were clustered into two broad groups Groups I and II. The Group II was further divided into two smaller subgroups (II a and II b). The subgroup II b was further clustered into a smaller subgroup II b1 and a larger subgroup II b2 (Fig. 7c). Higher sequence diversity was observed for the DEAH and DExD/H -box proteins with an average pairwise distance of 2.02 and 2.39, respectively. For the proteins CaDExD/H35 and AT3G11200, the highest pairwise distance of 3.28 was observed. CaDExD/H36 was identified as a potential homolog of the AtHELPs protein (AT3G46960).

Digital expression profiles of the RNA helicase gene family in chickpea. The expression patterns
for most of the RNA helicase genes was conserved across all the different tissues examined (Fig. 8). However, for a few genes, tissue specific expression was observed. Both the genes CaDEAH4 and CaDExD/H61 (Fig. 8b,c) were substantially down-regulated in the leaf tissues of the cultivated genotype as compared to the other tissues www.nature.com/scientificreports/ or the leaf tissues of the wild genotype. These genes might play different roles in the leaf tissues of the wild and cultivated species. The gene CaDExD/H66 was down-regulated in the leaf tissues of the wild genotype relative to all the tissues of the cultivated genotype. This gene (LOC101491973) is a putative CLSY gene (CLSY family), whose homolog in Arabidopsis encodes for a protein involved in regulation of RNA Polymerase. IV, in turn controlling the level of DNA methylation both at the global and at the locus-specific level 37 .
To examine the drought-responsive expression of the chickpea RNA helicase gene family the log2FC values obtained from the RNA-Seq data derived from the leaf tissues of two genotypes, exposed to drought stress and collected at the shoot apical meristem formation stage, ICC 8261 (DT) and ICC 283 (DS), were analysed. It was observed that for the CaDEAD and CaDExD/H-box subfamilies, a considerably higher number of genes were down-regulated in response to drought stress for the DT genotype as compared to the DS genotype (Fig. 9). The genes CaDEAD46 and CaDEAD50, which are homologs for AtSTRS1 (At1g31970) and AtSTRS2 (At5g08620) genes (Stress Response Suppressor genes, negative regulators of stress-responsive TFs) showed different expression profiles in the two genotypes (Fig. 9a). Both the genes were relatively up-regulated in the DS genotype compared to the DT genotype. www.nature.com/scientificreports/   Fig. S7a). On day 24 of drought stress (117 DAS), at a stage when water was withheld for 24 days from the drought stressed plants (T24), the highest significant (p < 0.05) reduction in RWC of the T24 over the C24 plants was observed for the genotype SBD 377 (21.7%) (Fig. 10) (Fig. 10, Supplementary  Fig. S7b).
Based on both the parameters, least reduction in T24 plants over the C24 plants for both RWC and MSI, 5.2 and 2.9%, respectively (p < 0.05) was observed for the genotype ILWC 292 followed by ICC 8261 (5.1 and 4.7%) and ICC 4958 (6.1 and 3.7%). Highest reduction for both the parameters was observed for the T24 plants of the genotype ICC 1882 (13.4 and 8.5%), followed by ICC 283. For, SBD 377 it was observed that the % age reduction in RWC and MSI was not collinear. While the SBD 377 T24 plants exhibited a very high reduction in RWC over the C24 plants (21.7%), the percentage reduction in MSI (2%) was very less for the same set of plants (Fig. 10).
Data for the root morphological traits were recorded for the C24 and T24 plants for all the seven genotypes. The mean RL was highest for the genotypes ICC 8261 (595 cm) and ILWC 292 (583 cm) and least for ICC 1882 (317 cm). The highest significant reduction (p < 0.05) between the C24 and T24 plants was also observed for the two genotypes, ICC 8261 and ILWC 292. For ICC 4958 there was least significant reduction in RL between the C24 and T24 plants (p < 0.05). Compared to all the other traits like RL, SA and RV, the reduction in RD of the T24 plants when compared to the C24 plants, was much lesser (Fig. 11). www.nature.com/scientificreports/ Drought-responsive RNA helicase genes in chickpea identified through qRT-PCR. Based on transcript abundance, we could identify the gene CaDEAD50 as a drought-responsive RNA helicase gene in chickpea. On exposure to drought stress treatment, it was found to be up-regulated in ICC 8261 as compared to ILWC 292 (p value < 0.01) (Fig. 12). It was observed that the gene was up-regulated in both the genotypes on exposure to drought stress but the magnitude of up-regulation was higher in ICC 8261 as compared to ILWC 292 ( Supplementary Fig. S8b,c). For the gene CaDExD/H66 an opposite pattern was observed, on exposure to drought stress. While its expression did not change in ICC 8261 under drought stress, it was over-expressed in ILWC 292 on drought stress treatment. Therefore, its relative expression in ICC 8261 was lesser compared to ILWC 292 due to drought stress (Fig. 12). It is noteworthy that the proteins encoded by homologs of both these genes in Arabidopsis have been identified to play key roles in epigenetic regulation of gene expression, primarily by controlling the de novo methylation pathway 37,38 . www.nature.com/scientificreports/

Discussion
The cellular repertoire of different RNA species in eukaryotic organisms, has expanded immensely in the last decade or so. With the identification of non-coding RNAs like long non-coding RNAs (lncRNAs), small noncoding RNAs like miRNAS (microRNAs) and siRNAs (small interfering RNA) and other forms of non-coding RNA species in plants 39 and with the continuing research to decipher their molecular functions, the enzymes which regulate their synthesis and metabolism, become important subjects of scientific research and investigation. The SF2 RNA helicases which consist of the DEAD, DEAH and DExD/H-box subfamilies, have been identified in several plant species 40 . Amongst the leguminous crops, this gene family has been identified in soybean (Glycine max) 4 . In the present study, we identified a total of 150 RNA helicase genes (50 DEAD, 33 DEAH and 67 DExD/H-box genes) across the chickpea genome. This number is comparable to the number identified in rice (149) but lesser than those in soybean (213) and Arabidopsis (161) 4 . The higher number of RNA helicase genes existing in soybean is expected owing to the larger genome size (1115 Mb) compared to that of chickpea (738 Mb). Also, since in soybean, genome duplication has occurred twice, which has led to the duplication of about 75% of soybean genes, the number of members identified for most of the gene families in soybean is always higher than in chickpea [41][42][43] . A plausible explanation for higher number of these genes identified in Arabidopsis as compared to chickpea, could be because of the differences in the genome assembly and its annotation. While for Arabidopsis assembly version TAIR10.1, 119 Mb of 125 Mb has been sequenced and annotated (TAIR10.1, GCF_000001735.4), for the chickpea assembly version GCF_000331145.1, 530 Mb of 738 Mb has been sequenced and annotated (NCBI Assembly database). The higher degree of genome characterization in Arabidopsis compared to chickpea could have led to the increased number of RNA helicase genes being identified in the former. It cannot be overruled that with higher number of gene annotations and protein coding genes being identified for chickpea in future, the number of genes reported for not only the RNA helicase gene family but for other gene families also, may increase. Inspite of the differences observed in the numbers of RNA helicase genes across the www.nature.com/scientificreports/ different species, the overall size in terms of % of genes is similar. In chickpea it is 0.45%, while it is 0.39% and 0.32% in rice and soybean, respectively 4 . The positions, lengths and number of introns within the chickpea RNA helicase genes was found to be highly variable for all the three subfamilies, as has been observed in Arabidopsis 15 . The gene duplication analysis revealed the presence of 15 pairs of paralogous genes in chickpea genome. There was a predominance of segmental duplication events which led to the expansion of the RNA helicase gene family in chickpea. It was observed that except for one pair CaDExD/H12/CaDExD/H39 (protein identity of 57.5%), the gene expression levels of the genes belonging to a duplicated gene pair were similar across two different genotypes, which contrast for drought tolerance. The different expression profiles of these two genes is suggestive of their functional non-redundancy. It could be possible that the two genes acquired different functions after duplication (neofunctionalization). The promoter sequences of the helicase genes were enriched in light and drought-responsive elements. The light-responsive nature of the promoters could indicate towards a DNA/ RNA-damage repair function of these helicases as it has been found that many of the helicases are induced in response to and to rectify the oxidative damage caused by UV-B exposure to nucleic acids 44 . The most abundant, drought-responsive element that was identified in as many as 13/20 genes examined was the drought stress and aging response element (ACG TAT ERD1). This element with ACGT as a core sequence has been previously reported in promoters of stress-responsive genes which are induced by ABA 45 .
Compared to the CaDEAD-box proteins, the CaDEAH and the CaDExD-box proteins, were larger in sizes and less conserved in terms of motif structures and organization. This observation is in agreement with previous reports in Arabidopsis, sweet potato, cotton, etc 4,12,13 . The diversity in terms of protein structures might confer functional diversity to the RNA helicases in chickpea.
The RWC and MSI are two of the most widely utilized traits to estimate the drought tolerance behaviour of different genotypes across several different crop species 46,47 , including chickpea 48 . Amongst the seven genotypes    51 . For the root traits, higher mean reduction for the T24 plants, for all the traits was observed for the DT genotypes compared with DS genotypes. Least mean reduction was observed for the trait RD. It has been observed that under low soil moisture conditions, the roots demonstrate agravitropic growth response, during which the root length decreases but the diameter increases. This behaviour of roots has been previously reported in chickpea in response to drought and salinity stress 52,53 .
The gene expression analysis through qRT-PCR, was performed to identify drought-responsive RNA helicase genes in chickpea. The CaDEAD50 gene, was up-regulated in the cultivated variety relative to the wild genotype on drought stress exposure. The gene CaDExD/H66, on the other hand was down-regulated in the cultivated variety when compared to the wild genotype, under drought stress. The proteins encoded by the CaDEAD50 and CaDExD/H66 genes, share a high degree of homology with the Arabidopsis STRS2 protein (NP_196479, 74.5% identity, query coverage of 83%, e-value = 0) and CLSY protein (NP_172040, 49.6% identity, query coverage Figure 11. Differences in root morphological traits (root length, surface area, root diameter and root volume) between the control, well-watered (C24) and drought stress treated (T24) plants, for the seven genotypes used in the study. The data for the traits was recorded on 117 DAS. ILWC 292 belongs to the wild species C. reticulatum, while the remaining six genotypes belong to the cultivated chickpea species, C. arietinum. ICC 4958, ICC 8261, ILWC 292, Pusa 362 are drought tolerant and ICC 1882, SBD 377 and ICC 283 are drought sensitive. Error bars indicate standard errors of means of three biological replicates/genotype/treatment (n = 3). Statistical significance of differences in means, was tested by Student's t-test. The *symbol above horizontal lines (C and T) represent significant difference at p < 0.05. www.nature.com/scientificreports/ of 57%, e-value = 0), respectively. Both these proteins are known to be involved in the RdDM pathway and bring about differential methylation at specific loci, the stress-responsive genes 54 by the STRS proteins and both at specific loci (siRNA loci) and across the genome by the CLSY proteins 37 . In fact we were able to identify five RNA helicase genes, i.e., the CaDEAD46, CaDEAD50, CaDExD/H12, CaDExD/H39 and CaDExD/H66, as potential homologs of the Arabidopsis STRS1, STRS2, CLSY1, CLSY2 and CLSY3 genes, respectively. The proteins encoded by these Arabidopsis genes have been shown to control DNA methylation (mC) through the RdDM pathway (involving the various RNA Polymerase enzymes). The transcriptome data (Fig. 9) shows that while some genes like CaDEAD46 and CaDExD/H66 show little change in expression in the DT or DS variety in response to drought stress, the genes CaDEAD50, CaDExD/H12 and CaDExD/H39 were all differentially expressed amongst the DT and DS varieties, on exposure to drought stress. However, based on the qRT-PCR results, since we could identify the genes CaDEAD50 and CaDExD/H66 as drought-responsive and also keeping in view that the CaDExD/H66 protein was found to interact (confidence score > 0.7) with the CaRDR2 protein, (protein-protein interaction network analysis), we contemplate the role of epigenetic regulation of gene expression in response to drought stress in chickpea, mediated by the CaDExD/H66 protein. In a study which described the organ-specific DNA methylation patterns in chickpea, it was observed that there was a substantially higher number of differentially methylated regions (DMRs) (23,265), identified for the leaf tissues collected from the wild chickpea (C. reticulatum) genotype PI 489777 when compared to that of cultivated chickpea 26 . Amongst these the hypomethylated DMRs constituted the bigger fraction than the hypermethylated DMRs. The CaDExD/ H66 gene was also found to be up-regulated in the leaf tissues of the cultivated variety of chickpea, relative to the leaf tissues from the wild species, under control conditions (the transcriptome data generated in the same study, from different tissues of wild and cultivated species, Fig. 8c and corroborated by the qRT-PCR analysis, Supplementary Fig. S8a, present study). In Arabidopsis, in the clsy mutants, a high degree of overlap was observed between hypomethylated DMRs and reduced 24nt-siRNA clusters, especially in the non-CG contexts 37 . Based on Figure 12. The relative expression levels of selected 15 RNA helicase genes (7 CaDEAD, 1 CaDEAH and 7 CaDExD/H-box genes) in leaf tissues of two genotypes, ILWC 292 (C. reticulatum) and ICC 8261 (C. arietinum), in response to drought stress (T24), as estimated by qRT-PCR. The T24 treatment indicates water being withheld for 24 days beginning with 93 DAS. The relative expression levels which were normalized to GAPDH were determined by the comparative CT method (2 − ΔΔCT). The expression level of genes for ILWC 292 T24 was considered as control (expression = 1). Three biological and technical replicates were used for each experiment.

Scientific
Error bars indicate standard errors of means. The *symbol above bars indicates significant difference at p < 0.05 and **indicates significant difference at p < 0.01. www.nature.com/scientificreports/ all these observations, it can be hypothesized that the gene CaDExD/H66, could possibly be involved in regulation of DNA methylation levels in chickpea by regulating siRNA production, in conjunction with other proteins like the RDRs, AGO and DCL proteins.

Conclusion
This study provides a detailed and comprehensive analysis of the RNA helicases of the SF2 superfamily in chickpea. A total of 150 RNA helicase genes belonging to the DEAD, DEAH and DExD/H-box subfamilies were identified across the chickpea genome. The CaDEAD-box proteins were found to be most conserved in terms of protein structure amongst all the three subfamilies. Also, highest number of duplicated gene pairs were identified for the CaDEAD-box genes as compared to the CaDEAH and CaDExD/H-box genes. Gene expression analysis led to the identification of drought-responsive RNA helicase genes in chickpea. The gene CaDExD/H66 was identified as a drought-responsive helicase gene, which could potentially be involved in regulation of DNA methylation patterns in chickpea through the RdDM pathway.

Data availability
Two publicly available data sets were utilized for digital gene expression analysis. The first comprised of the FPKM values from RNA-seq. data retrieved from NCBI GEO dataset, BioProject ID: PRJNA401922. The second dataset corresponded to the log 2 FC values of gene expression derived from the RNA-Seq dataset, BioProject ID: PRJNA413294.