A combination of two approaches, namely QTL analysis and gene enrichment analysis were used to identify candidate genes in the “QTL-hotspot” region for drought tolerance present on the Ca4 pseudomolecule in chickpea. In the first approach, a high-density bin map was developed using 53,223 single nucleotide polymorphisms (SNPs) identified in the recombinant inbred line (RIL) population of ICC 4958 (drought tolerant) and ICC 1882 (drought sensitive) cross. QTL analysis using recombination bins as markers along with the phenotyping data for 17 drought tolerance related traits obtained over 1–5 seasons and 1–5 locations split the “QTL-hotspot” region into two subregions namely “QTL-hotspot_a” (15 genes) and “QTL-hotspot_b” (11 genes). In the second approach, gene enrichment analysis using significant marker trait associations based on SNPs from the Ca4 pseudomolecule with the above mentioned phenotyping data, and the candidate genes from the refined “QTL-hotspot” region showed enrichment for 23 genes. Twelve genes were found common in both approaches. Functional validation using quantitative real-time PCR (qRT-PCR) indicated four promising candidate genes having functional implications on the effect of “QTL-hotspot” for drought tolerance in chickpea.
Chickpea (Cicer arietinum L.) crop is a rich source of protein (20–30%), carbohydrates (~40%) and minerals and thus serves as an important source of nutrients in vegetarian diets, especially in developing countries1. However, its productivity is severely affected by various abiotic stresses such as drought, heat and salinity. Among these, terminal drought is one of the most serious production constraints, often leading to 40–50% annual yield loss, as the crop is largely cultivated on residual soil moisture in the arid and semi-arid regions of South Asia and sub-Saharan Africa2. Additionally, global warming and predicted future climatic change primarily affecting precipitation, temperature and evapotranspiration will further increase the drought issue. Therefore, enhancing drought tolerance in chickpea is required to enhance crop productivity. To supplement conventional breeding, some efforts such as QTL mapping3,4,5, candidate gene based allele diversity analysis6 and genome wide association study (GWAS)7 have been made to identify QTL/genes for drought tolerance related traits in chickpea. In our earlier study, by using phenotyping data for 20 drought tolerance related traits collected in 1–7 seasons at 1–5 locations in India and genotyping data for 241 simple sequence repeat (SSR) loci on one intra-specific population (ICC 4958 × ICC 1882), we identified a “QTL-hotspot” region harbouring 12 QTLs for 12 drought tolerance related traits explaining up to 58.20% phenotypic variation5. Introgression of this “QTL-hotspot” region in JG 11, an elite variety, has improved root traits and drought tolerance8. However, this region was estimated to be 29 cM on the genetic map and 7.74 Mb on the physical map9. Subsequently, by using a genotyping-by-sequencing (GBS) approach, we refined this region to 14 cM, equivalent to ca. 3 Mb10. Therefore, for identification of candidate genes associated with drought tolerance as well as development of closely linked markers for breeding, there is a need for fine mapping the “QTL-hotspot” region.
Recent advances in sequencing technology have provided a cost effective way to develop several thousand SNPs in a limited period of time in large mapping populations, by sequence-based genotyping11,12. One such technique, GBS13, offers simultaneous SNP detection and scoring and therefore is now being used in several crops for diversity assessment, trait mapping, GWAS and genomic selection14,15. This approach has already been used by us and as mentioned above, we refined the “QTL-hotspot” region to ca. 3 Mb10. The GBS approach, however, suffers from the limitation of missing regions of the genome and bias based on methylation and restriction site abundance16. An alternate approach to overcome missing data issues is whole genome re-sequencing (WGRS) wherein samples can be sequenced at greater depth thereby reducing the missing data issue. To save the cost of sequencing, WGRS can be done at a lower depth and in that scenario the approach is referred to as skim sequencing17. This approach is very useful to identify sequence variants/SNPs in species where the reference genome sequence is available. However, the SNPs identified using NGS technologies cannot be directly used for QTL studies as: i) Next generation sequencing (NGS) technologies are prone to small but unrecoverable sequencing errors and therefore an individual SNP site cannot be used as a reliable marker for genotyping, ii) it is very difficult to score all SNP sites in an entire recombinant population, and iii) limitations of QTL analysis software to handle such a huge dataset. To address these issues, a parent dependant sliding window approach was used to identify true recombination breakpoints and to construct a recombination bin map using SNP data of an entire recombinant population in rice18. Using a bin mapping approach, a QTL for plant height was reported18 within the region that contained the semi-dwarf gene, sd1, responsible for the rice “green revolution”19. By using the WGRS bin mapping approach, candidate gene(s) were identified for root-knot nematode resistance20 and salt tolerance21 in soybean and for grain width in rice22.
With an objective of fine mapping the “QTL-hotspot” region and to identify candidate genes, we sequenced the intraspecific RIL mapping population derived from ICC 4958 (drought tolerant genotype) and ICC 1882 (drought sensitive genotype) cross and constructed a bin map. By using a combination of bin mapping based QTL analysis and a priori gene enrichment analysis, a set of candidate genes were prioritized. Functional validation of these genes has provided the most promising candidate genes that are involved in regulation of drought tolerance in chickpea.
High-density bin mapping
Skim sequencing and SNP calling
The RIL population (ICC 4958 × ICC 1882) was genotyped using a skim sequencing approach. In the first instance, a total of 5.90 Gb and 5.94 Gb data were generated for ICC 4958 and ICC 1882 respectively, which represents an estimated coverage of 7.90X and 8.03X. SGSautoSNP identified 84,963 SNPs, out of which 64,581 were distributed over the 8 pseudomolecules and 20,382 were located on unplaced contigs. Subsequently, a total of 497.60 Gb of Illumina paired read sequence data were generated for 232 RIL samples. The data obtained ranges from 0.21X to 4.9X per RIL, with an average of 0.72X, however, RIL219 has exceptionally low (0.002X) data (Supplementary Table 1). The SNPs identified (84,963) from resequencing of parental genotypes were compared with those identified from skim sequencing of RIL population; resulted in identifying 82,127 common SNPs. The remaining 2,836 SNPs could not be identified in RIL population. This might be because of low coverage sequencing of RIL population or because of errors in resequencing of parental genotypes. However, only 62,370 SNPs out of 82,127 which were located on pseudomolecules were considered for further analysis. These SNPs were further filtered, using the criteria of minimum allele frequency (MAF) of ≥0.20 and lines having ≥5% missing data being discarded, which resulted in 53,169 SNPs across 222 RILs being analysed. Additionally 54 SNPs within the “QTL-hotspot”10 were also included and a total of 53,223 SNPs were used for further study. The distribution of SNPs on the eight chickpea pseudomolecules is represented in Fig. 1 and Supplementary Table 2. The highest number of SNPs were identified on Ca4 (18,989 SNPs), while only 954 SNPs were identified on Ca5.
Recombination breakpoints and resolution of genes
By using the sliding window approach on 53,223 SNPs segregating in 222 RILs, a total of 1,610 bins were identified (Fig. 2 and Supplementary Table 2). The average number of bins identified in an individual RIL were 35.71, while the number of bins per pseudomolecule ranged from 2.75 to 6.12. The minimum number of bins (112) were identified on Ca2, whereas the highest number of bins (292) were identified on Ca6 (Figs 1 and 3a). The bin sizes ranged from 41 bp to 8.05 Mb, with an average of 210.60 Kb and median of 54.77 Kb. In total, 91% bins were of size ≤5 Mb while ~9% bins (144) had a size >5 Mb (Fig. 3b). By considering 28,269 genes in the chickpea genome23, the defined bin map covered 23,277 genes. In total, 8.68% (139) bins contained no gene, 14.29% (229) had only one gene, while 45.82% (734) had two to ten genes. Thus, 68.79% of bins had ≤10 genes (Fig. 3c).
Linkage map with bins as markers
All 1,610 bins, as mentioned above, were used as molecular markers for developing a genetic map (http://cegresources.icrisat.org/cmap/Chickpea_bin_map). Out of which, 1,557 (96.71%) bins could be integrated into a genetic map that spans 973.94 cM (Supplementary Table 2) and corresponds to 757.75 Kb/cM by considering the chickpea genome size of 738 Mb (http://cegresources.icrisat.org/cmap/Chickpea_bin_map). The average inter-bin interval per linkage group ranged from 0.45 cM (CaLG04) to 0.93 cM (CaLG02) with an overall average value of 0.66. The size of the linkage groups varied from 70.57 cM (CaLG08) to 196.27 cM (CaLG06) (Supplementary Table 2). The highest number of bins (286) were mapped on CaLG06, followed by CaLG04 (277), while the minimum number of bins were mapped on CaLG02 (103) (Supplementary Table 2). The order of bins on the genetic map were compared with their physical position on the chickpea genome sequence (CaGAv1.0). While excellent concordance was observed for linkage groups 4 and 8; minor differences were observed on other linkage groups (Supplementary Fig. 1).
Identification of candidate genes in “QTL-hotspot” region
Combinations of two approaches, namely linkage-mapping based QTL analysis as well as genome-wide association study (GWAS) based gene enrichment analysis were undertaken.
Bin mapping based QTL analysis
High-resolution mapping of QTLs
Genotyping data for 1,557 bins were analysed together with phenotyping data collected for 17 traits and two drought indices (drought tolerance index (DTI) and drought susceptibility index (DSI)) at 1–5 locations in 1–5 seasons, mentioned in our earlier study5. These data include: 4 root traits: root length density (RLD, cm cm−3); root dry weight (RDW, g); root surface area (RSA, cm2); root dry weight/total plant dry weight ratio (RTR, %), 6 yield and yield-related traits: pods/plant (POD); seeds/pod (SPD); 100-seed weight (100SDW, g); biomass (BM, g); harvest index (HI, %); yield (YLD, g), 4 morphological traits: shoot dry weight (SDW, g); plant height (PHT, cm); primary branches (PBS); secondary branches (SBS), 2 phenological traits: days to 50% flowering (DF); days to maturity (DM) and 1 physiological trait: delta carbon ratio (δ13C/DC). QTLs with >10% phenotypic variation explained (PVE) were considered as major QTLs while others were considered as minor QTLs. In summary, a total of 134 QTLs for 17 traits and two drought indices were identified, out of which 71 were major QTLs and 63 were minor QTLs (Supplementary Table 3, Supplementary Table 4). PVE for major QTLs ranged from 10.14 to 59.83% at 3.12 to 43.56 LOD value while for minor QTLs, PVE ranged from 4.75 to 9.85% at 3.00 to 4.91 LOD value.
Major QTLs were identified for 11 traits (RLD, RTR, SDW, PHT, PBS, DF, DM, POD, 100SDW, HI and DC) and minor QTLs were identified for all the analysed traits (Supplementary Table 3, Supplementary Table 4). The distribution of all the major QTLs in the chickpea genome has been shown in Fig. 4 and summarized in Supplementary Table 4. A total of 29 major QTLs for 9 traits (RLD, RTR, SDW, PHT, DM, POD, 100SDW, HI and DC) were observed on CaLG04 and that too within the “QTL-hotspot” region. On the other hand, 30 major QTLs for five traits (PHT, PBS, DF, DM and HI) were observed on CaLG08.
Splitting of “QTL-hotspot” region
Although 1–17 major QTLs per trait were identified (Supplementary Table 4), we targeted the topmost QTL (a QTL which explained the highest phenotypic variation) for a given trait. As a result, 11 topmost QTLs with 10.14–59.83% PVE were selected for 11 traits (Table 1). While analysing flanking markers for these topmost QTLs, we identified two genomic regions, namely bin_4_13239546- bin_4_13378761 of 139.22 Kb and bin_4_13393647- bin_4_13547009 of 153.36 Kb sizes on CaLG04 containing the topmost QTLs for RLD, PHT, POD, 100SDW and DC; and for RTR and SDW, respectively (Table 1). Similarly, one genomic region (bin_8_6034209- bin_8_5984553) of 49.66 Kb size was identified on CaLG08 containing topmost QTLs for PBS, DF and HI (Table 1). However the topmost QTL for DM was present solitary on CaLG07. As expected, the two genomic regions on CaLG04 mentioned above were present within the “QTL-hotspot” region. In brief, this study split the “QTL-hotspot” region of ~3 Mb size into two smaller regions and we refer them as “QTL-hotspot_a” (bin_4_13239546- bin_4_13378761, equivalent to 139.22 Kb) and “QTL-hotspot_b” (bin_4_13393647- bin_4_13547009, equivalent to 153.36 Kb) (Fig. 5).
Candidate genes in “QTL-hotspot” sub-regions
It is evident from the results presented above that QTL analysis using high density bin mapping refined the “QTL-hotspot” region from 3 Mb10 to two sub-regions of 139.22 Kb (“QTL-hotspot_a”) and 153.36 Kb (“QTL-hotspot_b”). Comparison of these sub-regions with the chickpea genome assembly23 and genome annotations identified a total of 26 genes in these two regions. The “QTL-hotspot_a” sub-region has 15 and the “QTL-hotspot_b” contains 11 genes. Further, a detailed analysis of recombination at “QTL-hotspot_a” and “QTL-hotspot_b” along with the phenotypic data for 100SDW for 10 RILs (4 high and 6 low) clearly showed that the possible role of “QTL-hotspot_a” and “QTL-hotspot_b” to govern 100SDW trait in chickpea (Fig. 6).
A priori gene enrichment analysis in the “QTL-hotspot” region
Genome-wide association study for Ca4 pseudomolecule
In order to identify candidate genes from the “QTL-hotspot” region present on pseudomolecule Ca4, 18,935 SNPs present on Ca4 and segregating in the RIL population were analysed with phenotyping data for all 17 traits and two drought indices for genome-wide association study. This analysis provided 3,574 marker-trait associations (MTAs) for all the 17 traits and two indices after applying bonferroni correction (Supplementary Table 5).
Candidate genes in “QTL-hotspot” region
While comparing the “QTL-hotspot” region with genomic positions of QTLs identified using bin map in the present study, a total of 34 QTLs including 29 major QTLs for 9 traits (RLD, RTR, SDW, PHT, DM, POD, 100SDW, HI and DC) and five minor QTLs for three traits (PHT, SBS and BM) were found in the region. Analysis of genomic positions for 34 QTLs showed 7 clusters in the “QTL-hotspot” region (Supplementary Table 3). Comparison of these 7 cluster regions with the chickpea genome and genome annotation data identified a total of 76 curated genes (Supplementary Table 6).
Promising candidate genes
To systematically summarize observed MTAs identified by GWAS and to identify candidate genes involved in the regulation of drought tolerance variation, probabilistic inference was made using the PICARA analytical pipeline. With the defined RMIP cutoff, significant associations were found for 6 traits viz. 100SDW, BM, DTI, PHT, POD and SDW (Supplementary Table 5) and 23 out of 76 genes showed significant enrichment for these traits. Detailed gene enrichment analysis has been summarized in Supplementary Table 6 while a representative image for 100SDW is shown in Supplementary Fig. 2.
Prioritization and functional validation of candidate genes
As evident from above, bin mapping based QTL analysis identified 26 candidate genes in two sub-regions, while a priori gene enrichment analysis shortlisted 23 genes. Comparison of these gene sets prioritized 12 genes that were common in both analyses. Of these genes, 1 was present in “QTL-hotspot_a” and 11 were present in “QTL-hotspot_b” sub-regions. The functions of these genes were predicted by comparison with the NCBI non-redundant database using BLASTP (Supplementary Table 7). Various proteins such as serine threonine-protein kinase, E3 ubiquitin-protein ligase, homocysteine s-methyltransferase, leucine-rich repeat extensin-like along with uncharacterised mitochondrial and ribosomal proteins were identified.
For investigating functional implications of the prioritized 12 genes, differential gene expression profiling was undertaken in root tissues collected under stress and control conditions and in mature seeds of ICC 4958 and ICC 1882 using qRT-PCR. In root tissues, 7 genes, namely Ca_04546, Ca_04561, Ca_04562, Ca_04564, Ca_04567, Ca_04568 and Ca_04569 showed significantly higher expression in the ICC 4958 than ICC 1882 under stress conditions (Fig. 7). The expression level of these genes in ICC 4958 genotype was 0.61 to 5.72 fold higher under stress compared to that of ICC 1882. The Ca_04568 gene showed the highest (5.72 fold) change in expression followed by Ca_04561 (5.12 fold). In mature seeds, all genes except Ca_04565, Ca_04568 and Ca_04570 were found to be upregulated in the ICC 4958 genotype than ICC 1882 (Fig. 8). Interestingly, most of these upregulated genes were found not to be expressed or showed negligible expression in the ICC 1882 genotype. A maximum 9.63 fold higher expression was observed for the Ca_04564 gene, while the Ca_04571 gene showed only 0.80 fold higher expression in ICC 4958 as compared to ICC 1882.
Comparative analysis of expression of each gene in respective root and seed samples of ICC 4958 and ICC 1882 genotypes was also carried out. Four genes, viz. Ca_04561, Ca_04562, Ca_04567 and Ca_04569, were found to be upregulated in both root (under stress) and seed samples of ICC 4958 with a similar fold change in expression as compared to ICC 1882. For instance, the Ca_04561 gene was found 5.12 and 5.49 fold upregulated in stressed root tissue and mature seeds of ICC 4958 than the respective samples of ICC 1882. In contrast, gene Ca_04564 was found highly upregulated in seed (9.54 fold) as compared to the stressed root (3.12 fold) sample of ICC 4958 while significantly higher expression of Ca_04568 gene was observed in root (5.62) as compared to seed (−0.18) samples in ICC 4958 genotype.
NGS technologies have revolutionized the identification of candidate genes for crop improvement. Terminal drought is a major constraint to chickpea production and productivity, especially in arid and semi-arid regions. The changes in global climate are further predicted to aggravate losses due to drought in particular. In this context, the identification of candidate genes will be helpful to define breeding strategies and provide molecular markers to improve selection efficiency and therefore mitigate yield losses in chickpea. In this study, a skim sequencing approach was used and more than 53,000 SNPs were identified in a RIL population of ICC 4958 and ICC 1882 cross, which represents the largest number of markers used for QTL mapping in chickpea to date.
A sliding window based bin mapping approach identified an average of ~35 bins per RIL. Theoretically, each RIL contains 3–5 recombination events on each chromosome24. The results obtained match these assumptions (4 × 8 = 32). The average bin size was 210.60 Kb and more than 90% bins were less than 1Mb, suggesting that the majority of the recombinations were captured in this study. Moreover, ~68% bins contained ≤10 genes, indicating a high resolution of the bin map. All 1,610 recombination bins were used as markers to construct a linkage map of 973.94 cM. The average inter-bin distance (0.66) observed in this study was lower than other studies conducted on intra-specific mapping populations (0.94–7 cM) in chickpea25 indicating the current map is highly saturated. Such a highly saturated map can be used for targeted QTL mapping, QTL cloning and for identification of candidate genes for important agronomic traits in chickpea. We observed a high correlation between bin orders and their positions on the genome sequence for linkage groups 4 and 8. The interchanging of bin positions observed on other linkage groups could be due to misassembled portions of chickpea genome assembly. Such misassembled regions in chickpea genome have recently been reported26,27.
The majority of QTLs identified in the present study are in concordance with earlier studies5,10. For instance, in all the three studies, QTLs for 100SDW, RLD, PHT, POD, RTR and SDW were observed on CaLG04, while a QTL for DF was reported to be present on CaLG08 indicating the quality and accuracy of the linkage map. The “QTL-hotspot” region of 7.74 Mb identified in earlier studies5 had 7 SSRs and was further refined to ca. 3 Mb by integration of 49 SNP markers10. The present study integrated 1,421 SNPs and identified 38 recombination breakpoints within this region, thereby splits the “QTL-hotspot” into two subregions viz. “QTL-hotspot_a” (139.22 Kb; 0.23 cM) and “QTL-hotspot_b” (153.36 Kb; 0.22 cM), and thus further narrowed down the earlier reported “QTL-hotspot” region to a size appropriate for candidate gene identification. This demonstrates that WGRS approach helps to significantly improve the accuracy and resolution of QTL mapping similar to what has been observed in several studies in other crops. For example, in soybean, QTLs of sizes 7.90 to 340 Kb were identified for root knot nematode resistance using ~100 thousand SNPs and 242 RILs20, while in another study21 QTLs of 176 Kb to 1.28 Mb sizes were identified using 1,798,504 SNPs and 96 ILs for salt tolerance. This resolution is equivalent to traditional fine mapping studies. For instance, a grain-weight QTL, gw3.1 in rice was narrowed down to 93.8 Kb using fine mapping procedures28. These studies suggest that the bin mapping approach can be used for candidate gene identification using a RIL population and thus avoiding the laborious time required for traditional fine mapping.
A sequential approach of QTL and gene enrichment analysis identified 12 candidate genes for drought tolerance related traits in chickpea. Annotation of these genes identified several genes, including E3 ubiquitin-protein ligase, leucine-rich repeat extensin-like protein, serine threonine-protein kinase, homocysteine s-methyltransferase, vicilin, which have been reported to play a role in biotic and abiotic stress tolerance29,30,31,32 and differential gene expression analysis proved their involvement in the drought tolerance mechanism in chickpea. Ca_04568 gene was annotated as ‘Homocysteine s-methyltransferase’ and found to be upregulated in ICC 4958 genotypes only in root tissue under stress conditions indicating tissue specific expression. ‘Homocysteine s-methyltransferase’ gene was also reported to be upregulated in drought tolerant chickpea cultivar ‘Xj-209’ under stress condition32 and could be one of the important genes involved in drought tolerance mechanism in chickpea. Proteomic analysis of rice seedlings also reported involvement of ‘Homocysteine s-methyltransferase’ in regulating cold stress33. The Ca_04567 gene was annotated as ‘LRR receptor like serine threonine-protein kinases’ which are the well characterized membrane proteins known for their roles in stress response and abscisic acid (ABA) regulation29. More than 2 fold higher expression of this gene was observed in the drought tolerant genotype, ICC 4958 than the drought sensitive genotype ICC 1882, suggesting a role in drought tolerance in chickpea. Additionally, the Ca_04561 gene, which is predicted to be an ‘E3 ubiquitin-protein ligase’, was up-regulated in ICC 4958 while down-regulated in ICC 1882 under stress conditions. Ubiquitination is an important post translational modification to regulate growth and development in all eukaryotes. Various studies in plants34,35,36 reported the role of the ubiquitin 26S proteasome system (UPS) in regulating fundamental processes such as embryogenesis, photomorphogenesis, and organ development. E3 ubiquitin ligase, which determines substrate specificity, is an important enzyme of the ubiquitination pathway37,38. Recent studies39,40,41 reported the involvement of E3 ubiquitin ligase in the regulation of abiotic stress tolerance. A study in Arabidopsis42 showed that suppression of one of RING E3 ubiquitin ligase (AtATL78) gene was responsible for increasing tolerance to cold stress while decreasing drought stress tolerance. The regulation of this gene could be responsible for determining drought sensitivity/resistance in chickpea. Interestingly, this gene showed higher expression in mature seeds of ICC 4958 (larger seed size) as compared to ICC 1882 (smaller seed size), although, a loss of function of RING-type E3 ubiquitin ligase was reported to be responsible for enhanced grain width, weight and yield in rice31. This indicates that ‘E3 ubiquitin-protein ligase’ could be responsible for only abiotic stress tolerance in chickpea while a different mechanism might be present for controlling seed size. Ca_04564 gene was identified as ‘leucine-rich repeat extensin-like protein’. This gene was found highly over-expressed in ICC 4958 seed while showing negligible expression in ICC 1882 seeds. The role of extensin like genes in seed germination has already been reported in Arabidopsis43. They found extensin like genes are expressed specifically in endosperm and might play a role in cell wall modification. The present study suggests a detailed study of such genes in the context of seed size/weight. Along with the above genes, one uncharacterised mitochondrial protein showing higher expression in both stressed root and seed samples of ICC 4958, was also identified. This suggests a combined action of different genes may be responsible for drought tolerance in ICC 4958. However, we did not observe any sequence level variation for these genes between tolerant and sensitive genotypes suggesting detailed characterization of these genes could help to provide deeper insights on their role in drought tolerance mechanism in chickpea.
The present study showed that bin mapping based QTL analysis approach is advantageous over traditional QTL study. However, the resolution required to identify gene(s) for complex traits could not be achieved using the bi-parental mapping population because of inherent limitation of low recombination with limited population size. In such cases, developing a large scale population will help to capture more recombinations, thereby, increasing the resolution to dissect the complex traits to the gene level.
In summary, a bin mapping approach was successfully applied to identify 1,610 recombination breakpoints in chickpea. A linkage map of 973.94 cM was constructed using recombination bins as markers and this was used for QTL analysis of drought related traits. The QTL study identified 71 major QTLs and delineates the “QTL-hotspot” region from ca. 3 Mb to two QTL regions viz. “QTL-hotspot_a” of 139.22 Kb and “QTL-hotspot_b” of 153.36 Kb sizes. Further, bin mapping and gene enrichment analysis identified a set of 12 candidate genes. Some of the identified genes such as E3 ubiquitin ligase, Serine threonine protein kinases, homocysteine s-methyltransferase were annotated as candidate drought tolerance genes, and expression profiling suggests their involvement in drought tolerance in chickpea. Detailed analysis of these genes will help to further elucidate drought tolerance mechanism in chickpea.
A set of 232 RILs was developed using the single seed decent method by crossing ICC 4958 (a drought tolerant genotype) and ICC 1882 (a drought sensitive genotype). Phenotypic data for 17 different traits along with two drought indices (DTI and DSI) described in our earlier study5 were used for identification of QTLs.
Genomic DNA was isolated from parental genotypes and RILs using high throughput mini-DNA extraction method as described earlier44. The quality and quantity of DNA was checked using a spectrophotometer (Shimadzu UV160A, Japan).
Library construction, skim sequencing and SNP calling
For the RILs and parental lines of the mapping population, a total of 50 ng DNA from each sample was used to prepare an Illumina Nextera library according to the manufacturer’s instructions. In brief, the genomic DNA was simultaneously fragmented and tagged with sequencing adaptors using an engineered transposome. The fragments were then amplified using limited PCR cycles which also adds index sequences on both ends of the DNA, thus enabling dual-indexed sequencing of pooled libraries. Libraries for 96 individuals were pooled and sequenced on the Illumina HiSeq 2500 platform. Parental genotypes were sequenced separately at high sequencing depth (~8X) whereas RILs were sequenced at low coverage of ~0.72X (average of RILs).
The reads obtained were filtered to remove low quality bases and used for SNP identification. Initially, the reads from the parental genotypes (ICC 4958 and ICC 1882) were aligned to the draft chickpea genome (CaGAv1.0) using SOAP45. Uniquely mapped reads were considered for SNP calling using SGSAutoSNP software46 with default parameters. Similarly, the low quality reads obtained from RILs were filtered out and SNPs were identified from the remaining reads. The identified SNPs were filtered with a minor allele frequency (MAF) cutoff of 0.2 and lines with ≥5% missing data were also excluded. SNPs were scored as “A” and “B” representing alleles from the two parents ICC 4958 and ICC 1882 respectively.
Identification of recombination break points
A 15 bp sliding window approach was used to predict recombination break points18. This was determined based on the ratio of the alleles in the sliding window using a perl script18. The script was used with necessary changes according to chickpea genome co-ordinates. For each individual, the ratio of A and B alleles within the window was calculated. Windows with nine or more alleles from either parent were considered as homozygous for an individual. The recombination break point was defined at the transition from one genotype to another. The recombination breakpoints obtained from all the individual RILs were aligned and compared over 100 Kb intervals. The successive intervals lacking recombination break points within an entire population were combined and considered as a single bin. These bins were then used as markers for linkage map construction.
Construction of bin map and QTL analysis
The bins were used as genetic markers for the construction of a linkage map. Map construction was carried out using QTL IciMapping Version 3.3 software47. The REcombination Counting and ORDering (RECORD) algorithm was used for marker ordering. Sum of adjacent criterion (SAD) ripple was performed to confirm the marker order. The marker order and their positions on genetic and physical map were visualized using Strudel V. 1.12.03.2048.
QTL IciMapping software was also used with the phenotype data for 17 drought related traits and two drought indices and bin map for QTL analysis. The ICIM-ADD mapping method, along with other default parameters, was used. The LOD threshold was set by using permutation 1000 and p value ≤ 0.05. The results were compared with earlier studies5,10 to determine the quality and accuracy of the bin map. Further, the QTL results were used to refine the earlier identified “QTL-hotspot” region and also to identify candidate genes controlling respective traits.
Gene enrichment analysis
PICARA enrichment algorithm delineates local linkage structure and provides a probabilistic inference of GWAS signal enrichment that defines the significance of co-localization of marker-trait association and curated candidate genes49. To generate the local linkage blocks, all SNPs on CaLG04 were included in the pairwise linkage disequilibrium analysis by Lewontin’s scaled D’ method50. Only significant pairwise LD estimates were preserved for calculating a local linkage block. For the candidate genes located in SNP poor regions, where no SNPs within and flanking candidate gene can be located, a chromosome-wide estimation was calculated by taking the median physical distance in base-pairs between all SNP pairs that are in complete dependency (D’ = 1).
For each a priori candidate gene, PICARA enrichment estimates the amount of co-localizing GWAS signal with that gene, compares the weighted probability of co-localization with a permutation test from a chromosome-wide resampling and a significance test score was generated following a hypergeometric distribution49. A prior probability of co-localization with significant association variants was generated by weighting the cumulative probability between the given candidate gene and all SNPs from the same chromosome with RMIP (re-sampling model inclusion probability51) cutoffs. In this study, a RMIP ≥ 5, which is the same SNP variant showing at least 5 times an association in 100 subsets of resampling, was used as the cutoff for association significance. The permutation was done through 10,000 randomized association sets each containing the same number of associations as the true data, sampled without replacement from all SNPs on the chromosome. Association significance (in this case, RMIP values) was assigned to each set of the 10,000 associations from the true data, and compared to their posterior probability of co-localization with the a priori candidate genes. The vector of 10,000 significance results, then served the null distribution of co-localization.
Quantitative gene expression analysis
Quantitative real time PCR (qRT-PCR) analysis was carried out to study the expression of candidate genes in RNAs isolated from seed (for seed weight trait) and root (for drought tolerance) tissues harvested under control as well as drought challenged conditions. For drought, a slow drought experiment for both ICC 4958 and ICC 1882 was conducted under greenhouse conditions, as described in Ray et al.52. Root samples of the selected genotypes were collected when the transpiration ratio reached 0.10 along with the respective controls. Each treatment was maintained in 3 biological replications. For expression profiling, root tissues of drought tolerant ICC 4958 and sensitive ICC 1882 genotypes under control and drought conditions were collected. Tissues were washed thoroughly with 0.10% DEPC water, frozen in liquid nitrogen and stored at −80 °C. Total RNA was extracted from the harvested tissues using TRIzol (Invitrogen, USA) according to the manufacturer’s protocol. RNA quality was assessed on 1.20% formaldehyde agarose gels, while the purity of RNA was assessed using a Nanovue spectrophotometer (A260/A280 ratio). First strand cDNA was synthesized from total RNA (2.50 μg) using a cDNA synthesis kit (Superscript® III; Invitrogen, USA) following manufacturer’s instructions. For seed weight trait, the mature seeds of ICC 4958 and ICC 1882 were soaked in 0.10% DEPC treated water for 30 minutes before RNA extraction. The same protocol mentioned above was used for total RNA extraction and first strand cDNA synthesis.
Gene-specific primers for qRT-PCR were designed using primer 3 software53 and used for PCR amplification using the conditions described in54. qRT-PCR was performed using an Applied Biosystems 7500 Real Time PCR System with SYBR green chemistry (Applied Biosystems, USA) according to the manufacturer’s instructions. The data from different PCR runs or cDNA samples were compared by using the mean of the CT values of the three biological replicates that were normalized to the mean CT values of the endogenous gene, glyceraldehyde 3-phosphate dehydrogenase (GAPDH). The relative expression ratios were determined using the 2_∆∆Ct method and student’s t-test was used to calculate significance55.
How to cite this article: Kale, S. M. et al. Prioritization of candidate genes in “QTL-hotspot” region for drought tolerance in chickpea (Cicer arietinum L.). Sci. Rep. 5, 15296; doi: 10.1038/srep15296 (2015).
The authors are thankful to Dr. Xinpeng Qi and Dr. Hon-Ming Lam for providing scripts to generate a bin map image. Technical assistance from Jaipal Gaud is greatly appreciated. The work has been undertaken as a part of Australia–India strategic research fund (AISRF) project funded by Department of Science and Technology (DST) and Tropical Legumes I (TLI) project funded by Bill and Melinda Gates Foundation (BMGF) through CGIAR-Generation Challenge Program (GCP; http://www.generationcp.org/). This work has been carried out as part of the CGIAR Research Program on Grain Legumes. ICRISAT is a member of the CGIAR Consortium.