Ultra-high density intra-specific genetic linkage maps accelerate identification of functionally relevant molecular tags governing important agronomic traits in chickpea

We discovered 26785 and 16573 high-quality SNPs differentiating two parental genotypes of a RIL mapping population using reference desi and kabuli genome-based GBS assay. Of these, 3625 and 2177 SNPs have been integrated into eight desi and kabuli chromosomes, respectively in order to construct ultra-high density (0.20–0.37 cM) intra-specific chickpea genetic linkage maps. One of these constructed high-resolution genetic map has potential to identify 33 major genomic regions harbouring 35 robust QTLs (PVE: 17.9–39.7%) associated with three agronomic traits, which were mapped within <1 cM mean marker intervals on desi chromosomes. The extended LD (linkage disequilibrium) decay (~15 cM) in chromosomes of genetic maps have encouraged us to use a rapid integrated approach (comparative QTL mapping, QTL-region specific haplotype/LD-based trait association analysis, expression profiling and gene haplotype-based association mapping) rather than a traditional QTL map-based cloning method to narrow-down one major seed weight (SW) robust QTL region. It delineated favourable natural allelic variants and superior haplotype-containing one seed-specific candidate embryo defective gene regulating SW in chickpea. The ultra-high-resolution genetic maps, QTLs/genes and alleles/haplotypes-related genomic information generated and integrated strategy for rapid QTL/gene identification developed have potential to expedite genomics-assisted breeding applications in crop plants, including chickpea for their genetic enhancement.

We discovered 26785 and 16573 high-quality SNPs differentiating two parental genotypes of a RIL mapping population using reference desi and kabuli genome-based GBS assay. Of these, 3625 and 2177 SNPs have been integrated into eight desi and kabuli chromosomes, respectively in order to construct ultra-high density (0.20-0.37 cM) intra-specific chickpea genetic linkage maps. One of these constructed high-resolution genetic map has potential to identify 33 major genomic regions harbouring 35 robust QTLs (PVE: 17.9-39.7%) associated with three agronomic traits, which were mapped within ,1 cM mean marker intervals on desi chromosomes. The extended LD (linkage disequilibrium) decay (,15 cM) in chromosomes of genetic maps have encouraged us to use a rapid integrated approach (comparative QTL mapping, QTL-region specific haplotype/LD-based trait association analysis, expression profiling and gene haplotype-based association mapping) rather than a traditional QTL map-based cloning method to narrow-down one major seed weight (SW) robust QTL region. It delineated favourable natural allelic variants and superior haplotype-containing one seed-specific candidate embryo defective gene regulating SW in chickpea. The ultra-high-resolution genetic maps, QTLs/genes and alleles/haplotypes-related genomic information generated and integrated strategy for rapid QTL/gene identification developed have potential to expedite genomics-assisted breeding applications in crop plants, including chickpea for their genetic enhancement. C hickpea (Cicer arietinum L.), the second most grown legume crop world-wide is an annual self-pollinated and diploid (2n 5 2x 5 16) pulse species with a genome size of ,740 Mb 1,2 . A member of Fabaceae family, this diploid food legume is a major source of human dietary protein packed with essential amino acids 1 . Chickpea is broadly categorized into two cultivars-Kabuli and Desi-types basing upon their plant characteristics and diverse gene pools-based geographical distribution. They are known to enrich the soil nutritional status and fertility by symbiotic nitrogen fixation. The seed and pod traits are considered the most prominent characteristics of chickpea defining its economic value as a diet for human race. Consequently, they draw major interest of researchers for yield and product quality improvement towards generation of high-yielding genetically tailored chickpea cultivars via genomics-assisted breeding 3 .
A large genome with a narrow genetic base, chickpea requires numerous informative and genome-wide welldistributed SNP (single nucleotide polymorphism) markers for construction of ultra-high density genetic linkage map (#1 cM average map-density 4,5 ) in order to identify and fine-map the useful trait-influencing candidate genes/ QTLs (quantitative trait loci) for marker-assisted genetic enhancement. A number of high-resolution inter-specific genetic linkage maps (with mean map-density: 0.59-1.7 cM) have been generated in chickpea by genotyping of large-scale SNP markers in its diverse mapping populations [6][7][8][9][10] . Few recent efforts have also been made to construct SNP marker-based high-density genetic linkage maps (with map density: 1.74-3.68 cM) by utilizing diverse advanced generation desi and kabuli intra-specific mapping populations [11][12][13][14] . Indeed, all these studies have utilized highthroughput genotyping assays like Illumina GoldenGate/Infinium and Competitive Allele Specific PCR (KASPar) for large-scale validation and genotyping of SNP markers (,1000 SNPs till now) in intra-and inter-specific chickpea mapping populations to construct high-density genetic linkage maps. Aside chickpea 8,9 , similar efforts have also been undertaken to construct comprehensive high-density genetic linkage map (average inter-marker distance #1 cM) by deploying aforesaid strategies in diverse legume species (soybean, Medicago, pigeonpea, common bean and mung bean) [15][16][17][18][19][20][21][22][23][24][25] . However, the need of prior information regarding SNP loci and their high-quality flanking sequence regions for genotyping of SNPs largely limits the use of these assays in both discovery and validation of SNPs at genome-wide scale for constructing efficient high-resolution genetic linkage maps in chickpea. It thus implicates the necessity of a suitable modern and fast advanced approach/assay for simultaneous large-scale genomewide discovery and high-throughput genotyping of SNPs in diverse mapping populations to generate required ultra-high resolution genetic linkage maps in chickpea. Numerous past reports have established the essentiality of high-density linkage maps as references for more effective assembly and orientation of the high-quality sequenced scaffolds into the pseudomolecules corresponding the chromosomes of crop plant genomes 15,22,23,[25][26][27][28][29][30][31][32][33][34][35][36][37][38] . Therefore, it is indispensable to generate ultra-high density genetic linkage maps enriched with a greater number of SNP markers for efficient anchoring of assembled scaffolds into the chickpea chromosomes. This is expected to overall greatly facilitate the chickpea genome assembly and sequencing efforts in near future.
Genotyping-By-Sequencing (GBS) is a simple, rapid and costeffective next-generation sequencing (NGS)-based assay having potential for simultaneous large-scale discovery and genotyping of SNPs in diverse crop plants at a genome-wide scale 39,40 . The highbarcoded multiplexing capacity (pooling up to 384-barcoded accessions in a single sequencing lane) at methylation-sensitive RE (restriction endonuclease)-sites (like APeKI) have enhanced the efficiency of GBS assay in fast discovery and genotyping of genome-wide SNPs in larger mapping populations. This in turn made the GBS assay much more advantageous over other available high-throughput traditional genotyping assays for constructing ultra-high density genetic linkage maps and genome-wide high-resolution (major and minor effects QTLs) gene/QTL mapping in small diploid as well as large genome crop species, including rice, wheat, barley, sorghum, Medicago and soybean 24,[41][42][43][44][45][46][47][48][49][50] . Considering its utility in many high-throughput marker-based genotyping applications in genomics-assisted breeding, the use of GBS assay for generating ultra-high resolution genome maps and molecular mapping of robust major as well as minor genes/QTLs with sub-optimal use of resources in a crop species like chickpea assumes significance. Nevertheless, very limited reports are available with regard to the use of GBS approach in discovery and genotyping of genome-wide SNPs in intra-and inter-specific mapping populations for constructing ultra-high resolution genetic linkage maps and fine mapping of QTLs regulating important agronomic traits in chickpea vis-a -vis other crop plants 22,45,50,51 .
Keeping all above in view, we discovered 26785 and 16573 genome-wide SNPs showing differentiation between parental genotypes [ICC 12299 (desi) and ICC 8261 (kabuli)] by their genotyping in a 275 F 7 RIL (recombinant inbred line) mapping population (ICC 12299 3 ICC 8261) using reference desi and kabuli genome-based GBS assay, respectively. A selected 3625 and 2177 high-quality SNPs of these, were integrated into eight desi and kabuli chromosomes, respectively to construct ultra-high density intra-specific genetic linkage maps in chickpea. These high resolution genetic maps were further used as a reference to identify and map the novel major genomic regions underlying robust QTLs associated with three important agronomic traits in chickpea. To delineate functionally relevant candidate gene(s) regulating seed weight, one of the strong trait-associated major genomic region harbouring robust QTL has been narrowed down by integrating comparative desi and kabuli genome-based QTL mapping with QTL region-specific haplotype (linkage disequilibrium)-based high-resolution trait association analysis, differential gene expression profiling and gene haplotype-based association mapping in chickpea.

Results
Discovery and large-scale validation of GBS-based genome-wide SNPs. We generated an average of 207.3 million high-quality sequence reads (ranged from 1.9 to 8.3 with a mean of 3.4 million reads per individual) (Fig. S1) by sequencing of each 3 3 96-plex (288-plex) ApeKI libraries made from 275 RIL mapping individuals and two parental genotypes (ICC 12299 3 ICC 8261) using GBS assay. However, 84.5 and 82.6% of sequence reads of these, were mapped to unique physical location of desi and kabuli reference genomes, respectively. It identified 43358, including 26785 and 16573 highquality SNPs [with read-depth $10, SNP base quality $20, ,10% missing data in each individual, 5% minor allele frequency (MAF) and 100% reproducibility] differentiating two parental genotypes using the reference desi and kabuli genome-based GBS assay, respectively (Fig. 1A). Of these, 22331, including 7968 and 14363 SNPs were physically mapped on eight chromosomes of desi and kabuli genomes with average map-densities of 15.6 and 24.2 kb, respectively (Table S1, Fig. 1A, B, C). A higher proportion of SNPs were mapped on desi and kabuli chromosomes 3 (19.4%, 1548 SNPs) and 4 (19.9%, 2854), respectively. The desi chromosomes 7 and 8 (13.9 kb each) and kabuli chromosome 4 (17.2 kb) had maximum mean map-density (Table S1). The remaining 21027 SNPs, including 18817 and 2210 SNPs were physically mapped on scaffolds of desi and kabuli genomes, respectively (Fig. 1A). The A/G and C/T transitions (62.7 and 55.1% of total SNPs identified in desi and kabuli, respectively) were more abundant compared with that of transversions (Fig. 1D). Detail characteristics of 43358 SNPs discovered in our study are provided in the Four hundred-eighty SNPs mined by reference desi and kabuli genome-based GBS assay were targeted for their validation through resequencing of PCR amplicons (384 SNPs) and MALDI-TOF mass array (96 SNPs). It validated 437 (91%) SNPs successfully in parental genotypes and selected mapping individuals of a RIL population (ICC 12299 3 ICC 8261) (Fig. S2A, B). Remarkably, all these validated SNPs revealed expected homozygous and heterozygous alleles discrimination in RILs as detected through our GBS assay.
Construction of ultra-high density intra-specific genetic linkage maps. The genotyping information of 4448 and 2689 high-quality SNPs (MAF $ 20%) discovered by reference desi and kabuli genome-based GBS assay, respectively (Table S3) in parental genotypes and 275 individuals of a RIL mapping population (ICC 12299 3 ICC 8261) were used to construct ultra-high density intra-specific genetic linkage maps in chickpea. The linkage analysis of SNPs showing significant Mendelian segregation ratio (151) mapped 3625 and 2177 SNPs on eight desi and kabuli linkage groups (LGs) (LG1 to LG8) of intra-specific genetic maps, respectively according to their physical positions (bp) on corresponding chromosomes (Table 1, Fig. S3, S4, S5). Considering significant variations in the genome assembly and differences in sequence coverage and length (bp) of chromosomal pseudomolecules between desi and kabuli genomes 78 , we preferred to construct two genetic maps separately by using desi and kabuli genome-based high-quality GBS-SNPs and their corresponding physical positions on eight chromosomes as reference. The constructed desi and kabuli SNPbased ultra-high density intra-specific genetic maps comprising of eight LGs covered a total map length of 714.089 and 798.467 cM with mean inter-marker distances of 0.20 and 0.37 cM, respectively (  (Table 1). In desi, most saturated genetic map was LG1 (average inter-SNP marker distance: 0.12 cM), whereas LG5 and LG8 were least saturated (0.32 cM). In kabuli, LG3 (mean inter-SNP marker distance: 0.18 cM) and LG5 (0.78 cM) had most and least saturated genetic maps, respectively ( Table 1).
Estimation of genome-wide LD patterns in chickpea. The genomewide LD estimates (r 2 ) and extent of LD decay using all possible paircombinations of 3625 and 2177 SNPs genetically mapped on eight desi and kabuli LGs of intra-specific genetic maps, respectively were determined. The average LD estimates in eight LGs of desi genome  Table S4). The LG4 of both desi (r 2 : 0.78) and kabuli (0.72) genetic maps had highest LD estimates. Maximum proportion of SNP-pairs showed significant LD (P , 0.0001) on desi LG4 (42.8%) and kabuli LG2 (34.6%) (Table S4). We determined the LD decay of 3625 and 2177 SNP-pairs by pooling the r 2 estimates across eight desi and kabuli LGs and plotting their average r 2 against the 5 cM equal intervals of genetic distance (maximum up to 120 cM). A decreasing trend of LD decay (r 2 , 0.3) was observed with increase in the genetic distance (cM) of SNP markers mapped on the desi and kabuli LGs (Fig. S6). Remarkably, a rapid LD decay was observed at the genetic distance of 5 cM in both desi and kabuli genomes. The desi and kabuli LGs overall sustained a significant level of LD up to a genetic distance of 10 cM. A significant LD decay (r 2 , 0.1) was observed near about 15 cM genetic distance ( Fig. S6) in LGs of these two genomes.
Identification and mapping of high-resolution trait-governing QTLs. A significant difference of three quantitative agronomic traits, namely pod number/plant (PN) (51.1-141.2 with 79-80% heritability), seed number/plant (SN) (60.6-231.6 with 77-81%) and 100-seed weight SW (7.1-43.7 g with 88-90%, Fig. S7) in 275 RIL mapping individuals and parental genotypes across three years based on ANOVA was observed (Table S5). We observed bi-directional transgressive segregation of these traits beyond that of parental genotypes in RILs (Fig. S8). The normal frequency distribution of all three quantitative traits in RILs and parental genotypes was evident. The coefficient of variation (CV) ranged from 0.14 for PN to 0.33 in SW (Table S5). The Pearson's correlation coefficient estimated among three pair-wise combinations of quantitative traits indicated significant (P , 10 22 ) negative correlation of PN and SN with SW (r 5 20.42 to 20.51) as well as positive correlation between PN and SN (0.95) (Fig. S9).
For genetic/QTL mapping, the genotyping data of 3625 parental polymorphic desi-genome based SNPs mapped on an ultra-high density intra-specific genetic linkage map was integrated with phenotyping data of 275 RIL mapping population. It identified and mapped 33 genomic regions harbouring 35 major and significant (LOD: 7.25-14.5) QTLs associated with PN, SN and SW on seven LGs (except LG3) of chickpea ( Table 2, Fig. S10, Fig. S11). The proportion of phenotypic variance explained (PVE) by individual QTL varied from 17.9-39.7%. Collectively, all these QTLs revealed 37.3% PVE. All these identified QTLs (considered as robust QTL) showing consistent phenotypic expression and major effects on traits individually with PVE of . 10% each, were validated across two geographical locations as well as years/seasons in field. A maximum number of QTLs (nine QTLs) were mapped on LG1 followed by LGs 2, 6 and 7 (5 QTLs each) and minimum (3 QTLs) on LG5 ( Table 2). The genomic regions (from 0.015 cM on LG4 to 1.978 cM on LG1) harbouring the QTLs covered with 106 SNP markers were mapped on seven LGs.  (Table S6A and B). The detailed annotation of SNPs within genes identified maximum frequency (35.4%, 507 SNPs in desi and 49.4%, 662 SNPs in kabuli) of SNPs in the exons and minimum in the 1000-bp downstream regulatory regions (DRR) (27.9%, 399 and 17.7%, 238) ( Fig. 2A and S12A). Remarkably, 278 and 229 coding SNPs-containing desi genes, while 352 and 328 coding SNPscarrying kabuli genes showed synonymous and non-synonymous (missense and nonsense) substitutions, respectively ( Fig. 2B and S12B). The non-synonymous SNPs identified in desi and kabuli comprised of 223 and 323 missense SNPs in the 149 and 234 genes showing amino acid substitutions as well as six and five nonsense SNPs in the six and five genes, respectively culminating into premature termination codons introduced by nucleotide replacements (Fig. 2B and S12B). Maximum (5.3 SNPs/gene) and minimum (1.6 SNPs/gene) average frequency of SNPs was observed in the non-coding intronic and coding sequence components of desi genes, respectively. In contrast, we observed comparable average SNP frequency between intronic (1.7 SNPs/gene) and coding (1.9) sequence components of kabuli genes.
The functional annotation of 635 desi and 760 kabuli genes with SNPs detected a higher proportion of SNPs belonging to growth, development and metabolism-related proteins (58 and 63% in desi and kabuli, respectively) followed by transcription factors (16 and 18%) (Fig. S13A). The KOG-based determination of putative functions (excluding unknown and general functions) for SNPs-carrying genes revealed their maximum correspondence to transcription (K,  25% in desi and 31% in kabuli) (Fig. S13B). The GO enrichment analysis of SNPs-carrying desi and kabuli genes showed a significant (P # 5.9 3 10 23 ) overrepresentation/enrichment of GO terms in the genes associated with transport (15.4%) belonging to biological process (Fig. S13C).
QTL region-specific haplotype (LD)-based trait association mapping. One major and robust (LOD: 14.5 and R 2 : 39.7%) QTL (qSW5.1) region [Ca_Desi_SNP2337 (23.805 cM)-Ca_Desi_SNP2338 (24.875 cM)] governing SW mapped on desi LG5 of intra-specific genetic map was selected for fine mapping through QTL targetspecific haplotype (LD)-based high-resolution trait association mapping ( Fig. S10 and Fig. 3A). The genomic region underlying such a major QTL was defined by integrating the genetic linkage map information with that of physical map of desi genome. The target QTL interval genetically mapped on desi LG5 was compared and correlated with that mapped on kabuli LGs of a genetic map (based on QTL mapping). In kabuli, the SW trait-influencing QTL region spanned with 1.6 cM genetic interval between the markers Ca_ Kabuli_SNP1459 (34.6 cM) and Ca_Kabuli_SNP1462 (36.2 cM) was mapped on LG5 (Fig. 3B). The integration of genetic linkage map information with physical map of kabuli genome, the 1.6 cM QTL interval corresponded to 713 kb genomic region (40020.4-40733.4 kb) on chromosome 5 (Fig. 3C).
The targeted resequencing of this 713 kb qSW5.1 QTL region in parental genotypes (ICC 12299 and ICC 8261) and 10 of each homozygous low and high seed weight RIL mapping individuals detected 804 high-quality SNPs (mean SNP density: 1/886.8 bp). It includes 435 intergenic SNPs and 369 SNPs derived from the diverse coding and non-coding sequence components of 64 genes. The decay of LD over longer genetic distance (,15 cM, Fig. S6) particularly at the target SW QTL interval mapped on kabuli LG5 gave us clue to narrow-down this QTL region of interest using SNP haplotype (LD)-based high-resolution trait association mapping. The genotyping of 804 SNPs identified and mapped on the 713 kb qSW5.1 QTL region in 244 accessions belonging to an SW-specific association panel (Table S7) enabled to constitute 12 haplotypes (Fig. 3D). The SNP haplotype-based genotyping information were further correlated with their SW-specific robust field phenotyping data (SW: 5.9-57.6 g) for QTL region-specific trait association analysis. The integration of GLM and MLM analysis with EMMA and false discovery rate (FDR) correction based on multiple-comparisons (min- imizing the confounding effect of population structure) identified one best haplotype (H1) between two SNPs (CaSNP1 and CaSNP2) in two genes (Ca07590 and Ca07594) showing strong association (P: 2-4 3 10 26 and R 2 : 37.6-40.3%) with SW in contrast to any other SNP combinations. This haplotype region covered a maximum of 22.5 kb (40710.6-40733.1 kb) physical distance between CaSNP1 and CaSNP2 at qSW5.1 QTL interval. The structural and functional annotation of this 22.5 kb sequenced short QTL region with kabuli genome annotation database identified five proteincoding candidate genes (Fig. 3D). The comprehensive analysis of strong SW-associated haplotypes-constituting two genic SNPs detected one regulatory SNP (CaSNP2: C/T) in the URR (upstream regulatory region) of an embryo defective protein-coding gene showing strong association (P: 2.0 3 10 25 and R 2 5 39.8%) with SW.
Expression profiling of SW-associated genes. To infer the differential regulatory gene expression patterns, the expression profiling of five protein-coding candidate genes annotated in the 22.5 kb delineated SW-specific robust QTL interval (qSW5.1) was performed. The genebased primers were amplified using the RNA isolated from three different vegetative and reproductive tissues and two seed developmental stages of low and high SW parental genotypes (ICC 12299 and ICC 8261) of a RIL mapping population through quantitative RT-PCR assay (Fig. S14). One regulatory SNP-carrying embryo defective gene of these, constituting strong SW-associated haplotypes at qSW5.1 QTL interval revealed seed-specific expression (as compared to vegetative and reproductive tissues) as well as pronounced down-regulation (.7-fold) in parental genotypes during seed development (Fig. S14). Henceforth, this gene localized in a major SW-influencing QTL interval (qSW5.1) was selected as target candidate for understanding its significance in seed weight regulation through high-resolution gene haplotype-specific association/LD mapping in chickpea.
Haplotype-based association mapping in a strong SW-associated gene. The sequencing of 9286 bp cloned amplicon covering the entire coding and non-coding regions (including 2 kb URR) of a strong SW-influencing embryo defective gene (Fig. 4A) (validated by QTL mapping, QTL region-specific association analysis and differential expression profiling) among 244 accessions belonging to a SW-specific association panel identified 12 SNP loci (Fig. 4B). It includes four regulatory SNPs in the URR of this gene. The genebased haplotype analysis combining the genotyping data of 12 SNPs constituted a maximum of four haplotypes among accessions (Fig. 4B).
The association analysis using the four SNP marker-based haplotypes in an embryo defective gene with SW (5.9 to 57.6 g)-specific multi-location replicated field phenotyping data of 244 accessions (association panel) revealed its strong association with SW (P: 2.5 3 10 27 and R 2 : 44.7%). The haplotype-pair-based LD estimation produced a significant high degree of LD (r 2 . 0.70 and P , 0.0001) across entire 9286 bp sequenced gene (Fig. 4C), which increased its overall potential for trait association. The 94 high SW (40-57.6 g) accessions represented by single haplotype group 1 (TAC) in the gene differentiated distinctly from another haplotype group 2 and thus strong association potential of this gene for SW in chickpea is expected. The differential expression profiling using these high (TAC) and low (CGT) SW-specific haplotypes constituted by three SNPs in URR of an embryo defective gene revealed significant down-regulated expression (,6 times) specifically in two seed developmental stages of low and high SW parental genotypes and homozygous RIL mapping individuals as compared to leaf. It inferred that favourable natural allelic variants and superior haplotype (TAC) constituted in the URR of an embryo defective gene involved in decreased transcript expression and thus have significance in regulation of seed development and consequently seed weight in chickpea.

Discussion
We utilized NGS-based 3 3 96-plex GBS assay for large-scale validation and high-throughput genotyping of genome-wide SNPs in a 275 RIL mapping population (ICC 12299 3 ICC 8261) to construct ultra-high density intra-specific genetic linkage maps in chickpea. The reference desi and kabuli genome-based GBS assay identified 26785 and 16573 high-quality SNPs (5% MAF) differentiating the two parental genotypes (ICC 12299 and ICC 8261) of a RIL mapping population with 100% reproducibility, respectively. Notably, 22331, including 7968 and 14363 SNPs of these, showed extensive genome coverage across eight desi and kabuli chromosomes with mean densities of 1/15.6 and 1/24.2 kb, respectively. A high experimental validation success rate (91%) of 480 randomly selected SNPs from a total of 43358 GBS-based SNPs by amplicon resequencing and MALDI-TOF mass array genotyping assay was evident. A number of studies in the past have accessed the potential of diverse NGS-based assay for efficient mining and genotyping of non-erroneous SNPs through experimental validation of randomly chosen smaller set (,1-2%) of SNPs from a whole genome SNP dataset in diverse crop plants, including chickpea 12,22,42,45,50,51,[79][80][81][82] . The correspondence of on an average of 39% (16910 SNPs) desi and kabuli genomederived GBS-SNPs with that of SNP allelic information available in various chickpea genotypes 8,9,12,22,37,51 in silico based on their congruent physical positions was clearly evident. These findings overall suggest the robustness and utility of GBS assay for rapid large-scale mining and high-throughput genotyping of valid and high-quality SNPs at a genome-wide scale in chickpea. The advantages of GBSassay in simultaneous genome-wide discovery and genotyping of SNPs and their ability to expedite various large-scale genotyping applications have been demonstrated in many crop plants 24,[40][41][42][43][44][45][46][47][48][49][50] . Collectively, a large number of 43358 high-quality reference desi and kabuli genome-based GBS-SNPs (61% novel SNPs) developed by us (submitted to NCBI SNPdb for unrestricted use) could serve as a valuable genomic resource for their immense use in genomicsassisted breeding applications of chickpea.
A significant LD estimates across eight desi (0.62) and kabuli (0. 59) LGs of constructed intra-specific genetic maps using mapped SNP markers reflected direct correlation of LD patterns with the density of SNPs required to cover the genomes. In spite of large differences in SNP marker map density between desi (0.20 cM) and kabuli (0. 37) LGs of genetic maps, we observed higher LD estimates and extended LD decay (,15 cM) in LGs of both desi and kabuli genomes. This is much higher than the LD decay estimated in other self-and cross-pollinated crop plants [83][84][85][86][87][88][89][90] . It could be due to sequential bottlenecks during chickpea domestication resulting in its narrow genetic base in contrast to other crop plants 36,37,74,[91][92][93] . Overall, the genome-wide LD patterns and constructed LD maps gave an approximate perception regarding the SNP density required to identify the functionally relevant trait-regulatory robust genes/ QTLs in a large chickpea genome with narrow genetic base using genetic (fine-mapping and map-based cloning) and association mapping.
Our study utilized 'a-lattice' designs in multi-location experimental field trials to efficiently grow and comprehensively phenotype the 275 RIL mapping individuals (ICC 12299 3 ICC 8261) along with their parental genotypes for three important yield component traits (PN, SN and SW) in chickpea. The 'a-lattice' designs are more preferred than the commonly adopted 'randomized complete block design (RCBD)', since they are less prone to experimental errors, cost-effective and provide more precise (,35% higher) phenotyping information in field experiments aside high reproducibility. The 'a-lattice' designs are also associated with smaller coefficient of variation and error mean square, which collectively gives them an edge over the traditionally used RCBD designs specifically when phenotyping of yield component trait variables was conducted for very large numbers of diverse crop genotypes in field experiments with a smaller plot size [94][95][96] .
Thirty-three major genomic regions harbouring 35 robust QTLs (LOD: 7.25-14.5 and PVE: 17.9-39.7%) associated with three important agronomic traits (SW, PN and SN) were identified and mapped on seven desi LGs of an ultra-high density genetic map, which were well validated in multiple geographical locations/environments across years in field. The reliability and validity of the identified PN, SN and SW QTLs were determined by correlating/ comparing their underlying genomic regions with that of earlier QTL mapping studies 13,52,59,64,69,74,77,[97][98][99][100][101] based on correspondence of physical positions of markers covering the target QTL intervals. This analysis identified one of each PN (qPN8.1) and SW (qSW4.1) QTLs in line with previous QTL mapping studies using diverse intraand inter-specific mapping populations of chickpea 13,59,64,69,97,98,101 . This analysis indicated novelty and population-specific characteristics of our identified 33 robust QTLs for SW, PN and SN traits in chickpea. Several QTL mapping studies in crop plants, including legumes have effectively utilized ultra-high density genetic linkage maps derived from different intra-and inter-specific mapping populations as foundation for rapid molecular mapping as well as fine mapping/map-based cloning of major genes harbouring robust QTLs governing multiple yield contributing traits (like pod and seed number and seed weight) to accelerate genomics-assisted crop improvement 20,21,48,[102][103][104][105] . The use of ultra-high density genetic linkage map as a reference in the present study, significantly narrowed down the marker intervals in the QTL regions to 0.015-1.978 cM (mean , 1 cM), which is lower than that reported (1-3 cM) so far in diverse QTL mapping studies of chickpea 3,[12][13][14]20,22,51,[72][73][74][75][76] . It further authenticates the use of SNP markers that are tightly linked to the QTLs for subsequent validation in diverse genetic backgrounds along with fine mapping/map-based cloning of QTLs controlling three agronomic traits in chickpea. Moreover, SNPs localized in the QTL region mapped on eight LGs of an ultra-high density intraspecific genetic linkage map derived from the diverse coding and non-coding sequence components of genes could certainly expedite the process of validation and/positional cloning of QTLs governing three agronomic traits in chickpea. The required inputs obtained from these analyses will eventually help us to identify potential candidate genes underlying the QTLs for marker-assisted genetic enhancement of chickpea. The structural and functional annotation of SNPs mapped on eight desi and kabuli LGs, including SW, PN and SN QTL intervals of ultra-high density intra-specific genetic maps will enable us to select functional gene-based SNPs (like nonsynonymous coding and regulatory SNPs) for establishing markertrait linkages and identifying genes/QTLs associated with important qualitative and quantitative traits through genetic and association mapping in chickpea.
One strong SW-associated major genomic region (LOD: 14.5, PVE: 38.7%) harbouring a robust QTL (qSW5.1) mapped on desi LG5 of an intra-specific genetic map was compared/correlated with that of integrated genetic and physical maps of kabuli to fine map the major QTL region. The extended LD decay (,15 cM) in desi and kabuli LGs of genetic maps suggested that high-resolution haplotype (LD)-based trait association mapping at target QTL interval could be an attractive alternative approach for narrowingdown the QTL regions to candidate gene(s) regulating three said agronomic traits in chickpea. Henceforth, we combined haplotype (LD)-based high-resolution trait association mapping at target SW-specific QTL region with differential gene expression profiling (during seed development in low and high seed weight parental genotypes), which delineated one regulatory SNP-containing embryo defective protein-coding gene governing seed weight in chickpea. Such haplotype-specific association analysis in presence of high-resolution significant LD at major QTL region have utility to scale-down the QTLs to specific candidate gene(s) regulating important agronomic traits in crop plants [106][107][108][109] .
The strong trait association potential of an embryo defective gene with SW has been ascertained by higher contribution of a significant superior haplotype (TAC) identified in the URR of this gene with 44.7% SW phenotypic variation in a constituted 244 association panel. This was further supported by the seed-specific expression and significant down-regulated expression of gene haplotype transcript during seed development in contrasting SW parental genotypes and RIL mapping individuals. The favourable natural allelic variants and optimal superior haplotype identified in an embryo defective gene will be helpful in dissection of gene regulatory networks underlying this complex quantitative SW trait in chickpea. This identified gene could eventually be used as a potential candidate in marker-assisted genetic enhancement of chickpea for increasing seed weight and higher yield. The embryo defective chickpea gene ortholog in Arabidopsis is known to control the normal embryo development, which implies the role of this vital gene in natural growth and seed development of crop plants 110 . Considering the availability of very limited information regarding embryo defective gene in plants, a further comprehensive molecular characterization and functional validation of this gene in over-expression/knock-down transgenics is required to ascertain its definite involvement in regulating seed development and seed weight of chickpea. Methods Generation of an intra-specific chickpea mapping population. An intra-specific F 7 RIL mapping population (ICC 12299 3 ICC 8261) comprising of 275 individuals derived from the crosses between two parental chickpea genotypes ICC 12299 and ICC 8261 was generated using a single seed descent method. The chickpea genotype ICC 12299 originated from Nepal is a low 100-seed weight (9 g) and high pod (103) and seed (158) number-containing desi (C. arietinum) landrace. In contrast, the genotype ICC 8261 (originated from Turkey) is a high 100-seed weight (30 g) and low pod (64) and seed (73) number-containing kabuli (C. arietinum) landrace.
Phenotyping of mapping population for agronomic traits. The RIL mapping individuals along with their parental genotypes were grown in the field according to 'alpha (a)-lattice' design for three consecutive years (2011-2013) with at least two replications during crop growing season at two diverse geographical locations of India. These RILs were phenotyped individually for three important quantitative agronomic traits, namely PN, SN and SW. The PN and SN was estimated by counting the average number of fully formed pods and seeds per plant (from 10-15 representative plants from each RILs at maturity), respectively. The SW (g) was measured by taking the average weight (g) of 100-matured seeds at 10% moisture content by selecting 10-15 representative plants from each RILs. The mean, standard deviation, coefficient of variation, broad-sense heritability, frequency distribution, correlation coefficient and analysis of variance (ANOVA) of three agronomic traits in RIL mapping population were analyzed using SPSS v17.0 and following the methods of Kujur et al. 74,99 and Saxena et al. 77 .
Genome-wide discovery and genotyping of SNPs using GBS assay. The genomic DNA isolated from 275 RILs, including two parental accessions (used as biological replicates) were digested with ApeKI, ligated to adapters containing one of 288 unique barcodes, constructed 288-plex GBS libraries and pooled together following the detailed procedures of Elshire et al. 40 and Spindel et al. 45 . All these libraries were sequenced by Illumina TrueSeq V3.0 single end sequencing chemistry with read lengths of 100-bp using HiSeq2000 (Illumina Inc., San Diego, CA, USA) NGS platform. The FASTQ raw sequence reads were processed for high-quality sequence filtering and mapping through TASSEL-GBS. The sequence reads with a minimum phred Q-score of 10 across the first 72-bp nucleotide sequences were considered as high-quality, which were further rechecked for their quality using FASTQC v0.10.1.
The retained good quality sequences were de-multiplexed using unique barcodes attached to each mapping individuals. The sequence reads of each individuals were aligned and mapped to reference draft desi (ICC 4958 37 ) and kabuli (CDC Frontier 36 ) chickpea genome sequences separately using Burrows-Wheeler alignment tool (BWA) with default parameters. The missing SNP allele data in individuals were imputed using the haplotype probability method of fastPHASE. A maximum likelihood statistical model and a catalog was used to identify and genotype valid and high-quality SNPs (minimum sequence read depth: 10 with SNP base quality $20) in 275 RIL mapping individuals along with two parental genotypes.
Large-scale validation of SNPs. To validate the genome-wide SNPs identified through GBS assay, the primers designed from the 200-bp sequences flanking eitherside of 384 selected SNPs were PCR amplified using the genomic DNA of two parental chickpea genotypes and selected RIL mapping individuals. The amplified PCR products was resequenced, aligned the high-quality sequences and the SNPs were detected among individuals (following the methods 74,111 ). A selected 96 SNPs identified by reference genome-based GBS assay were validated and genotyped in parental genotypes and representative mapping individuals using MALDI-TOF (matrix-assisted laser desorption ionization-time of flight) mass array SNP genotyping assay (http://www.sequenom.com) following the methods of Pandit et al. 112 and Saxena et al. 77,111 .
Construction of intra-specific genetic linkage maps. The genotyping data of reference desi and kabuli genome-based high-quality GBS-SNPs (MAF $ 0. 2) showing differentiation between parental genotypes (ICC 12299 and ICC 8261) were analysed in 275 RIL mapping individuals using the x 2 -test (p , 0.05). The SNP genotyping data showing goodness-of-fit to the expected Mendelian 151 segregation ratio were used further for linkage analysis using MAPMAKER/EXP3.0 and JoinMap 4.1 at higher LOD (logarithm of odds) threshold (.10.0) with Kosambi mapping function. The SNPs were integrated into eight desi and kabuli LGs of intra-specific genetic maps based on their centimorgan (cM) genetic distance using the methods of Kujur et al. 74,99 and Saxena et al. 77 . The marker ordering was performed using the RECORD algorithm implemented in RECORD_WIN. The best marker ordered genetic linkage map with shortest map distance (cM) was preferred. The LGs with genetically mapped SNPs were designated and numbered (LG1 to LG8) based on their corresponding physical positions (bp) on the chromosomes of chickpea genomes as determined in our study.
Determination of LD patterns. To determine genome-wide LD patterns in chickpea, the LD estimates (significant P-value # 10 23 ) as average squared-allele frequency correlations (r 2 ) among a pairs of SNP loci that are genetically mapped on eight desi and kabuli LGs of intra-specific genetic maps were analysed using the sliding window approach of TASSEL v5.0. The decay of LD with the genetic distance was measured by combining the r 2 values of SNP-pairs mapped in a uniform genetic intervals of 5 cM (maximum up to 120 cM) across eight desi and kabuli LGs of genetic maps. The graph was plotted between pooled r 2 and genetic distance (cM) based on nonlinear regression model considering the r 2 value 5 1 at marker genetic distance of 0 cM 74,99,101,113 to determine the trend of LD decay in desi and kabuli genomes.
Structural and functional annotation of genome-wide SNPs. The reference desi and kabuli genome-based SNPs mapped on the eight LGs of intra-specific genetic maps were annotated in diverse intragenic and intergenic sequence components of genomes (chromosomes/pseudomolecules) using the genome annotation information (GFF) of desi 37 and kabuli 36 . The structural and functional annotation of SNPs, including synonymous and non-synonymous coding and regulatory SNPs were performed using the customized perl scripts and single-nucleotide polymorphism effect predictor (SnpEff v3.1h). The SNPs were plotted individually based on their unique physical positions (bp) on eight chromosomes (pseudomolecules) of desi and kabuli genomes using Circos v0.67-pre3. The functional annotation of SNP-containing genes was performed according to desi and kabuli genome annotation and PFAM database v27.0. The genes with SNPs were BLAST searched against the KOGnitor NCBI database. Gene Ontology (GO) enrichment analysis of genes with SNPs was performed using the BiNGO plugin of Cytoscape v2.6 based on Benjamini and Hochberg false discovery rate correction at 5% significance level.
QTL mapping. For identification and mapping of QTLs, the reference desi genomebased high-quality GBS SNP genotyping data and genetic linkage map information of SNPs mapped on eight desi LGs of intra-specific genetic map were correlated with multi-location/years replicated field phenotyping data (PN, SN and SW) of RIL mapping individuals and parental genotypes using single marker analysis, interval mapping and composite interval mapping functions of QTL Cartographer v2.5 and MapQTL v6.0. To identify and map the novel genomic regions harbouring the major QTLs associated with three agronomic traits (PN, SN and SW) on LGs, the LOD threshold score of $5.0 at 1000 permutations was considered significant (p , 0.05). The significant major trait-influencing QTLs, which were validated across multiple environments (locations)/years were considered as ''robust QTLs'' controlling agronomic traits (as per Ref. 77). The positional genetic effects, phenotypic variation explained (PVE %) by QTLs and their additive effect (evaluated by parental origin of favourable alleles) on traits were evaluated at significant LOD (p # 0.05).
QTL-region targeted resequencing and association analysis. A selected strong SW major genomic region underlying robust QTL was sequenced in parental genotypes (ICC 12299 and ICC 8261) along with 10 of each homozygous low and high SW RIL mapping individuals using the multiplexed amplicons sequencing protocol (following manufacturer's instructions) of TruSeq Custom Amplicon v1.5 in Illumina MiSeq next-generation sequencer. The high-quality amplicon sequence reads (. 90% bases covered at 0.53 mean coverage) were mapped to reference desi and kabuli genomes and SNPs were detected among parental genotypes and mapping individuals adopting the methods of Agarwal et al. 80 , Jain et al. 82 and Saxena et al. 77 . The sequenced genomic region underlying SW-specific QTL was structurally and functionally annotated (following aforementioned methods). The SNPs showing differentiation between low and high SW parental genotypes and mapping individuals at QTL region of interest were genotyped in 244 chickpea accessions (211 mini-core) belonging to an SW-specific association panel (Table S7) using the MALDI-TOF SNP genotyping assay (following Refs. 77,111). The SNP genotyping data among accessions were used to constitute haplotypes at target QTL interval. The replicated multi-location/years SW field phenotyping data, population structure (K 5 2) statistics and kinship matrix of 244 accessions were obtained from Kujur et al. 99 . The association analysis was performed using general linear model (GLM) and mixed linear model (MLM at optimum level of compression with P3D method) of TASSEL v5.0 and mixed model approach of EMMA adopting the detailed procedures of Kujur et al. 74,99 , Saxena et al. 77 and Thudi et al. 101 . To eliminate the confounding effect of population structure and correct the false discovery rate (FDR) based on multiple comparisons, the Bonferroni correction of P-value was performed for each traitassociated SNPs at 5% significance level. By integrating the results of GLM and MLM with EMMA and FDR correction, the SNPs showing strong association (R 2 5 correlation potential of significant SNPs with traits) with SW at significant cut-off P # 10 24 were selected.
Differential gene expression profiling. The SNPs-carrying genes annotated at the short delineated major genomic region underlying the robust SW-specific QTL were selected to infer their differential regulation during seed development through expression profiling. The RNA isolated from three different vegetative and reproductive tissues (leaf, root and flower bud) and two seed developmental stages 74,99,100 of high (ICC 8261) and low (ICC 12299) SW parental genotypes was amplified using the gene-specific primers through quantitative RT-PCR assay (following Refs. 74, 100).
Gene haplotype-based LD mapping. To perform high-resolution haplotype-based association/LD mapping, the fragments covering the entire coding and non-coding sequences, including 2 kb upstream regulatory region (URR) of one strong SWassociated gene (validated by QTL mapping, association mapping and differential expression profiling) were amplified using the genomic DNA of 244 SW-specific association panel. The cloning and sequencing of amplicons, SNP mining, haplotype constitution and LD pattern determination of a gene including, its association potential with SW were estimated using the methods of Kujur et al. 74,99 , Saxena et al. 77 and Bajaj et al. 100 . To determine the potential of diverse haplotypes constituted in the www.nature.com/scientificreports gene for regulating SW, the differential expression profiling in two seed developmental stages of parental genotypes and homozygous RIL mapping individuals representing the low and high SW haplotype groups was performed using the gene haplotype-specific primers.