Multi-Locus Genome Wide Association Mapping for Yield and Its Contributing Traits in Hexaploid Wheat under Different Water Regimes

Multi-locus genome wide association study was undertaken using a set of 320 diverse spring wheat accessions, which were each genotyped for 9,626 SNPs. The association panel was grown in replicated trials in four environments [two each in irrigated (IR) and rainfed (RF) environments], and phenotypic data were recorded for five traits including days to heading, days to maturity, plant height, thousand grain weight and grain yield. Forty-six significant marker-trait associations (MTAs) were identified for five traits. These included 20 MTAs in IR and 19 MTAs in RF environments; seven additional MTAs were common to both the environments. Five of these MTAs were co-localized with previously known QTL/MTAs and the remaining MTAs were novel and add to the existing knowledge. Three desirable haplotypes for agronomic traits, one for improvement in RF environment and two for improvement in IR environment were identified. Eighteen (18) promising candidate genes (CGs) involved in seven different biological activities were also identified. The expression profiles of four (Trehalose-6-Phosphate, APETALA2/Ethylene-responsive factor, DNA-binding One Zinc Finger and Gibberellin-dioxygenases) of the 18 genes showed that they were induced by drought stress in the wheat seedlings. The MTAs, haplotypes and CG-based markers may be used in marker-assisted breeding for drought tolerance in wheat.


Results
Phenotypic variability in the wheat association mapping (WAM) panel. A wide range of variability for each of the five traits was available in the WAM panel in all the four environments including both locations (Meerut and Powerkheda) each with two environmental conditions (IR and RF); the data is summarised in Supplementary Table S1. Coefficient of variation (CV) ranged from 5.6% [for days to maturity (DTM) in E3] to 39.7% [for grain yield per plot (GYPP) in E2]. Skewness and kurtosis for each of the five traits under all the four environmental conditions (E1-E4) were within the acceptable range of normal distribution (±2). The violin plots showing distribution of all the five traits in different environments are presented in Fig. 1. Continuous variation was observed for all the five traits at both the locations. Taken together, the extent of available variability for the different traits suggested suitability of WAM panel for conducting GWAS.
Pearson's correlation coefficient analysis revealed that all the five traits were significantly correlated among themselves in IR as well as RF environments (Supplementary Table S2). Grain yield (GY) and thousand grain weight (TGW) were negatively correlated with days to heading (DTH), DTM and plant height (PH); however, positive correlation was observed between GY and TGW. Similarly, DTM, DTH and PH were positively correlated with each other.

Distribution of SNPs across genome.
A set of 9627 mapped SNPs were utilized during the present study; these were distributed on all the 21 wheat chromosomes, spanning 5943.1 cM (Supplementary  Table S3). Maximum number of 4501 SNPs (total length 1953.3 cM) were available in sub-genome B, followed by sub-genome A (3969 SNPs; 2084.2 cM), and sub-genome D (1157 SNPs; 1157 cM). For individual chromosome, number of SNPs ranged from 77 (4D) to 874 (2B). Heat map showing SNP density in all the 21 chromosomes is shown in Fig. 2. The maximum SNP density was observed on sub-genome B (25 SNPs/10 cM) followed by sub-genome A (20 SNPs/10 cM) and sub-genome D (7 SNPs/10 cM). The average SNP density per 10 cM on individual chromosomes ranged from 4 (on 4D) to 42 (on 2B). Whole genome average SNP density was 17 SNPs/10 cM (Supplementary Table S3).
Genetic diversity, population structure and LD. Genetic diversity based on SNP genotypic data suggested that WAM panel is diverse (whole genome average gene diversity = 0.283). Maximum genetic diversity was observed for sub-genome B (0.33) followed by sub-genome A (0.30), and sub-genome D (0.22) (Supplementary Table S4). SNPs showed moderate to high polymorphism with average polymorphic information content (PIC) of 0.24. For individual sub-genomes, PIC ranged from 0.22 (sub-genome D) to 0.33 (sub-genome B). The number of sub-populations was three with G1 containing 57 genotypes, G2 containing 85 genotypes and G3 containing 15 genotypes; the remaining 65% (163 genotypes) were distributed in more than one sub-population and were therefore treated as admixture ( Supplementary Fig. S1). LD analysis was conducted as a part of another study using the same set of data; LD decay distance ranged from 2 cM to 20 cM in different genomic regions with a genome-wide LD decay of 3 cM 27 .
Marker-trait associations (MTAs). Using Bonferroni correction (−log p value ≥ 6.0), 46 significant MTAs were identified for five traits (DTH, DTM, PH, TGW and GY) in four different environments; 39 (85%) MTAs were detected in individual environment only; the remaining 7 (15%) occurred in two or more environments. MTAs were distributed on 18 wheat chromosomes excluding chromosomes 1D, 3D, and 7D (Table 1; Supplementary Fig. S2). Q-Q plots showing appropriate model fitting for GWAS tests are shown in Supplementary Fig. S2. The details of the MTAs are given in Table 1 and a summary of the results is presented here. Number of MTAs for individual traits were 5 (PH), 6 (GYPP), 10 (TGW), 11 (DTM) and 14 (DTH). The Identification of superior haplotypes. In wheat, the genome-wide LD decay distance has been estimated to be 1.5-15 cM 26,27,[34][35][36][37] . Therefore, in the present study also, we used an average decay of 10 cM and SNPs occurring within 10 cM were treated as haplotype blocks. Five genomic regions (GR1 to GR5), each with varying   14 (GR5). Univariate general linear model (GLM) suggested that there were significant differences for associated phenotypes among haplotypes (Fig. 3). Out of five GRs, GR3 including four haplotypes (H1, H2, H3 and H4) was solely identified in RF environment and was associated with TGW and DTH. The GR2 (four haplotypes associated with PH and DTM) and GR5 (14 haplotypes associated with TGW, PH, DTM and GY) were uniquely identified in the IR environment. GR1 and GR4 were associated with DTH and DTM; and these were identified in both RF and IR environments. In each of the five GRs, most desirable haplotypes were also identified (highlighted in Fig. 3). For example, in each of GR1, GR2 and GR4, H3s were most suitable since each of these were associated with early flowering, maturity and reduced PH; H3 in GR4 was also associated with higher TGW. Similarly, in case of GR5, H8 was most desirable haplotype as it is associated with reduced PH, early maturity, higher TGW as well as GY.    www.nature.com/scientificreports www.nature.com/scientificreports/ Joint effect of significant SNPs on associated phenotypes. For each of the five traits, more than one SNPs were found to be associated under one or more environmental conditions (except PH_E1, PH_E2, TGW_E3, GY_E2 and GY_E4). In order to determine the effect of number of desirable alleles of associated SNPs on phenotype, joint effect was estimated using linear regression. In case of DTH, DTM and PH, SNP alleles that lower the trait value were considered as desirable; however, in case of TGW and GY, SNP alleles that increases the trait value were considered as desirable. Interestingly, in all the cases, significant joint effects were observed except in case of PH_E4. For example, three SNPs (SNP_404, SNP_647 and SNP_5304) were associated with DTH under E1; significant difference were observed for DTH among genotypes with one, two, three and without any desirable alleles (Fig. 4). Similarly, for other four traits (DTM, PH, TGW, GYPP) also, significant joint effects were observed (Fig. 4).
On the basis of breeding value (estimated using effect of significant SNPs), contrasting genotypes were selected for each of the five traits under different environmental conditions (Supplementary Table S5). For three traits, including DTH, DTM and PH, genotypes with negative breeding value were considered as desirable, however, in case of TGW and GYPP, genotypes with positive breeding value were considered as desirable.

Exploration of candidate gene (CG).
Significant MTAs identified during the present study were also used for identification of underlying CGs using Ensembl Plant database; for every MTA, a window of 1 Mb was used for identification of CGs. A search for CGs resulted in identification of 121 (ranges from 1 to 11 CGs/MTA) wheat CGs (Supplementary Table S6). This number of CGs was reduced to 68 genes using annotations based on gene ontology (GO), based on IWGSC RefSeq v1.0; the remaining 53 were placed in un-characterised category (Supplementary Table S6). Out of 68 characterized genes, 18 genes representing 16 MTAs were found to have putative role in drought stress on the basis of literature search ( Table 2). These 18 CGs were involved in different biological activities like-transcription factors (AP2/ERF, bHLH, Dof, GRF, SBP, Zinc finger C2H2, WRKY), oxidoreductase activity, ubiquitination, trehalose-6 phosphatase activity, protein serine/threonine kinase activity, histone-lysine N-methyltransferase activity, amino-acid transmembrane activity (Table 2).
In-silico gene expression analysis was also conducted for the above mentioned 18 CGs using RNA-Seq expression data from Wheat Expression Browser (http://www.wheat-expression.com/). This also provided further evidence of their potential involvement in the trait phenotype in different wheat developmental stages and tissues, and under drought stress condition (Supplementary Figs. S3 and S4). The results indicated variable expression of 10 of the 18 genes in different developmental stages and tissues. Four genes (TraesCS3B02G123600, TraesCS4A02G389900, TraesCS5B02G193100, TraesCS5B02G294400) had relatively higher expression (up to 3.6 Transcripts Per Million; TPM) in all tissues and at all developmental stages. Some CGs expressed uniquely   www.nature.com/scientificreports www.nature.com/scientificreports/ in a specific tissue or developmental stages ( Figure Supplementary Fig. S3). For example, TraesCS3A02G116200 expressed in stem tissues at all developmental stages; TraesCS3A02G323500 expressed in root tissues at all developmental stages and TraesCS5B02G414400 expressed in the spikes at reproductive stage ( Supplementary Fig. S3). Under drought stress condition, only five of the 18 CGs showed higher expression ( Supplementary Fig. S4). Interestingly, out these five CGs, four CGs (TraesCS3B02G123600, TraesCS4A02G389900, TraesCS5B02G193100, TraesCS5B02G193200, TraesCS5D2G294400) belong to MTAs that were identified only in RF environments ( Table 2; Supplementary Fig. S4). To further ascertain the biological functions under DS (1 h and 6 h) of the above CGs, we examined their expression profiles using quantitative real-time PCR (qRT-PCR) analysis. Out of five gene examined, following four CGs showed higher expression under DS: (i) TraesCS5B02G193100 (Trehalose-6-Phosphate;T-6-P), (ii) TraesCS5B02G193200 (APETALA2/Ethylene-responsive factor; AP2/ ERF TF), (iii) TraesCS5A02G401800 (DNA-binding One Zinc Finger; Dof TF) and (iv) TraesCS2A02G547600 (Gibberellin-dioxygenases; GAox)]; and remaining solitary CG [TraesCS1B02G055800 (SET)] did not show any significant change in its expression under DS as compared to control (Fig. 5). We also evaluated the correlation between RNA-Seq data (using log2 TPM values) and RT-qPCR data (using normalized Ct values) for the above mentioned five genes. As expected, significant negative correlation was observed between the two values (R 2 = 0.795, P < 0.001).
Identification of putatively important rare variants for GY. During GWAS rare variants (Minor allele frequency; MAF < 0.05) are mostly ignored due to statistical reasons. However, in order to examine the importance of rare variants in determining phenotypic variation in the present study, we selected 99 SNPs with MAF ranging from 0.03 to 0.04, and tested each individual SNP for association with GYPP in all the four environments using "t test". Total seven SNPs were found associated with GYPP in E1 and E2 environments. These included two SNPs each uniquely identified in each of the E1 and E2 and three SNPs that were found in both the E1 and E2 environments (Supplementary Table S7). In the remaining two environments (E3 and E4), no significant association was observed. Under E1, out of five significant SNPs, major alleles for four SNPs were desirable (i.e. having higher mean value for GYPP) and the minor allele of the remaining one SNP was associated with higher mean GYPP. However, under E2, the major alleles for each of the five significant SNPs were associated with higher GYPP.

Discussion
As mentioned earlier, studies on genetics of drought tolerance have been conducted in several crops including wheat. As a result, it is also widely known that drought tolerance is a complex quantitative trait that is influenced by genetic background and environmental conditions. In the present study, we evaluated a WAM panel [assembled at the International Maize and Wheat Improvement Center (CIMMYT) gene bank] under different water regimes in different locations of India in order to conduct GWAS. This diverse panel was never utilized earlier for a study of the genetics of drought tolerance using GWAS, although it was utilized for conducting GWAS for other traits like yield related trait 26 and Fe, Zn, β-carotene, GPC content 27 . The study allowed us to identify important genomic regions carrying some important genes associated with drought tolerance. Five yield and related traits were used for recording data on drought tolerance. High variability (as revealed by descriptive statistics) for each of the five traits under IR/RF environments suggested that the panel was suitable for a study of the genetics of quantitative traits like yield. Moreover, genetic diversity and PIC (based on marker data) also suggested that WAM panel is highly diverse. Significant and strong correlation of four traits (PH, DTH, DTM, and TGW) with GY under IR and RF environments (Supplementary Table S2) suggested that these traits may be used as surrogate for yield under different water regimes. The D sub-genome carried only one-third or one-fourth MTAs (1157) identified on A and B sub-genomes (3969 and 4501). This is attributed to relatively low level of diversity observed  www.nature.com/scientificreports www.nature.com/scientificreports/ in the D sub-genome that is generally attributed to late hybridization of Aegilops tauschii during evolution of common wheat 35,38,39 .
During present study, we utilized multi-locus association model (consider background loci during association testing) for GWAS to overcome the limitations arising due to single locus GWAS 40,41 . Being a multi-locus model, confounding arising due to population structure may also be corrected by including kinship (K-model) and principal components (Q-model) in association test model. During present study, appropriateness of multi-locus association model was confirmed by solid lines in quantile-quantile (Q-Q) plots ( Supplementary Fig. S2). Q-Q plots also suggested that power of test statistics were high. Moreover, to reduce false positives due to multiple testing Bonferroni correction criteria was used and 46 high confidence MTAs (ranging from 5MTAs for PH to 14 MTAs for DH) were identified for five traits under the two water regimes (Table 1). Interestingly, for almost all the traits, significant joint effect of the desirable alleles of all the SNPs involved in different MTAs was observed suggesting that identified MTAs are important and corresponding traits may be improved significantly through pyramiding of significant SNP alleles (Fig. 4). The identified MTAs also add up to the existing knowledge and may be useful for downstream research and also for wheat breeding. MTAs identified uniquely under IR (20) or RF (19) environments may be important to understand the genetics of water stress signalling, however, MTAs that were common in both the IR and RF environments may be considered as independent of water level and may be found useful for breeding under different water regimes.
The 46 MTAs identified in this study were also compared to MTAs/QTL identified in earlier important studies using chromosome position, where the mapping population/germplasm/association panel were phenotyped under different water regimes (Table 3). Five MTAs were co-localized with QTL/MTAs identified in earlier studies using linkage mapping and GWAS, and the remaining 41 MTAs are novel. Therefore, it seems that the WAM panel used during the present study is quite diverse from the genetic material used in earlier studies. Among the co-localized MTAs/QTL, an MTA associated with GYPP and located on chromosome 7A was co-localized with MTAs/QTL for GY and TGW identified in four earlier studies 28,[42][43][44] . Out of above co-localized QTL/ MTAs, a QTL Qyld.csdh.7AL was associated with GY under RF environments 42 ; the other three co-localized MTAs were earlier identified using GWAS. For instance, a genomic region between 148.43-161.31 cM (wsnp_ CAP7_c1321_664478-IACX7848) was associated with GY under DS 28 . The other two co-localized genomic region between 150.54-178.42 cM (wsnp_Ex_c11047_17915103-wsnp_Ku_c8437_14341371) and at 156.23 cM (Excalibur_c14451_1313) were associated with TGW 43,44 . Another MTA associated with TGW and located on chromosome 6A at 88.9 cM was co-localized with a QTL for TGW (QTkw.aww.6A) associated with wmc0256A mapped at 90 cM 45 . Similarly, the other important MTA associated with TGW identified in RF environment during the present study and located on chromosome 2A at 150.26 cM was co-localized with an MTA for GY under DS located at 149 cM 28 . The above three genomic regions (on 2A, 6A and 7A) associated with GY under drought stress identified during the present study could be subjected to fine mapping and CG identification, so that diagnostic molecular markers can be developed for deployment in breeding programs, particularly those targeting drought prone regions.
It is well known that wheat genome is allopolyploid with three sub-genomes (A, B and D); and thus, most of wheat genes (including CGs identified in present study) have three homoeologous copies one on each sub genomes (i.e. A, B, D sub-genome). Polyploidization may provide sub-functionalization or neo-functionalization against stress to plant 46 . Gene involved in drought tolerance also found to have homologous copies, and majority of them showed expression partitioning under stress condition to provide better adaptability and wide distribution of wheat 46 .  www.nature.com/scientificreports www.nature.com/scientificreports/ Further, in comparison to the single SNPs, haplotypes involving multiple SNPs may be more useful in plant breeding 47 . Haplotype analysis using multiple significant SNPs was recently reported in cotton 48,49 and foxtail millet 50 and desirable haplotypes impacting multiple traits were identified. Such haplotypes may prove useful in improvement of multiple associated traits simultaneously. During the present study, the haplotypes H3 (GR3) was found under rainfed environment only and this haplotype was associated with higher TGW and early maturity. Grain weight is a highly heritable trait and has high phenotypic stability 51 and early maturity under rainfed environment allows the crop to avoid the ever-depleting soil moisture during the crop growth. Therefore, haplotype H3 may be exploited in MAS for breeding for water stress tolerant wheat genotypes. Under IR environments, the haplotypes H3 (GR4) and H8 (GR5) showed association with early maturity, reduced plant height, higher TGW and GYPP. These two haplotypes may be exploited for breeding high yielding early maturing wheat varieties for IR environments.
An effort was also made to identify CGs underlying MTAs. For this purpose, expression analysis of identified CGs was examined. The results suggested that the following four genes were strong candidates for the MTAs identified [TraesCS5B02G193100 (T-6-P), TraesCS5B02G193200 (AP2/ERF TF), TraesCS5A02G401800 (Dof TF), TraesCS2A02G547600 (GAox)]: Following are the reasons for the high level of confidence: (i) each of these genes exhibited a significantly higher expression under DS; (ii) MTAs linked with these CGs were identified exclusively in RF environments ( Table 2, Fig. 5). Among these four genes, the gene TraesCS5B02G193100 encodes trehalose 6-phosphatase (T6P), which regulates carbon assimilation and sugar status in plants. In addition, T6P has also been demonstrated to play an essential role in plant development under drought stress 52,53 . A wheat TPP (T6P) was also found to be associated with TGW in bread wheat 54 . The second gene TraesCS5B02G193200 encodes APETALA2/Ethylene-responsive factor (AP2/ERF), which is a transcription factor. Genes belonging to this family of TFs are mainly plant-specific TFs and are known to be involved in regulation of tolerance to several abiotic stresses including DS 55,56 . Another important CG TraesCS5A02G401800 encodes DNA-binding One Zinc Finger (Dof) TF; this family of TFs contains a zinc finger domain, and plays an important role in imparting tolerance against DS in higher plants 57 . DoF TF was initially reported in maize (ZmDOF1), where it plays a major role in light-regulated gene expression 57 . In wheat, Dof1 has been shown to be involved in carbon metabolism by increasing the regulation of the C4 pathway 58 . Another wheat Dof gene WPBF has been reported to be involved in plant growth and development 59 . Recently in potato (Solanum tuberosum L.), a Dof gene (PGSC0003DMG400019528) was reported to be upregulated in response to 2 h DS 60 . The fourth gene, TraesCS2A02G547600 encodes Gibberellin-dioxygenases (GAox) that are involved in the biosynthesis of bioactive gibberellins (GAs). Abundance of GA regulates responses to environmental stress including DS 61 . GAox genes are also indirectly involved in DS regulation via GA biosynthesis 61 . Also, the role of these genes in drought tolerance in wheat is supported by their up-regulation in wheat seedling under DS (Fig. 5). Gene-based functional markers for these genes may be developed and used for molecular breeding to increase wheat yield under drought stress.
During present study, we also analysed seven important SNPs (out of 99 SNPs with MAF) that were associated with GY; these were eliminated during GWAS due to MAF. The minor allele of one (SNP_6794) of the seven SNPs was associated with significantly higher GYPP under IR environment. Similar to the present study, earlier also, some rare alleles were also found to be desirable for important agronomic traits in wheat 31 . In rice also, rare allele of important grain size gene GS2 was found to increase the grain size and yield 62 . Thus, we strongly feel that rare variants should not be ignored as a whole during GWAS since some of the rare alleles may be responsible for important traits.
In summary, MTAs identified during present study may be further validated using joint linkage and association mapping (JLAM) or post-GWAS 63 . Further, desirable alleles of associated SNPs, desirable haplotypes and CG-based markers together may prove useful in the breeding for improved wheat varieties; the CGs may also be validated using functional genomics approaches and CG-based association mapping. Significant joint effect of associated SNPs suggested that corresponding traits can be improved substantially through pyramiding of significant SNP alleles. Contrasting genotypes identified during the present study may serve as a promising material to develop mapping populations for further genetic dissection of the trait; and favourable genotypes may serve as donor in a breeding program.

Material and Methods
WAM panel and field experiment. The wheat association mapping (WAM) panel for GWAS consisting of 320 diverse spring wheat genotypes was assembled at the International Maize and Wheat Improvement Center (CIMMYT) (details of 320 genotypes are available in Supplementary Table S8). This WAM panel was evaluated for two consecutive years (2011-12 and 2012-13) at two locations in India (Meerut, Uttar Pradesh and Powerkheda, Madhya Pradesh, separated by >900 km) in 18 × 18 simple lattice design (two replications) independently in IR and rainfed environments, making a total of four environments. In a particular environment, each genotype in a replication was raised in plots of 3 rows of 1.5 m each with a row to row distance of 0.25 m. In IR environments, five irrigations were given, four before flowering and one during the grain filling duration. In the rainfed environment, single irrigation was given at 21 days after sowing to allow crop establishment and to avoid complete crop failure. The details of the experimental sites, planting dates of experiments, meteorological data and rainfall during the crop seasons are summarized in Table 4.
Phenotyping. Data on each of 320 genotypes were recorded for the following five traits: (i) days to heading (DTH), recorded as number of days from the date of planting until the emergence of spikes in ~75% of plants in a plot; (ii) days to maturity (DTM), recorded as the number of days from the date of planting until physiological maturity achieved in ~75% of the spikes in a plot, (iii) plant height (PH), recorded at maturity as the mean height (from ground to tip of the main spike) of randomly selected five plants, (iv) thousand grain weight (TGW), recorded as average weight (g) of five random samples of 1000 grains each; (v) grain yield per plot (GYPP), recorded as grain weight of plot.
Genotyping by sequencing and SNP markers. Genomic DNA was extracted from fresh leaves collected from five plants per line using a modified CTAB method 64 . Genotyping was undertaken using GBS developed by DArT Pty. Ltd., Yarralumla, Australia. The detailed methodology is described elsewhere 26 . Genetic diversity, population structure and LD. Using mapped SNPs (MAF > 0.05%; missing data <30%), diversity analysis was conducted, and two diversity parameters, namely gene diversity 65 and PIC 66 were calculated at chromosome, sub-genome and whole genome levels. Model-based cluster analysis was performed using 42 unlinked SNPs, one from each of the 42 chromosome arms, to infer population structure in the dataset. The software STRUCTURE version 2.2 67 was used for this purpose. The number of presumed sub-populations (K) was set from 2 to 20, and the process was repeated three times. For each run, burn-in and MCMC iterations were set to 50,000 and 100,000, respectively, and a model "without admixture and correlated allele frequencies" was used. The number of sub-populations was determined following delta K (ΔK) method 68 . Whole genome as well as chromosome wise LD was conducted as a part of another study using TASSEL 5, and decay distance was calculated using R software 27 . MTAs and haplotype analysis. During the present study we used a multi locus GWAS model named Fixed and random model Circulating Probability Unification (FarmCPU) 40 . This method is believed to be highly efficient in computation and also eliminates confounding issues arising due to population structure, kinship, multiple testing correction, etc. This method utilizes both Fixed Effect Model (FEM) and a Random Effect Model (REM), iteratively. REM tests estimate pseudo-quantitative trait nucleotides (QTNs), FEM tests marker using pseudo-QTNs as covariates. For GWAS, SNPs having <30% missing data and >5% minor allele frequency were utilized. Principle component analysis (PCA) was conducted using TASSEL 5.0, and first three components were incorporated as covariate in association test model. Bonferroni-corrected P-value threshold was set as 0.01 (−log p value = 6.0) and SNP with −log (p) > 6.0 declared as significant MTA. Q-Q plots generated through FarmCPU were used to examine model fitting (account for population structure). Haplotypes were determined on the basis of interval distance between associated SNPs. Significant SNPs present within LD range (10 cM) were considered as one haplotype. Univariate GLM was used to determine significance difference for trait among haplotypes. Joint effect of SNPs, breeding values and contrasting phenotype. Joint effects were estimated, when more than two SNPs were associated with the same trait. To determine joint effect, linear regression was performed using number of desirable SNP alleles for traits (independent variable) and corresponding trait values of the genotypes that contained more than one desirable SNP alleles (dependent variable). The breeding value of each accession was calculated from the allelic effects of all significant SNPs. The allelic effect was calculated by taking the difference in trait value between genotypes with contrasting alleles. The breeding value for each genotype was estimated, by using absolute value of the allelic effect of each significant SNP taken as a negative value if a genotype, had SNP allele that decreases the trait value and vice-versa. All positive and negative allelic values were summed to estimate the breeding value of each genotype. For each trait, contrasting phenotypes were selected with minimum and maximum breeding value.

Identification of putative CGs and their expression analysis.
CGs for individual MTAs were identified by aligning the associated GBS sequences to wheat genome assembly IWGSC1.0 69 available on the Ensemble database (http://www.ensembl.org/info/docs/tools/vep/index.html). High-confidence annotated genes were retrieved from a 1000 kb window for each identified MTA. The GO annotation information of these CGs were extracted from the IWGSC website (http://www.wheatgenome.org/). For annotated CGs, gene expression analysis was conducted utilizing wheat expression database 70 hosted at http://wheatexpression.com. Transcripts per kilobase millions (TPM) values for every CG were downloaded. Log transformed (Log2X) value was used to generate a heatmap using online tool ClustVis 71 .
Plant materials, treatments, and quantitative real-time PCR (qRT-PCR) analysis. Seedlings of wheat cv. 'C306' representing the well-known drought tolerant genotype were analysed for tolerance against DS. For applying water stress, 7 days old seedlings were transferred to modified Hoagland's solution (+20% PEG 8000). The leaf samples of control and treated seedlings were harvested at 0, 1 and 6 h after water stress treatment  Table 4. Details of the environments, dates of planting of experiments, coordinates, altitude, megaenvironment (ME), total rainfall, average maximum and average minimum (avg max/avg min) temperatures of the four environments used to evaluate WAM panel. IR, irrigated; RF, rainfed; *Source is CIMMYT wheat atlas, http://old.wheatatlas.org/; ME, mega-environments; Avg, Average; # Total rain-fall during the crop-season in mm.