Genome-wide association study in hexaploid wheat identifies novel genomic regions associated with resistance to root lesion nematode (Pratylenchus thornei)

Root lesion nematode (RLN; Pratylenchus thornei) causes extensive yield losses in wheat worldwide and thus pose serious threat to global food security. Reliance on fumigants (such as methyl bromide) and nematicides for crop protection has been discouraged due to environmental concerns. Hence, alternative environment friendly control measures like finding and deployment of resistance genes against Pratylenchus thornei are of significant importance. In the present study, genome-wide association study (GWAS) was performed using single-locus and multi-locus methods. In total, 143 wheat genotypes collected from pan-Indian wheat cultivation states were used for nematode screening. Genotypic data consisted of  > 7K SNPs with known genetic positions on the high-density consensus map was used for association analysis. Principal component analysis indicated the existence of sub-populations with no major structuring of populations due to the origin. Altogether, 25 significant marker trait associations were detected with − log10 (p value) > 4.0. Three large linkage disequilibrium blocks and the corresponding haplotypes were found to be associated with significant SNPs. In total, 37 candidate genes with nine genes having a putative role in disease resistance (F-box-like domain superfamily, Leucine-rich repeat, cysteine-containing subtype, Cytochrome P450 superfamily, Zinc finger C2H2-type, RING/FYVE/PHD-type, etc.) were identified. Genomic selection was conducted to investigate how well one could predict the phenotype of the nematode count without performing the screening experiments. Prediction value of r = 0.40 to 0.44 was observed when 56 to 70% of the population was used as a training set. This is the first report where GWAS has been conducted to find resistance against root lesion nematode (P. thornei) in Indian wheat germplasm.


Results
Phenotypic analysis. To identify the region of wheat genome contributing resistance to P. thornei, 143 wheat genotypes predominantly from India were screened for nematode resistance. Screening experiments were carried out over two years by growing wheat genotypes in five different batches with total 5-7 replications. A histogram depicting the distribution of nematodes among genotypes is shown in Supplementary Fig. S1. The restricted maximum likelihood (REML) analysis of the nematode count revealed significant effect of the genotypes which was ascertained by comparing the models wherein batch and replications were considered as random terms. Further, the batch effect was higher than the replication effects which are given in Supplementary Table S1. Thus, the best linear unbiased predictions (BLUPs) were calculated using batch and genotypes as a random term. We obtained a high variation for the nematode counts across the screened genotypes with minimum = 33, maximum = 13,503 and the coefficient of variation (CV) of 62.96%. The heritability value of the nematode counts was moderate (h 2 = 0.55) (Supplementary Table S2).

SNP markers statistics.
In the present study, a total of 16,406 SNP markers were extracted after initial filtering for the quality on the chip and the missing data. This number was further reduced to 7,491 SNP mark- www.nature.com/scientificreports/ ers, after filtering SNP markers with a missing rate higher than 10%, minor allele frequency (MAF) lower than 5% and heterozygosity higher than 10%. Marker coverage of the SNPs over the chromosomes is shown in Fig. 1. Marker density was found highest for the B genome followed by A and D genome ( Supplementary Fig. S2). The maximum number of markers were found on chromosome 5B while chromosome 4D spanned the minimum number of markers.
Principal component analysis. To estimate population structure, a PCA was conducted with 7,491 polymorphic SNP markers and 143 bread wheat genotypes (Fig. 2). PC1 explained 12.47% variation and PC2 explained 6.46% variation. Six genotypes clustered very close to each other although none of them shared any similarity in terms of the origin of the germplasm (these genotypes were originated from Maharashtra, Jharkhand, New-Delhi and Madhya Pradesh states of India). We also explored further whether the structure of the population is associated with the nematode resistance. Genotypes clustered towards the left side were associated with low nematode counts whereas the right side genotypes were susceptible with higher nematode count (r = 0.11, p value < 0.05). Further, a significant correlation was obtained for phenotypes with PC4axis (− 0.15, p value < 0.05) wherein genotypes clustered towards the lower axis were resistant compared to the upper axis (data not shown).  www.nature.com/scientificreports/ Linkage disequilibrium (LD). Faster LD-decay was observed for the A-genome (5-7 cM) compared to the B and D-genome (8-9 cM) of the panel. However, local spikes in the long-range LD regions were also observed that indicates the population structure of the panel as the LD-decay was computed without correcting for population structure. Despite a lower number of markers on the D-genome, LD patterns follow a similar trend of declining LD-values over the genetic distances. Genome-wide and chromosome-wide LD decay is given in Fig. 3 and Supplementary Fig. S3, respectively.
Genome-wide association analysis. Four GWAS methods, CMLM, BLINK, FarmCPU and MLMM as implemented in GAPIT were used to identify marker-trait association (MTA). These four methods identified 25 significant MTAs with − log10 (p value) > 4.0 as shown in circular Manhattan plots (Table 1; Fig. 4a). All the MTAs identified using BLINK and MLMM methods had FDR-corrected p value < 0.05. But in FarmCPU only six MTAs and four in CMLM method had FDR-corrected p value < 0.05. Q-Q plots reflected that the distribution of observed association (p values) were close to the distribution of the expected associations (Fig. 4b). This means the methods implemented for GWAS were sufficiently stringent to control spurious associations. CMLM analysis identified eight MTAs, with significant SNPs explaining approximately 23% to 26% of the total phenotypic variation. These MTAs were detected on chromosomes 1B, 1D, 3B and 6B. On chromosome 1B, five significant MTAs were identified and among these four were located between 24 and 25 cM (Table 1). Six MTAs were identified on chromosomes 1B, 1D, 2B and 3B using the BLINK method. Two MTAs were present on each of 1B and 1D whereas a single MTA was identified on 2B and 3B. Three of the MTAs located on chromosomes 1B (24 cM), 1D (221 cM) and 3B (8 cM) were common with those identified by CMLM (Table 1). FarmCPU analysis identified a total of seven MTAs on chromosomes 1B, 1D, 2B, 3B, 5A and 7A. Two MTAs identified on chromosome 3B and single MTA on each chromosome 1B, 1D, 2B, 5A and 7A. Two MTAs identified on chromosome 1D (221 cM) and 1B (24 cM) were also identified by both CMLM and BLINK methods (Table 1). MLMM analysis identified four MTAs on chromosomes 1B, 1D and 2B. Two MTAs detected on chromosome 1B and single MTA on each chromosome 1D and 2B. The MTA on chromosome 1D (221 cM) was identified in all the four methods as discussed above. MTA on 2B (27 cM) was detected in BLINK and FarmCPU methods whereas as MTA on 1B (119 cM) was detected only by BLINK method.
Association analysis performed using all these four methods CMLM, BLINK, FarmCPU and MLMM seems to pick similar regions across the genome as discussed above. One of the MTAs (AX-94470359) on chromosome 1D located at 221 cM was detected in all four methods. MTA (AX-94692978) located on chromosome 1B at 24 cM was detected through CMLM, BLINK and FarmCPU. Likewise, MTA (AX-94620141) on chromosome 2B at 27 cM was detected through BLINK, FarmCPU and MLMM, respectively. These three MTAs on chromosomes 1D, 1B and 2B identified consistently in either three or four methods were considered to be the most significant of the study and could be useful for further analysis. Allelic effects of highly significant SNPs obtained consistently in more than one method is shown in the Fig. 5. Alleles leading to the decrease in nematode count were considered to be favourable alleles whereas unfavourable alleles increased the nematode count.
Haplotype analysis. We also generated the haplotypic blocks across the genome by using the SNPs. Interestingly, three main LD blocks that spanned around the regions of significant markers were used to generate haplotypes. Three large LD blocks and the corresponding haplotypes (LD block one, two and three) which were found to be associated with significant SNPs are given in Fig. 6.
Genomic selection of the nematode counts. As nematode counting, used for resistance screening, is not straight forward and time consuming, we explored whether the genomic selection would predict the nematode count using genome-wide markers. For this analysis the population was divided into smaller subgroups starting from 20 to 80% of the population used as a training set and then genomic prediction was performed on the remaining set. We observed a prediction value of r = 0.40 to 0.44 when 56 to 70% of the population was used as a training set which is moderate for a complex trait like nematode count of the present study ( Supplementary  Fig. S4).
Putative candidate gene associated with significant MTAs. The SNP tags for the highly significant MTAs were subjected to a BLAST search against recently released wheat genome sequence IWGSC RefSeq v1.0 (http://www.wheat genom e.org) to find the candidate genes. We identified 37 putative candidate genes (Supplementary Table S3), out of which nine genes having a significant role in disease resistance (including F-boxlike domain superfamily, Leucine-rich repeat, cysteine-containing subtype, Cytochrome P450 superfamily, Zinc finger C2H2-type, RING/FYVE/PHD-type, etc.) ( Table 2). No hit was found for the highly significant MTA (AX-94470359) located on chromosome 1D. SNP tags for markers AX-95630606 and AX-94478352 associated with 3B and 6B identified six putative candidate genes on each of the chromosomes. Similarly, more than one candidate genes were also identified for other MTAs. The maximum number of candidate genes 11 and 10 were detected on chromosome 1B and 3B, respectively (Supplementary Table S3).

Discussion
The use of resistant cultivar is considered as the economically feasible and eco-friendly method to control P. thornei 31 . Up to 69% and 32% reduction in wheat yield is reported due to P. thornei and P. neglectus, respectively 29,43 . Dababat et al. 31 evaluated CIMMYT's spring wheat genotypes for P. thornei resistance and identified 56 resistant lines under controlled environmental conditions in a pre-screen. These lines were further evaluated under natural field conditions and only 14 remained resistant. In the present study, 143 wheat genotypes (predominantly from India) were screened against P. thornei that showed a varied response when tested under controlled environmental conditions. No reference source of resistance in wheat against P. thornei is available in India. Kranti and Kanwar 19 evaluated resistance and susceptibility in several wheat varieties (including Indian lines) for P. thornei resistance by the reproduction factor (Pf/Pi). The reproduction factor is commonly used in plant nematology as a quantitative value to assess resistance. In this study we found six resistant genotypes having reproduction factor (Pf/Pi) value equal or less than one, 24 moderately resistant genotypes (RF = 1-2), 25 moderately susceptible (RF = 2-3), 17 susceptible (RF = 3-4) and 71 highly susceptible (RF = more than 4) (Supplementary Table S4). In the current study, heritability (h 2 ) was found to be 0.55, which showed moderate heritable value. This value indicates that despite the complexity of the trait which mainly owes to environmental conditions significant genetic effects were still observed and measured in the present study. Although in previous studies comparatively higher heritability for this nematode was also observed ranging from 0.83 to 0.97 36,44,45 . Dababat et al. 42 detected very high heritability values, 0.95, 0.97 and 0.97 for Heterodera avenae, P. neglectus and P. thornei, respectively. Thus, in the future more experiments are needed to ascertain the phenotype and minimize the errors.
In the current study, our aim was to identify novel MTAs associated with P. thornei resistance by GWAS. With the rapid advancement in high throughput sequencing techniques, GWAS has been widely used to identify and dissect complex traits in wheat 46,47 . GWAS has been successfully used to study disease resistance in wheat [48][49][50][51] . Figure 6. Association of the three LD blocks displaying the haplotypes nematode count as a box plots (a) LD block one with three haplotypes due to 9 significant SNPs (b) LD block two with two haplotypes due to 2 significant SNPs and (c) LD block three with two haplotypes due to 3 significant SNPs. Smaller haplotypes are not shown here. www.nature.com/scientificreports/ GWAS was also performed in wheat for cereal cyst and root-lesion nematode resistance 31,42 . However, nothing from Indian wheat genotypes were screened and reported which links the phenotype to genotype. To avoid false association which is often present in the genotypes of local origin, effective corrections for population structure is needed [52][53][54][55][56] . Thus, we employed four methods of GWAS to locate the MTAs. In the present study, PCA detected two groups of the population indicating the existence of sub-populations. Interestingly we did not see major structuring of the populations due to the origin possibly suggesting despite a smaller group the genotypes were diverse. Contrary to traditional linkage analysis, association mapping offers higher mapping resolution. In GWAS, diverse germplasm is used that contains a large number of recombination events as well as LD across a lineage that could help to detect mutations associated with phenotypic traits of interest 57 . The magnitude of LD is influenced by many factors like recombination rate, allele frequency, mating system, genetic isolation, population structure, selection, marker type 52,58-61 , etc. Many studies suggest that LD is not constant across the whole genome or along a single chromosome. It can occur over large distances but also decreases very quickly for nearby loci 62 . In our study, LD decay so observed was similar to earlier reports 63,64 where LD decay in common wheat ranged between 0.22 to 10 cM using SNP markers 65 , 2 cM (in A and B genome) to 5 cM (in D genome) using SNP markers 61 . LD ranged from 8 cM (D genome) to 10 cM (A genome) and average LD for whole genome 5 cM was also reported 64 . The D-genome has a higher LD decay, which is believed to be mainly due to limited populations of Aegilops tauschii which contributed to the present wheat genome in the evolutionary history of wheat 66,67 .
For a complex trait like nematode resistance, using more than one GWAS analysis methods make results statistically more stringent and acceptable with higher confidence which could be further used for pre-breeding purposes. CMLM, FarmCPU, MLMM and BLINK were used to identify significant associations between genotypic and phenotypic data. In the present study, several common significant MTAs were detected through four association analysis methods. The mixed linear model has been widely used in controlling population structure and relatedness within GWAS and reduces the spurious associations [68][69][70][71][72] . However, mixed linear model-based methods are computationally challenging for large data sets. Hence, to overcome this problem, CMLM is implemented. CMLM method implemented in the GAPIT package decreases the effective sample size of such large and computationally challenging datasets by clustering individuals into groups 73,74 . CMLM improves statistical power by 5 to 15% and reduces computing time compared to regular MLM 73,75 . However, due to confounding between population structure, kinship and multiple testing corrections, CMLM based analysis can be insufficient, which could lead to a false association. Therefore, we also used FarmCPU analysis to mitigate this problem as the latter method has improved statistical power to detect causative genotype-phenotype associations and reduced false positives 76 . This method implements both the fixed-effect model (FEM) as covariates and a random effect model (REM) containing the kinship matrix. MLMM, which is a modified version of the mixed linear model, as it fits loci of large effect as covariates, detects more MTAs with smaller effects and has higher power and a lower FDR than single-locus approaches 77 . A recently developed BLINK method is also used in the present study to  www.nature.com/scientificreports/ decrease the confounding problems that may lead to the spurious association 78 . BLINK implements a multiple loci test method by combining a fixed-effect model (FEM) with Bayesian information criteria (BIC). This method uses linkage disequilibrium (LD) information to replace the bin method. Using four different GWAS methods, 25 MTAs were detected in association with P. thornei resistance. Among these, five MTAs on chromosome 1D (221 cM), 1B (24 cM), 3B (8 cM), 1B (119 cM) and 2B (27 cM) were obtained in more than one method. The SNP marker AX-94470359 on chromosome 1D detected in the present study was found common in all four methods. Among MTAs detected in the present study, seven were considered novel as detected for the first time whereas nine MTAs were found to be overlapped with or near to previously reported QTL/MTA regions (Table 1 and Supplementary Table S5).
Another highly significant marker AX-94620141 was detected on chromosome 2B (27 cM) by MLMM, Farm-CPU and BLINK methods. In a previous study by Zwart et al. 37 , a QTL for P. thornei resistance was detected on chromosome 2B at 27.9-35.1 cM interval. In a separate study by Linsell et al. 36 three QTLs for P. thornei resistance were also identified on chromosome 2B, and mapped at 8.4-17.6 cM, 62.2-69.8 cM and 302.7-306.8 cM, respectively. Recently, Rahman et al. 39 performed fine mapping of P. thornei resistance loci on chromosome 2B of wheat. Although, MTA detected in our study is not on the same position as the fine mapped QTL, but its presence on the same chromosome suggests possible importance of chromosome 2B in resistance and thus can be considered as an important chromosome for nematode resistance. A candidate gene for Cytochrome P450 superfamily was also detected on this chromosome. Dababat et al. 42 also detected the DArT marker for P. neglectus resistance on chromosome 2B.
We identified three markers [AX-94809409 (8 cM), AX-95232057 (102 cM), AX-95630606 (85 cM)] on chromosome 3B using CMLM, FarmCPU and BLINK methods. Schmidt et al. 80 also identified QTL for P. thornei resistance in this region on chromosome 3B. Dababat et al. 42 identified markers associated with P. thornei and P. neglectus resistance on chromosome 3B at 14.2 cM and 64.7 cM, respectively. Although, in previous studies, QTL/ markers associated with H. avenae and P. neglectus resistance were mapped on chromosome 5A, 6B and 7A 42,82,83 . In this study, two markers (AX-94920631 on chromosome 5A and AX-94478352 on chromosome 6B) were identified; no QTL was reported on these chromosomes for P. thornei resistance in previous studies.
In order to find out the allelic effect, the SNP alleles that decreased the nematode count were defined as favourable allele and those that resulted in an increase in nematode count were defined as unfavourable alleles. The allelic contribution may be quite effective in improving the resistance in wheat by marker-assisted breeding. The average phenotypic value (nematode count) of each significant SNP allele for nematode resistance was selected to calculate the allelic effect. Highly favorable allele detected in the present study can be used for the development of nematode resistant wheat genotypes.
Since the nematode count is a tedious trait that needs uprooting of the plants which is a cumbersome process specifically when dealing with large populations. That often is practically impossible when large populations are screened. Thus, we have conducted the genomic selection of the trait to investigate how well one could predict the phenotype of the nematode count without performing the phenotyping by using genomic-selection. We observed a prediction value of r = 0.40 to 0.44 when 56 to 70% of the population was used as a training set (Supplementary Fig. S4). These were not very high values but reasonable to use in the plant breeding practices when one targets multiple traits along with moderately heritable nematode resistance traits by employing relatively inexpensive genotyping instead of large costly phenotyping efforts. Small number of genotypes used in the current study further contributed to low-prediction values. However, despite using a small number of genotypes, we obtained a significant peak in the known region supported by old literature. Although, prediction values are low, information obtained from present study will provide a future roadmap to other studies that embark on genomic-selection using trait such as nematode count. Still, for future studies, we suggest inclusion of a higher number of genotypes to get better prediction values.
In the present study, 37 putative candidate genes were identified for significant MTAs (Supplementary  Table S3). On the basis of the literature search, nine genes were detected having a putative role in disease resistance (Table 2). Important candidate genes identified (i) F-box domain; F-box protein plays a central role in plant immune response through hormone pathways such as salicylic acid and jasmonic acid pathway 84,85 . The plant uses these hormones in the regulation of defense response against several pests and pathogens 86 . Wheat F-box gene TafBA1 was also found to be involved in drought and heat tolerance 87,88 . (ii) Cytochrome P450 superfamily; Cytochrome P450s are heme-containing membrane-bound enzymes that involve in plant defense, and play an important role in disease response including fusarium head blight disease of wheat 89,90 , (iii) Vacuolar protein sorting-associated protein Vps 28 and Zinc finger protein; these are found to be drought-responsive genes and also play a role in other multiple stresses response in Arabidopsis thaliana 91 . (iv) Myb domain; Myb family is the largest family of a transcription factor in wheat. These are known to play a role in fungal infection caused by Rhizoctonia cerealis and also characterized as genes responsive to heat stress 92

Conclusion
The results obtained in the present study should be highly useful for tapping resistance against P. thornei using natural genetic resources. Up to the best of our knowledge, only a single GWAS study is reported on P. thornei resistance in wheat 42 . The wheat genotypes, in the present study, showing resistance for P. thornei will be tested in the field under natural environmental conditions. For further confirmation, MTAs obtained shall be checked in appropriate segregating bi-parental population for potential breeding application in future. The SNP markers associated with significant MTAs can be used to develop KASP (Kompetitive Allele Specific PCR) markers, which will be further useful in marker-assisted breeding for P. thornei resistance. Further, we showed the effectiveness of the genomic-selection in successfully predicting the phenotype of the genotypes. Identified candidate genes need further validation through wet lab experiments.

Materials and methods
Plant material. The germplasm collection consisted of 143 wheat genotypes, predominantly collected from the pan-Indian wheat cultivation states, and was used for nematode screening and association analysis (Supplementary Table S6). Three resistant (Raj MR1, WH542, M6) and three susceptible (WH711, WH147, Opata) wheat genotypes were also included in the screening experiments as control.  95 . Seeds of wheat genotypes were surface sterilized with 0.2% (w/v) mercuric chloride (HgCl 2 ) and then soaked in water for germination overnight. Germinated seeds were transferred to polyvinyl chloride (PVC) tubes (16 cm in height and 3.5 cm in diameter) filled with steam-sterilized soil. This soil was collected from the field, sieved and then sterilized twice at 121 °C for 2 h. At the bottom of each tube, a 20 μm sieve was fixed to prevent both root outgrowth and the nematode's escape out of the tube. The tubes were placed in special holders on an irrigation system as described previously [96][97][98][99] . All the experiments were performed in the growth room under controlled environmental conditions (22 °C ± 2, 16 h light and 8 h darkness and ~ 65% relative humidity). Completely randomized design (CRD) with a minimum five replicates in five different batches (over two years) of each genotype was used. The plants were irrigated with Hoagland media at a fixed interval during the experiment. Ten days post transplantation, each seedling was inoculated with 1000 P. thornei mixed stage juveniles and adult nematodes in solution by pipetting at a depth of 1.0 cm near to root. Plants were not watered for the next 3-4 days after inoculation of nematodes. Nematode extraction was done after 90 days of transplantation from soil and root by Cobb's sieving and decanting method combined with modified Baermann's funnel technique 95,100 . Nematode suspensions were collected and stored in bottles at 4 °C for counting. Three aliquots of 1.0 ml were taken from each bottle and nematodes were counted under a stereo microscope (Nikon SMZ645) at 40X-fold magnification. Reproduction factor (Pf/Pi) values (Pf means final population and Pi means an initial population of nematode) were calculated as the ratio between the number of nematodes counted at the end of the test and the number of nematodes used as inoculum on the basis of reproduction factor (Pf/Pi) values (Supplementary  Table S4). Genotypes were categorized as resistant (Reproduction factor (RF) equal to or less than 1), moderately resistant (RF = 1-2), moderately susceptible (RF = 2-3), and susceptible (RF = 3-4) and highly susceptible (RF more than 4) 42 .
Statistical analysis. In total, 2,860 data points (variety, batches and replication combinations) were generated. To control the errors all data points were manually checked and visually inspected by conducting outlier z-tests. The analysis of unbalanced nematode counts data was carried out by the mixed model using REML (restricted maximum likelihood) 101,102 . A REML model was later fitted using the ASREML-R library in R 103 . All terms are fitted as random and the best model was selected and fitted to predict the BLUPs of the varieties that were used in the subsequent GWAS analysis.
Genotyping. From the seedling of the grown plants, leaf samples were collected for DNA extraction. All DNA samples were quantified on 0.8% agarose gel followed by quantification on a spectro-photometer-based Nanodropsystem. Affymetrix 35 K Axiom wheat Breeder chip was used for genotyping. We obtained 16,406 SNPs after quality control on this breeder chip. We further filtered the SNPs based on the minor allele frequency of 5%, heterozygosity (< 10%) and missing values that reduce the total SNP numbers to 7,491. Full set of 7,491 polymorphic SNPs that have genetic positions on the high-density consensus map was used for further analysis 104 .
Population structure and Linkage disequilibrium analysis. To understand the population structure, PCA analysis was performed from a set of polymorphic SNPs implemented in TASSEL version 5.0 and displayed using R 103 . To compute pair-wise LD between the markers TASSEL version 5.2.54 was used. The pairwise LD was determined by using the full set of mapped 7,491 polymorphic and genetically anchored SNPs. To display whole genome and sub-genomic LD pairwise r 2 versus distances in cM, sliding window of 50 SNPs was used and the graphics of the LD-decay were generated in R package GGPLOT2 105 .
Genome-wide and haplotype-based association mapping. For the genome-wide association analysis of SNPs to nematode counts, GAPIT R package (3.0) 106  www.nature.com/scientificreports/ thinned markers over the 21 wheat chromosomes were used to generate the IBS based kinship matrix. Four methods of GWAS analysis were used including compressed mixed linear model (CMLM) 73 , multi loci mixed linear model (MLMM) 77 , Fixed and random model Circulating Probability Unification (FarmCPU) 76 and Bayesian-information and linkage-disequilibrium iteratively nested keyway (BLINK) 78 . In total, 7,491 SNPs obtained after filtering were used for genome-wide association analysis. To visualize the false positives of the implemented methods, QQ-plots were generated from each method in R (using the R-package R-CM plot). Manhattan plots were generated in R package qqman and CMplot. A robust threshold value of − log 10 (p value) = 4.0 was set to reduce false positives and to ascertain the comparison across the implemented GWAS methods of this study. To compute the haplotype-based association analysis haplotypes were determined over the whole genome. PLINK version 1.9 was used to generate the LD-blocks which in turn is based on the default algorithms of finding haplotype blocks from Haploview software 107 .
Genomic prediction analysis. A set of polymorphic, genetically mapped markers discussed above were used for the genomic selection. The analysis was conducted by a cross-validation approach wherein a set of training sets were drawn from the total population with a starting set of size 20 to 120 that corresponds to the 14 to 84% of the population. As there were only a few marker data sets missing which we imputed using mean values. A set of 500 cycles were run for this analysis as implemented in the software rrBLUP R-package 108 . The predicted regression coefficients obtained for these analyses were checked against the actual values and were reported in this study.
Candidate gene analysis. To identify the potential candidate genes and their gene ontology, the sequences of SNPs defined significant association for P. thornei resistance, were searched for IWGSC sequence information in Ensembl plants for T. aestivum (http://plant s.ensem bl.org/Triti cum_aesti vum) and the high confidence genes were selected to explore the function. The protein products of genes present in the flanking sequence available for the SNP markers with maximum bases (50,000 bases before and after the SNP) were taken as putative genes.