Introduction

Wheat is a major staple cereal, providing one-fifth of the calories and protein for four billion people globally1. Turkey is the eighth largest wheat producer worldwide and a leading exporter cultivating > 8 million ha2. Wheat production in Turkey is therefore important for global food security. Climate change scenarios have indicated that heat and drought stress will have profound effects on Turkey’s wheat production in coming decades3,4. Latest crop models project that the winter wheat zone of the country will suffer great losses in yield as compared to the spring wheat zone due to drought and heat (25–29%) yield reductions in winter wheat and 15–16% in spring wheat, respectively5. There is an urgent need to explore new approaches to develop climate resilient wheat varieties that can adapt to drought and heat stress environments.

In 2011–2014, International Winter Wheat Improvement Network (IWWIP) based in Turkey, collected wheat landraces (LR), which were evaluated, and superior landraces multiplied for seed deliveries to farming communities assuring their continuous cultivation and use in breeding6. The LR collection was sampled from four provinces in Turkey, two provinces in Iran and Afghanistan. Thereafter, 84 LR were delivered to the gene banks of the three countries to establish a common field trial along with 73 IWWIP modern germplasm and varieties (MV). This important germplasm set, named ‘International Landrace Exchange Set’ was evaluated for yield and yield components across Afghanistan, Iran and Turkey6. The trial results demonstrated that LR were highly adaptable to diverse agro-ecological conditions in all three countries.

Genome-wide association study (GWAS) has been increasingly utilized in wheat to untangle the genetic architecture of complex agronomic traits including yield and yield components under favorable and abiotic stress conditions7,8,9,10,11,12,13,14,15,16,17. However, most studies have used SNP-based GWAS. SNP markers are bi-allelic therefore less informative than multi-allelic markers. Additionally, in SNP-based GWAS multiple SNPs in high linkage disequilibrium (LD) are commonly associated with the same quantitative trait loci (QTL), resulting in overestimation of QTL effects. Haplotypes are constructed by combining multiple SNPs in high LD and haplotype-based GWAS overcomes most of the limitations associated with using single SNPs in GWAS, resulting in better statistical significance9,12,18,19,20,21.

The increased understanding of genetic loci that control complex traits and those that underwent selection during crop improvement (referred to as selective sweeps or signatures of selection) is important to design efficient breeding strategies. A combination of GWAS and selective sweep analysis is emerging as a leading approach to identify such new genomic targets14,22,23. Additionally, it allows to evaluate the role of selection in shaping the quantitative genetic variation at various levels24. In this study, we characterized the International Landrace Exchange Set to (i) identify the genetic diversity and population structure in this set, (ii) conduct GWAS using SNP and haplotype-based GWAS to detect genomic regions leading to improved drought resilience, and (iii) identify genome regions that were directionally selected between LR and MV and link these regions to known genes related to wheat adaptation and improvement.

Methods

Germplasm panel

The LR subset in the International Landrace Exchange Set comprised of 84 wheat landraces; 45 entries from Turkey, 20 from Afghanistan and 19 from Iran. The MV subset comprised of 73 entries developed by IWWIP in Turkey; 32 were bred for semiarid environments and 41 for irrigated environments. The detail list of entries is in supplementary Table S1 and described by6.

Phenotyping, genotyping and population structure

Phenotyping was conducted on 6 m2 plots at the Bahri Dagdas International Agricultural Research Institute in Konya, Turkey for two years (2018 and 2019). An alpha-lattice experimental design was used with two replicates. The weather conditions in Konya in 2018 were characterized by lack of moisture prior to heading resulting in drought conditions. In 2019, the precipitation was sufficient and grain yield exceeded 4 t/ha without applying any additional irrigation. The 2018 year was treated as a drought season and 2019 as a favorable season. Experimental data were recorded on grain yield (GY), spike length (SL), spike number (SN), number of spikelets per spike (NSS), harvest index (HI) and thousand grain weight (TGW) as described in6. The statistical analysis of the phenotypic data, including estimation of best linear unbiased predictors (BLUP) and broad-sense heritability, was done in Meta R25 (Vargas et al. 2013). Broad-sense-heritability in Meta R was estimated using the formula H2 = Vg/(Vg + Verr/r), where Vg is the genotypic variance, Verr is the error variance, and r is the number of replications. The ANOVA was done with R package lme4 using a linear mixed effect model in which replications were treated as fixed effect and entries as random effect. The correlations between traits were calculated using R packages ggplot2, GGally and rlang.

The germplasm set was genotyped using a high-density Illumina Infinium 25 K wheat SNP array (TraitGenetics GmbH, Gatersleben, Germany). After removing markers with missing data > 30% and minor allele frequency < 5%, 15,208 SNPs were used in the analyses. The polymorphic information content (PIC) and nucleotide diversity parameter (π) were calculated to estimate genetic diversity in the panel. PIC was calculated using PowerMarker version 3.2526 while π was calculated in TASSEL version 5.2.7927.

Principal component analysis (PCA) was conducted using the R package ‘stats’ and ‘rgl’. Coefficient of correlation r2 among markers was calculated to estimate LD among all pairwise comparison of markers in TASSEL version 5.2.79 and the values were plotted against genetic distance (bp) in R Studio using an in-house script. The pattern of LD decay was determined as the distance where LD values reduced to half of their maximum value.

SNP and haplotype-based GWAS

Genome-wide haplotypes were constructed based on the linkage disequilibrium (LD) parameter D’ using the modified R script28. The details of the parameters used were described by29,30. Haplotype-based GWAS was conducted using Plink version 1.07 with default parameters31. SNP-based GWAS was conducted in GAPIT Version 3.032. A mixed linear model was applied with the first three principal components as fixed variate and kinship as a random variate. SNPs and haplotype blocks were declared significant at p < 0.001. Box plots were generated to show the allelic effects of the associated markers and haplotype blocks using the PAST statistical program version 1.9333.

Analyses of selective sweeps

To identify genomic regions under selection, we used two independent methods; EigenGWAS34 and Wright’s Fst statistic35. EigenGWAS (Genome-wide association study with eigenvector decomposition) is a GWAS, however, phenotypic data are replaced by individual-level eigenvectors derived from the genotypic data. EigenGWAS was conducted using R-based GEAR software (https://github.com/gc5k/GEAR). To control the genetic drift component, the method generates a genomic control factor (λGC) and corrects the p-value. We used the corrected p‐value, called PGC (p value with a genomic control factor36, for detecting the loci under selection. One of the EigenGWAS outputs is called strength of selection, which is defined as the ratio between Fst of a locus and the average Fst of the population under study37. Wright’s F statistics of all individual 15,208 loci was calculated using ‘hierfstat’ package in R environment. A locus was declared under selection if it showed Fst > average Fst of the panel.

Candidate gene analysis

To search for putative candidate genes in the proximity of any significant GWAS output, Basic Local Alignment Search Tool (BLAST) in the EnsemblPlant database (https://plants.ensembl.org/index.html) was used. The co-expression patterns and gene network analysis were investigated in the Wheat Expression database (http://www.wheat-expression.com/). Potential links to phenotypes were determined using Knetminer integrated in the Wheat Expression database.

Results

Phenotypic variation in LR and MV

Year 2018 growing season was treated as the drought environment due to very low precipitation prior to heading, whereas year 2019 growing season was considered as the favorable environment. The metrological data is provided in Table S2, which clearly supports the drier conditions in 2018 growing seasons as compared to 2019. The total rainfall during 2018 (149.6 mm) growing season was 27.7% lower than that during 2019 (206.8 mm) growing season. Analysis of variance (ANOVA) revealed that all traits showed significant variation in the panel (Table S3). The effect of environment was significant on all traits except SL. In addition, significant genotype x environment interaction was observed for all traits. Broad sense heritability (H2) estimates were higher in the drought season for GY, SL and SN, whereas these were higher for HI and TGW in the favorable season (Table S3). For NSS, H2 estimates were similar in both seasons. Across seasons, TGW had the highest heritability of 0.73 whereas SL and NSS showed moderate heritability values of 0.62 and 0.65, respectively. The remaining traits GY, SN and HI showed low heritability values (H2 < 0.50) across seasons.

A detailed description of morphological diversity and descriptive statistics of the panel has been published recently6. To avoid repetition, we have elaborated here effects of drought on GY and yield components in LR and MV. The average GY was 3292 and 4786 kg/ha in the favorable season and 2378 and 2073 kg/ha in the drought season in LR and MV, respectively. This shows that reduction in GY was more severe in MV (56.6%) as compared to LR (27.7%) under drought conditions (Fig. 1a). NSS and TGW also showed significant reductions in the drought season in both LR and MV. The reduction in NSS was up to 33.3 and 30.9% whereas in TGW the reductions were up to 23.4 and 16.7% in LR and MV, respectively (Fig. 1b,c). For the remaining traits, reductions were moderate to very low (Fig. 1d–f). For example, SL was least affected by drought (Fig. 1d) in both LR and MV, while SN (Fig. 1e) and HI (Fig. 1f) displayed moderate reductions of 8.1 and 8.4% and 7.5 and 6.7% in LR and MV, respectively.

Figure 1
figure 1

Rate of reduction in grain yield in kg/ha, p < 0.01 (a) and yield parameters (b number of spikelets per spike, p < 0.001; (c) thousand grain weight in g, p < 0.01, (d) spike length in cm, p < 0.05, (e) spike number, p < 0.01and (f) Harvest index, p < 0.05 in drought (2018) and favorable (2019) seasons. The solid and dotted boxes represent reductions LR and MV, respectively.

The correlations were estimated between traits for the two groups (LR and MV) and two seasons i.e. drought (2018) and favorable (2019). In general, the correlation between traits were higher in the drought season of (Fig. S1a,b) than in the favorable season of 2019 (Fig. S1c,d). In the favorable season, except for HI (r = 0.236, p < 0.05) in LR, correlations of all traits with GY were insignificant in both LR and MV. In the drought season, the correlation of three traits (SL, NSS and HI) became stronger and significant with GY in LR (SL and GY, r = 0.265 at p < 0.05; NSS and GY, r = -0.33 at p < 0.01; HI and GY, r = 0.367 at p < 0.001). In MV too, the correlation of all three spike traits (SL, SN and NSS) became stronger with GY in the drought season (SL and GY, r = 0.463 at p < 0.001; SN and GY, r = 0.391 at p < 0.001; NSS and GY, r = 0.432 at p < 0.001).

Distribution of SNPs and haplotype blocks, population structure and LD decay

A total of 15,208 filtered SNPs was used, after filtering for 30% missing data and minor allele frequency ≥ 0.05, for all subsequent analysis. The distribution of SNPs showed maximum number of SNPs on chromosome 2B (1263) followed by chromosomes 3B (1196), 7A (1195) and 5B (1191). Chromosome 4D showed the least number of SNPs. Based on the linkage disequilibrium approach28, a total of 2568 haplotype blocks were constructed from 15,208 SNPs that covered a total genome length of 14,050 Mb (Table S4). The highest number of haplotype blocks were obtained on chromosome 5B (225) followed by chromosomes 3B (209) and 2B (205) (Table S4, Fig. S2).

The population structure was determined by 3-dimensional PCA that revealed a clear distinction between LR and MV (Fig. 2a). Three subgroups were evident in the PCA plot. Subgroup 1 was formed by all modern varieties regardless of the fact whether these were bred for irrigated or semiarid environments while subgroups 2 and 3 were formed by landraces from Turkey and Afghanistan, respectively (Fig. 2b). The landraces from Iran formed a diffused group and a few Iranian landraces overlapped with the other 3 groups. We also computed pairwise average FST (population differentiation coefficient35) between different subpopulations. Overall, the FST analysis indicated a moderate genetic differentiation between MV and LR (Fst = 0.101). The pairwise Fst values, as shown in Fig. 2b, between MV and LR from Turkey, Afghanistan and Iran were 0.101, 0.090 and 0.101, respectively. The pairwise Fst value between LR from Afghanistan and Iran was 0.141 and between LR from Turkey and Iran was 0.142, while it was slightly higher between LR from Afghanistan and Turkey (Fst = 0.151) (Fig. 2b). The genetic diversity parameter, as estimated by PIC, was 0.31, 0.30 and 0.27 for the complete panel, MV and LR, respectively. The nucleotide diversity parameter π was 0.39, 0.38 and 0.36 respectively, for the complete panel, MV and LR, respectively.

Figure 2
figure 2

Three dimensional PCA plots showing two broad groups of LR and MV (a) and three subgroups of LR based on geographic origins and one group of MV (b). Pairwise Fst among different groups are shown in part (b) of the figure.

LD decayed to half of its maximum at 1.58 Mb in the complete panel, while it decayed at ~ 1.96 and 1.99 Mb in LR and MV (Fig. S3a–c), respectively. The LD decay curve was also drawn at cut off r2 = 0.1 to allow an easy comparison with various previous studies in wheat (discussed in the Discussion section below). The LD decay at cut off r2 = 0.1 was observed at ~ 5.94 Mb in the complete panel, whereas it was observed at ~ 7.58 and 8.08 Mb in LR and MV (Fig. S3d–f), respectively.

Marker trait associations (MTA) identified using SNP- and haplotype-based GWAS

We used two approaches for GWAS, SNP- and haplotype-based GWAS, to identify MTA of GY and yield parameters in both environments separately. Below, we have first described MTA identified in the drought season using both GWAS approaches followed by MTA identified in the favorable season. All MTA identified by both GWAS are summarized in Tables 1 and 2 separately. The common genomic regions identified by both GWAS are also described, especially when these were co-located with the known metaQTL for GY or known genes in wheat governing yield-related traits (Table S5).

Table 1 Marker trait associations identified by SNP-GWAS in drought affected (2018) and favorable (2019) seasons for grain yield (GY), spike length (SL), spike number (SN), number of spikelets per spike (NSS), harvest index (HI) and thousand grain weight (TGW).
Table 2 Marker trait associations identified by haplotype-GWAS in drought affected (2018) and favorable (2019) seasons for grain yield (GY), spike length (SL), spike number (SN), number of spikelets per spike (NSS), harvest index (HI) and thousand grain weight (TGW).

A total of 47 MTA were identified for all traits using SNP-based GWAS in the drought season of 2018. For GY, 12 MTA were identified in SNP-GWAS with percentage variation (PV) varying from 10.2 to 11.8% (Table 1). Haplotypes-GWAS, on the other hand, identified 15 MTA associated with GY with PV ranging from 10.4 to 20.1% (Table 2). Interestingly, a hot spot region (724,940,015–730179871 Mb) was identified on chromosome 7B by haplotype-GWAS, where four haplotypes (H7B-123, H7B-124, H7B-127 and H7B-128) were associated with GY (Table 2). Figure 3 shows the estimated yield advantage at the five best haplotype blocks and the five best SNPs. The favorable alleles at these five haplotype blocks showed invariably yield increase of > 700 kg/ha (Fig. 3I–V), whereas the favorable alleles at the best five SNPs (Fig. 3VI–X) showed yield advantage varying from 352 kg/ha with SNP BS00067150_51_5A) to 710 kg/ha with SNP AX-158592462_7B. Figure 4 shows heat map of the panel showing the distribution of favorable alleles at the five best haplotype blocks in LR and MV. Only ten MV showed a favorable allele at two of the five best haplotypes. Clearly, MV are devoid of favorable alleles of the best five haplotypes identified here. Table S6 shows 16 important landraces that have been identified to carry favorable alleles of two or more than two haplotypes for GY. These 16 landraces showed an average GY of 3436 kg/ha in the drought season and are important for introgression of high allele effect haplotypes into modern varieties. Both GWAS approaches identified common genomic regions for GY on chromosomes 3B, 4A and 7B, of which the genomic regions identified on chromosomes 4A and 7B were located within the two metaQTL regions reported for GY (MQTLs 31 and 63) (Table S5). The genomic region identified on chromosomes 3B was located in the TaERF3-3B gene, which plays an important role in grain size and development (Table S5).

Figure 3
figure 3

Allelic effects of five best haplotypes (H1A-42, H2A-71, H4A-48, H7B-123 and H7B-124) and five best SNPs (AX-158598372_3B, wsnp_Ex_c7451_12757458_3D, BS00067150_51_5A, AX-89582418_6B and AX-158592462_7B) associated with GY in the drought season. The favorable haplotype alleles are underscored.

Figure 4
figure 4

Heat map of the panel showing distribution of favorable haplotype alleles as green vertical rectangle for GY in LR and MV at five best haplotype blocks (H1A-42, H2A-71, H4A-48, H7B-123 and H7B-124). Each vertical black line represents an individual accession and each green vertical rectangle represents the favorable allele of the corresponding haplotype block labelled on the left. The numbers on the x-axis show first and last serial number of the LR (1–84) and MV (85–157) corresponding to the serial number in Supplementary Table S1.

For the three spike measurements SL, SN and NSS, SNP-GWAS identified 7, 7 and 4 MTA while haplotype-GWAS identified 6, 7 and 14 MTA, respectively (Tables 1, 2). Most notably for NSS, a 5.6 Mb genomic region (775,824,676–781,493,870 Mb) was identified on chromosome 3B by haplotype-GWAS, where 6 haplotype blocks (H3B-191—H3B-196) were associated with NSS with very high PV varying from 27.9 to 33.2%. The common genomic regions identified by both GWAS for the three spike traits were on chromosomes 1B (SL), 3B and 7B (SN), and 2A (NSS) (Table S5) and these were located within three metaQTL regions (MQTLs 1B.3, 3B.5, 7B.5 and 2A.3) reported for GY under drought or heat stress environments (Table S5). For HI, nine MTA each were identified by both GWAS with PV varying from 9.1 to 24.0% in SNP-GWAS and 16.9 to 31.1% in haplotype-GWAS. A common genomic region was identified on chromosomes 7B (Table S5), which was located in metaQTL7B.2 reported for GY. Interestingly, on chromosome 6A, a constitutive MTA (identified in both seasons) was detected for HI by SNP- and haplotype-GWAS (Table S5), which could not be mapped to any known metaQTL or gene and hence an interesting candidate for future studies. Eight and seven MTA were identified for TGW by SNP- and haplotypes-GWAS with PV varying from 8.7 to 11.5% and 11.0 to 17.0%, respectively. A common genomic region on chromosome 5B, identified by both approaches for TGW, was located in metaQTL45 for GY. Most importantly, three known TGW genes, TaTGW6-A1 (chromosome 3A), TaTPP-6A (chromosome 6A) and TaCwi-A1-2 (chromosome 2A), were identified to be associated with TGW in the drought season by a combination of both GWAS (Tables 1, 2).

In the favorable season of 2019, 11 and 21 MTA were identified for GY with SNP- and haplotype-GWAS, respectively (Tables 1, 2). Two common regions were identified on chromosomes 3A and 3B by both GWAS (Table S5), of which the genomic region identified on chromosome 3B fell within metaQTL26 for GY. In addition, two important genes governing yield-related traits were identified for GY on chromosomes 6A (TaAGL6-A) and 6B (TaSPL21-6B) by SNP- and haplotype-GWAS, respectively (Tables 1, 2). For the three spike measurement traits, 9, 4 and 5 MTA and 9, 9 and 8 MTA were identified by SNP- and haplotype-GWAS for SL, SN and NSS, respectively. The common genomic regions identified for the three spike traits by both GWAS were on chromosomes 1B, 5B and 7B for SL, 6A for SN and 2B for NSS (Table S5). Of these, the genomic regions on chromosomes 5B and 7B for SL were in metaQTL5B.1 and metaQTL63, respectively, and the genomic region on chromosome 2B for NSS was in metaQTL18. For HI, SNP- and haplotype-GWAS identified 7 and 11 MTA, respectively, with a common genomic region on chromosomes 3B (557,088,909–557,097,177 Mb), which could not be mapped to any known metaQTL for GY or gene. Three and four MTA were identified for TGW by SNP- and haplotype-GWAS, respectively, and no common genomic region was detected by two GWAS for TGW. Two MTA by SNP-GWAS and one MTA by haplotype-GWAS could be mapped within known metaQTL (Tables 1, 2).

Comparison of SNP and haplotype-based GWAS

A comparison of both approaches showed that the haplotype-GWAS was more effective in identifying MTA with high PV values as compared to SNP-GWAS. For example, haplotype based GWAS identified 15 MTA for GY in the drought season, of which 7 haplotypes showed high PV values of more than 15.0% and 8 haplotypes showed moderate PV values between 10.0 and 14.5%. The SNP-based GWAS, on the other hand, identified 12 MTA in the drought season and all 12 MTA showed moderate PV values of 10.2 to 11.8%. The average PV explained for GY by all associated haplotypes was 15.0 and 7.3%, whereas it was 10.8 and 4.6% by SNPs in the drought and favorable seasons, respectively. A similar trend i.e., higher PV explained by haplotype blocks as compared to SNPs was observed for all the traits in both seasons except NSS in favorable season (Fig. 5).

Figure 5
figure 5

Percentage variation explained by SNPs and haplotypes associated with the traits. GY grain yield, SL spike length, SN spike number, NSS number of spikelet per spike, HI harvest index, TGW thousand grain weight.

Signatures of selection by EigenGWAS and Fst analyses in LR and MV

We conducted EigenGWAS and F-statistical test (FST) to identify genomic regions that have been differentially selected between LR and MV, hence must have played an important role in adaptation or other selected traits. Overall, EigenGWAS identified 90 SNPs with significant PGC (PGC < 0.01) values (Table S7, Fig. 6). At all loci, one of the SNP alleles was almost fixed in either LR or MV (Fig. 7). The pattern of contrasting allele frequencies in the LR and MV at each of the 90 loci supports the fact that these loci are under differential selection (Fig. 7, Table S7).

Figure 6
figure 6

Manhattan plot from EigenGWAS in a panel of Turkish landraces and modern varieties highlighting SNPs that showed signatures of selection. The X-axis represents chromosome numbers and Y-axis represents the corrected P value, also called PGC.

Figure 7
figure 7

Frequencies of alleles in LR and MV at 90 loci showing signature of selection.

In the F-statistical test, SNPs above the threshold of Fst > 0.101 were declared to be under differential selection based on the average Fst 0.101 ± 0.007 (SD) between LR and MV. A total of 85 SNPs showed Fst > 0.101 and 82 of these were common with EigenGWAS (Table S7). The A genome showed the largest number of SNPs under selection (54) followed by the B (25) and D (11) genomes. Based on the genome wide LD threshold of ~ 6 Mb, SNPs within 6 Mb region were merged to declare a selection region and thus 39 selection regions were identified. Notably, on chromosomes 4A and 6B, multiple selection hot spot regions were observed (Fig. 6, Table S7). On chromosomes 4A, five selection regions were identified. Of these, four selection regions were 0.04 Mb (100,634,209–100,673,242 bp), 6.0 Mb (113,854,870–119,931,428), 2.8 Mb (542,827,852–545,618,766) and 0.02 Mb (570,265,418–570,478,671) long, while the fifth region was at 640 Mb identified by one SNP (AX-89422359). Likewise, on chromosome 6B, seven selection regions were evident; three regions were identified by only one SNP at each region at 77.7, 112.9 and 567.4 Mb while the remaining four were 4.2 Mb (92,433,077–96,648,725), 5.4 Mb (103,667,878–109,143,478), 6.5 Mb (650,045,929–656,565,987) and 0.01 Mb (678,175,501–678,344,340) long.

We aligned known genes, published QTL/meta-QTL and MTA identified in GWAS studies in wheat for flowering-time, adaptation and yield and yield-related traits to understand the biological relevance of the selection regions identified here. Table 3 shows selection regions linked to known genes/meta-QTL known for plant adaptation and yield related traits. Out of 39 selection regions, 15 (i.e., 38.4%) were within the proximity of known functional genes controlling flowering [PRR-A1, PPR-D1, TaHd1-6B) and yield and yield components (TaSus2-2B, TaGS2-B1, AG1-1A/WAG1-1A, DUO-A1, DUO-B1, AG2-3A/WAG2-3A, TaLAX1, TaSnRK210-4A, FBP, TaLAX1, TaPIL1 and AP3-1-7A/WPA3-7A). Six selection regions were close (within 5 Mb) to QTL/metaQTL reported for yield and yield related traits. Eight selection regions showed homology with genes coding for ion/amino acid/sugar transporters, transcription factors and regulatory genes (Table S7). For the remaining selection regions, annotated candidate genes were identified from the EnsemblPlant database, however, their molecular functions are not known.

Table 3 Known genes/QTL, meta-QTL for plant adaptation and yield traits identified by EigenGWAS and Fst analysis in the present study.

Most importantly, the selection regions also included known genes for quality (Pina-D1, Glu-D1, TaSS4-1A and Tamyb10-A1) and disease resistance (Yr78, Yr5) (Table S7). We validated some of the selection regions by genotyping the panel with 32 gene-based KASP assays available for known genes related to flowering, yield related traits, quality and disease resistance in wheat and calculated the allele frequencies and Fst in both groups (Table S8). The results clearly showed that seven genes out of 32 showed signatures of selection (Fst > 0.101) and 5 (PRR73-A1-4A, TaSus2-2B, Glu-D1, Pinb-D1 and Yr5) of these were also identified in EigenGWAS (Table S8). In addition, Rht-B1 and Ppd-D1 genes showed signatures of selection but these were missed in EigenGWAS probably due to low density of SNPs around these two genes in this study. Interestingly, LD analysis of the 7 genes revealed that these were in high LD with each other (Fig. S4). Table S9 shows gene-based association mapping for all traits in the two seasons.

Discussion

GWAS has been extensively deployed in wheat to detect genomic regions associated with complex agronomic traits38, however, the approach has not been explored much in combination with selective sweep analysis14,39. While GWAS effectively identifies large-effect loci, its application is limited to the germplasm and phenotypes used in analyses and it often misses the opportunity to detect allelic changes associated with signatures of breeding selection40. Insights gained from a joint GWAS and selective sweep analysis expand the opportunities to exploit both associated loci and selection footprints for designing effective breeding strategies14. The explosion of SNPs in the post genomics era, the concomitant advancements in statistical tools and the availability of high-resolution reference genomes in wheat have provided unprecedented opportunities to perform such integrated analyses and to efficiently apply the data for genomics-assisted breeding.

In the present study, we have characterized a panel of Turkish winter wheat landraces (LR) and modern varieties (MV) by GWAS and signatures of selection analyses to identify the genes and MTA associated with improved yield under drought for deployment in breeding. The LR collection investigated here is a representative set of superior landraces, drawn from a large collection done by IWWIP almost a decade ago from 2009 to 201441. All traits showed significant variation in the panel, supporting the inclusion of all traits in GWAS and signature of selection analyses. The GY in the drought season of 2018 was lower than reported in previous studies on winter wheat germplasm in Konya, Turkey12,42. A comparison of yield-related traits such as TGW and HI also revealed significant reductions in these parameters in the present study (16–23% in TGW and 6.0–7.0% in HI) as compared to previous studies (10–12.0% in TGW and 2.0–3.0% in HI), suggesting a more devastating effect of drought in 2018 than previous years12,42. The total rainfall during 2018 growing season (149.6 mm) was 32.7 and 38.4% lower than that during 2016 (222.4 mm) and 2017 (243.0 mm) growing seasons in Konya12. A comparison of broad sense heritability (H2) of GY and TGW with previous studies in both spring and winter wheat revealed a similar trend as found here10,11,12,42,43, i.e., a low H2 for GY and high H2 for TGW across years. With regards to HI, the present and previous studies showed variable results and our results were in line with those which showed lower H2 for HI44. The correlations of traits with GY were higher and significant under drought conditions in both LR and MV than in favorable environment, as was also observed in a previous study45. The highest positive correlation between HI and GY in LR suggested that HI is a primary determinant of GY under stress in landraces, whereas in MV it was the spike length. The panel characterized in this study showed high genetic diversity, with an average PIC of 0.31, comparable to winter wheat sets from Australia, Kazakhstan or Croatia46,47,48 and higher than observed in panels of winter wheat germplasm from the US49,50,51. The PIC and π statistics in MV were comparable to the values estimated in the LR, indicating that the genetic diversity of the MV is maintained during crop improvement processes. The International Exchange germplasm Set contains 50% of breeding lines derived from the IWWIP breeding program through partnership with CIMMYT. The diversity in CIMMYT germplasm has been reported to be high through routine introductions of synthetics and other wheat wild relatives52,53. The LD decay observed was at ~ 5.94 Mb for the complete panel at cut off r2 = 0.1, which is in the range reported for highly diverse germplasm sets18,54,55,56,57. The panel revealed a clear distinction between LR and MV, suggesting LR as valuable genetic resource for introgression of novel alleles into MV. Further, Afghan and Turkish landraces formed two distinct groups whereas Iranian landraces overlapped these two groups. This indicates genetic differentiation of Afghan and Turkish landraces and interchange of Iranian landraces with neighboring countries through seed exchange.

SNP- and haplotype-based GWAS were used to identify genomic regions associated with improved GY and yield components under rainfed conditions. We used an LD-based approach to construct genome wide haplotype blocks. The LD-based approach reflects recombination history of the population and thus is the best among all the methods developed to construct haplotype blocks28. The average number of haplotype blocks per chromosome was similar to the findings in a synthetic panel14 and higher than obtained in a panel of central European winter wheat germplasm58. We found very few common MTA between favorable and drought stress seasons for all the studied traits with both GWAS methods. Similar results have been found in many studies in which QTL under optimal and stress conditions have been compared and it is attributed to different evolutionary trajectories induced by contrasting environmental conditions leading to activation of different sets of genes11,14. Together, both GWAS identified 18 known genes to confer yield advantage in wheat under different water regimes. Of these, the allelic variation in seven genes (TaERF3, TaSnRK23, TaARF12, TaDEP1, TaTGW6, TaSPL21, TaCwi-A1) has been shown to be associated with agronomic traits22,59,60,61,62,63. It was demonstrated recently that two genes, TaARF12 and TaDEP1, encoding an auxin response factor and the G-protein γ-subunit, respectively, control both plant height and grain weight pleiotropically and both genes are positively selected in Chinese cultivars over the course of breeding22. Further, they showed that TaARF12 and TaDEP1 interact epistatically with Rht-1 locus, suggesting that plant height and yield traits have been selected simultaneously during modern wheat breeding.

Haplotype-based GWAS identified a higher number of MTA (26) that overlapped with known meta-QTL and/or the signatures of selection (27) identified in the present study when compared to SNP-based GWAS (16 and 8, respectively). The MTA overlapping with signatures of selection (i.e. showing significant P values in EigenGWAS) in both GWAS can be potential future breeding targets after validation. Present results corroborate previous studies and reinforce that haplotypes-based GWAS identifies QTL with better statistical significance (i.e. better P-values and higher R2) than SNPs12,18,19,30. We obtained 4 to 18% higher PV for the traits in haplotype-GWAS as compared to SNP-GWAS. Out of the five best high effect haplotype blocks associated with GY in the drought season, four showed signatures of selection (H2A-71, H4A-48, H7B-123 and H7B-124) and hence are interesting breeding targets. The haplotype blocks H7B-123 and H7B-124 are in proximity of the TaSBEIb gene, which codes for a starch branching enzyme (SBE) 1,4-alpha-glucan branching enzyme involved in starch biosynthesis. Starch deposition occurs synchronously with grain development in wheat and its accumulation is greatly affected under drought and heat stress conditions because of significant reduction in the activities of the key enzymes involved in the conversion of sucrose to starch including SBE64,65. The SNPs in haplotype block H2A-71 fell in the region of TraesCS2A02G295400 (Table S10). The ortholog of TraesCS2A02G295400 in rice OsGIF1 encodes a cell-wall invertase required for carbon partitioning during early grain-filling66,67. Recently, the pleiotropic role of GIF1 gene has been suggested regulating the sizes of stems, leaves and grains in rice67. The SNPs in haplotype block H4A-48 showed homologies with various transcription factor genes including Zinc finger C2H2-type (TraesCS4A02G310700), which are known as master regulators of abiotic stress responses in plants such as drought68. The gene network analysis of this candidate gene showed that it is interacting with five other genes (Fig. S5) involved in diverse pathways and regulating leaf relative water content, stomatal resistance, harvest index, days to heading and chlorophyll content; a suite of drought-adaptive traits. Similarly, the SNPs in H1A-42 show homologies with regulatory/transcription factor genes belonging to AP2/ERF domain superfamily (TraesCS1A02G058400) and transporters such as sugar phosphate transporter (TraesCS1A02G058600, TraesCS1A02G058700) (Fig. S6). The gene network analysis indicated the involvement of these genes in multiple stress pathways including drought and cold tolerance and disease resistance. The heat map showed that the favorable alleles of these five high affect haplotypes are either missing in modern varieties or present in less than 10% frequency. Hence, their deployment is essential to enhance the existing gene pool for novel drought stress tolerance alleles and for further yield improvement of the modern germplasm.

We used EigenGWAS and Fst analysis between LR and MV, to identify the footprints of selection that are linked to adaptation and yield improvement. We identified selection footprints in 39 genomic regions. Of these, 15 selection regions were within proximity of known functional genes in wheat controlling flowering (PRR-A1, PPR-D1, TaHd1-6B) and yield and related component traits (TaSus2-2B, TaGS2-B1, AG1-1A/WAG1-1A, DUO-A1, DUO-B1, AG2-3A/WAG2-3A, TaLAX1, TaNHX1-4D, TaSnRK210-4A, FBP, TaPIL1 and AP3-1-7A/WPA3-7A) (Table 3). A comparison between the sets of genes identified by GWAS and signatures of selection analysis reveals unambiguously that the two approaches pulled out entirely different sets of genes, thus expanding the repertoire of genes that can be utilized in breeding. Notably, the frequencies of early flowering alleles were high in MV for all flowering genes showing signatures of selection (PRR-A1, PPR-D1, TaHd1-6B), whereas for the yield-linked genes selection of alleles was quite variable between the landraces and modern germplasm.

For deployment in breeding, two strategies can be followed. In the first strategy, the genes that showed higher favorable allele frequency in LR (for example, TaSus2-2B, DUO-B1, AG2-3A/WAG2-3A, TaSnRK210-4A, TaNHX1-4D and TaPIL1) can be deployed for increasing the frequency of favorable alleles in MV through targeted crosses and marker-assisted selection. The second strategy could be allele mining of genes with higher frequencies of favorable alleles in the modern germplasm to identify new allelic variations. For example, genes such as TaGS2-B1, TaLAX1 and DUO-A1 could be mined for additional SNP variation that could not be captured here and their association with drought adaptive traits could be re-explored as has been done in case of TaGS2-B1 gene69.

Many previous studies in wheat have shown selection signatures for Vrn-1 loci14,23,70,71. In our study and an earlier study72, signatures of selection were not observed for Vrn-1. All three major Vrn 1 loci (Vrn-A1, Vrn-B1 and Vrn-D1) showed a balancing selection for winter habit alleles in both LR and MV. The present results therefore suggest that, unlike in spring wheat, the contribution of Vrn-1 loci in shaping the evolution of the winter wheat is not a significant one. The flowering time genes PRR-A1, PRR-D1, TaHd1-6B and Ppd-D1 showed signatures of selection indicating the important roles these genes play in fine tuning the crop growth cycle of modern germplasm to increase their adaptability to wider cultivation zones of the country (Turkey) and elsewhere (Iran and Afghanistan). The high LD between PRR-A1 and Ppd-D1 genes and significant association of PRR-A1 and Ppd-D1 with GY and yield related traits (R2 of 32.9–58.8%) further confirm their importance in providing yield advantage to MV (Table S9).

Intriguingly, we found selection footprints for only two gene(s) on homoeologous chromosomes. One gene was identified on chromosomes 4A at 113–120 Mb (PRR-A1) and 4D at 341 Mb (PRR-D1), respectively. The other gene was DUO-1 on chromosomes 1A (DUO-1-1A) and 1B (DUO-1-1B). Such observations are common and have been reported in previous studies in wheat73,74. It has been suggested that directional selection rarely acts on multiple advantageous mutations across homoeologous regions. This happens to prevent fitness loss that might occur due to simultaneous mutations in the three copies of the genes on homoeologous chromosomes73. The study by74 showed simultaneous selection of one SNP in LEC2 (LEAFY COTYLEDON2) gene on chromosome 2A and 2B. Several selective sweep regions were identified on chromosome 4A and 6B, in the present study. Several selective sweeps on chromosome 4A were also identified in wheat germplasm from Iran and Pakistan23, suggesting high selection pressure on genes from this chromosome in wheat from multiple geographies. The longest selective sweep identified on chromosome 4A was a 6 Mb region in the vicinity of the PRR-A1 gene75. PRR-A1 gene is a paralog of Ppd1 gene for photoperiod insensitivity. This is not surprising considering the important role played by this gene in fine tuning the flowering times of wheat, especially during stress conditions. Another important selection region was in the vicinity of TaSnRK210-4A gene, coding for a sucrose non-fermenting 1-related protein kinase and regulating grain weight and spike length in wheat. Interestingly, a selection region on chromosome 4A was identified at 100.6 Mb where candidate gene search showed proteins/domains of unknown function (DUF) 3527. Since it was selected in the MV in very high frequencies, we assume that it must be playing an important role in adapting the plants to new environments. Although no definite role could be ascertained for DUF3527, evidences are being generated for other families of DUF proteins. For instance, expression profiling of DUF4228 genes was investigated in Arabidopsis exposed to multiple abiotic stresses (osmotic, salt and cold) and results suggested the involvement of DUF4228 genes in the pathways of plant resistance to abiotic stresses76.

Conclusion

The identification of many favorable haplotypes from landraces associated with improved GY under drought stress conditions indicates that the landraces have considerable potential towards enhancing the existing gene pool for drought stress tolerance. Sixteen landraces have been identified carrying multiple haplotypes alleles and showing GY from 3000 to 3781 kg/ha. These landraces should be deployed in breeding to expand the repertoire of drought tolerance alleles in the current germplasm for further yield improvement. Further, the genes identified in signatures of selection analyses should be subjected to allele mining in the modern germplasm to identify additional, yet unexplored, superior alleles.