Introduction

Rising global warming instigate the changes in pattern of rainfall, disease incidence and vulnerable atmospheric temperature for crops1,2. Despite the advances in improved technology for agricultural crops, climatic changes jeopardize the genetic potential of crop productivity3,4,5. Increasing threats of high temperature restrict sustainable crop production6. Projections from global climate models portend an anticipated surge of 5.8 °C in average temperatures by the end of twenty-first century7. Wheat stands as a pivotal staple crop globally, contributing to 70% of caloric intake and 12–15% of protein consumption for humans8. However, heat stress instigates irreversible impairment to wheat crop development. Each degree Celcius increase in temperature above 32 °C during anthesis and grain filling reduces 1.0–4.2% grain yield in wheat1,9. In Pakistan, wheat planted between 20th October to 20th November and each day delay in sowing reduces 1.2% grain yield10. Nevertheless, approximately 20% of wheat cultivation adheres to the normal planting, with deviations attributed to delay harvesting of rice and cotton crops11,12. Late planting curtails the wheat crop life cycle by completing growing degree days earlier, thereby exacerbating heat stress conditions13,14,15. High temperatures intricately regulate photoperiodic and vernalization-sensitive genes, precipitating a reduction in grain filling duration and impeding the plant's capacity to harness available resources effectively16,17,18. Selection of desirable physiological traits associated with thermotolerance provides opportunities for crop improvement and genetic yield gain.

Heat stress exerts its influence on photosynthesis by impeding the functionality of photosystem-II and the electron transport chain mediated photosystem-I19,20. It also reduces the chlorophyll content and chloroplast integrity due to leaf senescence, thereby impeding the overall process of photosynthesis21,22,23. Cell membrane is composed of lipids and proteins, regulates enzymatic activity and ion transport. High temperature disrupts the hydrogen bonding between proteins and adjacent fatty acids, consequently perturbing membrane fluidity24,25. Additionally, leaf senescence hampers the synthesis of photosynthetic products, impeding their translocation into developing grains26,27,28.

Leaf senescence during the reproductive phase reduces green leaf area due to decreased chlorophyll and carotenoid content, which are essential for photosynthesis. High temperatures disrupt chloroplast integrity and accelerate leaf senescence, impairing photosynthesis in wheat. Therefore, stay-green during anthesis to physiological maturity assures the retention of chlorophyll content and maintains photosynthesis under adverse environmental conditions29,30,31.

Proline accumulation in wheat is regulated by proline dehydrogenase activity and Δ1-pyrroline-5-carboxylate synthetase/reductase (P5CS)32. High temperatures increase P5CS activity and decrease proline dehydrogenase, leading to proline synthesis from glutamate under heat stress. At temperatures of 35–40 °C, proline content can increase by up to 200%, enhancing the defense mechanisms, photosynthetic efficiency, and yield of wheat seedlings33. Proline content accumulation also stabilizes the photosynthesis and antioxidant enzyme activity, and acts as osmo-protectant against heat stress conditions34.

Cooler canopy determines the stomatal conductance that maintains evapo-transpiration and photosynthesis in wheat35,36,37. These intricate cellular and physiological processes coupled with membrane fluidity dynamics determine wheat growth and development under heat stress conditions.

Selection for thermo-tolerance can be accomplished utilizing physio-morphic traits and molecular markers but lack of knowledge regarding genetic basis of thermo-tolerance. Earlier, genetic diversity among accessions was identified using physio-morphic traits but integration of molecular markers into the selection process holds the promise of enhanced efficiency, reliability, expedience, and robustness, while concurrently mitigating susceptibility to environmental variability. Among array of molecular markers, simple sequence repeats (SSR) markers stand out for their high polymorphism, co-dominance, and replicability throughout the entire genome38,39. Furthermore, SSR markers are widely favored in genetic mapping endeavors, as they yield more informative data compared to biallelic SNPs markers40,41. Utilization of SSR markers serves to augment marker density in specific genomic regions, thereby facilitating the construction of comprehensive genetic maps42,43,44.

Genome wide associations and linkage mapping/bi-parental mapping are methods to decipher genetic architecture underlying quantitative traits45,46. Association mapping is an alternative to quantitative trait loci (QTL mapping), serves to identify novel genes or loci leveraging diverse cultivars, landraces or elite lines. It aids in understanding the genetic basis of quantitative traits related to thermo-tolerance in wheat. Its advantages over bi-parental based mapping include broader genetic divergence among population and high resolution47,48,49,50. Association mapping investigatethe relationship among phenotypic variation and genetic polymorphism based on linkage disequilibrium. Linkage disequilibrium (LD) denotes the non-random alleles associations at different loci and its frequency subject to deviate by factors such as mating system, recombination rate, genetic drift and population dynamics. Elucidating LD pattern improves the accuracy and precision of association mapping endeavors51,52,53,54. However, a notable challenge in association mapping is false positive associations that ensue from family relatedness. Mixed Linear Model (MLM) has been devised to effectively eliminate the false positives in association mapping42,55,56.

Previous studies predominantly emphasized agronomic attributes viz., plant height, tillers per plant, leaf area, grains per spike, thousand grain weight under invariable environmental conditions11,45,57 but identification of marker trait associations (MTAs) under heat stress environment for physiological attributes is limited. Scarcity of knowledge regarding genomic regions linked to thermo-tolerant traits served as the impetus for this research endeavor. Therefore, current study was meticulously designed to unveil the genomic regions controlling physio-morphic traits via association mapping leveraging a panel of 186 SSR markers. Such insights not only facilitate breeders in delineating marker-assisted selection strategies under heat stress conditions but also lay the groundwork for a deeper comprehension of the genetic architecture governing thermo-tolerance in wheat.

Materials and methods

Plant material

A diverse panel of 158 wheat accessions from Pakistan and CIMMYT (Mexico), encompassing entries from 23rd and 24th SAWYT (Semi-arid wheat yield trial) as explained in Supplementary Material Table 1.

Soil composition

Prior to sowing, basic soil analysis were performed for electrical conductivity, pH58, soil moisture59, soil texture classification60, potassium level61, extractable phosphorus content62 and total nitrogen content63. Soil texture was determined to be sandy clay, EC 0.19 dS m−1, pH 7.65, moisture content 9.14% and organic carbon in soil 0.52% showing deficiency i.e. < 1% organic matter. Nitrogen content was measured 0.41 mg g−1 in soil whereas extractable potassium, Olsen phosphorus and ammonium nitrates in soil were 82.8, 1.8 and 0.64 mg kg−1, respectively. Therefore, fertilizers were applied comprising N@118 kg/ha and P@ 58 kg/ha. Agronomic practices such as weeding, hoeing and irrigation were diligently executed in accordance with the specific requirements of wheat cultivation.

Growing conditions and experimental layout

Wheat accessions were systematically planted using triplicated Augmented Complete Block Design along with check (control) cultivars namely Pakistan-13 and Gold-16 with thirteen blocks (12 accessions with 2 checks in each block) at the shelter house of PMAS-AAUR (33.117283°N, 73.010958°E) Pakistan. Each accession was planted in a single 3-m row and checks were replicated in each block. Planting was carried out during the 1st week of November coinciding with the wheat growing season (Rabi season) spanning 2018–2019, 2019–2020 and 2020–2021. Temperature in the shelter house was maintained 21 ± 0.8 °C day/15 ± 0.7 °C night (controlled set) and 26 ± 1.2 °C day/18 ± 0.9 °C night day/night (heat treated set) with relative humidity maintained 70–75% for 7 days at tillers stage Zadoks scale 3964. Subsequent data collection on physiological attributes occurred after a recovery period of 4–5 days. At the anthesis stage, as per Zadoks scale 69, temperature conditions were adjusted to 26 °C during the day and 18 °C at night for the controlled set, while the heat-treated set experienced elevated temperatures of 32 °C during the day and 20 °C at night, with a relative humidity range of 70–75% over another 7-day interval. Data recording on physiological attributes ensued following a similar recovery period of 4–5 days.

Data recording

Photosynthetic rate and transpiration rate

Photosynthetic rate and transpiration rate was measured spanning from 10:00 am to 12:00 pm utilizing an advanced infrared gas analyzer (IRGA), LCA-4, ADC, Joddeson, UK. Measurements conditions were constant at CO2 360 mmol mol−1 and PAR-1600 mmol m−2 s−1 65.

Proline content

Proline content were quantified involving extraction of 0.5 g leaf ground suspended in 3% sulphosalicyclic acid solution subsequently centrifuged for 10 min @ 10,000 rpm. Supernatant (4 mL) was reacted with glacial acetic acid (4 mL) and boiling at 100 °C in water bath for 1 h and 8 mL toluene was added after cool down on ice for 30 min and absorbance was estimated on spectrophotometer at 520 nm66. Proline content were determined using formula

$$ \mu moles\;proline/g\;of\;plant\;sample = \left[ {\frac{{\left\{ {\frac{{\upmu {\text{g}}\;proline/ml \times ml\;toluene}}{{115.5 \;\upmu {\text{g}}/\mu moles}}} \right\}}}{sample \left( g \right)/5}} \right] $$

Cell membrane injury

Cell membrane injury was recorded from flag leaf samples. Two distinct sets were washed with ddH2O and subsequently filled with 4 mLddH2O water. One set was exposed to heat stress at 40 °C for 1 h whereas the control was treated at room temperature (25 °C). After heat treatment, additional 16 mL ddH2O was added in heat treated set and incubated 24 h at 10 °C. Initial electrolyte leakage (T1 and C1) was estimated with electric conductivity meter. These sets were autoclaved for 10 min and second electrolyte leakage (T2 and C2) was measured67. Cell membrane injury was calculated using the CMI (%) formula given below:

$$ CMI \left( \% \right) = 1 - \left[ {\frac{{1 - \left( {T1/T2} \right)}}{{\left\{ {1 - \left( {C1/C2} \right)} \right\}}}} \right] \times 100 $$

Canopy temperature depresssion

Canopy temperature was recorded within the time frame of 10:00am to 12:00 pm with infrared thermometer68. Canopy temperature depression (CTD) involved subtracting the air temperature (Ta) from canopy temperature (Tc) as outlined by formula provided below.

$$ CTD = Ta - Tc $$

Stay green

In order to collect data for the "stay-green", green leaf area and spikes were scored visually (0–9 scale) at intervals of 3–4 days from anthesis to maturity using the LAUG approach69 and calculated using the formula below

$$ LAUG = \Sigma \left[ {\left\{ {Yi + \frac{{Y\left( {i + 1} \right)}}{2}} \right\}times \left( {t\left( {i + 1} \right){-} ti} \right)} \right] $$

where t(i + 1)—ti = Time between two consecutive reading in days and Yi = Difference of green spike and flag leaf at time.

Grain yield

Grain yield was calculated from each plant and computed the average in grams.

Genotyping and marker analysis

DNA extraction was performed in 96 wells plate70. A set of 186 SSR primers was randomly selected for polymerase chain reaction (PCR) covering all 21 wheat chromosomes (Supplementary Material Table 2). M13 tail (CACGACGTTGTAAAACGAC) was synthesized at the 5′ end of the forward primer. Four fluorescent dyes (PET, NED, HEX and FAM) were used for PCR with different primers and PCR products were analyzed on DNA analyzer ABI-373071, a sophisticated instrument renowned for its precision and reliability in DNA fragment analysis, facilitating the elucidation of genetic information encoded within the amplified DNA sequences.

Statistical analysis

Physiological trait data were analyzed utilizing PROC MIXED, employing a fixed effect for entries and a random effect for blocks. Best linear unbiased predictions (BLUPs) were then computed to derive genotypic means while mitigating environmental variability, employing the SAS software package72. Descriptive statistics and relative performance metrics were computed for both physiological traits and grain yield73. Additionally, heritability, a crucial parameter indicative of the genetic control of traits and provides the valuable insights into the inheritance patterns of the observed traits was estimated using a standard formula.

$$ H^{2}_{(B.S.)} = \sigma^{2} G/\sigma^{2} G + (\sigma 2e/nE) $$

where \(\sigma^{2}\) G represents the variance of genotype variance, \(\sigma^{2}\) e is residual variance and nE is number of environments.

Principal component analysis for physiological traits was performed, extracting Eigen values, variance proportions and loading factors using the R-4.0.3 software package, facilitated by Factoextra, FactoMiner and ggplot2 libraries (accessed on 15 February 2023). Additionally, correlation analysis was performed and diagram was constructed using corrplot package. Cluster analysis was performed and dendrogram was drawn employing ggplots package in R-4.0.3, contributing to a comprehensive understanding of the interrelationships and clustering patterns among the physiological attributes.

For genotypic data, fragment analysis and allele calling was computed on GeneMapper software v3.774. Allelic frequency and polymorphism information content (PIC) were calculated for determination of genetic diversity75. Genotypic data were analyzed by admixtured model in STRUCTURE76. For genetic dissimilarity among individuals, 1–12 clusters were presumed with 5 independent runs. For each run, simulation length 100,000 replications and Burn-in length 50,000 cycles were adopted to estimate lnPr(X|K) peak in the range of 1–12 subpopulations. DeltaK value based on the Evano criterion was performed using the web-based program STRUCTURE HARVESTER v0.6.9377. PCoA was performed in Statistica78 whereas cluster analysis employing the UPGMA (Unweighted Pair-Group Method with Arithmetic mean) with 1000 permutations using Jacords method executed in software DARwin 6.079 and genetic dissimilarity matrix was employed to construct dendrogram by FigTree v1.3.180. Linkage disequilibrium was assessed at 95% confidence interval for rare allele frequency and significant (P-value < 0.001) correlations (r2 > 0.1) for each pair of loci were considered in software TASSEL V4.3.181. Extent of linkage disequilibrium (P < 0.01 with r2 > 0.1) for each genome and chromosome was estimated82. MTAs were identified using Mixed Linear Model and convergence criteria were set at highly significant P ≤ 0.001 value in default run with 1000 number of iterations in TASSEL V4.3.183. Bonferroni correction was subsequently applied by dividing P ≤ 0.05 by the number of SSR marker to establish the more stringent threshold. Bonferroni correction was subsequently applied by dividing P ≤ 0.05 by the number of SSR marker to establish the more stringent threshold.

Ethics approval and consent to participate

"This study complied with relevant institutional, national, and international guidelines and legislation of Pakistan. The plant material was obtained from Pakistan and CIMMYT (Mexico). No special permissions were necessary to collect samples. Otherwise, the plant materials used and collected in the study comply with Pakistani guidelines and legislation".

Results

Phenotypic data analysis

Normal wheat planting faced optimum temperature at tillering (20 ± 1.2 °C) and anthesis stage (25 ± 1.5 °C). However, wheat crop was exposed to heat stress at tillering (24 ± 0.8 °C) and anthesis (32 ± 1.5 °C) as displayed in Fig. 1A,B. In the present study significant phenotypic variation was observed among the 158 wheat accessions under both conditions (Table 1). Relative performance of photosynthetic rate at vegetative stage indicated 42.5% reduction under heat stress than normal conditions. Photosynthetic rate at reproductive stage was reduced by 47.9% under heat stress conditions. Transpiration rate was reduced by 37.8% and 58.6% at the vegetative and reproductive stage, respectively. Proline content exhibited an increase of 14.4% and 21.8% at the vegetative and reproductive stages under heat stress, respectively. Mean cell membrane injury was recorded at 29.1% (vegetative) and 19.7% (reproductive), canopy temperature depression at 11.1 °C (vegetative) and 9.0 °C (reproductive), leaf angle at 29.6°, and 'stay green' at 47.1 under heat stress conditions. Broad-sense heritability ranged from 0.86 to 0.99 under normal conditions and from 0.55 to 0.99 under heat stress, as indicated in Table 1.

Fig. 1
figure 1

Source: Department of Environmental Science, PMAS-AAUR. (B) Temperature data during wheat life cycle 2018–19, 2019–20 and 2020–21 under heat stress conditions. Source: Department of Environmental Science, PMAS-AAUR.

(A) Temperature data during wheat life cycle 2018–19, 2019–20 and 2020–21 under normal conditions.

Table 1 Phenotypic performance under normal and heat stress conditions.

Genetic variability using multivariate analysis

Multivariate analysis techniques including principal component analysis, correlation analysis and cluster analysis were employed to discern the genetic variability among wheat genotypes based on their physiological traits and grain yield. Principal component analysis unveiled that first six principal components exhibited more than 1 Eigen value, collectively contributing 73.7% of the total variability. However, graphical representation of all variables was constructed, emphasizing the highest contribution from the first two principal components, which accounted for 41.2% of the variability (Fig. 2). PC1 and PC2 contributed 26.9% and 14.4% of the variability, respectively. Highest contributing traits were proline content, transpiration rate at vegetative and reproductive stage, canopy temperature depression at reproductive stage and stay green under both conditions. Cluster analysis based on the dataset revealed seven distinct clusters of wheat genotypes under both normal and heat stress conditions. Remarkably, genotype G23 exhibited the highest proline content under normal conditions, whereas genotype G18 displayed this trait prominently under heat stress conditions, at both vegetative and reproductive stages. Additionally, genotype G-133 showcased the highest transpiration rate under normal conditions, while genotypes G-117 and G-65 demonstrated this trait prominently under heat stress conditions.

Fig. 2
figure 2

Graphical representation of scree plot showing the percentage of variability of principal components and contribution of traits towards variability of physiological traits under normal and heat stress conditions.

Correlation analysis was conducted to explore the relationships between physiological traits and grain yield under both normal and heat stress conditions, as depicted in Fig. 3. Transpiration rate exhibited a synergistic association with photosynthetic rate at both stages, irrespective of the environmental conditions. Conversely, cell membrane injury demonstrated an antagonistic relationship with all studied traits; however, these associations were found to be non-significant at both the vegetative and reproductive stages. Notably, 'stay green' exhibited a significant positive correlation with transpiration rate, photosynthetic rate, and proline content under both normal and heat stress conditions. Furthermore, grain yield demonstrated a synergistic relationship with transpiration rate, photosynthetic rate, proline content, and grain yield under both normal and heat stress conditions.

Fig. 3
figure 3

Graphcial display showing the association of physiological traits under normal and heat stress conditions where n.s: non-significant, * at p = 0.05, ** at p = 0.01 and *** p = 0.001.

Cluster analysis was implemented utilizing Ward's method to stratify the wheat genotypes into distinct groupings, as depicted in Fig. 4. This method effectively segregated the wheat genotypes into four primary clusters denoted as Cluster-A, Cluster-B, Cluster-C, and Cluster-D. Cluster-A exhibited further subdivision into three subgroups: cluster-1A, comprising 16 genotypes; cluster-2A, comprising 15 genotypes; and cluster-3A, comprising 18 genotypes. Similarly, Cluster-B showcased division into subgroups namely cluster-1B consisting of 16 genotypes, cluster-2B consisting of 9 genotypes, cluster-3B consisting of 13 genotypes and cluster-4B consisting of 10 genotypes. Moreover, both Cluster-C and Cluster-D manifested subdivision into two subgroups. Cluster-1C and cluster-2C comprised of 24 and 15 genotypes respectively within Cluster-C and cluster-1D and cluster-2D consisted of 10 and 12 genotypes respectively within Cluster-D. Cluster-A predominantly comprised wheat varieties sourced from various institutes across Pakistan, while clusters-B, cluster-C and cluster-D primarily encompassed wheat genotypes obtained from CIMMYT advanced lines and Pakistani varieties recommended for rainfed areas.

Fig. 4
figure 4

Dendrogram representing the grouping of wheat genotypes based on their physiological traits under normal and heat stress conditions.

Allelic frequency and PIC

Allelic frequency and polymorphic content determines the genetic variability among chromosomes and genomes. However, 341 alleles (2–13 alleles) were calculated on genome-A whereas 246 alleles (2–10 alleles) on genome-B and 275 alleles (2–9 alleles) on genome-D. Notably, the highest allelic frequency was observed on genome-A (4.94), followed by genome-B (4.56), and genome-D (4.37). Among chromosomes, highest mean alleles (5.00) possessed by chromosome 1 (Table 2). Remarkably, highest allelic frequency (13 alleles) was calculated on marker xgwm44 (Supplementary Material Table 3). Moreover, the polymorphic information content (PIC) value was maximized on genome-A (0.688), succeeded by genome-B (0.663) and genome-D (0.658). Noteworthy, chromosome 1 boasted the highest PIC value (0.712), followed by chromosome 7 (0.690) and chromosome 3 (0.684).

Table 2 Genetic diversity in wheat genomes revealed by 186 SSR markers.

Population structure of wheat collection

Population structure was derived to determine genetic similarity among the wheat accessions. Remarkably, DeltaK (K = 4) clearly partitioned the wheat accessions into four groups (Fig. 5). Subpopulation G1 exhibited the highest number of wheat accessions (55) comprising primarily accessions sourced from the 23rd Semi-Arid Wheat Yield Trial (SAWYT) originating from CIMMYT and certain varieties historically cultivated in Pakistan including Chakwal-97 and Chakwal-86 (Fig. 6). Subpopulation G2 encompassed 41 wheat accessions including lines collected from 24th SAWYT and certain varieties popular in Pakistan including Dirk, Rawal-87, Mexipak, Bahawalpur-97, Sariab-92 and TD-1. Meanwhile, subpopulation G3 comprised 33 wheat varieties primarily cultivated in the rain-fed regions of Pakistan, inclusive of popular varieties like Khirman, Sarsabz, Zardana, SKD-1, Sarhad-82, Pirsabak-91, and Pirsabak-05. Lastly, subpopulation G4 encapsulated 29 wheat varieties predominantly favored in irrigated areas, such as Faisalabad-08, Miraj-2001, Wafaq-2001, Lasani-08, Punjab-11, and Shahkar-13, suggesting a shared ancestral origin among these accessions.

Fig. 5
figure 5

Population structure estimation to identify the DeltaK from population ranging from 1 to 10. DeltaK is the function of K, the peak K = 4 represent the 4 subpopulations.

Fig. 6
figure 6

Bayesian approach based Population Structure of 158 wheat genotypes observing 4 clusters based on inferred ancestry analyzed by 186 SSR markers. Each color segment represents the membership fraction of each genotype. Horizontal coordinates represents the codes of wheat accession in Supplementary material Table 1.

Cluster analysis also categorized the wheat accessions into distinct four groups based on their genetic dissimilarity (Fig. 7). Cluster 1 (C-I) comprised 30 accessions, with 29 originating from the 23rd Semi-Arid Wheat Yield Trial (SAWYT). Meanwhile, Cluster 2 (C-II) encompassed 43 accessions, among which 36 were derived from the 24th SAWYT. Cluster 3 (C-III) featured 40 lines, with 37 representing varieties prevalent in the rain-fed regions of Pakistan, alongside three belonging to the 24th SAWYT. Cluster 4 (C-IV) consisted of 45 accessions, with 38 representing well known varieties in irrigated areas of Pakistan, three sourced from the 23rd SAWYT and four from 24th SAWYT. Additionally, as an alternative to the Bayesian approach, principal coordinate analysis was employed to characterize wheat accessions. First three principal components (PCs) collectively accounted for 21.01% of the variation (PC1: 10.69%, PC2: 6.10%, and PC3: 4.22%), those distinctly segregated the 158 wheat accessions into four major groups (Fig. 8).

Fig. 7
figure 7

Clustering of wheat genotypes into four clusters based on dissimilarity matrix using Jacords method with 1000 permutations. Different color lines represent each cluster viz., C-I, C-II, C-III and C-IV. Codes of accessions are provided in the supplementary material Table 1.

Fig. 8
figure 8

3D plotting of principal coordinate analysis based on SSR marker analysis of the wheat accession. Different circles represent the geographical ecotype and codes provided in supplementary material Table 1.

Linkage disequilibrium and linkage disequilibrium decay

SSR identified LD pattern evident at chromosome and genome level. In current investigation, a total of 20,107 linked locus pair were discerned across all three genomes of wheat. Notably, highly significant loci (P < 0.001) exhibiting linked locus pairs with r2 > 0.1, r2 > 0.2 and r2 > 0.5 were 1553 (7.7%), 665 (3.3%) and 125 (0.6%) respectively (Table 3). Genome-A had 10,753 linked locus pairs and highly significant locus pairs (P < 0.001) with r2 > 0.1, r2 > 0.2 and r2 > 0.5 were 1159 (10.8%), 267 (2.5%) and 24 (0.2%), respectively. Genome-B displayed 4429 linked locus pairs, among which those demonstrating significance (P < 0.001) with r2 > 0.1, r2 > 0.2 and r2 > 0.5 were 513 (11.6%), 275 (6.2%) and 66 (1.5%) linked locus pairs, respectively. Additionally, Genome-D exhibited 4925 linked locus pairs and P < 0.001 with r2 > 0.1, r2 > 0.2 and r2 > 0.5 were 214 (4.4%), 123 (2.5%) and 35 (0.7%) linked loci, respectively.

Table 3 Linkage disequilibrium (LD) pattern on individual genome at significant (P-value < 0.001) linkage disequilibrium utilizing 186 SSR markers.

In Genome-A, the chromosome 1A exhibited the highest number of paired loci (8001) with significant associations (P < 0.001) observed at varying levels of r2 thresholds including r2 > 0.1, r2 > 0.2 and r2 > 0.5 were 621, 202 and 13 paired loci, respectively (Table 4). Within the B genome, chromosome 5B was distinguished by a notable presence of highly significant (P < 0.001) paired loci, with counts at r2 > 0.1 (96), r2 > 0.2 (48), and r2 > 0.5 (16). Similarly, among the D genome chromosomes, chromosome 1D showcased significant paired loci at P < 0.001 with r2 > 0.1 (57), r2 > 0.2 (32) and r2 > 0.5 (6). Linkage disequilibrium (LD) decay was conducted across the entire genome as well as individual chromosomes of wheat. Particularly, the largest LD block of 15–20 cM was discerned on chromosome 1A. On average, the LD block within the chromosomes of genome-A exhibited a range of 5–10 cM at P < 0.01 and r2 > 0.1 (Supplementary material Table 6). Contrastingly, the LD decay was notably less than 5 cM, for both the genome-B and genome-D.

Table 4 Linkage disequilibrium (LD) pattern on individual chromosome at significant (P-value < 0.001) linkage disequilibrium utilizing 186 SSR markers.

Marker trait association

MTA analysis was conducted utilizing the set of 186 SSR markers to explore associations with traits observed under both normal and heat stress conditions (Table 5). A consistent set of marker trait associations, exhibiting resilience across both standard and heat stress circumstances, focusing on key physiological traits such as transpiration rate, photosynthetic rate and proline content with two distinct associations identified, denoted by the markers Xcfa2147 and Xwmc418. Similarly, a parallel stability was observed with two notable associations for grain yield, attributed to the markers Xcfd30 and Xbarc8, thus underlining the robustness of these markers across varied environmental conditions. Moreover, the relative performance of physiological attributes such as cell membrane injury, canopy temperature depression, leaf angle, and stay green was leveraged to discern marker trait associations under heat stress conditions. As a result, a total of 16 marker trait associations, underscoring the efficacy of this methodology in elucidating key genetic relationships amidst challenging environmental conditions.

Table 5 Marker trait associations (MTAs) for physiological traits under normal and heat stress.

Individually, ten highly significant (P < 0.001) MTAs were identified under normal conditions while forty one MTAs were observed under heat stress conditions for grain yield and physiological attributes but here we discussed the most significant associations using a Bonferroni correction for stringent threshold cut-off at P < 0.00027. Notably, proline content at the reproductive stage exhibited a significant correlation with marker Xbarc42 (3D at 17 cM) under normal conditions. Conversely, under heat stress, the transpiration rate at the vegetative stage demonstrated an association with marker Xwmc418 (3B at 37 cM), while the transpiration rate at the reproductive stage was linked to marker Xgwm233 (7A at 7 cM). Moreover, the photosynthetic rate at both vegetative and reproductive stages exhibited correlations with marker Xgwm494 (3A at 37 cM). Canopy temperature depression at the reproductive stage displayed highly significant associations with markers Xcfa2129 (1A at 79 cM) and Xwmc201 (6A at 43 cM) under heat stress conditions. Furthermore, cell membrane injury showed an association with marker Xbarc163 (4B at 35 cM) at the vegetative stage, while at the reproductive stage, it was linked to marker Xwmc463 (7D at 73 cM) specifically under heat stress conditions. The 'stay green' trait exhibited a significant association with marker Xbarc49 (5A at 83 cM) under heat stress conditions.

Discussion

Temperature influences wheat growth and development. Optimal temperature at reproductive stage viz., heading (15–22 °C), anthesis (23–26 °C) and grain filling (26–28 °C) is the prerequisite for crop growth that maintains the physiological processes and metabolic activities3,84,85,86,87,88,89. Heat stress exposed the wheat plants to 3–4 °C above the threshold level and prompting the completion of growing degree days (GDD) earlier14. In present study, the high temperature during tillering and anthesis stage restricts the efficiency of physiological and metabolic processes. High temperature induces the leaf senescence due to high canopy temperature that destabilizes the physiological processes viz., cell membrane stability, photosynthesis and transpiration. Therefore, wheat plants capable of maintaining traits such as 'stay-green' characteristics, cooler canopy temperatures, transpiration rates, photosynthetic rates, proline content, and cell membrane integrity exhibit enhanced potential for grain yield under heat stress conditions90,91,92,93,94. Furthermore, the moderate to high heritability values observed across these traits suggest a degree of uniformity in genotype performance across different growing seasons, consistent with previous findings28,95,96.

Multivariate analysis techniques for assessing genetic diversity prove instrumental in unraveling the spectrum of variability within wheat germplasm and identification of desirable traits in breeding programs. Principal component analysis and cluster analysis enabled the quantification of genetic variability based on the phenotypic traits97. It simplifies the high dimensional data into fewer one. Principal component analysis serves as a powerful tool for gauging the extent of variation across traits and delineating their respective contributions to overall variability98. In current study, proline content, transpiration rate, canopy temperature depression and stay green emerged as key determinants, streamlining the selection process for enhancing wheat yield in subsequent breeding endeavors under heat stress conditions. Notably, these traits have been previously underscored by researchers as pivotal targets for bolstering wheat productivity in the face of heat stress challenges, underscoring their significance in breeding programs aimed at mitigating adverse environmental conditions99,100.

Moreover, association of physiological traits with grain yield is also essential to unveil the complex quantitative traits. Assessing the strength of linear relationships, one can effectively discern and prioritize traits that exhibit significant correlations with grain yield, thereby facilitating the selection of desirable attributes crucial for enhancing wheat crop productivity11. Notably, under heat stress conditions, a noteworthy positive correlation was observed between grain yield and physiological parameters such as proline content, transpiration rate, photosynthetic rate, and 'stay-green' characteristics, while a negative correlation was noted with cell membrane injury.

High temperature induces a rise in canopy temperature, exerting a profound impact on the integrity of cell membranes within wheat leaves. It prompts the increase in kinetic energy within the molecular bonds connecting proteins and the lipid bi-layer of cell membranes, leading to bond rupture and subsequent electrolyte leakage3. This disruption in cell structure invariably interferes with pivotal physiological processes, including photosynthesis and transpiration rate90. High temperature also enhances the proline content in tolerant plant due to conversion of glutamate into proline content, serving as a protective mechanism against changing environmental conditions34.

Cluster analysis serves as a valuable tool in the identification and selection of promising wheat lines with diverse genetic backgrounds, thereby informing targeted breeding programs aimed at enhancing crop traits and productivity101. In the present study, wheat genotypes were stratified into four distinct clusters, each represents the specific genetic profiles and origins. Cluster 1 comprised genotypes sourced from various research institutes across Pakistan, while clusters 2 and 3 predominantly consisted of lines derived from the 23rd and 24th Semi-Arid Wheat Yield Trial (SAWYT) of the International Maize and Wheat Improvement Center (CIMMYT). Interestingly, cluster 4 possessed genotypes representing a mixture of Pakistani cultivars and CIMMYT lines, suggesting the potential shared parentage owing to the historical introduction of Pakistani cultivars from CIMMYT germplasm. Clustering approach has previously been employed to elucidate patterns of similarity and genetic diversity within wheat genotypes99,102 that assist breeder in understanding of wheat germplasm dynamics and aiding in the selection of superior breeding lines.

Polymorphism information content allelic frequency on genomes and chromosomes provides valuable insights into the distribution of genetic diversity and aids in targeting gene rich regions of genome and chromosome11,46,103. Higher number of alleles and PIC value represents the higher genetic rich regions. In the present investigation, the analysis revealed that the allelic frequency and PIC values followed the order of genome-D being the lowest subsequently genome-B and with genome-A exhibiting the highest values suggesting that genome-A harbors regions of greater genetic richness compared to genomes-B and genome-D among the Pakistani wheat cultivars and CIMMYT lines. These results delineate the potential targets for further exploration and exploitation in breeding programs aimed at enhancing wheat genetic diversity and productivity.

Bayesian approach based population structure utilizes relatedness frequencies among accessions in each group indicate geographic habitat104,105 resulting in the formation of four clusters as determined by the software STRUCTURE. These results from the population structure analysis of the two CIMMYT trials, 23rd SAWYT and 24th SAWYT, and the historic varieties from various Pakistani provinces agreed with that expected based on the pedigree records. Prior, two main groups were predicted viz., CIMMYT lines and Pakistani accessions based on their geopgraphical origin but there was greater genetic variability in wheat accessions that lead to four groups. The unexpected grouping accessions may be due to geographical origin, selection of germplasm and varietal age that infleuences the structure of wheat accessions.

Consistency and reliability of this grouping methodology are reaffirmed through cluster analysis and principal coordinate analysis (PCoA) as reported earlier11,106,107. PCoA, serving as an alternative visualization tool for genotype data based on genetic distances, delineates the wheat accessions into four distinct groups, corroborating the findings of the population structure analysis. Moreover, the utilization of the unweighted pair group method with arithmetic mean (UPGMA) clustering further confirms the division of wheat accessions into four groups, thereby validating the results derived from the population structure analysis and phenotypic evaluation of wheat based on physiological attributes and grain yield. Genetic similarity within each sub-population suggested common parentage among the constituent accessions, particularly evident post-1960s, where a majority of Pakistani wheat varieties either directly originated from CIMMYT or were developed through cross-breeding with CIMMYT germplasm108. This historical context likely accounts for the notable genetic resemblance among lines with varieties such as SOKOLL, PASTOR, KAUZ, and WBLL from CIMMYT emerging as apparent progenitors for many Pakistani wheat accessions (Supplementary material 1).

Association mapping requires presence of LD and strong association among linked loci indicated LD decay109. Therefore, a comprehensive analysis employing 186 SSR markers was undertaken to elucidate the LD pattern across each genome and chromosome. Understanding the LD pattern intensifies the precision of identifying marker-trait associations (MTAs) both at the chromosome and genome levels, thereby facilitating a deeper understanding of the genetic architecture underlying quantitative traits53,54,110.

LD decay was determined using paired loci and their genetic distance measured in centimorgan111,112. LD decay within shorter genetic distance designates the higher resolution of association mapping and vice versa. Therefore, the detection of larger LD blocks within shorter genomic regions becomes imperative for enhancing the precision of association mapping113,114. In current study, LD decay was the highest (5–10 cM) on the A genome and chromosome 1A (15–20 cM) representing highest number of loci in relation to other chromosomes. Notably, chromosome 1A emerges as a prime candidate for the identification of genomic loci linked to target traits, especially among Pakistani and CIMMYT wheat accessions, highlighting its significance in future breeding endeavors.

Population structure greatly affects the association mapping efficiency and may identify false positives due to spurious correlations115,116. Traditional approaches often encounter difficulties in accurately identifying genes associated with quantitative traits within a single population due to the inherent variability in genetic differentiation and phenotypic expression across diverse geographical regions117. Statistical methods were developed to manage population structure to minimize the detection of false positives but that resulted in the detection of false negative associations56,118. Mixed Linear Model remove these associations using Q matrix extracted from population structure and relative kinship matrix among loci115,119.

Association mapping emerges as a pivotal tool for unraveling the intricate genetic architecture underlying quantitative traits associated with thermo-tolerance in wheat, offering insights into the molecular mechanisms governing heat stress responses. In the present study, several MTAs were identified for transpiration rate (3B and 7A), canopy temperature depression (1A and 6A), photosynthetic rate (3A), cell membrane injury(4B, 7D) and ‘stay green’(5A) under heat stress conditions whereas MTA were identified for proline content (3D) under normal conditions. Marker trait associations were previously reported for stay green on chromosome 5B, 5A, 4A, 4B, 4D, 3B, 7B120,121, photosynthetic rate on 4A, 5A, 7A122,123, canopy temperature depression on 2D, 4D, 6A124,125 and membrane stability on 7D126. However, intriguingly, leaf angle failed to exhibit significant associations with any marker under heat stress conditions, potentially attributed to limitations in the SSR markers employed for association mapping or inherent disparities in the genetic composition of the studied genotypes. Identification of these MTAs serves as a fundamental aspect for devising marker assisted selection strategies against thermo-tolerance in wheat.

Conclusion

Marker trait association analysis has revealed genomic regions associated with key physiological traits involved in heat stress tolerance in wheat. Proline content, transpiration rate, canopy temperature depression, and stay green have emerged as robust selection criteria for identifying thermo-tolerant germplasm, exhibiting positive correlations with grain yield in wheat. PIC value and allelic frequency was perceived from lowest to highest D < B < A suggesting higher genetic rich regions for targeting on genome A than B and D. Moreover, principal component analysis and cluster analysis, employing phenotypic data, successfully categorized wheat accessions into four distinct groups. These findings were consistent with results obtained from principal coordinate analysis (PCoA), cluster analysis, and population structure analysis using genotypic data, thereby indicating their geographical origins. LD decay on chromosome 1A suggested the potential for uncovering the genes associated with heat tolerant trait. Stable markers assiciated with physiological traits under normal and heat stress provides valuable insights for future MAS strategy for improving thermo-tolerance in wheat.