Introduction

Bread wheat (Triticum aestivum L.) is a strategic cereal worldwide and can feed approximately 30% of the global population and provide 25% of the calorie consumed by humans1. Owing to rapid population growth, climate change, and abiotic stress incidence in the world, wheat productivity needs a 2.5% of yield increase yearly. Therefore, to meet future demand, plant breeders face the challenge of increasing wheat production up to 70% by the 2050s2. Drought stress adversely influences wheat productivity by disrupting a variety of bio-physiological and metabolic activities, and thereby giving rise to yield loss by a diminution in biomass3. Seed morphometric properties are explored as basic parameters in digital seed analysis4, enhancing the understanding of seed response to drought stress and providing data for research on wheat breeding in water-limited conditions5,6,7. There are only a few reports on investigating seed physical traits in the previous researches8, and such data have been focused on energy dissipation (thermodynamic) or shape, volume, surface area, sphericity, aspect ratio, density, and moisture (dimensional) properties. Seed physical traits, such as shape and size, are found effective for grain storage and processing. These features may be helpful, among others, for food scientists, processors, and engineers. The composition of the seed is influenced by the seed number, cultivar, water availability, temperature, light, and maturity9.

Genomics-by-sequencing (GBS) is a method for evaluating genetic variation and discovering new markers based on the advent of next-generation sequencing technologies10. This approach has been used to discover the complicated agronomical properties of wheat using molecular markers such as single nucleotide polymorphisms (SNP). They have also been recognized as key elements in genome-wide association studies11. This approach is aimed at detecting genomic regions that are either QTLs, genes, or markers related to important traits for gene introgression, gene discovery, or marker-assisted breeding12. Genetic markers detected by GWAS enable the dissection of genetic structure and diversity across many loci. This can enable wheat breeders to discover and use genomic loci controlling drought tolerance13.

Exploring the genetic basis of complicated quantitative traits by innovative technologies is critical to wheat breeding programs14. Genome-wide association mapping (GWAS), an efficient approach to dissecting the genetic foundation of complex traits, first genotypes a large collection of accessions with a lot of single-nucleotide polymorphisms (SNPs) distributed throughout the genome and then tests their associations with agronomic traits15. Association mapping has been successfully utilized to evaluate several agronomic traits in a range of plants/crops, including alfalfa16, sorghum17, soybean18, maize19, and rice20. Although GWAS has been widely adopted to examine agronomic characteristics in wheat, only a few studies used this approach for seed-related properties in drought-stressed wheat genotypes. In an attempt, Rahimi et al.21 demonstrated that bread wheat landraces from Iran possess favorable alleles, which are adaptive to water deficit. They also observed marker-trait associations (MTAs) within protein-coding regions that can be used in the molecular breeding of novel wheat cultivars. Such studies also provide important data about MTAs, which can assist plant breeders in the marker-assisted selection schedules22.

The objective of this study was to perform a genome-wide association analysis for seed morphometric traits in Iranian bread wheat. Seed morphometric traits were utilized in association studies to uncover putative QTLs responsible for key seed traits in water-limited conditions.

Results

Phenotypic data summary

The effects of genotype and genotype × environment for seed morphometric traits in the whole population were significant at 0.001 probability level (Table 1). The results of the box plot of 34 morphometric traits of wheat seeds for cultivars and native landraces in favorable conditions (well-watered) and stress (rain-fed) are shown in Fig. 1. The means of all traits under stress conditions decreased compared to normal conditions in both cultivars and native landraces. In both environments, the highest length (Feret) and width (Breadth) were found in landraces and cultivars, respectively. There was no significant difference between cultivars and landraces for the most important morphometric traits of seeds, i.e. Area, Area.1, Area.2, Volume, thickness, and 1000-kernel weight (TKW). Overall, the diversity and distribution among native landraces were higher than those of cultivars in both well-watered and rain-fed environments.

Table 1 Mean, coefficient of variation (CV), broad sense heritability (H2), and combined analysis of variance based on studied traits in 298 Iranian wheat landraces and cultivars.
Figure 1
figure 1

Box-plot representation of the distribution for a total of 34 morphometric seed traits for Iranian wheat landraces and cultivars in the well-watered and rain-fed environments.

The highest correlation was observed between TKW and volume under stress (r = 0.76**) and normal (r = 0.85**) conditions. There was also a high correlation between circ and TKW in rain-fed (r = 0.73**) and well-watered (r = 0.83**) environments, which indicates that the more round the seed, the heavier it will be (Fig. 2; Supplementary Table 1).

Figure 2
figure 2

Correlation coefficients between morphometric seed traits for Iranian wheat landraces and cultivars in the well-watered (A) and rain-fed (B) environments.

Assessment of SNPs

The number of imputed SNPs includes 15,951, 21,864, and 5710 markers for genomes A, B, and D, respectively, including 36.7, 50.2, and 13.1% of total SNPs (Fig. 3A). The highest number of markers used in all chromosomes except chromosome 4 is related to genome A. The highest number of markers 4034 is related to chromosome 3A and the lowest number of markers 270 is related to chromosome 4D (Fig. 3B).

Figure 3
figure 3

Number of imputed SNPs used in wheat chromosomes (A) and genomes (B).

Linkage disequilibrium (LD)

LD assessment indicated that this indicator varies between chromosomes and across each chromosome and it usually decreases with rising distances between SNP locations. A total of 1,858,425 marker pairs with r2 = 0.211 were identified in cultivars, of which 700,991 (37.72%) harbored significant linkages at P < 0.001. The strongest LD was recorded between marker pairs on chromosome 4A (r2 = 0.367). Based on the observations, most of the significant marker pairs were found at a distance of < 10 cM. Genomes D and B possessed the lowest and highest number of significant marker pairs (63,924) and (370,359), respectively. A similar analysis on landraces identified a total of 1,867,575 marker pairs with r2 = 0.182, of which 847,725 (45.39%) harbored significant linkages at P < 0.001. Similar to cultivars, marker pairs on chromosome 4A showed the strongest LD (r2 = 0.369). Moreover, most of the significant marker pairs were found at a distance of < 10 cM. Genomes D and B possessed the lowest and highest number of marker pairs (92,702 and 427,017), respectively (Supplementary Table 2).

Population structure and Kinship matrix

The genetic relationship of accessions in the wheat population was assayed via the Kinship matrix derived from imputed SNPs. Population structure analysis indicated the highest value of ΔK for K = 3 (Fig. 4A,B). The estimated principal components for the population revealed that PC1, PC2, and PC3 explain 16.94, 6.34, and 2.30% of genotypic variations, respectively (Fig. 4C). As expected, a population structure was identified in the Iranian wheat landraces, with the first five eigenvalues accounting for 30.50% of genetic diversity.

Figure 4
figure 4

Determination of subpopulations number in wheat genotypes based on ΔK values (A), A structure plot of the 298 wheat genotypes and landraces determined by K = 3 (B). Principle component analysis (PCA) for a total of 298 Iranian bread wheat accessions (C). Cluster analysis using Kinship matrix of imputed data for Iranian wheat accessions (D).

The clustering analysis determined three major groups with different levels of admixture, where Group I consists of 6 cultivars and 107 landraces, Group II includes 4 landraces and 70 cultivars, Group III includes 14 cultivars, and 97 landraces (Fig. 4D). From the imputed SNP data, a total of 19 Cultivars appeared to be mixed with the two native landrace groups. The admixed Cultivars originated include the Sivand, Neishabour, Ghods, Azadi, Mahdavi, 4820, and Shahi. A neighbor-joining tree indicated that both cultivars and landraces were divided into two groups based on the imputed SNPs (Supplementary Fig. 1). In an analysis based on cultivars, two groups with 42 and 48 accessions were obtained. Native landraces were also divided into two groups with 98 and 110 accessions. The reason for each group's location can be due to the characteristics of the parents and the place where they came from.

Genome-wide association studies for morphometric seed traits using mrMLM, 3VmrMLM, and MLM

A total of 257 and 74 MTAs were identified by mrMLM and MLM models under well-watered conditions, respectively, using the imputed SNPs at a significance value of LOD > 3 (mrMLM) and 0.05/m (MLM). Of the total MTAs in the mrMLM method, 95, 99, and 63 MTAs were related to genomes A, B, and D, respectively. Out of 74 MTAs in the MLM method, 27, 31, and 16 MTAs belonged to genomes A, B, and D, respectively. Genome B with 38.5% (mrMLM) and 41.9% (MLM) had the highest number of significant MTAs. Therefore, the mrMLM approach led to the most MATs. The number of significant MTAs for Frete, Breadth, Thickness, Area, Perim, Circ, Volume, and TKW traits by using the mrMLM method were 9, 8, 9, 7, 7, 7, 12, and 10, respectively, and according to the MLM method were 5, 1, 0, 2, 4 2, 0, and zero, respectively. Based on mrMLM and MLM methods, the highest number of significant MTAs was related to ArBBox.1 and Concavity (18 and 10 MTAs, respectively) (Fig. 5A,B).

Figure 5
figure 5

GWAS results for seed traits in Iranian wheat landraces and cultivars. (A) mrMLM (Well-watered), (B) MLM (Well-watered), (C) mrMLM (Rain-fed), (D) MLM (Rain-fed), (E) 3VmrMLM.

More significant MTAs were identified in rain-fed than well-irrigated conditions, i.e., a total of 246 and 67 MTAs were recorded based on mrMLM and MLM methods, respectively. Of these MTAs, 110, 105, and 31 mrMLM-based MTAs, as well as 30, 33, and 4 MLM-based MTAs were related to genomes A, B, and D, respectively. Genome A and B had the highest percentage of significant MTAs with 44.7% and 49.6% based on mrMLM and MLM, respectively. The number of significant MTAs for Frete, Breadth, Thickness, Area, Perim, Circ, Volume, and TKW traits according to the mrMLM method were 8, 4, 10, 7, 4, 7, 8, and 8, respectively, and according to the MLM method were 3, 3, 2, 0, 0, 0, 1, and 1, respectively. Based on mrMLM and MLM methods, the highest number of significant MTAs were related to Concavity (11 and 7 MTAs, respectively) (Fig. 5C,D). Circular Manhattan plots were plotted for common regions associated with seed traits (Fig. 6; Supplementary Fig. 2).

Figure 6
figure 6

Circular Manhattan (A) and QQ-plots (B) to draw common regions associated with TKW in Iranian wheat landraces and cultivars. Inner to outer circles represents average trait for the mrMLM and MLM methods in the well-watered and rain-fed environments, respectively. The chromosomes are plotted at the outmost circle where thin dotted blue and red lines indicate significant levels at P-value < 0.00001 (0.05/m, Bonferroni), respectively. Black dots indicate genome-wide significantly associated SNPs at P-value < 0.00001 (0.05/m, Bonferroni)), probability levels. The scale between ChrUn and Chr1A indicates − log10 (p) values. Colored boxes outside on the top right side indicate SNP density across the genome where green to red indicates less dense to dense.

In this study, we adopted a three-variance component mixed model method, 3VmrMLM, for detecting QTNs and QTN-by-environment (QEIs). A total of 140 MTAs were identified by 3VmrMLM model, using the imputed SNPs at a significance value of LOD > 3. A total of 64, 60, and 16 MTAs based on 3VmrMLM, were related to genomes A, B, and D, respectively. Genome A with 45.7% had the highest number of significant MTAs. Therefore, the 3VmrMLM approach led to the most MATs. The number of significant MTAs for Frete, Breadth, Thickness, Area, Circ, Volume, and TKW traits by using the mrMLM method were 2, 3, 0, 6, 6, 5, and 4, respectively (Fig. 5E). QTN-by-environment interactions using 3VmrMLM for Area, Perim, Frete, Breadth, and TKW are reported in Table 2 and other traits in Supplementary Table 3.

Table 2 A summary of QTN-by-environment interactions for some seed traits of Iranian wheat using 3VmrMLM.

Gene ontology

The markers with the highest significance and pleiotropy were studied in more detail. A total of 10 high-significance markers were identified in well-watered plants, most of which were located on chromosomes 1A, 1D, 2B, 2D, 3B, 4A, and 7A. Genes encoding proteins from MTAs were involved in molecular/biological processes such as metal ion binding, ATP binding, calcium ion binding, DNA binding, positive regulation of protein catabolic process, protein ubiquitination, ionotropic glutamate receptor, ligand-gated ion channel, lipid binding, and transport, protein phosphorylation, protein kinase, oxidation–reduction, and lipid biosynthesis (Table 3). In the rain-fed plants, 10 high-significance markers were identified with the highest pleiotropy, most of those were located on the wheat chromosomes 1A, 1B, 2A, 2D, 3B, 4A, and 6A. Protein-encoded genes from MTAs were responsible for molecular/biological processes such as metal ion binding, Fe ion binding, lipid binding, and transport, oxidation–reduction, lipid biosynthesis, oxidoreductase activity, and DNA-binding transcription factor (Table 3).

Table 3 Description of expected MTAs by using the imputed SNPs for seed morphometric traits of Iranian wheat accessions exposed to the well-watered and rain-fed environments.

Based on blast gene IDs identified from the wheat reference genome, the following pathways were discovered: metabolic pathways (Supplementary Fig. 3), ubiquitin-mediated proteolysis (Supplementary Fig. 4), oxidative phosphorylation (Supplementary Fig. 5), carbon metabolism (Supplementary Fig. 6), biosynthesis of amino acids (Fig. 7a), pentose phosphate (Supplementary Fig. 7), ascorbate and aldarate metabolism (Fig. 7b), sulfur metabolism (Supplementary Fig. 8), and fatty acid elongation (Supplementary Fig. 9)23,24,25 (www.kegg.jp/kegg/kegg1.html).

Figure 7
figure 7

The KEGG pathway of biosynthesis of amino acids (A), the KEGG pathway of ascorbate and aldarate metabolism (B).

In addition, using RNA-seq data from Rahimi et al.26 the DGEs belonging to the different transcription factor (TFs) families totaled 1,377. In this study, 443 genes encoding transcription factors were identified that showed differential expression between stress and normal treatments, Approximately the same number of TFs were identified among susceptible and tolerant genotypes (356 and 328 TFs, respectively). The difference between 9 and 18 days of water deficit was associated with 250 TFs. As genotype specific TFs, the majority of these TFs belong to the MYB, AP2/ERF-ERF related, MADS-M, B3, and, bHLH classes. There were, however, other TFs that were specific to long and short-term water deficits, including bZIP, C2H2, WRKY, NAC, and MYB. Furthermore, transcriptional regulators such as TAZ, TRAF, SNF2, and mTERF were identified. A summary of identified TFs among the different sets of DEGs in wheat is given in Supplementary Table 4.

Discussion

A total of 298 Iranian wheat accessions including 208 landraces and 90 cultivars were assembled as a natural population for mapping QTLs related to seed traits using GWAS. A high level of variation found in wheat seed traits suggests the potential of GWAS for uncovering QTLs, as reported by Rahimi et al.21.

Most plant populations are structured because of artificial selection, isolation, or, nonrandom mating. As a result, genetic loci may be falsely related to traits when there is no authentic associations15. The possibility of false positives can increase in GWAS if population structure is not suitably accounted for, therefore evaluation of population structure is critical for any association mapping36,37. The panel of Iranian wheat accessions in this study was stratified into three groups. Cultivars made up one group, while landraces made up the other two groups, regardless of their geographic origins. Rahimi et al.21 observed the same groups on these Iranian wheat accessions. This mixture can be derived from grain exchanges between farmers in different local markets throughout the country15. As reported previously21, most Iranian cultivars originated from the International Maize and Wheat Improvement Center and only a small number of the cultivars derived from Iran, suggesting relatively narrow exploitation of native landraces in developing the new/old cultivars. Therefore, Iranian cultivars are suffered from a remarkable genetic bottleneck.

In accordance with previous reports, genome D indicated a low number of SNPs while most SNPs we located on the genomes B and A21,38. A similar situation was also uncovered for the number of marker pairs in LD, i.e., SNPs mapped to the genome B were about four times more common than those located on the genome D. The 3B and 2B chromosomes possess the most significant marker pairs, as reported previously21. The higher variation uncovered in the B and A genomes can be due to two reasons: (i) gene flow from T. turgidum as opposed to its absence from Ae. tauschii to T. aestivum; (ii) the output of older evolutionary history of the genomes B and A relative to genome D39,40. Furthermore, bottleneck impacts have likely happened owing to intense selection in native landraces during breeding schedules and this might lead to further impacts on genome D15. These impacts lead to a decrease in the effective population sizes, which in turn increase the loss of rare alleles in genomes B and A. A higher rate of low-frequency alleles in the D genome indicates a decrease in its allelic variant41. Of the observations in this study, most of the significant markers were present at a distance of less than 10 cM. Marker distances and LD throughout the genomes B and A were much lower than in the D genome. The higher level of linkage across three genomes in wheat cultivars reflects the impact of selection in the breeding history of those Cultivars. Population relatedness, mating systems, genetic drift, mutation, recombination, and selection are major forces affecting LD42,43,44. The fact that cultivars revealed higher LD in contrast to landraces, particularly in the genome D, is presumably a consequence of selection throughout the time of breeding efforts for key traits45.

A total of 10 and 10 MTAs by mrMLM, 3VmrMLM and MLM methods in the well-watered and rain-fed environments, respectively, were found within coding regions with P-value < 0.001. To remove any false-positive association, the most strongly markers were selected. Some MTAs discovered in this study are in line with previous reports.The GWAS identified 8 MTAs underlying seven putative QTL associated with grain perim on chromosomes 1A27, 1B27,28, 1D29, 2B27, 2D29,30,31,32, 3B27,33, and 7A28. Thus, MTA on Ch. 4A has not been reported and they are new for wheat seed perim. Six MTAs for area were found on Ch. 4A, 7A, 3B, 1D, 2D, and 3D. Earlier reports have detected MTAs/QTLs for area on Ch. 4A27,29, 7A28, 5B, 3B27, 1D29, 2D30,31. Therefore, MTAs on Ch. 3D are novel for area. Four MTAs for grain frete were recorded on Ch. 1A, 1B, 1D, and 4A in this study. Earlier research efforts have discovered MTAs/QTLs for frete on wheat Ch. 1A28, 1B28, 1D29, and 4A27. For seed breadth, two MTAs were revealed on Ch. 4A, and 7A. Previous research exhibited that this trait is linked with genomic regions on Ch. 2B, 4A29,33,34, 4B, 6A, and 7A28. The GWAS identified 4 MTAs underlying seven putative QTL associated with grain compactness and solidity on chromosomes 1A27, 2A27, 4A29,33,34, and 7A28. For instance, we detected QTLs on chromosomes 1A, 2A, 2B, 2D, 3A, 4A, 5A, 5B, and 7A for TKW under well-irrigated conditions. These observations agree with previously determined QTLs for TKW46. For rain-fed conditions, we also detected QTLs on chromosomes 1A, 1B, 2B, 2D, 3B, 4A, 5B, 6A, and 6B for TKW. These outputs are in agreement with the report by Ain et al.47 for TKW. Moreover, Gao et al.48 mapped a TKW QTL, namely QTKW.caas-7AL, in various conditions using an F8 population of Chinese spring wheat. Yan et al.22 revealed the TaGW8 gene is associated with seed size in wheat by using GWAS. Breseghello and Sorrells35 revealed a QTL on chromosome 5B that affects seed length, with a moderate impact on seed size, under normal and stress conditions. They also reported QTLs for seed sphericity on 2D, 5B, QTLs for surface on1B, 2B, 4A, and QTLs for volume on 1B, 4A, 5B, 7B in wheat. Ma et al.49 located the TaCYP78A3 gene, encoding cytochrome CYP78A3 P450, on the 7DS, 7BS, and 7AS, related to wheat seed shape and size. The authors demonstrated that silencing the TaCYP78A3 gene could reduce the seed shape and size. Earlier reports have detected MTAs/QTLs for seed traits on Ch. 7D50, 7B51, 5B52, 3B50, 3A52,53, 2D54, 2B50,51,54, 2A50, and 1A51,52,53,54. Therefore, MTAs on Ch. 5A, 1B, 6B, and 1D are novel for seed traits.

In the recent study, the flanking sequences of imputed SNPs were identified and aligned versus the RefSeq v2.055. The results indicated that most genes detected are responsible for key biosynthetic pathways. In a closer look, the proteins encoded by these genes are responsible for metal ion binding, peroxidase activity, ATP-binding, DNA-binding, protein kinase activity, enzyme inhibitor activity, etc. Such marker-trait associations have also been uncovered in previous reports56,57. These genes are found in genomic regions, which exhibit strong associations with key seed characteristics, suggesting that the genes can be regarded as favorable target genes for breeding efforts in future programs.

Analysis of RNA sequencing revealed genetic variations among genotypes as well as drought-responsive genes26. Our goal is to identify wheat genes that respond consistently to drought in dry, long-term conditions. Interestingly, we found a significantly higher number of genotype-specific DEGs in the susceptible genotype under normal and stress environments than in the tolerant genotypes, which is consistent with previous findings by Mia et al.58, and Fracasso et al.59 who both found similar expression pattern changes in susceptible materials.

From the gene network, several pathways were discovered in this study. Synthesis and elongation of fatty acids also are useful in response to drought in oats60. Protein phosphorylation contributes to a key role in wheat response to drought conditions61. Peptidase activity, DNA repair, DNA-binding transcription factor activity, and transmembrane transport were possibly responsible for drought tolerance26. Wheat avoids from oxidative stress and maintains cellular functions under drought by non-enzymatic antioxidants (ascorbate, etc.) and ROS scavenging enzymes (SOD, CAT, etc.)62. The role of ubiquitination in metabolic pathways of tea in response to drought has also been proved by Xie et al.63. Such essential roles for the biosynthesis of secondary metabolites have also been reported64. A metabolic pathway that is associated with drought stress tolerance involves genes such as ABA-responsive element-binding factor, sucrose synthase, and sucrose-phosphate synthase in the metabolism of ascorbate and aldarate65. Drought stimulates energy-intensive processes such as osmolyte production and oxidative phosphorylation, as well as increases respiratory rates66. Proline is an amino acid produced by the amino acid pathway. Proline has been linked to a number of osmoprotective properties, such as the ability to regulate humidity and activate genes that produce antioxidizing enzymes that scavenge reactive oxygen species (ROS)67,68. In drought-stressed genotypes, proline levels increased faster and by a greater proportion than those of their sensitive counterparts, emphasizing its importance for drought tolerance breeding. Proline-controlling genes have cumulative effects on proline content69,70. These findings are similar to the previous report21. Oxidative damage is induced by the production of the reactive oxygen species (ROSs), including OH, O2, and H2O267. These ROSs in high concentrations are detrimental and degrade photosynthetic pigments, proteins, etc. In the context of osmotic tolerance, crops generate proline osmolyte to adjust water status69,70. Crops also adopt tissue tolerance by using the scavenging system to alleviate ROSs effects. The first enzyme committed to remove ROSs is the superoxide dismutase (SOD), which can dismutate O2 to H2O2. H2O2, in turn, is catalyzed by peroxidase (POD) and catalase (CAT) to O2 and H2O 67,68. Expressed wheat-originated CAT and SOD in Arabidopsis can enhance tolerance to multiple abiotic stimuli, such as high-drought conditions62. APX, GPX, and PPO enzymes are other key components of non-enzymatic scavenging systems in crops68.

Conclusion

Of the current findings, new QTLs were uncovered in the panel of Iranian wheat landraces in multi-environment phenotypic data, i.e., rain-fed (drought) and well-watered (normal). Data from multi-environment, multi-year phenotypic experiments could reveal QTL that are stable across environments. Major QTLs controlling seed traits were uncovered on the genome B (1B and 3B) and chromosomes 1A and 4A. QTL for grain shape traits were identified in chromosome regions in which major QTL or/and genes were detected in previous studies. Using digital image analysis is a non-invasive and inexpensive alternative to trait evaluations.

Methods

Plant materials and experimental conditions

A total of 208 wheat landraces and 90 cultivars (Supplementary Table 5) were analyzed in an alpha-lattice experiment with two repeats during two crop seasons (2018–2019 and 2019–2020) under rain-fed (drought) and well-watered (normal) conditions. In the field, the plots consisted of four rows (1*1 m2) at 0.5 m intervals. The irrigation threshold in the well-watered crops was considered according to 40 mm evaporation from an evaporation pan. The crop coefficient [KC] and reference crop evapotranspiration [ET0 = Epan × Kpan; where Epan is the evaporation depth from the pan surface (40 mm) and Kpan is a pan coefficient (0.8) for each month] were utilized to measure evapotranspiration (ETC = KC × ET0). The irrigation time was determined according to the ratio of the assigned water for 1400 m2 (the cultivation area of 298 genotypes in two repeats) to water discharge (10.8 m3/h). The volume of water needed for each hectare (m3/ha) was determined by the depth of ET0 (mm) multiplied by 10. The wheat cultivated under the rain-fed regime was only exposed to rainfall, the only available water source. The pattern of monthly rainfall for the cropping seasons is presented in Table 4. The authors declare that all study complies with relevant institutional, national, and international guidelines and legislation for plant ethics in the “Methods” section. Samples are provided from the Gene Bank of Agronomy and Plant Breeding Group and these samples are available at USDA with the USDA PI number (Supplementary Table 5). The authors declare that all that permissions or licenses were obtained to collect the wheat plant.

Table 4 Climatic data in the studied environments and pattern of monthly precipitation and irrigation for the 2018–2019 and 2019–2020 cropping seasons.

Digital image analysis

The digital images of wheat seeds were provided by a camera (Canon SX540 HS) equipped with 800 dpi resolution. After imaging, the pictures were analyzed and processed via the Python 3.7 software6,71,72 to evaluate a total of 34 morphometric variables in bread wheat seeds (Table 5; Fig. 8).

Table 5 Morphometric traits measured on wheat seeds.
Figure 8
figure 8

Graphical presentations of morphometric traits measured on wheat seeds (Refer to Table 5). (A) dorsal, (B) lateral, (C) vertical.

GBS and imputation

The establishment and sequencing of the sequence library for the wheat accessions were carried out following the procedure as elucidated by Alipour et al.73. After trimming reads to 64 bp and categorizing them into tags, single-nucleotide polymorphisms (SNPs) were discovered via internal alignments, which permitting for mismatch up to 3 bp. The pipeline UNEAKGBS was utilized for SNP calling, where SNPs with low minor allele frequency < 1% and reads with a low-quality score (< 15) were discarded to keep away from false-positive markers, which are derived from errors in the sequencing process. The imputation was performed according to available allele frequency calculated after accounting for the haplotype phase74 in BEAGLE version 3.3.2. The reference genome W7984 was specified that harboring the highest imputation accuracy75 among four various reference genomes during imputation. The linkage disequilibrium decay of various chromosomes was obtained based on LOESS regression and RStudio, the ggplot2 package76.

Population structure and Kinship matrix

Population structure was assayed in the Iranian wheat landraces and cultivars through STRUCTURE version 2.3.437. A simulation phase consisted of 10,000 steps for K = 1 up to 10 along with an admixture model was used in this study. ΔK was utilized to estimate the most likely number of subpopulations in this study. To measure LD among markers, the expected and observed allele frequencies were exerted in TASSEL version 577. Q-matrix was used as a structural matrix for the association study. A neighbor-joining tree was formed according to a pairwise distance matrix counted in TASSEL77 and visualized using Archaeopteryx to explore the relationships between the Iranian wheat landraces and cultivars.

Genome-wide association study

MLM78, mrMLM79,80 and 3VmrMLM81 approaches were used to estimate the marker effect. IIIVmrMLM82 was used to identify QTN and QEI in this study. The first approach led to the most accurate marker-trait association. The K, Q, and Q + K versions of the MLM approach were utilized to modulate both effects of more diffused relationships (K) among accessions and population structure (Q) via TASSEL. The association mapping for the MLM, mrMLM, and 3VmrMLM models was performed using the package GAPIT and IIIVmrMLM in Rstudio. In the MLM approach, accessions are regarded as a random effect and the relevance among them was transferred by a kinship matrix. The elements in this matrix were utilized as similarities and the resultant clusters were visualized using a UPGMA-based heatmap via the GAPIT package. A Manhattan plot was derived from a comparison scenario using the package GAPIT to explore the association between genotype and phenotype, SNPs were ordered according to their base-pair positions and chromosomes. In the Manhattan plot, the y-axis represented the negative logarithm of P-value derived from the F-test and the x-axis represented the SNP genomic position.

Annotation of genes

Sequences around all significantly associated SNPs were provided from the 90 K SNP database of wheat. These sequences were utilized for the gene annotation via aligning to the IWGSC RefSeq V2.0 (URGI-INRA) using the database gramene (http://www.gramene.org/). The functions of putative genes were discovered via evaluating the pathways including the encoded enzymes. After aligning SNPs sequences to the reference, overlapped genes with the largest identity percentages and blast scores were picked out for further analysis. The ensemble-gramene database was used to extract the molecular functions and biological processes of genes in the gene ontology. Moreover, the sequences of significant SNPs were utilized in the enrichment analysis of gene ontology via KOBAS version 2.0 to test for statistically enriched pathways in the database KEGG (https://www.genome.jp/kegg/; www.kegg.jp/kegg/kegg1.html).

Identification of candidate genes via BLASTn

Identification of gene IDs was based on sequences of genes associated with seed traits (https://plants.ensembl.org/index.html). An analysis of whole CDS sequences of candidate genes was conducted using BLASTn analysis (nucleotide Basic Local Alignment Search Tool) from the National Center for Biotechnology Information (NCBI; https://www.ncbi.nlm.nih.gov/). In this alignment, default settings are used to align these sequences to Triticum aestivume species. An ideal match or high similarity was determined when all queries were covered, the Expect (E) value was zero, and the identity was greater than 99%. In addition, RNA-seq data from Rahimi et al., which were based on the tolerant and sensitive genotypes selected from this field experiment, were used to identify the genes involved in the drought stress path.

Statistical analysis

The descriptive statistics, variance analysis (ANOVA) and correlation of Seed imaging data were performed via SAS version 9.4 and RStudio separately for the two conditions, rain-fed (drought) and well-watered (normal). For advanced linear analysis, the adjusted means were derived from an alpha-lattice experiment using GLM and MLM models. Correlation and box-plot analysis were carried out in RStudio using the corrplot and ggpubr packages to assay the relationship and distribution of wheat seed morphometric traits.

Permission for land study

The authors declare that all land experiments and studies were carried out according to authorized rules.