Introduction

Bread wheat (Triticum aestivum L.) is among the widely consumed food crops worldwide and is regarded as one of the most traded commodities on global markets1,2. Lodging is one of the most important limiting environmental factors for achieving the maximum yield and quality of grains in cereals, including wheat3,4,5. Lodging is when the roots of a crop are dislocated and/or their stems are irreversibly bent downward4. There are several difficulties that result from this situation, including higher drying costs, slowed harvest, reduced grain quality, and drastic yield losses of up to 85%3,5,6. The main challenge is the lack of global, regional, or local statistics on lodged areas related to various crops7. There are three key elements in determining the level of lodging and yield loss—the lodging angle, the spatial extent of lodging, and the stage of crop development (time of lodging incidence)8. By definition, CAI refers to the angles formed by stems with respect to vertical planes9. During lodging, a crop may experience a sequence of steps (i.e., lodging stages) beginning with CAI0° (a low deviation from the vertical situation) and finishing with CAI90° (crop bending close to the horizontal situation)10. Hence, CAI levels (ranging from moderate to severe) can be used to evaluate the lodging stage and canopy structure of lodged crops11,12. Agronomists and crop physiologists study lodging widely, but their efforts are usually limited to two aspects: agronomic practices (which reduce lodging-related risks) and breeding programs (which develop lodging-tolerant cultivars)11. There are four main characteristics of wheat ideotypes that demonstrate lodging resistance: larger stem diameters, wide root plates, strong root systems, and moderately short heights13.

There are many challenges associated with evaluating lodging level since there are no data associated with it, no standard scale is available to present it, and lodging distribution on farms is random, involving complex genetic-environmental interactions10,14,15. The physiology of wheat lodging is influenced by a complex genetic architecture16,17. A complex trait that is difficult to quantify in the field is lodging18. For these reasons, assessing a genetic panel for lodging tolerance is a difficult task for wheat breeders. To make further progress in the development of lodging-tolerant wheat varieties it is crucial to get a better understanding of the molecular basis of lodging tolerance-related traits by using genetic tools, like QTL (quantitative trait loci) mapping19. In this context, a handful of QTLs accounting for 2–27% of stem strength and lodging variation20,21,22 have been reported. The advent of next-generation sequencing (NGS) approaches has enabled cost-efficient genotyping-by sequencing that has been shown to be a useful tool by facilitating genetic dissection of complex traits in non-model organisms. Association mapping overcomes many of the restrictions of classic QTL mapping and can help identify minor genetic factors underlying complex traits19. QTLs identified through association mapping can be directly utilized in marker-assisted selection for improving genetic gain. To date, a few genome-wide association studies (GWAS) have been adopted to explore marker-trait associations (MTAs) and candidate genes affecting growth and development for lodging in crop plants including wheat23, oat24, bean25, canola26,27, and rice28. For example, Singh et al.23 explored lodging tolerance via GWAS and identified a key genomic region on chromosome 2A, consistent across digital and visual scores of the lodging.

Anatomically lodging resistance is directly related to plant height, all 21 chromosomes carry genes that control plant height in wheat29,30,31. Up to now, 24 reduced height (Rht) genes (Rht1–Rht24) are catalogued in wheat32,33, where Rht8 on chromosome arm 2DS has been extensively explored34,35. We could locate only two QTLs to chromosome 2DL, whereas the ones reported by Borner et al36, on chromosome 2DS could not be detected.

Genomic prediction (GP) boosts speed and effectiveness of breeding by shortening breeding cycles and improving selection accuracy as auxiliary tools for GWAS37,38. An advantage of this approach is that it provides an opportunity to select a candidate gene by genotyping it before determining its phenotype39. Genomic prediction involves training a model that is comprised of all genetic markers within a model. A validation set is used to estimate the accuracy of the model40. Several factors affect genomic accuracy, including marker set, population structure, genomic selection method, and trait genetic architecture. Research has shown that GP is highly or moderately accurate for quantitative characteristics in barley41, maize42, rice43, oat44, and wheat20,39.

To the best of our knowledge, little is known about genomic regions associated with lodging resistance in wheat. Therefore, we uncovered putative candidate genes and evaluated the genomic prediction accuracy of lodging resistance using three methods for building a genomic selection model, namely genomic best linear unbiased prediction (GBLUP), ridge regression-best linear unbiased prediction (RRBLUP), and Bayesian ridge regression (BRR).

Result

Phenotypic variation and correlation analysis

According to the analysis of variance, genotypic, and genotype × environmental effects on lodging-related traits were significant (Supplementary Table S1). Grain yield and number of nodes showed the lowest and highest phenotypic coefficient of variation (PCV) and genotypic coefficient of variation (GCV). A low heritability was observed in grain yield (41.32%), and a high heritability was observed in number of nodes (88.25%). Descriptive data on the lodging-related traits of wheat genotypes can be seen in Supplementary Fig S1 and Supplementary Table S2. Cultivars and landraces showed an average crop angle of inclination (CAI), lodging area (LA), and lodging index (LS) of 69.3 and 79.2°, 64.4 and 100%, 0.49 and 0.84, respectively. Because of this, cultivars have lower lodging rates than native landraces. Compared to native populations, cultivars had lower plant height, IL1, IL2, PL, and PeL, and larger stem diameters at nodes. Landraces showed higher DTH, DTF, and DTM while cultivars showed lower DTH, DTF, and DTM. Furthermore, cultivars showed better spike weight, spike area, and grain yield than landraces.

Pearson correlation between lodging-related traits is illustrated in Fig. 1. The lodging index was most positively correlated with LA (r = 0.96**), followed by CAI (r = 0.95**), PH (r = 0.78**), NFN (r = 0.71**), IL1 (r = 0.70**), and IL2 (r = 0.63**). The lodging index also showed the greatest negative correlation with PeD (r = − 0.48**), followed by ID1 (r = − 0.41**) and ID2 (r = − 0.40**). These findings show that the higher the LI, the lower the grain yield (r = − 0.26**). Correlation coefficients between two environments (year 1 and year 2) for the lodging-related traits in Iranian wheat cultivars and landraces are shown in Supplementary Table S3.

Figure 1
figure 1

Correlation coefficients between the lodging-related traits in Iranian wheat cultivars and landraces. LA lodged area, CAI crop angle of inclination, LS lodging score index, PH plant height, NFN number of nodes, PL peduncle length, PeL penultimate length, IL1 internode length 1, IL2 internode length 2, PD peduncle diameter, PeD penultimate diameter, ID1 internode diameter 1, ID2 internode diameter 2, DTH days to heading, DTF days to flowering, DTM days to maturity, SW spike weight, SA spike area, GY grain yield.

Genotypes were divided into four bunches based on heatmap output. The foremost lodging-resistant genotypes were found within bunch A, which had a lodging index of zero or near zero. Genotypes with lodging index between 0 and 0.15% were found within bunch B. In the remaining two groups, genotypes were observed with a high lodging index. The lodging index within bunch D, which incorporates most local populaces, was the most elevated and extended from 0.6 to 1. Traits were separated into four groups: group 1 including SA, SW, and GY; Group 2 including PD, PeD, ID1, and ID2; Group 3 including DTH, DTF, and DTM; Group 4 including LS, CAI, LA, IL1, IL2, PH, NFN, PL, and PeL (Fig. 2).

Figure 2
figure 2

Hierarchical clustering and heatmap of Iranian wheat landraces and cultivars based on the wheat lodging-related traits. LA lodged area, CAI crop angle of inclination, LS lodging score index, PH plant height, NFN number of nodes, PL peduncle length, PeL penultimate length, IL1 internode length 1, IL2 internode length 2, PD peduncle diameter, PeD penultimate diameter, ID1 internode diameter 1, ID2 internode diameter 2, DTH days to heading, DTF days to flowering, DTM days to maturity, SW spike weight, SA spike area, GY grain yield.

Marker distribution and linkage disequilibrium (LD)

We had used four different reference genomes including barley, Chinese Spring, W7984, and IWGSC RefSeq v1.0 for imputation and the results indicate that W7984 provides more accurate results than other reference genomes. Therefore, in the current study we only used the result that obtained from W7984 reference genome. An analysis of 566,439,207 unique reads was obtained by genotyping 298 Iranian bread wheat accessions. A total of 133,039 SNPs were identified after alignment and de-duplication, of which 10,938 had a MAF > 5%, heterozygosity < 10%, and missing data < 10%. The 10,938 SNPs were retained and used for imputation. Afterward, 43,525 imputed SNPs were used to conduct the association study. A, B, and D genomes were mapped with 15,951, 21,864, and 5,710 SNPs, respectively, representing 36.7%, 50.2%, and 13.1% of all SNPs. The lowest and highest number of SNPs were on 4D (270 SNPs) and 3A (4034 SNPs), respectively (Supplementary Fig. S2).

According to the LD calculation of 46,525 SNPs, 1,858,425 marker pairs (MPs) were found in the panel of cultivars, of which 37.72% showed significant linkage. A range of 0.138 (Chr. 3D) to 0.368 (Chr. 4A) LD between marker pairs was observed across the 21 chromosomes. The D genome had the lowest number of MPs (215,600, 11.60%), followed by the A genome (683,825, 36.80%) and the B genome (959,000, 51.60%) (Table S1). A similar test on wheat landraces revealed 1,867,575 MPs, which were lower than those in wheat cultivars with a mean r2 of 0.182. As expected, landraces showed a higher percentage of significant markers (847,725, 45.39%). The strongest LD was observed between marker pairs in Chr.4A (0.369), followed by Chr. 2A (0.289) (Supplementary Table S4).

Population structure and Kinship matrix

According to the analyses of population structure, there are three subpopulations with varying degrees of mixing within them. The population structure matrix also revealed the maximum value of ΔK for K = 3, showing that the Iranian wheat genotypes can be divided into three subpopulations (Fig. 3A,B). A cluster analysis of the kinship matrix showed that the SBP-I subgroup contains 135 genotypes (13 cultivars and 122 landraces), the SBP-II contains 88 genotypes (5 cultivars and 63 landraces), and the SBP-III contains 75 genotypes (72 cultivars and 3 landraces) (Fig. 3C). In the principal component analysis, PC1 and PC2, explained 14.1 and 5.8 of the genotypic variation, respectively (Fig. 3D). Three distinct subpopulations were identified based on the first three PCs, with admixed accessions falling between the three major subpopulations. Additionally, the neighbor-joining tree of all accessions clearly showed that they were clustered into three subgroups (Fig. 3E).

Figure 3
figure 3

Determination of subpopulations number in wheat genotypes based on ΔK values (A), A structure plot of the 298 wheat genotypes and landraces determined by K = 3 (B). Principle component analysis (PCA) for a total of 298 Iranian bread wheat accessions (C). Cluster analysis using Kinship matrix of imputed data for Iranian wheat accessions (D). The dendrogram of Neighbor-Joining clustering was constructed using 43,525 SNPs and 298 Iranian wheat accessions (E).

Genome-wide association analysis for lodging-related traits

Compressed mixed linear model (CMLM) method led to the identification of 465, 497 and 478 significant marker pairs for lodging-related traits (– log10 P > 3), respectively, in the year 1, year 2, and pooled (Fig. 4). There were 150, 260 and 55 MTAs in the year 1, 162, 264 and 71 MTAs in the year 2, and 154, 261 and 63 MTAs in the pooled assigned to A, B, and D genomes, respectively. The largest number of significant marker pairs was found in the B genome followed by the A genome in year 1, year 2, and pooled. The number of significant markers for LA, CAL, LS, PH, NFN, PL, PeL, IL2, IL1, PD, PeD, LD2, LD1, DTH, DTF, DTM, SW, SA, and GY traits were 30, 19, 19, 21, 32, 9, 11, 31, 20, 48, 24, 11, 6, 40, 39, 40, 15, 30, and 20, respectively, in the year 1 (Supplementary 2 and Fig. 4A). The number of significant markers for LA, CAL, LS, PH, NFN, PL, PeL, IL2, IL1, PD, PeD, LD2, LD1, DTH, DTF, DTM, SW, SA, and GY traits were 20, 15, 17, 30, 32, 21, 15, 24, 41, 41, 24, 14, 5, 47, 56, 30, 18, 27, and 20, respectively, in the year 2 (Supplementary 2 and Fig. 4B). The number of significant markers for LA, CAL, LS, PH, NFN, PL, PeL, IL2, IL1, PD, PeD, LD2, LD1, DTH, DTF, DTM, SW, SA, and GY traits were 24, 15, 19, 24, 39, 13, 12, 30, 26, 52, 24, 12, 4, 44, 39, 33, 17, 28, and 23, respectively in the pooled (Supplementary 2 and Fig. 4C). The highest number of significant marker pairs was related to PD (52 significant markers) and the least number was associated with LD1 (4 significant markers) in the pooled. In contrast to other genomes, genome B had a greater effect on lodging, which had the most significant markers for lodging (Fig. 4). Manhattan and Q-Q plots for the lodging-related traits of interest are presented in Fig. 5 and Supplementary Fig. S3.

Figure 4
figure 4

GWAS results (CMLM method) for lodging-related traits in Iranian landraces and cultivars (A = year 1, B = year 2, C = pooled). GWAS genome wide association study, CMLM compressed mixed linear model, LA lodged area, CAI crop angle of inclination, LS lodging score index, PH plant height, NFN number of nodes, PL peduncle length, PeL penultimate length, IL1 internode length 1, IL2 internode length 2, PD peduncle diameter, PeD penultimate diameter, ID1 internode diameter 1, ID2 internode diameter 2, DTH days to heading, DTF days to flowering, DTM days to maturity, SW spike weight, SA spike area, GY grain yield.

Figure 5
figure 5figure 5

Manhattan and QQ-plots of highly associated haplotypes in Iranian wheat landraces and cultivars. X axis represents wheat chromosomes: (1) 1A, (2) 1B, (3) 1D, (4) 2A, (5) 2B, (6) 2D, (7) 3A, (8) 3B, (9) 3D, (10) 4A, (11) 4B, (12) 4D, (13) 5A, (14) 5B, (15) 5D, (16) 6A, (17) 6B, (18) 6D, (19) 7A, (20) 7B, (21) 7D. LA lodged area, CAI crop angle of inclination, LS lodging score index, PH plant height.

Putative candidate genes for lodging tolerance

In-depth analysis was conducted on the markers with the highest significance (P < 0.0001) and pleiotropy. Gene ontologies based on 45 reliable MTAs indicate that candidate gene harboring these SNPs encode proteins that play a variety of roles in various biological processes, such as defense response, calcium ion transmembrane transport, carbohydrate metabolic process, chloroplast organization, nitrogen compound metabolic process, biosynthetic process, protein neddylation, protein phosphorylation, lipid metabolic process, phosphorelay signal transduction system and response to oxidative stress under lodging stress (Table 1). A total of 45 highly significant, functional MTAs were considered "reliable" MTAs for lodging-related characteristics. In choosing reliable MTAs, a high significance threshold and the molecular function were taken into account. The "major" MTAs were selected from the reliable MTAs that that had R2 > 10%. A total of 20 major MTAs were detected for lodging-related in two environments (Supplementary Table S5). The following pathways have been discovered based on the rice reference genome; starch and sucrose metabolism (Supplementary Fig. S4), zeatin biosynthesis (Supplementary Fig. S5), amino sugar and nucleotide sugar metabolism (Supplementary Fig. S6), and carbon metabolism (Supplementary Fig. S7).

Table 1 Description of expected MTAs using imputed SNPs for lodging traits in Iranian wheat accessions.

Genomic prediction for lodging-related traits

The BRR, GBLUP, and RR-BLUP approaches using whole SNPs led to the identification of the highest prediction accuracies for 1, 13, and 5 phenotypes, respectively (Fig. 6). The highest prediction accuracy was determined via the GBLUP model for DTF, DTH, DTM, GY, ID1, IL1, IL2, LA, LS, NFN, PH, PL, SA, via the RR-BLUP for RR-BLUP for CAL, ID2, PeD, PeL, SW, and via the BRR for PD traits. The best prediction accuracy was related to the three traits CAL, LA, and LS, respectively with an average of 0.627, 0.610, and 0.519, while the lowest prediction accuracy was related to trait SW with an average of 0.233 (Fig. 6).

Figure 6
figure 6

The efficacy of genomic selection (GS) on genomic prediction (GP) accuracy for 19 lodging traits in Iranian landraces and cultivars. (AS) The accuracy of GP via ridge regression-best linear unbiased prediction (RR-BLUP), genomic best linear unbiased predictions (GBLUP), and Bayesian ridge regression (BRR) genomic selection (GS) methods are demonstrated with green, red, and blue colors, respectively. The boxplots show the first, second (median) and third quartiles. The middle points indicate a mean of GP accuracies for the trait of interest. Lodged area (A), Crop angle of inclination (B), Lodging score index (C), Plant height (D), Number of nodes (E), Peduncle length (F), Penultimate length (G), Internode length 1 (H), Internode length 2 (I), Peduncle diameter (J), Penultimate diameter (K), Internode diameter 1 (L), Internode diameter 2 (M), Days to heading (N), Days to flowering (O), Days to maturity (P), Spike weight (Q), Spike area (R) and Grain Yield (S).

GBLUP, RR-BLUP, and BRR approaches using significant SNPs displayed the highest prediction accuracies for the phenotypes 6, 6, and 7, respectively (Fig. 6). The highest prediction accuracy by the GBLUP was obtained for Convexity; by the RR-BLUP for DTH, IL1, ID2, PeD, PeL, SW; as well as by the BRR for PD, DTM, IL2, LA, PH, and PL traits. The best prediction was related to the three traits CAL, LA, and LS, respectively with a mean of 0.737, 0.637, and 0.650, while the lowest accuracy was related to trait SW with a mean of 0.361 (Fig. 6).

Discussion

It appears that Iranian wheat accessions are more diverse than the cultivars, and thus have a higher lodging score index due to the higher DTH, DTF, DTM, IL, and PH, and lower stem diameter and NFN. The stem diameter explains 55% of the variance in the lodging index45, so it is viewed as an important parameter for enhancing lodging resistance due to the increased amount of lignin, cellulose, and carbohydrates soluble in water. Increasing internode diameters can lower tiller numbers per unit area and therefore grain yield46, thus the association between grain yield and stem structure needs to be further explored. According to the correlation between phenological traits and lodging, an increase in DTH, DTF, and DTM leads to further growth, which leads to lodging47. Trait correlations revealed lodging to be directly related to plant high and other stem traits48. The properties of stems and their composition contribute significantly to stem bending resistance in crops13,49. The results show the lodging index correlates more strongly with ID1 than ID2, suggesting that the first internode contributes more to lodging resistance in wheat. Considering the first internode's nearly twice the material strength of the second internode, this association makes sense50. A weakness affecting stem strength, root characteristics, or soil structure can contribute to anchorage failure and lodging susceptibility51. The research by Berry et al.52 found a decrease in lodging risk as a result of an increase in stem strength and root anchorage. Similarly, Tripathi et al.45 found lodging resistance negatively related to spike area and weight. Thus, the increase in stem diameter of wheat genotypes can reduce lodging risk, making these genotypes ideal for breeding programs aimed at improving lodging resistance.

If population structure is not appropriately accounted for in mapping studies, there is the possibility of false associations53. There are two types of kinships that cause false positives in GWAS: ancestry differences and cryptic relatedness. Cryptic relatedness occurs when some accessions of a plant are closely related; however, their shared ancestry is not disclosed to the breeder54. PCA analysis and clustering were conducted on Iranian wheat genotypes to evaluate their population structure. The panel of accessions was stratified into three groups based on the results. Such a genetic separation can be explained by selection effects in breeding programs55. The same grouping was observed on these Iranian wheat genotypes by Rabieyan et al.38. Cultivars made up one group, while landraces made up the other two groups, regardless of their geographic origins. There was no correlation between the clustering of wheat accessions and the origin or geographical distribution of these accessions. In part, this can be explained by migrations of farmers to different regions and germplasm exchanges among researchers and institutes worldwide22.

According to the results, the detected SNPs covered the wheat genome well. A higher number of SNPs were found in genome B compared to genome D. Therefore, chromosome size is directly related to SNP density56. Due to evolutionary processes, genome B contains a greater number of SNPs57,58.

From our observations, Genomes B, A, and D have the lowest LD, respectively. Breeding efforts during the period of lodging-related traits have presumably resulted in cultivars exhibiting higher LD than landraces, particularly in genome D59. Due to selection events during crop breeding, cultivars exhibit higher LD compared to landraces38. In addition to evolutionary processes, breeding schedules are likely to have influenced the differences in LD between genomes and accessions60. Liu et al.61 also noticed that LD decay distances were significantly less in native populations than in varieties of wheat in China/Pakistan. A number of factors besides selection breeding affect linkage disequilibrium in wheat and other plants, including population relatedness, genetic drift, mutation, recombination, and mating systems38.

Numerous efforts have been made so far to locate QTLs and genes affecting wheat traits in order to facilitate marker-assisted breeding for wheat62,63. In this study, MTAs were also detected, adding to the previous list of candidate genes and markers. In spite of this, aligning our results with earlier studies is challenging due to different reference genome models than IWGSC Ref.Seq, inaccurate genomic locations, or combining different markers (GBS-derived SNP versus SSR and DART)62,63,64,65. Of course, detecting MTAs on the same chromosome as previous projects increase their assurance. A total of four MTAs were revealed on each of chr 1A, 2B, 6B, and 2D in regards to plant height. There are genes on all 21 chromosomes that are involved in controlling plant height in wheat29,30,31. The reduced height (Rht) genes (Rht1–Rht24) have been classified in wheat32,33, where Rht8 on chromosome arm 2DS has been extensively studied34,35. A total of two QTLs were detected on chromosome 2DL, while the ones reported by Börner et al.36 on chromosome 2DS were not detected. There is a similar pattern of QTLs on chromosomes 1A, 2B, 6B, and 2D described by Griffiths et al.66. Plant height is increased by 2–7 cm by the major height QTLs found in this study, and when all major height QTLs are taken into account, 26 cm is gained in height. This discovery may help to explain why Iran breeders select tall plants within semi-dwarf backgrounds. There may, however, be a way to balance QTLs bringing yield and height increases by using height QTLs without effect on yield. As a result of the project, QTLs were identified for height, lodging score index, internode length, internode diameter, grain yield, flowering date, and spike weight. Five MTAs for grain yield were recorded on the chr 4A, 5A, 3B, 5B, and 3D in this study. Earlier research efforts have discovered MTAs/QTLs for grain yield on wheat chr 7B67,68,69, 7A67,69,70,71, 5B67,69,72, 3D69, 3A67,69,73,74, 2B69,73,74,75,76, and 1B69,74,75. Individual QTL for lodging score index, internode length, and internode diameter were estimated by the model of lodging. A number of MTAs for the above traits were found on the 1A, 2A, 3A, 6A, 7A, 1B, 3B, 5B, 6B, 1D, 2D, and 2D. Berry and Berry20, identified QTLs on chromosomes 1A, 3A, 6A, 5B, 6B, 2D, and 7D for lodging-related traits such as stem diameter, internode length, and lodging score index. Singh et al.23 explored lodging tolerance via GWAS and identified a key genomic region on chromosome 2A.

The flanking sequences of imputed SNPs were aligned versus the RefSeq version 2.0 findings in the recently published study77. It was interesting to observe that most of the marker pairs were located in the protein-coding regions of the genome, which are responsible for controlling transcription, defense response, oxidation–reduction process, calcium ion transmembrane transport, carbohydrate metabolic process, and sulfate transport that are likely interconnected in lodging resistance. Researchers have found similar results in earlier studies20,21,22. The following pathways have been identified based on the rice reference genome: starch and sucrose metabolism, zeatin biosynthesis, amino sugar and nucleotide sugar metabolism, and carbon metabolism. Based on various studies, carbon metabolism, and starch and sucrose are effective on the lodging resistance in plants78,79,80. There are two types of carbon assimilated structural carbohydrates and non-structural carbohydrates (NSCs). The former (such as lignin and cellulose) contribute mostly to the cell composition and densities of the stem lodging resistance, while the latter (such as starch and sucrose) contribute to metabolism and yield78. Studies on carbon assimilate metabolism in gramineous crops, especially NSCs, have been conducted in recent years. In these studies, NSCs were found to vary significantly by sowing method and planting density and were closely associated with yield and lodging78,79. A stem sheath’s NSCs are the key to stem lodging resistance, and they can continue providing assimilates while photosynthetic capacity is diminished during grain-filling, thereby reducing yield loss caused by stress80,81. Mizuno et al.82 stated that the metabolism of sugar has an effect on the amount of lodging resistance so that lodging decreased the amount of sucrose, starch, and the ratio of sucrose to total sugars in sorghum stems. The hormone zeatin (Z) significantly increases the growth of buds83. According to the studies, the application of Z significantly influenced the growth regulation of tillers and tiller’s buds. Therefore, Z application will help plants anchor well to the soil, thereby creating a more lodging-tolerant environment83,84.

Our findings suggested that genomic predictions are useful in predicting wheat genotype's performance, allowing phenotyping to be limited to a fraction of germplasm instead of the entire collection85,86,87. Furthermore, Kehel et al.88 reported that genomic selection can accurately predict key traits within wheat accessions, especially for traits with moderate to high heritability. A prediction model usually accounts for stratified populations by using the first five principal components as covariates87,88,89,90. In the Iranian wheat genotypes, there is a significant population structure with 30.5% of diversity coming from the first five eigenvalues. It was also observed in other studies that the population structure negatively affected GWAS and GP models90. Based on our observations, the GBLUP model provided the highest prediction accuracy. A study by Shabannejad et al.37 investigated classic strategies to exploit GP accuracy in wheat cultivars and landraces under normal and drought conditions. Using the GBLUP and BRR methods, they identified the highest GP accuracy. Singh et al.23 examined genome prediction models for lodging‑related traits and found high predictive accuracy (0.42) across populations and environments. The authors observed that obtaining the highest GP accuracy depends on the genetic variation, the genetic architecture of the trait, the level of LD, and the genomic selection approach. As a result, the GBLUP model can detect genetic impacts in wheat populations better than other genomic prediction models.

Conclusion

Using validated lodging measurements along with association and genomic prediction analyses, we provide evidence in support of a polygenic genetic architecture of lodging in wheat. Our findings have diverse applications in plant breeding and genetics. The results of our research provide new insight into the molecular mechanisms underlying lodging resistance traits in wheat. To develop lodging resistance wheat cultivars, marker-assisted selection can target genes controlling these traits, including LS, PH, NFN, and IL1. Moreover, genomic selection by using our putative genetic markers along with GBLUP-based genomic prediction will help to achieve the above-mentioned goal.

Methods

Plant material and phenotyping

The research was conducted in the Alborz province in the Department of Agriculture & Natural Resources Campus (35°48′59ʺN, 51°58′48ʺE, 1321 m elevation) (Fig. 7a). We conducted an alpha-lattice experiment with two replications on 298 wheat accessions (208 native landraces and 90 cultivars) to analyze lodging and related traits in wheat under normal conditions (Supplementary Table S6). The replications each contained 30 incomplete blocks containing 10 genotypes each. Wheat accessions were grown in plots with a plant density of 250 plants/m2 in four rows (1 m × 2 m) at 0.20 m intervals. The wind rose plot, and the climatic characteristics are presented in Fig. 7B,C, and Supplementary Table S7. Based on evaporation from an evaporation pan, a 40 mm threshold was determined for irrigation to be implemented in irrigated crops. The lodging and related traits of wheat accessions were measured in the pre-physiological stage. The traits measured in this study were as follows: Lodged area (LA, %), Lodging score index (LS), Crop angle of inclination (CAI, degree), Plant height (PH, cm), Internode length 1 and 2 (IL1 and IL2, cm), Penultimate length (Pel, cm), Peduncle length (PL, cm), Internode diameter 1 and 2 (ID1 and ID2, mm), Penultimate diameter (PeD, mm), Peduncle diameter (PD, mm), Number of node (NFN, n), Grain Yield (GY, g per plant), Spike weight (SW, g), Spike area (SA, cm2), Days to maturity (DTM), Days to flowering (DTF), and Days to heading (DTH).

Figure 7
figure 7

The geographical location of the study area (A), and average two-year wind rose plot (B) and climatic parameters (C).

The authors declare that all study complies with relevant institutional, national, and international guidelines and legislation for plant ethics in the “Methods” section. Samples are provided from the Gene Bank of Agronomy and Plant Breeding Group and these samples are available at USDA with USDA PI number (Supplementary Table 2), respectively. The authors declare that all that permissions or licenses were obtained to collect the wheat plant.

LA, CAI, LS

In order to determine whether the wheat plots were healthy (H) or lodged (L) in the field, the CAI was calculated for each plot using the lodged area (LA [0–100%]) and vertical angle (CAI [0–90°]) (Fig. 8A,B)91. The CAI was calculated using trigonometric calculations and a plumb bob. In this case, the string of the plumb bob was suspended from the top of the crop, and it was just possible for the plumb bob's tip to touch the soil, ensuring precise vertical height (hv) measurements. The slant height (hsl) of lodged plants was measured using a plumb bob. The CAI was then calculated from the vertical using the Eq. (1) 91:

$$\theta degree={90}^{{\text{o}}}-{Sin}^{-1}\frac{{h}_{v}}{{h}_{sl}}$$
(1)

lodged area was also analyzed visually using a quadrant methodology. Using this approach, the LA % was computed in each of the four quadrants starting from the center of the plot, and then the final LA for each plot was calculated. The Fig. 8C and D depict lodged and healthy subplots for healthy plots, each trait was measured in three subplots (0.25 m2), while for lodged plots, four to eight subplots were measured in order to account for spatial heterogeneity within each lodged patch91.

Figure 8
figure 8

Measurement of crop angle of inclination (A) and presentation of various lodging stages (B). Presentation of the plot center and the healthy/lodged subplots in the field (C). Division of the plot into four quadrants Q1, Q2, Q3, and Q4 (D). LA1, LA2, LA3, and LA4 are corresponding to the lodged area in each quadrant. In this scenario, H1 and H2 present the healthy subplots while L1 to L6 are the lodged subplots. The CAI is estimated via averaging the CAI and LA calculated in the six lodged subplots and in each quadrant, respectively.

To define lodging and healthy severity classes, a normalized lodging score index (LS [0–1]) was calculated by merging LA and CAI [Eq. (2)]. The plot was labeled as H (LS = 0.0) if no lodging was observed. In the presence of lodging, wheat plots were classed as Upright (0.0 < LS ≤ 0.05), low lodged (LL) (0.06 < LS ≤ 0.15), moderately lodged (ML) (0.16 < LS ≤ 0.30), severely lodged (SL) (0.31 < LS ≤ 0.60), and very severely lodged (VSL) (0.61 < LS ≤ 1.091.

$$LS=\frac{LA}{100}*\frac{CAL}{{90}^{{\text{o}}}}$$
(2)

Phenotypic [Eq. (3)] and genotypic [Eq. (4)] variances, phenotypic coefficient of variation [PCV, Eq. (5)], genotypic coefficient of variation [GCV, Eq. (6)], and broad sense heritability [H2, Eq. (7)] were computed using standard formulas92.

$${\text{Phenotypic variance }}(\sigma^{{2}} ph) \, = \sigma^{{2}} g \, + \sigma^{{2}} e$$
(3)
$${\text{Genotypic variance }}(\sigma^{{2}} g) \, = \, \left( {{\text{MSP }} - {\text{ MSe}}} \right) \, /{\text{ r}}$$
(4)
$${\text{PCV}} = \, \surd \sigma^{{2}} ph/{\overline{\text{X}}} \times { 1}00$$
(5)
$${\text{GCV}} = \, \surd \sigma^{{2}} {\text{g}}/{\overline{\text{X}}} \times { 1}00$$
(6)
$${\text{H}}^{{2}} = \, (\sigma^{{2}} {\text{g }}/\sigma^{{2}} {\text{ph}}) \, \times { 1}00$$
(7)

where: MSe, error mean square; MSP, mean square for phenotypes (varieties); r, number of replications; , was the population mean for a specific trait; σ2ph, phenotypic variance; σ2g, genotypic variance; σ2e, error variance.

Statistical analysis

Analysis of variance and the best linear unbiased prediction (BLUP), were estimated using METAR v2.193. The diversity of Iranian wheat accessions was evaluated and compared using advanced statistical analysis. The box plot was drawn with R 4.1 software using ggplot2, dplyr, and ggpubr packages. Also, correlation diagrams were drawn using the R packages corrplot and RcolorBrewer. Cluster analysis and heatmaps were implemented with the R 4.1 packages gplots, dendextend, and d3heatmap in order to classify wheat accession types.

Genotyping and SNP imputation

By using CTAB, wheat seedling genomic DNA was extracted and RNA contamination was removed with RNase (ribonuclease)94. A Thermo Scientific NanoDrop was used to determine DNA concentration, and a 0.8% agarose gel was used to evaluate DNA integrity. Genotyping-by-sequencing (GBS) was employed to genotype all 298 wheat samples95. A library of GBS has been developed and sequenced as described by Alipour et al58. Each sequencing read was trimmed to 64 bp and grouped into sequence tags. The SNPs were explored using BLAST, which allows for up to 3 bp of mismatch. The UNEAK pipeline in TASSEL96 was used to call SNPs. To avoid false-positive SNPs originating from sequencing errors, SNPs with a missing rate < 10% across samples, a minor allele frequency (MAF) > 5% and heterozygosity < 10% were excluded. Missing data were imputed using the LD KNNi method implemented in TASSEL96. For the SNP calling, we used the W7984 bread wheat genome as the reference genome97.

Population structure and kinship matrix

Population structure inference was performed using an admixture model implemented in the Structure software. The assumptive number of subpopulations (K) was regarded from K = 1 to K = 10 and 10,000 burn-in steps were followed by 10,000 MCMC steps98. The most likely K value was determined using the ΔK method in Structure harvester99. The kinship matrix (K) was calculated with the EMMA algorithm using the GAPIT package in R software100,101. A principal component analysis (PCA) was also performed using the Tidyverse package in R. To determine the relationship between landraces and cultivars, Archaeopteryx constructed a neighbor-joining tree based on a pairwise distance matrix101.

GWAS analysis

For each trait, BLUPs values of the years were used for genome-wide association studies. The package CMLM in R was used to detect marker-trait associations (MTAs)38. We used a significance threshold (cut-off) of − log10 (P-value) > 3.0 (P < 0.001) for identifying significant associations in the model (as reported by many authors)58,96. All SNPs which met this cut-off value was categorized as significant MTAs38. GWAS results were summarised using Manhattan plots for visualizing associations between genotypes and phenotypes using the GAPIT package102. In this plot, the x-axis and y-axis represents the genomic position of SNPs and the − log10 (P-value) obtained from the F-test, respectively. A Q-Q plot was also done to assess the distribution of p-values results obtained from the GWAS analyses68.

Gene annotation

Genome sequences surrounding all significantly associated SNPs were collected and used for gene annotation with BLAST against the IWGSC RefSeq v2.0 genome references for wheat and rice, respectively97.

To detect candidate genes affecting lodging‑related traits, regions surrounding traits-associated SNPs were blasted against the wheat genome (IWGSC RefSeq v2.0) in the ensemble genome database using the BLASTn. After alignment, genes exhibiting the highest blast score and identity percentage were selected. The biological process and molecular function of putative genes were detected from ensembl plants (http://plants.ensembl.org). To detect pathways affecting lodging‑related traits, regions surrounding traits-associated SNPs were blasted against the rice genome (IRGSP 1.0) in the ensemble genome database using the BLASTn. Moreover, the sequences of significant SNPs were utilized in the enrichment analysis of gene ontology via KOBAS version 2.0 to test for statistically enriched pathways in the database KEGG103,104,105 [https://www.genome.jp/kegg/; www.kegg.jp/kegg/kegg1.html].

Genomic prediction strategies

The GP was calculated using three different approaches: RR-BLUP106, BRR107, and GBLUP108. An analysis of all the data was conducted using the Intelligent Prediction and Association Tool (iPat)109. A validation set was created by randomly appointing 20% of genotypes to a validation set, and a training set was created by using all the residuals. We repeated this process 100 times for each prediction method. A correlation (r) between the BLUPs and GEBVs was calculated over both the training and validation sets to measure GP accuracy110,111,112.

Permission for land study

The authors declare that all land experiments and studies were carried out according to authorized rules.