Introduction

Micronutrient deficiency, also known as ‘hidden hunger’ is mainly caused by the intake of diets often dominated by food staples low in minerals and vitamins1. It affects around two billion people worldwide and causes about 45% of deaths annually of children below five years of age2. Around 155 million children suffer from stunting and 52 million are wasted particularly in Asia and Africa3. The iron (Fe) and zinc (Zn) deficiencies among the minerals caused due to reduced dietary intake are a greater risk factor for human health4,5,6 and affect about one-third of the population in developing countries7,8. The Fe deficiency is indicated by reduced haemoglobin content resulting in anaemia and affects over 24.8% of the population worldwide and about 65% of the preschool-aged children in South-East Asia and Africa9. It can lead to several life-threatening diseases such as chronic kidney and heart failure, as well as inflammatory bowel diseases10. The Zn deficiency affects 17.3% of the global population mostly in developing countries of Asia and Africa11 and is responsible for the death of over half a million children below the age of five years12. It induces a wide range of physiological problems, such as growth retardation, impaired brain development, increased vulnerability to infectious diseases, diarrhoea and pneumonia, as well as an increased risk of infant mortality, pregnancy, and childbirth complications, and a range of other chronic diseases13,14,15. The GPC along with nutritional importance also determines the processing and end-product quality of wheat. One of the most common causes of infection in humans is a lack of secondary immunity caused by protein-energy malnutrition (PEM). Marasmus (chronic wasting) or kwashiorkor (edema and anemia) are the two clinical symptoms of acute PEM in infants16. In children with chronic PEM, cognitive growth is hampered17. In developing countries, micronutrients and protein-energy malnutrition are the leading causes of death, with pregnant women and young children being the most vulnerable18.

The cereals contribute to the largest daily dietary intake in the regions where micronutrient deficiencies are most prevalent5,19. The staple grains such as wheat (Triticum aestivum) and rice (Oryza sativa) contain sub-optimal levels of micronutrients especially Fe and Zn and milling further reduces their content. Genetic biofortification of staple food crops through conventional, molecular, or transgenic methods is considered a sustainable cost-effective long-term strategy for addressing nutritional deficiency20. The global wheat production for 2020–21 was 772.6 million metric tons (MMT), which is 117 MMT higher than the 2012–13 production of 655 MMT and hence is sufficient to meet global consumption demand21. Wheat production in India hit a new high of 109.5 million tonnes in the crop year 2020–2121. Wheat is consumed by 2.5 billion people worldwide22 and staple food for 30% of the population, particularly in developing countries23. It also accounts for one-fifth of the daily caloric intake and provides more than 20% of global dietary energy24. Wheat has greater accessibility, adaptability, and increased production to fulfil consumer demand. Therefore, biofortification of wheat in developing countries is expected to effectively reduce micronutrient malnutrition.

Identification of closely linked molecular markers to the genomic regions governing complex quantitative traits like Fe, Zn, and GPC will help in the development of biofortified wheat varieties through marker-aided breeding. Currently, GWAS is the most popular approach for dissecting the genetic basis of complex traits25,26. The increased QTL resolution, allele coverage, and ability to use large sets of natural germplasm resources such as landraces, elite cultivars, and advanced breeding lines are advantages of GWAS over conventional QTL mapping based on bi-parental populations. However, GWAS has only been used in a few studies to investigate the genetics of GFeC and GZnC in wheat27,28,29,30,31,32,33.

The yield penalty and concentration effects are additional bottlenecks for wheat biofortification34. Knowledge of the genetic relation of grain micronutrients along with TKW and TW may provide insights for the improvement of micronutrient concentration without compromising grain quality and yield potential. Therefore, more studies are needed on GWAS and also on the identification of candidate genes as well as genomic regions that regulate the accumulation of grain micronutrients and protein concentration. The identified novel genomic regions may be introgressed to develop high-yielding biofortified cultivars. The present study was aimed to identify the genomic region(s) and candidate genes associated with grain Zn and Fe concentration, GPC, TW, and TKW through GWAS in a set of 184 diverse bread wheat genotypes.

Materials and methods

Planting material and conduct of experiment

A set of 184 diverse bread wheat genotypes consisting of old and new Indian elite varieties, exotic lines, landraces, synthetic hexaploid, and derived lines were used for GWAS (Supplementary Table S1). The GWAS panel was evaluated for GZnC, GFeC, GPC, TKW, and TW during 2019–20 crop season at three diverse locations viz., IARI-New Delhi (E1) (Indian Agricultural Research Institute, Research Farm, New Delhi located 29.7008° N, 76.9839° E, 228.6 m AMSL), IARI-Indore (E2) (Indian Agricultural Research Institute, Regional Station, Indore located 22.7196° N, 75.8577° E, 228.6 m AMSL) and GBPUAT-Pantnagar (E3) (Govind Ballabh Pant University of Agriculture and Technology, Research Farm, Uttarakhand, 29.0229° N, 79.4879° E, 243.8 m AMSL). Each genotype was grown in a 5 rows plot of 2.5 m each, with a row-to-row distance of 0.25 m following augmented design with repeated checks namely, HD 3086, C 306, HI 1544, and GW 322. The pests and diseases were controlled chemically, whereas weeds were controlled both manually and chemically. Plant materials were harvested after the grains reached physiological maturity and were completely dry in the field.

Phenotyping

Twenty spikes of each entry were manually harvested, threshed, and carefully cleaned by discarding broken grains and foreign material without touching to metal parts of the farm equipment and used for micronutrient analysis. GZnC and GFeC were measured with a “bench-top” non-destructive, energy-dispersive X-ray fluorescence spectrometry (ED-XRF) instrument (model X-Supreme 8000; Oxford Instruments plc, Abingdon, United Kingdom) standardized for high-throughput screening of mineral concentration of whole-grain wheat35. The GZnC and GFeC were expressed in milligrams per kilogram (mg/kg). The GPC was measured by Infra-red transmittance-based instrument Infra-tec 1125 and expressed in percentage (%). The TKW was measured by weighing a set of randomly selected 1000 grains representing the whole grain sample in the Numigral grain counter. To record the TW, a thoroughly cleaned grain sample was poured into the metallic funnel of the hectoliter weight instrument developed by the ICAR-Indian Institute of Wheat and Barley Research, Karnal. After the grain was levelled well, the outlet was opened to allow the free flow of the grain in the metallic tubular container below till it is filled. Then the shutter was slid to remove the excess grain and to level it. The grain contained in the measuring can was then weighed using an electronic balance. The TKW was expressed in grams (g) and TW as kilogram per hectoliter (kg/hl).

Phenotypic data analyses

The phenotypic data was analysed with ACBD-R (Augmented Complete Block Design with R) version 4.0 software36. The mean, coefficient of variation (CV), least significant difference (LSD), genotypic variance, and heritability were estimated. In ACBD-R v4.0, the best linear unbiased predictors (BLUPs) of each genotype were calculated for each environment and across environments along with four checks varieties (HD 3086, C 306, HI 1544, and GW 322). The calculated BLUPs were then used in the GWAS analysis. The frequency distribution graphs and correlation coefficients of the recorded traits were obtained through Past 3.01 software37.

Genotyping

The CTAB method by Murray and Thompson38 was used to extract genomic DNA from the leaves of 21-day old seedlings. The genotyping was done using the 35 K Axiom® Wheat Breeder’s Array39 by outsourcing to Imperial life sciences, India. A total of 35,153 single nucleotide polymorphism (SNP) markers were processed to obtain high-quality informative markers. The filtering was done in MS Excel and markers with minor allele frequency (MAF) less than 0.05 and greater than 0.95, missing data greater than 30%, and heterozygosity greater than 20% were removed. The remaining set of 9503 high-quality SNP markers was used in GWAS analysis.

Linkage disequilibrium (LD), population structure, and GWAS

The pairwise LD values (r2) were estimated in TASSEL version 5.2.79 and the values were plotted against genetic distance (bp) in R Studio following40. The pattern of LD decay was determined as the distance where LD values reduced to half of their maximum value.

Population structure was inferred by two independent methods: Principal Component Analysis (PCA) using GAPIT version 3.041, and by neighbour–joining (N-J) clustering method in TASSEL version 5.2.79. For cluster analysis, the distance matrix was generated to construct tree using TASSEL version 5.2.79 software by following N-J clustering method, then the tree file was exported in Newick format to construct N-J tree in iTOL version 6.5.2 (https://itol.embl.de/).

The BLUPs from the 184 genotypes were used as phenotypic data in GWAS along with corresponding genotyping data. Significant marker-trait associations (MTAs) were identified using the Fixed and random model Circulating Probability Unification (FarmCPU) model approach in GAPIT version 3.0341,42. This algorithm selects the associated markers as a cofactor to control false positives using likelihood in MLM to avoid overfitting tests markers iteratively. The suitability of the model to account for population structure was assessed using quantile–quantile (Q–Q) plots. SNPs with p ≤ 0.0001 were considered significantly associated with individual traits. The past 3.01 software was used to draw box plots to show the allelic effects of the significant MTAs of GZnC and GFeC.

In silico analysis

A 100 bp sequence was extracted from Ensemble Plants database (http://plants.ensembl.org/index.html) of the bread wheat genome (IWGSC (RefSeq v1.0)) and added on both sides of the SNP for in silico analysis. Insilico search for the putative candidate genes was then done using Basic Local Alignment Search Tool (BLAST) in the Ensemble plant database (https://plants.ensembl.org/index.html). The genes found in the overlapping region and within 1 Mb upstream and downstream of the matched regions were selected as candidate genes and their molecular functions were determined. In addition, their expression patterns were investigated using the Wheat Expression database (http://www.wheat-expression.com/) and potential links to phenotypes were determined using Knetminer tool integrated with Wheat Expression database. The role of the identified putative candidate genes in the regulation of GZnC and GFeC was also determined with the previous reports.

Results

Phenotypic evaluations

A set of 184 diverse genotypes in the GWAS panel were evaluated for nutritional and other grain quality traits in three diverse locations viz., IARI-New Delhi (E1), IARI-Indore (E2), GBPUAT-Pantnagar (E3), and the combined data across three environments (E4). The summary statistics including mean, range, coefficient of variance, heritability, and variance estimates are presented in Table 1. The genotypic variance is significant for all the studied traits. An extensive range of variation was observed for all the traits in all the studied environments (Table 1; Fig. 1). The variation for GZnC, GFeC, GPC, TKW, and TW ranged from 17.56 mg/kg to 56.93 mg/kg, 25.47 mg/kg to 52.09 mg/kg, 8.6% to 15.81%, 24.33 g to 57.18 g and 64.48 kg/hl to 83.77 kg/hl, respectively. The heritability estimates for GZnC, GFeC, and GPC were variable and ranged between 0.5–0.88, 0.4–0.8, and 0.56–0.82, respectively, heritability for TKW and TW were greater and ranged between 0.73 and 0.92. Based on the combined BLUP values over the environments, the ten best-performing lines for the traits were selected and presented in Table 2. The landrace Navrattan and Syntheic Hexaploid Wheat (SHW) 2.38 were among the best performing genotypes for GZnC, GFeC, and GPC. The Indian variety Lokbold was found among the best performing lines for GZnC, GFeC, and TKW.

Table 1 Genetic parameters from the GWAS panel evaluated in IARI-New Delhi (E1), IARI-Indore (E2), GBPUAT-Pantnagar (E3), and across environments (E4).
Figure 1
figure 1

Frequency distribution of GZnC, GFeC, GPC, TKW and TW in the GWAS panel evaluated in IARI-New Delhi (E1), IARI-Indore (E2), GBPUAT-Pantnagar (E3), and across environments (E4).

Table 2 The highest-performing 10 lines for GFeC, GZnC, GPC, TW and TKW in the GWAS panel based on combined BLUPs across environments.

The Pearson correlation coefficients (r) estimated between the traits in each environment are presented in Table 3. The association was positive and highly significant (p < 0.01 to 0.001) among GZnC, GFeC (Fig. 2), and GPC in each environment and across environments. The GFeC was found to have a consistently significant positive correlation with TW and TKW (p < 0.05–0.01) in E1, E2, and E4, however, GZnC did not show any correlation with either TW or TKW in any of the environments. Further, the association of GPC with TW and TKW was negative and significant in E2, E3, and E4 (p < 0.05–0.01). The TKW showed a strong positive correlation with TW in all the environments (p < 0.01–0.001).

Table 3 Pair-wise correlation coefficients among the traits in GWAS panel evaluated in IARI-New Delhi (E1), IARI-Indore (E2), GBPUAT-Pantnagar (E3), and across environments (E4).
Figure 2
figure 2

Scatter plots showing the correlation of grain zinc (GZnC) and iron (GFeC) concentration in GWAS panel evaluated in IARI-New Delhi (E1), IARI-Indore (E2), GBPUAT-Pantnagar (E3), and across environments (E4).

Marker distribution, LD decay and population structure

A set of 9,503 high quality SNP markers were distributed across the genome with the highest number of markers on the B sub-genome (3646), followed by A (2979) and D (2878) sub-genomes, respectively. Chromosome-wise distribution suggests that the highest number of markers were mapped on chromosome 1B (675) followed by chromosome 2B (653) and 1D (610). Chromosomes 4D (170) and 4B (266) had the least number of markers (Table 4).

Table 4 The sub-genome-wise distribution of SNP markers in the GWAS panel.

The LD was estimated by calculating the squared correlation coefficient (r2) for all the 9503 markers. Genome-wide LD decayed with genetic distance, the LD decayed to its half at 4.71 Mb for whole genome, and 3.63 Mb for A, 5.63 Mb for B and 4.90 Mb for D sub-genomes (Fig. 3).

Figure 3
figure 3

Scatterplot showing linkage disequilibrium (LD) decay estimated by plotting (r2) against genetic distance (bp) in 184 diverse bread wheat accessions. The green line indicates the threshold point where LD dropped to 50% of its maximum value. LD decay value is at cut off point is indicated on the x-axis with the green font.

Population structure inferred by Principal Component Analysis (PCA) revealed three groups in the GWAS panel (Fig. 4a). The three groups consisted of 45 (G1), 20 (G2) and 119 (G3) genotypes respectively. The PC1, PC2 and PC3 accounted for 10.63%, 8.72% and 5.45% of the total variation respectively. The first three principal components were used as covariates in GWAS analysis to reduce the false positives. The Clustering methods (N-J tree) also revealed the three subpopulations, thus confirming the results of PCA (Fig. 4b). The G1 has most of the exotic lines, G2 constituted of some of the new Indian varieties and G3 was dominated by breeding lines. The Indian varieties were distributed in all three groups.

Figure 4
figure 4

Three-dimensional principal component analysis plot (a), Neighbor-joining tree (b) inferring the population structure.

Marker-trait associations

A total of 55 MTAs were detected; 4 for GFeC, 2 for GZnC, 23 for GPC, 15 for TKW, and 11 for TW. The details of these MTAs are summarized in Table 5 and depicted as Manhattan plots in Fig. 5. The Q-Q plots illustrating observed associations between SNPs and grain micronutrient concentrations compared to expected associations after accounting for population structure are presented in Fig. 5.

Table 5 Marker trait associations (MTAs) detected for GFeC, GZnC, GPC, TKW, and TW in IARI-New Delhi (E1), IARI-Indore (E2), GBPUAT-Pantnagar (E3), and across environments (E4).
Figure 5
figure 5

Manhattan and QQ plots for GFeC, GZnC, GPC, TKW and TW from IARI-New Delhi (E1), IARI-Indore (E2), GBPUAT-Pantnagar (E3), and across environments (E4).

MTAs for GFeC

A total of four significant MTAs were identified for GFeC in E1, E3, and E4 environments on chromosomes 2B, 3A, 3B, and 6A (Table 5; Fig. 5). The phenotypic variation (PV) explained by these SNPs ranged between 8.82 and 12.62%. A major SNP on chromosome 3A (AX-95002032) located at 637.9 Mb explained 12.62% of the PV, while another SNP on chromosome 6A (AX-94715803) located at 585.4 Mb explained 11.14% of PV, both were detected in E3. The other SNPs, AX-94761251 on 2B and AX-94850629 on 3B explained the PV of 9.26 and 8.82%, respectively. The SNP, AX-95002032 had A and C alleles with a phenotypic average of 30.68 mg/kg and 36.14 mg/kg respectively. The SNP, AX-94715803 had A and G alleles with a phenotypic average of 30.6 and 34.76 mg/kg respectively (Fig. 6).

Figure 6
figure 6

Allelic differences of the significant MTAs identified for GFeC and GZnC in IARI-New Delhi (E1), IARI-Indore (E2), GBPUAT-Pantnagar (E3), and across environments (E4).

MTAs for GZnC

Two significant MTAs were identified for GZnC in E2 and E3 environments on chromosomes 1A and 7B (Table 5; Fig. 5). One SNP (AX-94422893) was identified on chromosome 7B, located at 488.4 Mb, and explained 7.60% of the PV, while another SNP on chromosome 1A (AX-94651424) located at 544.7 Mb explained 6.37% of PV. The SNP AX-94422893 had C and T alleles with a phenotypic average of 29.03 and 28.13 mg/kg respectively. The SNP AX-94651424 had C and T alleles with a phenotypic average of 24.04 and 22.51 mg/kg respectively (Fig. 6).

MTAs for GPC

A total of 23 significant MTAs were identified for GPC across all environments and pooled mean, which were located on 13 different chromosomes viz., 1A, 1B, 1D, 2A, 2B, 2D, 3B, 3D, 4D, 5B, 6A, 7A, and 7D (Table 5; Fig. 5) and PV explained ranged from 1.91–18.19%. A major SNP detected in E1 on chromosome 4D (AX-95233137) located at 482.9 Mb explained 18.19% of the PV, while the SNPs on chromosome 1A, AX-94508066 located at 398.1 Mb, AX-95195514 located at 354.9 Mb explained 13.71% and 12.91% PV respectively. The SNP detected in E4 on chromosome 5B (AX-94838908) located at 689.2 Mb explained 13.87% of PV.

MTAs for TKW

A total of 15 significant MTAs were identified for TKW across all the environments and pooled mean. The corresponding SNPs were assigned to 12 different chromosomes, namely, 1A, 1B, 1D, 2A, 2B, 2D, 3A, 3B, 4D, 5D, 7A, and 7B (Table 5; Fig. 5) and PV explained ranged from 1.17 to 12.07%. The SNP AX-94939463 on chromosome 7A located at 731.8 Mb, was identified in the three environments namely, E1, E3, and E4, and explained the PV of 12.07, 10.13, and 8.09% respectively. This SNP had A and G alleles with a phenotypic average of 36.22 and 39.45 g, 32.86 and 37.95 g, 36.50 and 39.81 g at E1, E3, and E4 respectively. The other major SNPs identified were AX-94680946 on chromosome 2B at 156.8 Mb explained a PV of 10.83% followed by AX-94665811 on chromosome 3B at 694.3 Mb explained a PV of 9.49%. Both the SNPs were identified in E1.

MTAs for TW

A total of 11 significant MTAs were identified for TW in E2, E3, and E4 on chromosomes 1B, 3A, 4B, 5A, 5B, 5D, 6A, 6D, and 7B (Table 5; Fig. 5). The SNPs explained 1.65–13.79% of PV. A major SNP detected at E3 on chromosome 6A (AX-95197534) located at 3.8 Mb explained 13.79% of the PV.

Pleiotropic regions and stable MTAs

A stable MTA was identified for TKW on chromosome 7A in three environments i.e. E1, E3 and E4 (details are given in TKW paragraph of results section). The pleiotropic regions were identified for GPC and TKW on chromosome 1A (AX-94508066&AX-94466632), 2A (AX-95126447&AX-94691823) and 2B (AX-94882514&AX-95082269) between 398–403.1 Mb, 16–24.1 Mb and 777.8–790.8 Mb, respectively. They explained the considerable PV for GPC (13.71, 5.26, and 6.51%) and small PV for TKW (1.82, 3.71, and 1.17%).

In silico analysis

In silico analysis identified 23 candidate genes associated with important MTAs of GFeC and GZnC (Table 6). The candidate genes identified for GFeC includes TraesCS2B02G321500 (Domain of unknown function DUF3475 and Domain of unknown function DUF668), TraesCS3B02G295000 (Serine-threonine/tyrosine-protein kinase), TraesCS6A02G353900 (Zinc finger CCCH-type proteins) and TraesCS3A02G389000 (F-box-like domain superfamily) among others. The candidate genes for GZnC include TraesCS7B02G266000 (Histone deacetylase domain superfamily or Ureohydrolase domain superfamily) andTraesCS1A02G365900 (SANT/Myb domain or Homeobox-like domain superfamily) among others.

Table 6 Putative candidate genes identified for GZnC and GFeC.

Discussion

The genetics of GZnC, GFeC, GPC, TKW, and TW has been studied using GWAS in the present study. The GWAS panel exhibited significant variability for all the investigated traits. The range of variability obtained in the present study for GZnC (17.56–56.93 mg/kg) and GFeC (25.47–52.09 mg/kg) is in the range reported previously43,44,45,46,47,48,49,50,51. The heritability estimates were higher (h2 ≥ 0.60) for the studied traits, however, moderate heritability was found in some environments eg., > 0.40 and < 0.60 for GFeC in E1 and E4, GZnC and GPC in E4 (Table 1). The comparable heritability estimates were also reported in previous studies for these traits29,32,49,50,51,52,53.

The two genotypes i.e. the Navrattan (landrace) and 2.38 (SHW) were among the best performing genotypes for GZnC (36.05 and 37.72 mg/kg), GFeC (37.87 and 38.27 mg/kg) and GPC (12.32% and 12.22%) based on the combined BLUPs across environments and hence can be efficiently utilized in breeding programmes. The genotype Lokbold was identified to be another good performer for GFeC (37.76 mg/kg), GZnC (35.26 mg/kg), and TKW (49.8 g) and therefore can also be given due consideration in breeding programmes (Table 2). The performance of many old Indian varieties was found to be better for GZnC, GFeC, and GPC than recently released cultivars and vice versa was the case for TKW and TW, confirming the dilution effect, i.e. increased efforts of plant breeders in enhancing grain yield led to an unintentional increase of more starchy endosperm, thus reductions in other important quality components in modern wheat varieties54.

The significant positive correlations among GZnC, GFeC, and GPC indicate the possibility of simultaneous improvement of these traits. This finding is in line with earlier reports44,49,50,51,52. Additionally, many studies suggested a common genetic basis for these traits through GWAS and conventional QTL studies50,51,55,56. Also, GFeC and GZnC showed either positive or no correlation with TKW and TW, indicating that the grain Zn and Fe can be increased without yield penalty. However, the study also shows the negative association of TW and TKW with GPC indicating yield penalty with the increase in GPC beyond a certain level34, thus it is suggested to improve protein quality profile with the optimum level of protein quantity required for the superior-quality end product.

The population structure inferred by PCA revealed three sub-populations in the GWAS panel (Fig. 4). Similarly, NJ-based clustering divided the whole set of 184 genotypes into three distinct clusters. The genotypes were clustered in the previous studies mainly based on pedigree, geographical, and evolutionary origin29,31,32,33. In the current study, G1 was dominated by exotic lines, G2 constituted new Indian varieties and G3 by breeding lines. Most significantly, the Indian varieties were mixed up with all the three groups, thus pointing towards their broad genetic base. The results also suggest that many new Indian varieties might have been bred by introgressing genes from exotic lines.

The LD decay over genetic or physical distance in a population determines the density of marker coverage needed to perform GWAS. A faster LD decay indicates that a higher marker density is required to capture the markers close enough to the causal loci57. In the present study, the LD decayed to its half from the maximum LD at 4.71 Mb for whole genome, 3.63 Mb for A, 5.63 Mb for B and 4.90 Mb for D sub-genomes (Fig. 3). A similar LD pattern of 5.98 Mb was reported in a set of Chinese wheat landraces58. In contrast, the whole genome LD decay was faster and it was at the distance of 2 Mb in a set of CIMMYT spring bread wheat lines59. In addition, the whole genome LD decay distance of 3 Mb was reported60. In contrast to faster LD decay, the slower LD decay distance of 22 Mb and 23 Mb respectively were found in a set of hexaploid wheat collections from Kazakhstan and in Mexican bread wheat landraces61,62. The variation in the LD pattern among different GWAS populations may be due to factors like selection, mutation, admixtures, non-random mating, etc.

A total of 55 MTAs were identified, 4 for GFeC, 2 for GZnC, 23 for GPC, 15 for TKW, and 11 for TW. The four MTAs were identified for GFeC on chromosomes 2B, 3A, 3B, and 6A and explained the PV ranging between 8.82 and 12.62% (Table 5; Fig. 5). Previously, MTAs for GFeC were reported on chromosome 3A28, on chromosome 3B1,27, further QTL were reported on 2B and 6A by51,63. The major SNP on chromosome 3A (AX-95002032) located at 637.96 Mb explained 12.62% of the PV, and the putative candidate gene linked with this marker is TraesCS3A02G389000 (F-box-like domain superfamily). Interestingly, the E3 ubiquitin ligase complex containing the FBXL5 (F-box and leucine-rich repeats protein 5) protein targets iron regulatory protein (IRP2), the FBXL5 accumulating under iron- and oxygen-replete conditions and degraded upon iron depletion64,65. These observations also hint at the possible role of FBXL5 in iron sensing in plant systems. The SNP AX-94715803 on chromosome 6A located at 585.43 Mb explained 11.14% of PV. The putative candidate gene linked with this marker is TraesCS6A02G353900 (Zinc finger CCCH-type, G-patch domain). It is noteworthy that the zinc finger transcription factors, which control the functions of various genes, have a DNA binding domain that requires zinc or iron ions for its structural and functional stability and for activation66. The possible role of zinc finger protein in wheat grain zinc accumulation was also reported earlier31,50,51. The other two MTAs, AX-94761251, AX-94850629 found on chromosome 2B at 458.62 Mb and on chromosome 3B at 473.69 Mb explained a respective PV of 9.26 and 8.82%, respectively. In silico analysis revealed the putative candidate genes TraesCS2B02G321500 (Domain of unknown function DUF3475, DUF668) for AX-94761251 and TraesCS3B02G295000 (Serine-threonine/tyrosine-protein kinase) for AX-94850629. The possible role of protein kinases in grain iron and zinc accumulation in wheat was also indicated by other authors1,27,28,50,51,67. The protein kinases phosphorylate Fe and Zn proteins and are found to show greater interactions with Fe and Zn transporter proteins68, hence these are expected to have a potential role in grain micronutrients accumulation.

The two SNPs associated with GZnC found on chromosomes 1A and 7B with PV ranged from 6.35 to 7.60% (Table 5; Fig. 5). Previous studies also reported the QTLs on these chromosomes28,31,33,51. An SNP on chromosome 7B (AX-94422893) located at 488.41 Mb explained 7.60% of PV, this region encodes TraesCS7B02G266000 (Histone deacetylase domain superfamily, Ureohydrolase domain superfamily). Histone deacetylases play an important role in gene regulation. The zinc ion acts as a cofactor and regulates the catalytic function of the classical HDAC family of enzymes (Class I, II, IV)69. Another SNP on chromosome 1A (AX-94651424) located at 544.72 Mb explained 6.37% of PV and codes for TraesCS1A02G365900 (SANT/Myb domain, Homeobox-like domain superfamily). Interestingly, Arabidopsis Myb transcription factors positively regulate the biosynthesis of glucosinolates70 which in turn are involved in trade-off with Zn. This is evident from the studies on Thlaspicaerulescens, where Zn hyper accumulation decreased sinalbin (p-hydroxybenzylglucosinolate) concentration in shoots71.

Additionally, 23 MTAs identified for GPC, 15 for TKW, and 11 for TW. They explained the PV of up to 18.19%, 12.07%, and 13.79% respectively. A major SNP AX-94939463 identified for TKW on chromosome 7A at 731.88 Mb, stably found in three environments namely, E1, E3, and E4, and explained the PV of 12.07, 10.13, and 8.09% in the respective environments. This SNP was found in the region codes for TraesCS7A02G560100 (Polysaccharide biosynthesis domain). Interestingly, the polysaccharide synthesizing glycosyltransferases are the enzymes generally organized on Golgi bodies that catalyze the synthesis of more complex and highly branched polysaccharides72. Previously, two candidate genes namely, TaSus1 and TaGASR-A1 were reported on chromosome 7A73.

The positive and highly significant correlation between GFeC, GZnC and GPC (P < 0.001) in all the environments suggested the possibility of simultaneous improvement of these traits. The best performing lines like Navrattan, SHW 2.38, and Lokbold can be utilized as sources in the breeding pipeline and for developing mapping populations to discover QTLs for grain Zn and Fe concentration and protein content. Further, the promising SNPs on chromosome 1A, 7B for GZnC and 2B, 3A, 3B, and 6A for GFeC could be converted into breeder’s friendly Kompetitive Allele Specific PCR (KASP) markers to be used in marker-assisted selection (MAS) or targeted introgression to develop biofortified cultivars. The putative candidate genes identified need to be validated further to shed light on their functional role in grain Fe and Zn concentration.

Ethical approval

The genotypes listed in the study were in wheat section of Division of Genetics, ICAR-IARI, New Delhi and all imported lines have been obtained through National Bureau of Plant Genetic Resources, New Delhi following the prescribed guidelines. Also, we have all the permissions and rights to collect and use the genotypes for research purpose. The experimental research and field experiments in the present study are duly approved by the institute research council and advisory committee of NDR.