Genomic regions associated with heat stress tolerance in tropical maize (Zea mays L.)

With progressive climate change and the associated increase in mean temperature, heat stress tolerance has emerged as one of the key traits in the product profile of the maize breeding pipeline for lowland tropics. The present study aims to identify the genomic regions associated with heat stress tolerance in tropical maize. An association mapping panel, called the heat tolerant association mapping (HTAM) panel, was constituted by involving a total of 543 tropical maize inbred lines from diverse genetic backgrounds, test-crossed and phenotyped across nine locations in South Asia under natural heat stress. The panel was genotyped using a genotyping-by-sequencing (GBS) platform. Considering the large variations in vapor pressure deficit (VPD) at high temperature (Tmax) across different phenotyping locations, genome-wide association study (GWAS) was conducted separately for each location. The individual location GWAS identified a total of 269 novel significant single nucleotide polymorphisms (SNPs) for grain yield under heat stress at a p value of < 10–5. A total of 175 SNPs were found in 140 unique gene models implicated in various biological pathway responses to different abiotic stresses. Haplotype trend regression (HTR) analysis of the significant SNPs identified 26 haplotype blocks and 96 single SNP variants significant across one to five locations. The genomic regions identified based on GWAS and HTR analysis considering genomic region x environment interactions are useful for breeding efforts aimed at developing heat stress resilient maize cultivars for current and future climatic conditions through marker-assisted introgression into elite genetic backgrounds and/or genome-wide selection.

In Asia region maize has emerged as the third most important crop after rice and wheat, with average annual growth rate of 3.1% in maize area during 1993-2013 1 . Though the growth in maize area resulted in 2.4-5.6% increases in production, maize productivity in this region remain low with high inter-annual variability. Climate change effects accounted for over 50% of the total year-to-year variation in maize yields 2 . In Asia region, maize is largely (~ 80%) grown as rainfed crop and therefore is highly exposed to weather extremes, including intermittent drought and heat stress. Thus, there is a need for fast-tracking the development and deployment of stress-resilient maize to cope-up with extremes weather events. Due to exponential increase in maize demand spring maize has emerged as third cropping season where the crop is grown during the hot-summer period of the year (Feb-May). Maize crop grown during spring season is invariably exposed to high temperature regimes (> 35 ℃) during most part of the critical crop growth period, especially from the late vegetative stage until early grain filling. Also, during the main maize cropping season (monsoon season), which accounts for over 70% of the total maize area in the region, there is an increased frequency of drought years along with high temperature that severely affects maize yields. Climate modeling studies suggested that increased day as well as night temperatures will become more common in the tropics and that may significantly affect food production 3,4 . South Asia region has been identified as one of the hot-spot regions for climate change effects; therefore, without sufficient adaptation measures, the region is likely to suffer negative impacts of the climate change 5 . During the past 2 decades, Asian tropics have experienced frequent episodes of extreme weather events including increased day/night temperatures, apart from scattered drought/heat stress periods almost every year in one or another

Materials and methods
Association mapping panel. A set of 534 maize inbred lines representing the wide genetic background of tropical maize were included in the association mapping panel, named as heat tolerant association mapping (HTAM) panel (Supple. Table-1). The lines involved in the panel included the selected 449 lines derived from the International Maize and Wheat Improvement Center (CIMMYT), 52 lines from the Maize and Millet Research Institute (MMRI), Pakistan, 23 tropicalized temperate lines from Purdue University, USA, and 10 lines from Kaveri Seeds, Hyderabad, India. The CIMMYT lines involved in the panel were derived from various pools and populations, including several biparental pedigree populations developed by the CIMMYT-Asia maize program for abiotic stress breeding, and selected lines from pools/populations from CIMMYT-Mexico, such as La Posta Sequia-C7 (tropical late white-dent), DTPY-C9 (drought tolerant population-yellow), DTPW-C9 (drought tolerant population-white), G18 Sequia C5 (drought tolerant early yellow tropical population) and Pool16 BN Sequia-C5 (drought tolerant early white tropical population), which had reasonably good adaptation to the Asian tropics. Further details about the pools/populations from CIMMYT's lowland tropical maize program can be found elsewhere 19 . Phenotyping under managed heat stress. The HTAM panel was test-crossed with an established tester line from CIMMYT (CML451) with high general combining ability. CML451 is a high yielding line widely used by lowland tropical breeding programs in Asia; it has good local adaptation and resistance to foliar diseases common in the region but is highly susceptible to heat stress. The test-cross progenies were phenotyped at nine locations during the spring season (March to June) of 2013 and 2014. In spring 2013, the testcrosses were evaluated in four locations of India and Nepal, viz., Raichur (RA) and B'gudi (BG-1), Karnataka and Ludhiana (LU), Punjab, India, and Nepalgunj (NG), Nepal. In spring 2014, the testcross were evaluated at five locations in India, viz., Hyderabad (HY) and Baijenki (BJ) in Telangana state, at two locations in Jalandhar district (JL-1, JL-2) of Punjab and B'gudi (BG-2) in Karnataka. Locations were carefully selected based on weather data during the spring season (Feb-June) over the past 10 years, which showed that at all these locations, maximum temperature (Tmax) is usually > 35 °C after mid-March and invariably reaches close to 40 ℃ or more during the first fortnight of the month of May, and similar weather pattern was observed during the multilocation trials conducted in the year 2013 and 2014 (Supple. Table 2). The geographical coordinates of the trial locations are given in Table 1.
Trials at all locations in both years were planted during the second or third week of March, except at the Jallandhar location, where one set was planted in the last week of February and the other set in the second week of www.nature.com/scientificreports/ March 2014. Spring season planting time at all the locations was adjusted in such a way that most of the reproductive stage, including flowering, pollination, fertilization, and early grain filling, is exposed to severe heat stress. The weather parameters, viz., maximum, minimum temperature, relative humidity and vapor pressure deficit (VPD), of the trial locations were captured for the cropping period (Supple. Table 2). All the trials were planted using an alpha-lattice design with two replications in rows of 4.0 m in length with 0.75 m row-to-row spacing and 0.2 m plant-to-plant spacing. At maturity, ears were harvested separately from each plot and the grain yield (GY) was recorded on a per plot basis. Final grain yield was calculated after adjusting the moisture content at 12.5% for each plot and converting it to tons per hectare (t ha −1 ).
Phenotyping data analysis. Analyses of variance (ANOVA) for GY at individual locations were carried out using META-R (Multi-Environment Trial Analysis in R) software. Genetic (σ 2 g) and error (σ 2 e) variance components are estimated from ANOVA. Broad-sense heritability (h 2 ) for individual locations was estimated as: where h 2 is the broad-sense heritability, σ 2 g is the genetic variance, σ 2 p is the phenotypic variance estimated as σ 2 p = σ 2 g + (σ 2 e/r), where σ 2 e is the error variance and r is number of replications. Best linear unbiased estimators (BLUEs) were estimated for individual locations as: where Y bir was the adjusted phenotypic observation for the ith genotype in the bth block within the rth replication. µ denoted the mean, G i denoted the fixed effect of the i th genotype and e bir the residual error term. Descriptive statistics including mean, minimum, maximum and least significant difference (LSD) were also generated using standard procedures implemented in METAR. Best linear unbiased estimators (BLUEs) were used for GWAS.

Genome-wide association studies (GWAS).
A total of 955,690 SNPs obtained for the HTAM panel after imputation, were further filtered using the multiple selection criteria (as described above in the HTAM panel genotyping section) and used for GWAS analysis. The association between the filtered SNPs and the trait of interest (GY) was detected by employing a Mixed Linear Model (MLM) in the SNP and Variation Suits v8.6.0 software (GoldenHelix, Inc., Bozeman, MT, www. golde nhelix. com). GWAS was conducted for individual locations by MLM statistics in which the kinship relationship matrix (K) and population structure matrix (Q-based on the first 10 PCAs) were considered to correct for false positive or spurious associations related to familial relatedness. Marker-trait association (MTA) with a p value threshold of ≤ − 10 -5 was considered as significant association. Fitness of the model was determined by observing the Q-Q (quantile-quantile) plot ****(Supple. Figure 1), the plots of observed -log10 p values versus expected -log10 p values under the null hypothesis that there is no association between the marker and the phenotype. Haplotype trend regression (HTR) analysis. Haplotype trend regression takes one or more blocks of markers and estimates haplotypes for each block of markers which was then regressed on by-sample haplotype probabilities against a dependent variable. Haplotype blocks were detected based on the block-defining algorithm to minimize historical recombination 24 from significantly associated SNPs identified at each location. Haplotype frequency was estimated using 50 EM iterations, an EM convergence tolerance of 0.0001 and a frequency threshold of 0.01. In addition to the haplotype blocks detected using the algorithm, the single variant SNPs were also included for further analysis. Trend regression analysis of the haplotypes and SNPs was carried out based on a stepwise regression of the grain yield with the pre-estimated haplotypes with forward elimination. The significant haplotypes blocks were finalized based on the Bonferroni p value cutoff of ≤ 10 -2 and False Discovery Rate (FDR) p value cutoff of ≤ 10 -3 .

Results
Performance of the HTAM panel TC under heat stress. Genotypic variation for grain yield was significant at all the locations, with heritability ranging between 0.44 at RA and 0.69 at BG-2 (Table 1). Mean grain yield under heat stress ranged from 2.25 t ha −1 at NG to 6.81 t ha −1 at JA-2. Maximum grain yield ranged from 3.22 t ha −1 (NG) to 8.06 t ha −1 (JA-2), whereas minimum grain yield ranged from 1.15 t ha −1 (BJ) to 5.42 t ha −1 (JA-2) ( Table 1). Apart from the lowest grain yield, NG expressed the least variation between maximum and minimum yields in the trial (1.68 t ha −1 ) indicating that this location experienced the most severe heat stress, followed by RA, the second lowest yielding location with narrow variability (1.89 t ha −1 ). Among nine environments, BG-2 was found to be the best location in terms of expressing genotypic variability, with a 4.22 t ha −1 difference between maximum and minimum yields at the location with 3.19 t ha −1 , the mean yield of the trial. The correlation between grain yield at locations ranged from − 0.198 and maximum of 0.390 ( GWAS results. In total, 955,690 GBS SNPs were identified for the panel. Principal component analysis using genome-wide markers revealed only a moderate population structure (Fig. 1). The first two components explained 32.30% variance. Upon imposing the selection criteria of a minor allele frequency (MAF) of 0.3 and a call rate of 90%, a sub-set of SNPs was shortlisted to 43,705 SNPs, which were used for Linkage Disequilibrium (LD) decay estimation. The LD decay was 4.76 kb at r 2 of 0.2 and 13.72 kb at r 2 of 0.1 (Fig. 2). To carry out the association analysis, the total number of SNPs were filtered based on a call rate of 70% and an MAF of 0.03. The number of SNPs used for GWAS varied from 281,268 SNPs to 289,061 SNPs, as in each location the number of test crosses evaluated varied from 290 to 479 (Table 3). A total of 269 significantly associated SNPs were identified from nine locations at the p value threshold of ≤ 10 -5 (Fig. 3, Supple. Table 3). The proportion of variation explained ranged from 3.3 to 7.84%, with an average of 4.75%. The number of significantly associated SNPs for each location ranged from 44 in BG-1 to 21 in RA ( Table 3). Out of the total significant SNPs, the maximum number of SNPs were identified on chromosome 9 (50 SNPs) and the minimum number of SNPs were in chromosome 7 (12 SNPs) across locations. The p-value threshold of the significantly associated SNPs ranged from 8.08 × 10 -7 to 9.96 × 10 -5 . Highly significant SNPs were found in the HY location on chromosome 3,

Haplotype trend regression (HTR) analysis.
In the present study, significant SNPs identified in each location totaled 269 SNPs; they are specific to the location, as none of the SNPs were common across the locations. To make a consensus of the individual location analysis, haplotype trend regression analysis was carried out involving all the 269 SNPs to form the haplotype blocks and regressed against the individual location   Table 4). The proportion of variance explained by haplotype/SNP variants ranged between 3.10 and 18.24%. Among these, 13 variants were found to be significant in more than one location (Table 4), which included 5 haplotype blocks and 8 single SNP variants. Notably the SNP variant SNP3-S5_24337729 was found to be significant in 5 locations (Table 5), viz., LU, NG, BJ, BG-2 and HY followed by Haplotype Hap-9.1 located at 26 Mb on chromosome 9, which was found to be significant in three locations, viz., BG-1, LU and BJ. All the other variants occurred in at least 2 locations.

Discussion
Studies have highlighted the potential advantage of incorporating heat with or without drought tolerance into maize germplasm; this has the potential to offset predicted yield losses in current and future climatic scenarios in South Asia, compared to the current high yielding but heat stress-vulnerable maize varieties that are extensively grown in the lowland tropics 25 . Heat stress tolerance is a relatively new trait added to the product profile of the maize breeding pipeline for the lowland tropics in Asia. A limited success was achieved by breeders over the period by relying on the traditional breeding approaches of selecting tolerant lines based on phenotype, despite the known problems due to genotype x environment interaction, low heritability of grain yield under stress, etc. [26][27][28] . The integration of new molecular breeding approaches and methodologies has helped to understand the genetic basis of the plant response to stress and make informed selection decisions and introgression of heat tolerance into elite genetic backgrounds [29][30][31] . In the present study, GWAS was carried out in the HTAM panel phenotyped for grain yield under reproductive stage heat stress at nine locations in South Asia. Heat stress during flowering is known to have a negative impact on pollen viability and silk receptivity [32][33][34][35] and cause a decrease in photosynthesis 36 , which eventually affects grain yield, as evident from the mean and range of grain yield in all locations (Table 1). Grain yield at different locations were weakly correlated as none of the correlation coefficient values were ≥ 0.50 (Table 2) and therefore biologically not meaningful 37 . These weak correlations indicate strong genotype by environment interaction effects under heat stress due to different VPD regimes at Tmax at different locations 12 that resulted in significantly different phenotypic expression in terms of grain yield under heat stress. Though the GWAS of the panel was carried out with > 280 K SNPs, accuracy of the association analysis was affected by several genetic and non-genetic factors 38 . LD decay is the key factor among them. In the present HTAM panel, the LD decay was 4.76 kb at r 2 = 0.2 and 13.72 kb at r 2 = 0.1. This result is in agreement with various CIMMYT association mapping panels of tropical/sub-tropical germplasm for abiotic and biotic stress and nutritional quality traits [39][40][41] . The LD decay distance is shorter in tropical and sub-tropical lines than in temperate lines because of their diverse nature and a higher number of rare alleles 42 , which result in obtaining high mapping resolution 17,43 .
In the present study, a total of 269 significantly associated SNPs were identified at p value < 10 -5 (Fig. 3). Of 269 SNPs, 175 SNPs (i.e., 65.05% of the SNPs) were within the 140 unique gene models. Among the SNPs within the gene models, SNPs S1_283602921, S2_184967723, and S5_182091381 on chromosomes 1, 2 and 5 were in the gene model GRMZM2G065374/Zm00001d034298, and GRMZM2G137426/ Zm00001d017138 coding for bHLH47 and bHLH128 genes, respectively. In general, bHLH (basic-Helix-Loop-Helix) transcription factors (TFs) are involved in regulating various abiotic stresses like drought and salinity, along with biological processes such as plant development, flavonoid biosynthesis, flowering and photosynthesis [44][45][46] . Similarly, the SNP S5_5626614 on chromosome 5 and SNPs S1_281635911, S3_2629620, S5_191376812 on chromosomes 1, 3 and 5 were within the gene models GRMZM2G386273/Zm00001d013172, GRMZM2G028151/Zm00001d034204, GRMZM2G142179/Zm00001d039324 and GRMZM2G057386/Zm00001d017466 coding for bZIP55, EREB107, EREB184 and EREB50, respectively. The bZIP (basic lucine zipper) and EREB TFs play a critical role in myriad biological functions such as cell elongation, organ and tissue differentiation, root growth, plant senescence and light response 47,48 . Apart from these biological functions, these TFs respond to various abiotic stresses like cold, drought and salt 49 , and enhance the tolerance of plants by altering or reprogramming the various metabolic processes. It has also been reported that bHLH genes interact with bZIP and MYB drought tolerant genes 50,51 . The Table 3. Number of SNPs used for GWAS and PCA analysis and numbers of marker-trait associations (MTA) for grain yield under heat stress at individual locations. www.nature.com/scientificreports/ SNP S3_178992082 on chromosome 3 is within the gene model GRMZM2G449033/Zm00001d042843 and codes for the na1 (nana plant1) gene. Similarly, S9_122089710 on chromosome 9 and S2_1035863 on chromosome 2 are within the gene models GRMZM2G077495/Zm00001d047250 and GRMZM2G064212/Zm00001d001790 that code for PLATZ14 and ATG4 genes, respectively. Reports state that na1 is an important gene in the brassinosteroid biosynthetic pathway, which plays an important role in internode elongation and sex determination in tassel and silk primordia 52 . Similarly, in maize, the high energy consuming processes are endosperm development and grain filling in which the PLATZ protein interacts with the RNAPIII subunits to regulate the transcription of many transfer RNAs (tRNAs) and 5S ribosomal RNA (5S rRNA) 53 . Autophagy related proteins are the primary www.nature.com/scientificreports/ route for nutrient recycling in plants and become critical during nitrogen stress and under suboptimal field conditions that severely impact maize productivity. Of all the candidate genes identified in the present study, 51 SNPs/region were reported in previous GWAS and QTL studies. In the present study eight putative candidate gene regions were identified: two putative candidate gene regions for grain yield under heat stress, two for grain yield under drought stress, two for days to 50% anthesis under heat stress, one each for days to 50% anthesis under drought and combined drought and heat stress reported by Yuan et al. 17 (Supple. Table 3). In the process of improving grain yield under abiotic stress conditions, stress-adaptive secondary traits with high heritability and good correlation with grain yield under stress could improve the efficiency and success of the breeding program 54 . In the present study, a total of 13 SNPs (Supple. Table 3) were found within the QTL regions of heat stress-related secondary trials like leaf firing (11 SNPs), leaf blotching (2 SNPs) and tassel blast (1 SNP) 55 . Similarly, 16 SNPs (Supple. Table 3) were found within QTL regions for heat susceptibility index (HSI) for grain yield (15 SNPs) and HSI for leaf scorching 56 . Apart from the heat stress studies, few of the SNPs were previously reported for other abiotic stresses like drought and waterlogging stress. The SNP S10_25666798 within the candidate gene GRMZM2G037378/Zm00001d023885 was reported for its association with rooting depth under drought stress 39 , and a total of 20 SNPs were found within the QTLs reported for grain yield under waterlogging stress 57 . SNPs within various gene models, which had biological functions related to stress tolerance mechanisms, and the SNPs that were in previously reported functional and QTL regions can be further validated, re-sequenced to identify causal mutations and the most favorable alleles in order to develop simple PCR-based markers for MAS 58,59 .  www.nature.com/scientificreports/ The significant SNPs identified in the present study from 9 locations were unique, except for two SNPs, viz., S9_26830815 and S9_26830818, which were significantly associated with the trait in BG-1 and NG. Though none of the other SNPs were common across the locations, most of the SNPs between the locations were in proximity (Supple. Table 3) and considered for haplotype trend regression analysis. The haplotype analysis approach considers the natural dependence that exists between SNPs, which is relevant when considering high-density DNA sequences. Haplotypes were identified and haplotype frequencies were estimated, followed by trend regression analysis across all nine locations. This study helped to identify stable variants significantly associated with grain yield under heat stress across locations. Twenty-six haplotype blocks were identified, and 91 single SNP variants did not form part of any haplotype blocks (Supple. Table 4). Twenty-one haplotype blocks were formed with 2 SNPs, 2 blocks each were formed with 3 and 4 SNPs, and 1 block with 5 SNPs. The haplotype blocks with more than two SNPs in strong LDs are more informative than the biallelic SNPs, because of their multiallelic nature 60 . The higher allelic resolution of identified candidate regions will sustain a more accurate delineation of complex marker-trait correlations. The range and mean of phenotypic variance explained were higher than the single SNP-based GWAS. Also, 67.4% of the haplotypes in the HTR analysis had more PVEs (the range of difference was 0.07-12.22%) than the PVEs based on single SNP-based GWAS. This difference indicates the potential of the HTR to increase the PVEs explained for complex traits like grain yield 61 .
The identified haplotype blocks were found to be significant at a single location to a maximum of five locations. A set of 13 variants (5 haplotype blocks and 8 single SNP variants) were found to be significant for grain yield under heat stress in two to five locations (Table 5). Notably, the single variant SNP_3 (S5_24337729), which was significant across 5 locations, and Hap_6 (S6_164097936, S6_164097938), significant in two locations, were found to be within the intervals of two meta QTLs, MQTL5.3 and MQTL6.2, for grain yield and ASI under drought and optimal conditions, as reported by Semagn et al. 62 . These regions are important, as haplotype-based analysis is useful for fine mapping, marker-assisted selection and precise discovery of new genes responsible for heat tolerance, as well as for overcoming the biallelic limitation of single SNP-based analyses 63,64 . In the present study, the genomic regions identified based on individual location GWAS and haplotype trend regression analysis will be increasingly important in future post-genomic approaches for precise selection of environmentally resilient cultivars and to improve resilience of future cultivars to extreme weather conditions such as heat.

Conclusion
The present study identified genomic regions based on location-wise GWAS conducted for grain yield under heat stress in nine locations across South Asia, followed by HTRs that are robust and stable across locations with a range of heat stress regimes. Validation of these genomic regions in breeding populations followed by introgression of these common genomic regions into elite genetic backgrounds can be used in developing new generations of maize varieties with stable performance under a range of heat stress environments. The trait-associated markers obtained from this study, in the growing quantity of pan-genome sequence data, can help in the precise selection of resilient cultivars for heat stress. Further, the variants identified in this study for grain yield under heat stress can be used as covariates in prediction models in GS approaches for increasing prediction accuracy.