QTL mapping and genome-wide prediction of heat tolerance in multiple connected populations of temperate maize

Climate change will lead to increasing heat stress in the temperate regions of the world. The objectives of this study were the following: (I) to assess the phenotypic and genotypic diversity of traits related to heat tolerance of maize seedlings and dissect their genetic architecture by quantitative trait locus (QTL) mapping, (II) to compare the prediction ability of genome-wide prediction models using various numbers of KASP (Kompetitive Allele Specific PCR genotyping) single nucleotide polymorphisms (SNPs) and RAD (restriction site-associated DNA sequencing) SNPs, and (III) to examine the prediction ability of intra-, inter-, and mixed-pool calibrations. For the heat susceptibility index of five of the nine studied traits, we identified a total of six QTL, each explaining individually between 7 and 9% of the phenotypic variance. The prediction abilities observed for the genome-wide prediction models were high, especially for the within-population calibrations, and thus, the use of such approaches to select for heat tolerance at seedling stage is recommended. Furthermore, we have shown that for the traits examined in our study, populations created from inter-pool crosses are suitable training sets to predict populations derived from intra-pool crosses.

Maize (Zea mays L.), compared to other crop species which grow in temperate Europe, is heat tolerant due to its C4 metabolism and its tropical origin 1 . Nevertheless, temperate maize cultivars can experience substantial damages when encountering heat stress 2 . Especially during flowering and grain filling, heat stress has severe impacts on maize plants 3 . Phenotypic consequences of heat stress at adult stage on phenotypic traits of maize are, among others, a reduction of the time to flowering 4 , an increased anthesis-silking interval 5 , a reduction of photosynthetic tissue due to leaf scorching 4 , and a reduction of grain and whole plant yield 6 .
However, temperate maize is also significantly affected by heat events during seedling stage 7 . This is of particular practical importance for biogas production for which maize is the most important crop 8 in temperate Europe. One of the cultivation practices is that the planting of biogas maize is postponed until the harvest of the winter cereals in early summer. With this cropping system, sensitive maize seedlings are exposed to heat stress 9 .
In the future, heat stress is expected to become an even more critical threat to crop cultivation in temperate regions than it is today 10 as the mean temperature and the severity of heat events will rise due to climate change 11 . Therefore, breeding heat-tolerant cultivars is crucial to sustain crop production in the future 12 .
The tolerance to heat stress in maize was studied on a molecular level with a focus on natural variation by Ottaviano et al. 13 , Frova and Sari-Gorla 14 , and Reimer et al. 9 . These studies focused on isolated plant characteristics such as the cellular membrane stability, pollen germination, and root architecture. Alam et al. 5 estimated variance components for traits involved in heat tolerance in field trials under natural heat stress condition. However, we are not aware of systematic studies characterising genetic variability for heat tolerance at seedling stage on a whole plant level.
The classical approach to improve a trait by breeding is to screen genetic material in one or several environments in which the conditions are such that the phenotypic trait of interest shows heritable variation. The issue with an evaluation of heat tolerance in natural environments is that the timing and the strength of the heat event are typically unpredictable 15 . One way to circumvent this problem is to screen the genetic material of interest in an artificial environment such as greenhouses or growth chambers. The efficiency of such approaches can be increased by combining them with marker-assisted selection approaches. For many years, the markers for such approaches have been identified by quantitative trait loci (QTL) mapping or genome-wide association mapping. Although numerous QTL have been identified for maize (for review see Sehgal et al. 16 ), the impact of marker-assisted selection for improving truly quantitative traits in maize breeding is limited 17 . This is primarily attributed to the small effects of many of the detected QTL. An alternative promising approach for such traits is genome-wide prediction (GWP) because it captures not only the variance of the QTL but also all genetic variance. However, to the best of our knowledge, such an approach has not been tested previously for traits related to heat tolerance.
Several studies have shown that moderate-to-high genomic prediction accuracies can be obtained in bi-parental populations for a trait with high heritability, even by using low marker density and a relatively small training population [18][19][20] . Other studies have illustrated that genotypic characterisation using high-density genotyping platforms might improve the prediction accuracy [21][22][23] . To the best of our knowledge, only few experimental studies till now have compared GWP based on low density genotyping and genotyping by sequencing (GBS) [24][25][26] . Furthermore, the composition and size of the training and validation sets are crucial for GWP. Technow et al. 27 analysed the possibility of combining training sets across heterotic pools. However, no earlier study has examined the suitability of segregating populations derived from inter-pool crosses as training set for the prediction of the two original heterotic pools.
The objectives of this study were the following: (I) to assess the phenotypic and genotypic diversity of traits related to heat tolerance of maize seedlings and dissect their genetic architecture by quantitative trait locus (QTL) mapping, (II) to compare the prediction ability of genome-wide prediction models using various numbers of KASP (Kompetitive Allele Specific PCR genotyping) single nucleotide polymorphisms (SNPs) and RAD (restriction site-associated DNA sequencing) SNPs, and (III) to examine the prediction ability of intra-, inter-, and mixed-pool calibrations.

Methods plant material and phenotypic evaluation.
This study was based on six segregating populations derived from pairwise crosses of four Dent (S067 = D 1 , P040 = D 2 , S058 = D 3 , S070 = D 4 ) and four Flint (L012 = F 1 , L017 = F 2 , L043 = F 3 , L023 = F 4 ) maize inbred lines from the University of Hohenheim 28 . The eight inbred lines were previously characterised in detail for their heat tolerance at seedling stage 7 . The inbreds were crossed pairwise to create two Dent x Dent (DxD), two Flint x Flint (FxF), and two Dent x Flint (DxF) F 1 genotypes ( Supplementary Fig. 1). The F 1 genotypes were further self-pollinated in an ear to row manner, resulting in six segregating populations (P D D 1 2 , P D D 3 4 , P F F 1 2 , P F F 3 4 , P D F 1 1 , P D F 4 4 ) comprising between 75 and 107 F 3:4 progenies with a total of 607 genotypes.
Seeds were sown in soil (50% ED73, 50% Mini Tray (Einheitserde-und Humuswerke, Gebr. Patzer GmbH & Co. KG, Sinntal-Altengronau, Germany)) in single pots (9 cm edge length) as described previously. The experiment was replicated three times. The experimental design was a lattice design comprising 32 incomplete blocks per replication which were distributed on four tables in a walk-in growth chamber (Bronson Incubator Services B.V., Nieuwkuijk, Netherlands). The parental inbred lines were included as checks once on each table.
The plants were grown at 25 °C during a 16 h light period and at 20 °C during a 8 h dark period for a total of three weeks in the growth chamber; the relative humidity was set to 60% during this period. Photosynthetic active radiation emitted by fluorescent tubes was between 270-280 μmol m −2 s −1 in the canopy of the plants. The plants were watered every morning with an automatic irrigation system (Itec DC station Multi Program, I.T. Systems Ltd., Bazra, Israel) to avoid drought stress.
The leaf growth rate was assessed as follows: fourteen days after sowing, the length of the fourth leaf from the shoot base to the leaf tip was measured daily for a period of three days during the stage of linear growth. The slope of a linear trend line of leaf length measurements vs. time represented the leaf growth rate (LR). Twenty days after sowing, leaf greenness (SD) (SPAD-502, Minolta Corporation, Ramsey, NJ, USA) was assessed as the maximum value of four readings on the leaf blade of the latest fully developed leaf. Furthermore, leaf scorching of young leaves (SC) and leaf senescence of old leaves (SN) were recorded on a scale of 1 (weak damage) to 9 (strong damage). The length of the fourth leaf (LL), the plant height (PH) from the shoot base to the point where the youngest leaf detached from the older leaf 's sheath, and the number of leaves (NL) with visible leaf ligule were recorded per plant. 21 days after sowing, shoot dry weight (DW) and the shoot water content (WC) of the fresh material were determined. The above outlined experiment was repeated at a higher heat level, where the temperature was increased, six days after planting, to 38 °C at day and 33 °C at night to induce heat stress for a two-week period. The study was, thus, based on two experiments with different heat levels, which will be referred to hereafter as standard and heat conditions.
Genotyping. KASP SNPs. The parental inbred lines of the segregating populations were genotyped using a 50 K SNP array 29 . Out of 56,110 SNPs, 170 SNP markers were selected to genotype the individuals of the six segregating populations. For each population, between 47 and 77 markers were chosen (60 for P D D 1 2 , 47 for P D D 3 4 , 75 for P F F 1 2 , 64 for P F F 3 4 , 67 for P D F 1 1 , and 77 for P D F 4 4 ) as they were polymorphic between the two parental inbreds of each population and showed no heterozygosity in any of the parental inbreds. SNP marker selection was optimised for equal distribution across the physical map (due to the unavailability of a genetic map at that time) and the overlapping of markers between populations. The selected SNP markers were genotyped using KASP SNP technology by TraitGenetics GmbH (Gatersleben, Germany) on a bulk of 6 to 10 F 3:4 plants per genotype in the respective populations. This data set with 170 SNPs on 607 progenies will be designated hereafter as KASP 607 .
RAD SNPs. The RAD libraries of the segregating populations were prepared for single-end sequencing according to Baird et al. 30 with the following modifications: barcodes were 5 bp long, and were at least two mutational steps apart from each other with regard to the first four bases, followed by the fifth checksum base. A total of 2 μg genomic DNA of the same pool of plants that was used for KASP genotyping was digested for 30 min at 37 °C in a 50 μl reaction with 20 units of KpnI (New England Biolabs). Samples were inactivated by purification with Qiaquick spin columns (Qiagen, Hilden, Germany). Libraries were 96-fold barcoded, each genotype at two barcodes, resulting in 7 libraries. The sequencing of the RAD libraries was performed on a Hiseq2000 with 100 bp single end reads by the Max Planck-Genome-centre Cologne, Germany (https://mpgc.mpipz.mpg.de/home/), following the manufacturer's protocol.
Demultiplexing of raw sequencing data by barcode was performed using the Stacks software pipeline 31 . Parameters were chosen to allow barcode rescue with a distance of two, where reads were discarded according to default settings. In the next steps, version 0.4.2 of Trim Galore! (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) was used for adapter and quality trimming. In addition to the standard parameters, reads were filtered for a length of at least 75 bp, and the threshold for trimming low quality ends from reads was increased to 30 for higher stringency.
The trimmed reads were mapped genotype-wise to the repeat masked and concatenated reference sequence, using BWA-MEM. In accordance to previous studies 32 , apart from an increased sensitivity parameter of -r 1.0, standard parameters were used.
For genotype calling, GATK's HaplotypeCaller 33 was applied to each genotype's bam file independently, where the allowed maximum number of alternative alleles was set to three. The minimum base and mapping quality for calling were set to 30 and 40, respectively. Finally, soft-clipped bases were excluded from the analysis. All other parameters were set to their default values. Finally, the resulting files were combined into one and each position was recalculated and re-genotyped, considering information of all samples, by GATK's genotypeGVCFs. To ensure a high quality of the genotype calls, many filtering steps (see SM1) were applied to the resulting genotype call file.
After genotype calling, 53,579 SNP loci remained on 489 genotypes comprising 482 progenies and 7 parental inbred lines ( Supplementary Fig. 1). Missing genotype calls were imputed per population using Beagle 34 . Three cutoff values were chosen to filter per genotype the genotype calls based on the genotype probabilities (GP). For each of the three GP cutoff values, one of the three possible genotypes calls (minor allele homozygous, heterozygous, and major allele homozygous) was assigned to a genotype only when its predicted probability was greater than X for a given imputed genotype call, and all others were set to "NA". The three cutoff values were X = 1, 0.98 35 , and the allele with maximum genotype probability (MAX). The such created missing genotype calls were mean imputed. The data sets associated with the three cutoff values will be designated hereafter as RAD 482-GP:1 , RAD 482-GP:0.98 , and RAD 482-GP:MAX , respectively. As there were less genotypes in the RAD than in the KASP data set, we defined an additional KASP data set with the same 482 progenies as for the RAD data set. This will be referred to hereafter as KASP 482 .
Diversity. Gene diversity D 36 and population differentiation G ST 37 were calculated. The average modified Roger's distances (MRD) between and within populations were calculated according to Wright et al. 38 . Associations among genotypes were revealed with a principal component analysis (PCA) based on MRD estimates. phenotypic data analysis. Adjusted entry mean calculation. To assess the significance of the heat level effect, the following model was used: where Y icrb is the phenotypic observation of the i th genotype in the b th block within the r th replication nested within the c th heat level. μ is the general mean, G i is the effect of the i th genotype, C c is the effect of the c th heat level, . C G ( ) ci is the interaction between the i th genotype and the c th heat level, R cr is the effect of the r th replication nested within the c th heat level, B crb is the effect of the b th block in the r th replication nested within the c th heat level and e icrb the residual. C c was regarded as fixed and all other effects were set as random. Traits with a significant C c effect were considered heat-dependent traits. The adjusted entry means (AEM) of standard and heat conditions across genotypes were estimated and will be refered to as AEM s and AEM H , respectively.
To calculate AEM for each assessed trait of each genotype in each condition (AEM iS and AEM iH for standard and heat condition, respectively), the phenotypic observations of each heat level were separately analysed using the following model: where Y irb is the phenotypic observation of the i th genotype in the r th replication and the b th block, R r is the effect of the r th replication, B rb is the effect of the b th block nested within the r th replication, and e irb the residual. The genotype effect G i was of primary interest in this analysis and was considered a fixed effect. R r was considered fixed and B rb random.
Heat susceptibility index. A heat susceptibility index (HSI) for each trait and each genotype was calculated according to Mason et al. 39 : where HSI i is the HSI for genotype i. The HSI of the individual traits will be designated as follows: HSI LL , HSI PH , HSI NL , HSI SC , HSI SN , HSI SD , HSI DW , HSI WC , and HSI LR . Pairwise Pearson correlation coefficients were calculated between the HSI of all assessed traits across all genotypes. Further, HSI of this study were correlated with HSI assessed during adult stage under field conditions 4 .
Variance components and heritability. Genotypic (σ g 2 ) and error (σ e 2 ) variance components for each heat level were calculated using model (2) with a random G i effect. For each trait, the broad sense heritability (h 2,40 ) of the observations at each heat level was calculated separately for each (h pop 2 ) as well as across (h a 2 ) the six populations. Heritability can not be assessed for HSI directly as it is not measured but calculated on a mean basis with no replication 39 . To calculate an approximate heritability for the HSI of each trait, we applied the following procedure, which is an extension of the approach of Ouk et al. 41 . For each of the two heat levels, we estimated, based on model (2), the fixed effects for R r and B rb . In a second step, these effects were subtracted from all observations from the corresponding replicate and block to correct for differences in replicate and block effects. The adjusted data thereby comprised three replications per genotype for each of the two conditions. Subsequently, we separately created random pairs for each genotype between one replication from the heat condition and one from the standard condition. Across all genotypes, 3 2 607 combinations of replications from the heat and standard conditions are possible. In the next step, for one possible combination of replications, the HSI was calculated as described previously, resulting in three HSI replicates for each genotype. These data were then analysed using the following model: where Y ir is the phenotypic observation of the i th genotype in the r th replication, μ is the general mean, G i is the effect of the i th genotype, and e ir the residual. Based on genotypic (σ g 2 ) and error (σ e 2 ) variance components across all populations and for each population, the heritability was calculated. This procedure was repeated 100 times, and the median of the heritability was designed as heritability of HSI (h HSI 2 ). The genetic variance among and within populations was estimated for each HSI by partitioning the genotype effect of model (4) into the effect of the population and that of the genotype nested within the population. This procedure was repeated 100 times with random assignments between the different replications as explained for the h HSI 2 calculation, and the median of the genetic variance among and within populations was calculated. The genetic differentiation observed for the HSI was then calculated [42][43][44] and will be designed hereafter as Q ST .
QtL analyses with KASp Snps. SNP markers with a significant (P < 0.001) observed deviation from the expected allele frequency were excluded from the analysis (cf. Benke et al. 45 ). To improve the mapping of markers, information of five additional segregating populations which had been genotyped with the same set of SNP markers by Horn et al. 46 were considered during map creation. A consensus genetic linkage map was calculated chromosome-wise using the software CarthaGène 47 .
QTL associated with heat tolerance were detected based on the consensus genetic linkage map using an iterative composite interval mapping approach 48 , implemented in the software MCQTL 49,50 . QTL analyses were performed for the HSI of the individual traits across all six populations. We took into account connections between populations using an additivve kinship matrix. As the studied bi-parental F 3:4 populations have an expected heterozygosity of 25%, dominance effects between parental alleles of each bi-parental population were considered in the QTL analysis. Genotypic probabilities were computed every 5 centiMorgan (cM), taking into account information from neighboring markers. F thresholds to detect QTL for each trait were determined by 1 000 permutations, to adhere to a global type I error of 5% across populations and the entire genome 51 . F thresholds used to select co-factors were fixed at 90% of the F threshold values for QTL detection.
SNP markers associated with the respective HSI were selected as co-factors by forward regression, where the minimal distance between two co-factors was 10 cM. At the end of the detection process, confidence intervals were estimated on the basis of a 1.5 LOD unit fall 52 .
The dominance effect of each QTL was tested for its significance (P < 0.05) in each population using a two-sided t-test. The difference between the additive effects of pairs of parental alleles on the respective QTL was tested a posteriori using a Tukey test.
Furthermore, we overlapped the QTL detected in this study with those identified in a companion study at adult stage 4 . To identify candidate genes for heat tolerance in terms of the assessed traits, we mined genes, identified by Frey et al. 7 as heat responsive or heat tolerance genes. We determined the position of these genes on the genetic map by linear regression with information of the nearest two SNP markers. Heat tolerance genes of Frey et al. 7 mapping to QTL confidence intervals were designated in the following as heat tolerance candidate genes. GWp. Genetic model. GWP was performed by genomic best linear unbiased prediction (GBLUP 53 ): where y is the vector of the HSI of the corresponding trait, X is a design matrix assigning fixed effects to the genotypes, β is a vector of fixed effects, Z is the design matrix that assigned the random effects to the genotypes, and u the vector of the random effects that were assumed to be normally distributed with: where K denotes the realised kinship matrix calculated on the basis of the molecular marker data 54 with σ u 2 being the variance pertaining to the GBLUP model. Residuals e were assumed to be independent and normally distributed with σ ∼ e N I (0, ) 2 , where I is the identity matrix and σ 2 the residual variance. www.nature.com/scientificreports www.nature.com/scientificreports/ Two different genetic models were examined for each type of molecular marker that differed in the K matrix used: M A considering only additive effects with K A (the additive kinship matrix) and M AD considering additive and dominance effects with K A and K D (the dominance kinship matrix) 55 . GBLUP method was used as implemented in the R package Sommer 56 .
Model training and performance assessment. The effects were estimated in a training set (TS) and used in a second step to predict the breeding values of the genotypes of a validation set (VS). As measure of model performance, prediction ability was calculated as the Pearson correlation coefficient between observed and predicted phenotypes of the VS, r y y ( , ) 57 . In our study, different TS were used to establish the prediction model: in the TS pop scenario, the model was trained using one or two populations; for the TS a scenario all populations were used together as TS. For later scenario, prediction abilities were calculated both across all populations (r a ) as well as for each population (r pop ).
Resampling schemes. Different types of resampling schemes were used to evaluate the model performance as a function of the examined TS*VS combination: • For TS a , we performed 20 replications of five-fold cross validation (CV) to assess the model performance across all populations. Accordingly, the entire data set was subdivided into five mutually independent subsets where four formed the training set (TS) and one the validation set (VS). To account for the structure of our data set with six different segregating populations, the proportion of genotypes from the different segregating populations in the individual subsets was kept identical to that observed in the entire data set. The median of the Pearson correlation coefficient between observed and predicted phenotypes of the validation set r y y ( , ) VS VS was used to assess model performance. Prediction abilities were calculated both across all populations (r a,cv ) as well as for each population (r pop,cv ).
• In the context of prediction within populations with TS pop , the influence of the size of the training set on the prediction ability was analysed by simulating, for each segregating population, a TS size N from 10 to 100 in steps of 10 using a bootstrapping approach. The median of the prediction ability across 500 replications was calculated for the HSI of each trait, genetic model, and type of SNP marker within each population and is designated in the following as r pop BSP , w . • For the between population prediction using TS pop , the influence of the composition of the TS on the prediction ability was examined using three primary scenarios of TS compositions TS1, TS2, and TS3. In case of using the FxF and DxD populations as VS, TS1 comprised genotypes from populations for which the parental genotypes were from the same heterotic pool as those of the VS. TS2 comprised genotypes from one (TS2 s ) or two (TS2 c ) populations in which the parental genotypes belonged to the opposite heterotic pool as those of the VS. TS3 comprised genotypes from the DxF populations with three different scenarios: TS3 sr used the related mixed population, TS3 su used the unrelated mixed population, and TS3 c combined both mixed populations in the TS (Fig. 1, bottom).
In case of using the DxF populations as VS, the three scenarios were selected accordingly: TS1 comprised genotypes of the other DxF segregating population; TS2 comprised genotypes of the two combined FxF and DxD populations which did not have a parental inbred in common with the population in the VS. TS3 comprised genotypes of the two combined FxF and DxD populations which had a parental inbred in common with the population in the VS. To avoid differences in the prediction ability due to sample size effects, the TS were randomly reduced to a size of 50 such that the structure of the original TS was maintained. To obtain stable estimates for the prediction ability, the median of 500 independent bootstrapping runs of the TS construction was calculated for the HSI of each trait, genetic model, type of molecular marker, and TS*VS combination. The prediction abilities based on this bootstrapping strategy between populations will be referred to hereafter as r pop BSP , b . Number of molecular markers. In order to examine the influence of the number of molecular markers on the prediction ability, X random RAD SNPs were sampled from the RAD 482-GP:0.98 , where X ranged from 170 to 43 520 in steps of * X 2 . Based on the selected SNPs, genomic predictions were obtained for the HSI of each traits using the M A genetic model, and the median of the prediction abilities across 100 replications was calculated.
Comparison between observed and expected within-population prediction ability. We used the formula suggested by Daetwyler et al. 58 to estimate the expected within-population prediction ability. Following Meuwissen et al. 59 , M e was estimated as = M N L 2 e e where L is the genome size in Morgans. L = 22.36 was adopted from a previous linkage-mapping study, using an F 2 population which was characterised by genotyping by sequencing 60 similarly to our study. The effective population size (N e ) was calculated using the harmonic mean approximation for two generations (Hartl and Clark, p. 291 61 ), resulting in Ne = 2.92, 2.93, 2.94, 2.93, 2.92, and 2.92 for populations P D D 1 2 , P D D 3 4 , P F F 1 2 , P F F 3 4 , P D F 1 1 , and P D F 4 4 , respectively.
Results phenotypic diversity. The condition effect (standard vs. heat) was significant ( < . P 0 01) across all populations and for all traits. Across the two examined conditions, h a 2 was medium to high (0.50-0.83) for all assessed traits (Table 1). For all traits except NL and SD, h a 2 was higher at heat compared to standard condition. h HSI 2 ranged from 0.41 (NL) to 0.75 (SC) and was with the exception of SC, WC, and LR lower than those of the traits in the two conditions. The heritability values observed on a population level (h pop 2 ) were very low for a few trait*population*condition levels (Supplementary Table 1). This effect was the strongest for SC for which, at standard (2019) 9:14418 | https://doi.org/10.1038/s41598-019-50853-2 www.nature.com/scientificreports www.nature.com/scientificreports/ condition, a h pop 2 of 0 was observed for four of the six populations and smaller than 0.25 for the remaining two populations.
For LL, SC, DW, and LR, the mean and variation of the HSI were significantly ( < . P 0 05) higher in the two DxD populations (P D D 1 2 and P D D 3 4 ) than in the FxF and DxF populations. Especially, the population P D D 3 4 differed significantly from the other populations in the mean HSI for all traits except SD (Supplementary Fig. 2). Q ST calculated for the HSI varied from 0.07 (SN) to 0.37 (SC; Table 1).
The first two principal components (PC) of the PCA of the HSI explained 45% and 14% of the total variance, respectively ( Supplementary Fig. 3). PC1 was significantly ( < . P 0 01) correlated with each HSI where the correlation coefficient was low (<|0.25|) for HSI SD and between 0.36 and 0.85 for the other HSI ( Supplementary Fig. 4). PC2 was significantly correlated with the HSI of each trait except HSI NL and HSI WC where the correlation coefficient was low (<|0.25|) for all HSI except for HSI SC and HSI SD . The cluster of the two DxD populations overlapped only weakly with the cluster of the two FxF populations (Supplementary Fig. 3).
Genetic diversity. The consensus genetic linkage map for KASP SNPs had a total length of 1 823.5 cM with an average distance of 11.3 cM and a maximum distance of 83.2 cM between two adjacent markers.   Table 1. Mean and range of the adjusted entry mean (AEM), broad sense heritability (h a 2 ) of the studied traits for each condition, as well as for the heat susceptibility index (h HSI 2 ), significance of the environmental condition (standard vs heat) effect for each trait, and degree of genetic differentiation among the six populations (Q ST ). *, **, *** Significant at the 0.05, 0.01, and 0.001 probability level, respectively. (2019) 9:14418 | https://doi.org/10.1038/s41598-019-50853-2 www.nature.com/scientificreports www.nature.com/scientificreports/ In the PCA based on MRD estimates calculated from KASP 607 , the first and second PC explained 21.87% and 11.87% of the molecular variance, respectively ( Supplementary Fig. 5). For RAD 482-GP:0.98 , PC1 and PC2 explained 25.94 and 13.50% of the molecular variance. In the PCA based on KASP 607 , five distinct clusters were observed where one cluster was always constituted by the individuals of one segregating population except the individuals of the two FxF populations which were located in one overlapping cluster. For the RAD SNPs, the same trend was observed, but the two FxF populations (P F F 1 2 and P F F 3 4 ) were assigned to two distinct clusters. The lowest gene diversity was observed for P D D 3 4 irrespective of the considered marker type (Supplementary Table 2). The ranking between the populations for D and Gst was different when calculated based on KASP or RAD SNPs. The correlation between the MRD distance matrices calculated with KASP and RAD SNPs was with 0.84 significantly ( = . × − P 1 67 10 4 ) different from 0. The correlation between the K A matrices calculated with KASP and RAD SNPs was with 0.56 considerably lower but also significantly ( = × − P 6 10 4 ) different from 0.

QtL mapping and overlapping region with QtL detected at adult stage.
We identified a total of six QTL for the HSI of five of the nine traits (Table 2), each explaining between 7% and 9% of the phenotypic variance (R 2 ). The detected QTL were not randomly distributed across the genome, but a total of three QTL hot spots were observed. The QTL detected by Frey et al. 4 for adult traits in field trials colocated with these hotspots (Fig. 2). Five of the heat tolerance genes identified by Frey et al. 7 were located within the six QTL confidence intervals detected in our study (Supplementary Table 3).
GWp. The square root of the proportion of phenotypic variance explained by the QTL for the HSI of all traits was lower than the prediction abilities of the GWP models, irrespective of the type of molecular marker and genetic model used (Table 3). Furthermore, the correlation between R QTL and r a across the nine examined traits for KASP 607 was approximately 0.33. For the HSI of all traits, prediction abilities r a were equal or lower for KASP 482 than for KASP 607 (Table 3). r a calculated based on RAD 482 were consistently higher than those for KASP 482 , independent of the GP cutoff value which was used to filter the genotype matrices. Despite this mean difference in the prediction abilities r a of the different data sets and genetic models, the trend across the nine traits was the same. The prediction abilities r a,cv varied for KASP 607 between 0.31 (SN) and 0.70 (SC), whereas they ranged from 0.34 (SN) to 0.72 (SC) for RAD 482-GP:0.98 (Fig. 3 and Supplementary Fig. 6). We observed a higher r a,cv for RAD 482-GP:0.98 than for the two other GP cutoff values (data not shown) and, therefore, used the former data set for all further analyses. The correlation between Q ST and r a calculated for KASP 482 was with 0.82 significantly different from 0 ( = .
P 0 0068). In contrast, the correlation between Q ST and r a calculated based on RAD 482-GP:0.98 was 0.19 and not significant ( = . P 0 62). Based on KASP 482 , the prediction ability across populations r a was, apart from some exceptions, higher than the prediction ability calculated for each population r pop (Supplementary Fig. 6). The opposite trend was observed for RAD 482-GP:0.98 (Fig. 3). For both molecular marker types, the prediction ability r a,cv of the M A and M AD genetic model did not differ significantly. However, the within-population prediction ability r pop BSP , w of the individual population*trait combinations, was either similar or higher for the M A compared to the M AD model ( Supplementary Figs 7-12). Therefore, we focused in this study on the former model.
An increase in the prediction ability with an increasing number of molecular markers was observed (Fig. 4). The number of markers for which the prediction ability reached a plateau differed between the examined traits.
An increase of the within-prediction ability with an increasing size of the TS was observed for most population*trait*genetic model*molecular marker type combinations ( Supplementary Figs 7-12). With a few exceptions, which showed a linear increase, the detected increase followed a logarithmic trend line ( Supplementary  Figs 13-15). We observed within-population prediction abilities for three traits (WC, PH and SD) that were similar to or higher than the expected abilities (Fig. 5). This was not true for the other traits. Especially, NL was predicted worse than expected. Furthermore, two populations P D D 1 2 and P D F 4 4 were predicted better than expected, whereas the other four were predicted worse than expected. The within-population prediction abilities r pop BSP , w Trait QTL Chr Pos LOD Interval R 2 Additive effect of parent Dominance effect of population      www.nature.com/scientificreports www.nature.com/scientificreports/ were for 80% of the population*trait combinations higher than those found based on a model built across the six populations (r pop cv , ) when considering the same TS size of 50 (Fig. 6A). However, this was only true for 25% of the combinations when considering the original TS size of 385 genotypes (Fig. 6B). Across all traits and populations, the between-population predictions in the TS2 c scenario (TS built from two combined populations) resulted in higher prediction abilities than those of TS2 s (TS built from one population; Fig. 1, left). Furthermore, we observed that TS3 and especially the TS3 sr scenario resulted most often in the highest prediction abilities for FxF or DxD VS (Fig. 1, left). For DxF populations as VS, TS1 resulted in the highest prediction ability (Fig. 1, right).

Discussion
For most traits, we observed a higher broad sense heritability (h a 2 ) at heat compared to standard condition. This observation was due to an increased genotypic variance at heat condition while the error variance was not notably increased (data not shown). Our findings are in contrast with field-based studies in which the heritability is mostly lower at heat compared to standard condition due to an increased error variance (e.g. Cairns et al. 62 ). This difference might be explained by field environmental factors which are more important under heat conditions, www.nature.com/scientificreports www.nature.com/scientificreports/ e.g. soil heterogeneity. However, these factors have low relevance under controlled conditions, e.g. in growth chambers used in our study.
The h a 2 values for the assessed traits were medium to high for both environmental conditions (Table 1). h a 2 was comparable with heritability estimates observed by Frey et al. 4 and Naveed et al. 63 under heat stress, and by Cerrudo et al. 17 under drought stress. This observation suggests that an adequate estimation of the AEM for each genotype was achieved, which is the prerequisite for a high power to detect QTL, for genome-wide prediction, as well as to interpret differences between heterotic pools.
We observed significant ( < . P 0 05) differences in the HSI between populations derived from DxD crosses compared to populations derived from FxF crosses (Supplementary Fig. 2). Except for SD, the HSI of FxF populations were lower than that of DxD populations. This suggests that FxF populations are more tolerant to heat stress than DxD populations at seedling stage. The findings of Hallauer et al. 64 suggested that the Flint pool contributed an improved chilling tolerance to cultivars bred for temperate Europe. Moreover, the results from Strigens et al. 65 suggested an improved morphological and physiological adaptation of the Flint pool to chilling temperatures compared to the Dent pool. These observations along with ours suggest that genotypes of the Flint pool have a higher tolerance to temperature stress during seedling stage than genotypes of the Dent pool. This  www.nature.com/scientificreports www.nature.com/scientificreports/ conclusion is in accordance with the results of Reimer et al. 9 who observed a higher tolerance to temperature extremes during seedling stage of genotypes of the Flint pool compared to genotypes of the Dent pool.
The correlation between heat tolerance at seedling and adult stages, which was examined in a companion study 4 , was low ( Supplementary Fig. 4). This finding was expected as plant performance in young stages might have limited implications on plant performance after transition from the vegetative to the generative phase. Similar trends were observed for salinity tolerance in wheatgrass 66 or tolerance to defoliation intensity in maize 67 . Additionally, Gibert et al. 68 observed that for some traits, the correlation between trait and growth changes with plant size and physiological stage. In summary, these results indicate that plant growth strategies should not be considered as constant over the entire life but is stage-dependent.
The negative correlation between the heat tolerance of maize at seedling and adult stages was also manifested on the level of heterotic pools. Genotypes derived from DxD crosses had a lower heat tolerance at seedling stage but a high heat tolerance at adult stage compared to genotypes derived from FxF crosses 4 . Therefore, both heterotic pools should be considered when increasing heat tolerance across developmental stages.
Segregating populations derived from DxF crosses are only rarely created in commercial maize breeding programs. Nevertheless, because of their different behaviour regarding heat tolerance, such populations were included in our study to test their suitability for QTL mapping and GWP. The confidence intervals of the QTL for heat tolerance at seedling stage detected in our study (Table 2) overlapped not only with QTL confidence intervals for the same trait at adult stage 4 ( Fig. 2) but also with QTL regions associated with other abiotic stresses. The confidence intervals of QTL for HSI LL on chromosome 2 overlapped with a QTL associated with cold tolerance, which was identified in a metaQTL-analysis 69 . Furthermore, a QTL identified for the shoot and root dry weight and the leaf area under drought stress conditions 70 as well as a QTL for the leaf chlorophyll content at drought stress identified by Messmer et al. 71 mapped to the same genomic region on top of chromosome 2. These findings might suggest that different abiotic stresses might have similar genetic regulation mechanisms 7 . Furthermore, four out of five heat tolerance genes detected by Frey et al. 7 , in an RNAseq experiment, were located within the QTL confidence intervals on chromosome 2 (Supplementary Table 3, Frey et al. 72 ). This over-representation of heat tolerance genes in a particular region and their collocation with several QTL for heat tolerance illustrates the importance of genetic mechanisms for heat tolerance available on this chromosome.
Phenotypic evaluation of genetic material for heat tolerance is difficult due to irregular and uncontrolled appearance of heat stress conditions in field experiments and is technically demanding when performed in plant growth chambers. Marker-assisted selection (MAS) using QTL for traits related to heat tolerance could supplement phenotypic selection. However, each QTL detected in our study explained with <10% a relatively small part of the phenotypic variance of the respective trait. It suggests that the heat tolerance at seedling stage is inherited as a true quantitative trait. This in turn makes the use of MAS for heat tolerance inefficient. Therefore, in a subsequent step, we evaluated the utility of GWP to improve heat tolerance of temperate maize.
With the GWP approach, we were able to explain an approximately threefold higher proportion of the genetic variance of the HSI of each trait compared to the detected QTL (Table 3). This is in accordance with earlier reports on other traits in maize [73][74][75] and in other crops 24 , and illustrates the superiority of GWP over MAS for truly quantitative traits.
To quantify the potential advantage of GWP over phenotypic selection, the former can be seen as an indirect selection compared to direct phenotypic selection. The relative merit of indirect vs. direct selection can be calculated as the indirect selection response divided by the direct selection response 76 . This ratio can be rearranged as the ratio r h / a cv HSI , 2 . GWP is superior to phenotypic selection if this ratio is >1. Assuming the same selection intensity for GWP and phenotypic selection and considering the prediction abilities (r a cv , with RAD 482-GP:0.98 and M A ) of our study, the maximum relative cycle can be calculated for which identical selection gains are realised with indirect and direct selection 76,77 . The maximum relative cycle lengths for GWP were 92% (HSI LL ), 88% (HSI PH ), 100% (HSI NL ), 96% (HSI SC ), 68% (HSI SN ), 82% (HSI SD ), 76% (HSI DW ), 100% (HSI WC ), and 114% (HSI LR ) of those of phenotypic selection. Therefore, GWP for heat stress tolerance is already equivalent or superior to phenotypic selection for three traits even without considering potential advantages due to higher selection intensities, which is very promising in light of earlier studies 27,78 . In the following, various factors influencing the prediction ability are discussed.
We found a good agreement between observed and expected prediction abilities across all studied traits (Fig. 5). However, some traits were systematically over-(NL, DW) or under-estimated (WC). One of the potential reasons for a difference between observed and expected predicted ability could be the low heritability of the traits. This was especially true for the HSI of NL which had the lowest heritability (Table 1), and which prediction ability deviated substantially from the expected values (Fig. 5). Another reason for the systematic over-and underestimations could be deviations from the highly polygenic genetic architecture of the traits which is assumed in the applied deterministic formula 58 . For such traits, one would expect that the detected QTL explained the highest proportion of the phenotypic variance. However, it is striking that the traits for which the most considerable difference between observed and expected prediction ability was found were the traits for which no QTL were detected in our study. This warrants further research.
The superiority of GWP over phenotypic selection was evaluated as described previously based on an additive genetic model. However, the F 3:4 populations used in our study had an expected heterozygosity of 25%. In this case, it can be beneficial to investigate genetic models covering not only additive but also dominance effects when performing GWP. No obvious trend was observed regarding the superiority of one of the genetic models for the across-population prediction strategies r a , r pop , r a cv , , and r pop cv , (data not shown). However, the r pop BSP , w predictions abilities based on the M A model were at least equal and, for some population*trait combinations, higher than that observed for the M AD model ( Supplementary Figs 7-12). This suggests that dominance effects were either not relevant for the studied traits or were not captured by our GBLUP model. This result was in accordance with observations made in the animal breeding field that generally, the increase of the accuracy of additive breeding values by including dominance effects was scarce 79 . Therefore, we considered only the M A genetic model for all further analyses.
Other factors that potentially influence the prediction ability are the number and type of molecular markers that are used to to characterise the genetic material. From the three GP cutoff values examined for the RAD SNPs, RAD 482-GP:0.98 resulted in the highest r a cv , values for most of the traits (data not shown). Our finding suggests that the mean imputation of genotypes calls with low GP is less error-prone than the genotype calls obtained by Beagle. Therefore, we chose the GP cutoff value of 0.98 for all further analyses with the RAD SNPs.
The prediction abilities r a and r pop calculated for RAD 482 were higher than those obtained for KASP 482 , independent of the considered GP cutoff value (Table 3, Fig. 3, and Supplementary Fig. 6). Our observation is in accordance with the results of Elbasyoni et al. 25 in wheat and could be due to the considerably higher number of markers that were available in the RAD data set compared to the KASP data set. This was confirmed by the increasing prediction abilities observed for increasing numbers of RAD SNPs (Fig. 4). A high number of markers increases the precision of the K estimates, increases LD between markers and QTL, as well as ensures a better genome representation.
A second observation leading to the same conclusion was that Q ST was significantly correlated with r a calculated for KASP 482 but not with r a calculated for RAD 482-GP:0.98 . This finding suggests that the prediction abilities obtained with the KASP SNPs are predominantly due to the modeling of genetic relationships and are therefore higher when the ratio of genetic variance among-vs. within-populations increases. In contrast, the prediction abilities obtained with RAD SNPs are to a higher extent due to LD between marker and QTL alleles and to a lower extent due to the modeling of genetic relationship and thus do not correlate with Q ST .
Finally, the prediction abilities r a calculated for KASP 482 were with 0.72 also significantly (P = 0.028) correlated with h HSI 2 , which is in accordance with the results of Poland et al. 80 , Hayes et al. 81 , and De Moraes et al. 26 . However, the prediction ability r a calculated for RAD 482-GP:0.98 were not significantly (P = 0.14) correlated with h HSI 2 . This agrees theoretical considerations 82 and empirical studies 83 and can be explained by an increased proportion of the prediction abilities caused by LD between markers and QTL when a high number of molecular markers was used. Therefore, all discussions hereafter are based on RAD 482-GP:0.98 .
Further factors that have a high impact on the prediction abilities are the size and composition of the training set and, thus, were examined in our study. For most trait*population combinations, we observed an increase of the prediction ability r pop BSP , w with an increasing size of the training set ( Supplementary Figs 7-13). This is in accordance with the results of Van Raden et al. 84 and Technow et al. 27 . However, in contrast to these authors, we did not observe a plateau with an increasing size of the TS. One reason for this could be that our training set size is small and the plateau was not yet reached. Nervertheless, the TS size of this study corresponds to that typically used in commercial maize breeding programs.
We observed significant ( < . P 0 05) differences in the heat tolerance between the two heterotic pools. Therefore, we were interested in examining the potential of prediction between populations (inter-population calibration) and heterotic pools (inter-pool calibration) in comparison with the previously discussed within-population calibration. The latter resulted across all population*trait combinations in higher prediction abilities r pop BSP , w than the inter-population calibration, independent of the TS used for the r pop BSP , b . This is in accordance with the results of Technow et al. 27 . The most likely reason for this is that LD patterns are not consistent between heterotic pools 85,86 . To identify markers that are in LD with QTL across different heterotic pools, an even higher number of markers might be required 86, 87 . Nevertheless, as situations in which within-population prediction rarely appear in commercial breeding programs, we evaluated various scenarios for r pop BSP www.nature.com/scientificreports www.nature.com/scientificreports/ We observed a considerably lower prediction ability for intra-pool calibration (r pop BSP , b based on TS1; Fig. 1, left) than for the intra-population calibration (r pop BSP , w ). This was more pronounced than expected according to Habier et al. 54 . Our observation can be due to the fact that the parental inbreds used to develop segregating populations in our study were selected such that they show a high variation for heat tolerance related traits not only across all eight inbreds but also across the inbreds of one heterotic pool. Therefore, a high variation between the populations was observed even when the parental inbreds were from the same heterotic pool.
Compared to TS1, we observed a minor increase of the prediction ability when using an inter-pool calibration (TS2; Fig. 1, left). This was especially true when the TS was composed of two populations (TS2 c ). Moreover, this finding might be explained by the fact that the four Flint and Dent parental inbred lines used to develop the segregating populations were chosen to be as diverse as possible.
Compared to the results of TS2 ( s and c ), the prediction abilities for the mixed-pool calibration (TS3) were even higher. A part of this increase was due to the fact that the DxF populations shared parental inbreds with the DxD and FxF populations. Our observation that the prediction ability for a scenario with shared parental inbred (TS3 sr ) were higher than for a scenario with no shared parental inbreds between TS and VS (TS3 su ; Fig. 1, left) supported this explanation. In addition to the fact that TS and VS have one common parental inbred, the higher prediction ability for TS3 compared to TS2 ( s and c ) might be due to the higher segregation variance of the mixed populations (Supplementary Table 2 and Supplementary Fig. 5).
This aspect was studied more in detail by examining the prediction ability for mixed populations (DxF) using different types of TS (Fig. 1, right). The TS with combined FxF and DxD populations which were partially derived from common parents with the VS (TS3) performed better than combined FxF and DxD populations that had no parental inbred in common with the VS (TS2). This further emphasises parental inbreds shared between the populations of the TS and VS is an important component for the success of genomic prediction. However, we observed the highest prediction ability for mixed populations when using other mixed populations as TS even if when had no common parents (TS1). One hypothesis to explain this result is the high diversity of such mixed populations (Supplementary Table 2). Our findings suggest that commercial maize breeding companies who create DxF populations to e.g. increase the per-se performance of the Flint pool with Dent material or to introgress earliness in Dent inbreds with Flint material can use such populations as TS to predict the original heterotic pools.
Generally, the critical aspect when selecting the TS in routine plant breeding programs is the limited size of the segregating populations which hampers either the building of a TS of suitable size or reduces relatedness between TS and VS if several segregating populations are used to build the TS. Therefore, we have evaluated the potential of across-pools and -populations calibrations. r pop BSP , w was for 80% of the population*trait combinations better than a model built on the six populations (r pop cv , ) if considering the same TS size of 50. However, this was only the case for 25% of the combinations if considering the original TS size of 385 genotypes (Fig. 6). This implies that if genomic prediction is used to select among the genotypes of one single population and if the data for the TS must be generated de novo, the use of within-population prediction is the most promising approach. Nevertheless, in many cases, a commercial maize breeding company has multiple connected segregating populations for which predictions should be obtained. For such a scenario, our results clearly indicate that a combination of populations in one TS, also across different heterotic pools, increases the prediction ability compared to the use of within-population calibration (Fig. 6).

conclusion
In this study, we provided a basis for future research on genome regions and candidate genes involved in heat stress tolerance of maize seedlings. Antagonistic pleiotropy between heat tolerance at seedling and adult stages was observed in genomic hot-spot regions, especially on chromosome 2. Although six QTL were detected in our study, these explained a very small proportion of the phenotypic variance. Therefore, marker-assisted selection is not promising for the traits evaluated in our study. On the contrary, the prediction abilities observed for GWP encourages the use of such approaches for heat tolerance at seedling stage, especially if genotypic characterisation was carried out with a high number of markers, e.g. from RAD sequencing. Furthermore, our results suggest that the combination of populations from different heterotic pools in one training set is a promising approach to increase the prediction ability for heat tolerance traits, especially if the population in the TS share parental genotypes with those of the VS. Finally, we demonstrated that for the examined traits, segregation populations derived from inter-pool crosses are very suitable to predict populations derived from intra-pool crosses.

Data Availability
The sequencing datasets generated and analysed in the current study are available in the NCBI Sequence Read Archive (SRA) repository, https://www.ncbi.nlm.nih.gov/sra/?term=PRJNA564701. All phenotyping and genotyping data generated or analysed during this study are included as supplementary information files.