Discovery of beneficial haplotypes for complex traits in maize landraces

Mayer, Manfred; Hölker, Armin C.; González-Segovia, Eric; Bauer, Eva; Presterl, Thomas; Ouzunova, Milena; Melchinger, Albrecht E.; Schön, Chris-Carolin

doi:10.1038/s41467-020-18683-3

Download PDF

Article
Open access
Published: 02 October 2020

Discovery of beneficial haplotypes for complex traits in maize landraces

Nature Communications volume 11, Article number: 4954 (2020) Cite this article

7179 Accesses
34 Citations
77 Altmetric
Metrics details

Subjects

Abstract

Genetic variation is of crucial importance for crop improvement. Landraces are valuable sources of diversity, but for quantitative traits efficient strategies for their targeted utilization are lacking. Here, we map haplotype-trait associations at high resolution in ~1000 doubled-haploid lines derived from three maize landraces to make their native diversity for early development traits accessible for elite germplasm improvement. A comparative genomic analysis of the discovered haplotypes in the landrace-derived lines and a panel of 65 breeding lines, both genotyped with 600k SNPs, points to untapped beneficial variation for target traits in the landraces. The superior phenotypic performance of lines carrying favorable landrace haplotypes as compared to breeding lines with alternative haplotypes confirms these findings. Stability of haplotype effects across populations and environments as well as their limited effects on undesired traits indicate that our strategy has high potential for harnessing beneficial haplotype variation for quantitative traits from genetic resources.

Direct introgression of untapped diversity into elite wheat lines

Article 07 October 2021

Genomic variation in weedy and cultivated broomcorn millet accessions uncovers the genetic architecture of agronomic traits

Article 24 April 2024

Genome assembly of a tropical maize inbred line provides insights into structural variation and crop improvement

Article Open access 31 May 2019

Introduction

Harnessing the allelic diversity of genetic resources is considered essential for overcoming the challenges of climate change and for meeting future demands on crop production^1,2. For most traits of agronomic importance, modern breeding material captures only a fraction of the available diversity within crop species¹. In the case of maize (Zea mays L.), today’s elite germplasm went through several bottlenecks, first by geographical dispersion from its center of origin^3,4, second through the selection of only a few key ancestors sampled from a small number of landraces to establish heterotic groups^5,6, and third through decades of advanced cycle breeding with high selection intensities^7,8. For traits that were not targets of selection in the past, but are important today, like abiotic stress tolerance and resource-use efficiency⁹, this might have resulted in the loss of favorable alleles during the breeding process. In addition, unfavorable alleles might have become fixed during the selection process due to drift and/or hitchhiking effects^10,11,12.

Impressive examples exist where introgression of alleles from genetic resources has improved mono- or oligogenic traits^13,14,15, but for broadening the genetic diversity of complex traits, such as yield or abiotic stress tolerance successful examples are scarce². Up to date, the genomic characterization of genetic resources has been based predominantly on sampling individuals across a wide range of accessions, maximizing the level of diversity in the genetic material under study^{2,16,17,18,19,20}. Such diverse samples are characterized by high variation in adaptive traits and strong population structure, leading to spurious associations and limited power for detecting associations with nonadaptive traits of agronomic importance^21,22. Furthermore, alleles which are locally common, but globally rare likely remain undetected in broad, species-wide samples, whereas in a more targeted approach they might show sufficiently high frequencies for detection²².

Here, we propose a genome-based strategy (Supplementary Fig. 1) for making native diversity of maize landraces accessible for improving quantitative traits, showing limited genetic variation in elite germplasm, such as cold tolerance and early plant development^23,24,25. Capitalizing on low levels of linkage disequilibrium (LD), we map haplotype-trait associations at high resolution in ~1000 doubled-haploid (DH) lines derived from three European flint maize landraces. The genetic material has been preselected for adaptation to target environments to avoid confounding effects of strong adaptive alleles as suggested by Mayer et al.²⁶. We assess promising haplotypes genotypically by quantifying their frequency in a diverse panel of 65 European flint breeding lines. Phenotypically, we evaluate the direction and magnitude of haplotype effects relative to a subset of breeding lines. Many of the discovered haplotypes show stable trait associations across populations and environments. In addition, most of them do not exhibit undesired trait associations, making them ideal for introgression into elite germplasm. We show that our strategy to sample comprehensively individuals from a limited set of preselected landraces is successful in linking molecular variation to meaningful phenotypes, and in identifying alleles for quantitative traits that will enrich the genetic diversity of our crops.

Results

Molecular variation in landraces and breeding lines

The genetic differentiation of 941 DH lines derived from three landraces (Kemater Landmais Gelb, KE; Lalin, LL; and Petkuser Ferdinand Rot, PE) and a diverse panel of 65 European breeding lines²⁷ based on principal coordinate analysis (PCoA) with 501,124 single-nucleotide polymorphism (SNP) markers is shown in Fig. 1a. The first principal coordinate explained 6.2% of the molecular variation and separated the landrace-derived and the breeding lines based on their geographical origin within Europe from northeast (Germany) to southwest (southern France, Spain). The second principal coordinate explained 5.4% of the variation and separated the two landraces KE and PE from the panel of breeding lines. Diversity parameters polymorphism information content (PIC), gene diversity (H), and minimum number of historical recombination events (nR) were higher in the set of breeding lines compared to the three landrace-derived DH libraries, irrespectively if they were calculated from SNP or haplotype information (Supplementary Table 1). This was expected, as the landraces represent self-contained populations, whereas the breeding line panel was derived from a large number of different source populations across Europe²⁷. Combined across landraces, the DH lines almost reached the level of diversity of the breeding line panel.

**Fig. 1: Molecular inventories point to untapped variation in landraces.**

In total, the landrace and breeding line panels comprised 356,724 and 363,290 haplotypes (Fig. 1a), constructed for nonoverlapping windows of ten SNPs, corresponding to an average of 7.12 and 7.25 haplotypes per window, respectively. As expected for genetic material originating from the same germplasm group (European flint maize), haplotype frequencies were positively correlated (Pearson’s r = 0.74, P < 2.2e−16) between the two panels (Fig. 1b). Overall, 26.2% of the haplotypes of the landrace panel were not present in the breeding lines, indicating untapped haplotype variation. For those haplotypes, median and mean frequencies in the landrace panel were 0.005 and 0.039, respectively. Only 2.7% of those haplotypes occurred in all three landraces, whereas 82.8% occurred in only one landrace. Within the respective individual landraces their median and mean frequencies increased to 0.065 and 0.101, respectively. The landrace panel captured 72.4% of the haplotypes present in the panel of breeding lines.

Trait-associated genomic regions

A key question for the targeted utilization of genetic resources is, if molecular inventories of landrace-derived material are predictive of their potential to improve traits of agronomic importance. Using SNPs and haplotypes for genome-wide association scans (GWAS), we identified associations for all nine traits under study. Results were very similar for both types of genomic information, as exemplarily shown in Supplementary Fig. 2 for the trait tillering (TILL). As haplotypes are more informative than biallelic SNPs for the comparison with breeding lines, we focused on haplotypes in further analyses. Trait-associated genomic regions were defined based on LD between significant haplotypes (“Methods” section; Fig. 2 and Supplementary Data 1). As landraces were preselected for variation in early plant development^26,28, most associations (37–55) were detected for the traits early vigor (EV_V4/V6) and early plant height (PH_V4/V6). Haplotypes explained between 2 (female flowering time, FF) and 57% (lodging, LO) of the total genetic variance of the respective traits (Fig. 2). Despite the large sample size (n = 899), the proportion of genetic variance explained might be somewhat overestimated^29,30, and thus has to be interpreted with caution. Only few genomic regions were detected for flowering time, indicating that alleles with large effects were fixed during adaptation of the respective landraces to their geographical region, thus having little impact on GWAS for other traits.

**Fig. 2: Results from GWAS in DH libraries derived from maize landraces.**

Average r² decay distances (r² < 0.2) within the three DH libraries were 203 (LL), 484 (PE), and 973 kb (KE), and 201 kb for the combined set. This is consistent with previous results²⁶ and warrants high mapping resolution in the three DH libraries under study. For comparison, the diverse panel of 65 breeding lines across Europe exhibited an average r² decay distance of 107 kb. The lower LD level in the breeding line panel can be explained by admixture of many different source populations with varying linkage phases, which is generally undesired in GWAS. The median size of genomic regions associated with the nine traits under study was 92 kb, with a median number of three annotated genes per region (Supplementary Fig. 3), enabling prediction of candidate genes and functional analyses. Only for a few regions (<5%) resolution was not optimal, as they comprised >100 annotated genes. Mapping resolution in the three DH libraries is best demonstrated by an example of an already well-characterized locus: teosinte branched 1 (tb1). The gene tb1 played a major role in the transition from highly branched teosinte to maize with strongly reduced branch development³¹. In our study, a strong significant association for TILL was found in a genomic region on chromosome 1 comprising the tb1 locus (size 1.3 Mb, including in total 22 genes; Supplementary Data 1 and Supplementary Fig. 2). In silico fine-mapping in the respective region (“Methods” section) identified a ten-SNP window, which overlapped perfectly with tb1 and its regulatory upstream region.

Effect size and stability of trait-associated haplotypes

The potential of the identified landrace haplotypes for elite germplasm improvement depends on the size and direction of their effects on the traits of interest, their environmental stability and their dependence on the genetic background. In a given trait-associated genomic region, one window of ten SNPs comprising several haplotypes was selected. Significant haplotypes, hereafter referred to as focus haplotypes, entered into a multi-environment model (Supplementary Fig. 4) and were classified into favorable, unfavorable, or interacting based on the direction and stability of their effects in the different test environments (Supplementary Fig. 5). According to this categorization scheme, a high number of favorable haplotypes for early plant development traits were found in the DH libraries (Table 1 and Fig. 3a), representing potential candidates for introgression into elite germplasm. For the undesirable traits LO and TILL, many haplotypes had unfavorable effects. Overall, haplotypes identified for all nine traits showed moderate to high effect stability across environments, with similar patterns for favorable and unfavorable haplotypes (Fig. 3a, b).

Table 1 Number and percentage of favorable, unfavorable, and interacting focus haplotypes per trait.

Full size table

**Fig. 3: Effect stability of focus haplotypes across environments.**

The dependency of haplotype effects on the genomic background can be evaluated comparing effect significance and sign of the identified focus haplotypes between landraces. From the 48 haplotypes associated with PH_V6, 19 haplotypes were present in both KE and PE. Together, these 19 haplotypes showed 115 environment-specific haplotype-trait associations, of which 35 (30%) were significant for both landraces (Supplementary Fig. 6a). All of those 35 associations had equal effect signs for both landraces. Also for the 80 environment-specific associations significant for only one of the two landraces, a large majority (90%) had equal effect signs for both landraces. Similar patterns were observed for PH_V4 (Supplementary Fig. 6b).

Haplotype congruency in landraces and breeding lines

The ultimate criterion for assessing the usefulness of landrace haplotypes for germplasm improvement is their frequency in breeding material. If favorable haplotypes are already present at high frequency in the genetic material to be improved, they are of no additional value. We assessed the frequencies of the identified trait-associated focus haplotypes in a panel of 65 breeding lines based on genotypic data. When tracking an ancestral haplotype potentially shared between landrace and breeding material, recombination might have broken up the respective haplotype, but the trait-associated causal mutation might still be present. Small window sizes (mean = 0.026 cM), low values of historical recombination events (mean = 1.20), and high levels of haplotype similarity (mean = 0.33) found in the panel of breeding lines pointed to a low probability of haplotypes being broken up by recombination.

Frequency distributions of favorable landrace haplotypes in the 65 breeding lines are given for early development traits (EV_V4, EV_V6, PH_V4, and PH_V6) in Fig. 4. As the haplotypes identified for each of the four single traits (Table 1) were partly from similar genomic regions, we only considered 53 favorable haplotypes with a minimum distance of 1 Mb and/or r² < 0.8. The frequency of favorable haplotypes (mean = 0.20) was significantly increased (P < 0.01) compared to randomly drawn haplotypes (mean = 0.16). Six favorable focus haplotypes (11%) were absent in the set of breeding lines, and thus have potential for elite germplasm improvement. The mean frequency of 80 unfavorable haplotypes associated with early plant development did not differ significantly (P > 0.30) from the frequency of random haplotypes. A substantial proportion of unfavorable haplotypes (27.5%) were common in the breeding lines (Fig. 4), suggesting that a targeted substitution with favorable haplotypes could lead to further germplasm improvement.

**Fig. 4: Favorable and unfavorable landrace haplotypes in breeding lines.**

Linking haplotype variation to phenotypes

The potential of individual focus haplotypes to improve elite germplasm has to be evaluated phenotypically, comparing the performance of landrace-derived lines carrying these focus haplotypes and elite lines of the breeding pool one aims to improve. Using a subset of the breeding line panel (n = 14) phenotyped at six locations in 2017, we report exemplarily the results of such comparisons for two genomic regions on chromosomes 3 and 9, found to affect PH_V6 in the GWAS analysis (Fig. 5). On chromosome 3, the focus haplotype (haplotype A in Fig. 3a and Supplementary Fig. 5) was localized in a ten-SNP window, which explained 4.8% of the genetic variation for PH_V6 and comprised eight additional haplotypes in the DH lines. The focus haplotype had a frequency of 4.1% in the DH lines, outperformed six out of the eight alternative haplotypes significantly and was absent in the panel of breeding lines. A proportion of 93.8% of the 65 breeding lines carried one of the six haplotypes with significant negative effects relative to the focus haplotype (on average 0.61 standard deviations) in almost all environments. The remaining breeding lines (6.2%) carried a haplotype absent in the landrace panel, and thus without effect estimate. Averaged across environments, DH lines carrying the focus haplotype showed an increase of 6.06 cm over breeding lines, but the difference was not significant (P > 0.056; Fig. 5a). When looking at individual environments, however, significant differences (P < 0.044) were observed for locations OLI, EIN, and ROG (Supplementary Fig. 7a), which showed the lowest temperatures in the field²⁸, suggesting that the relative advantage of the identified haplotype might be temperature dependent.

**Fig. 5: Effect of favorable haplotypes not present in breeding lines on early plant development.**

On chromosome 9 in a genomic region of ~3 Mb, three independent focus haplotypes affected PH_V6 significantly (two favorably, one unfavorably). One of the three focus haplotypes (haplotype D in Fig. 3a and Supplementary Fig. 5) increased PH_V6 compared to the six alternative haplotypes in the respective window. The genetic variance explained by the haplotypes in this window was small (1.7%) most likely due to the low frequency (0.4%) of the focus haplotype in the DH lines. The focus haplotype was absent in the panel of 65 breeding lines. Instead, 95.4% of the breeding lines carried one of the six inferior haplotypes, while 4.6% carried haplotypes not present in the landrace panel. DH lines carrying the focus haplotype showed a significant increase of 15.1 cm compared to the breeding lines (P < 0.009). Similar as for the haplotype on chromosome 3, the difference was most pronounced in environments showing low temperature during early plant development (Supplementary Fig. 7b).

We also assessed genomic regions in more detail where the focus haplotype was unfavorable like, for example, the window comprising the tb1 locus, which explained 13.1% of the genetic variance for TILL in the landrace panel. DH lines carrying the unfavorable focus haplotype showed a significant increase of 1.51 scores compared to the 14 phenotyped breeding lines not carrying the haplotype (Supplementary Fig. 8a; P < 0.0001). Here, the focus haplotype was carried by only two of the 65 breeding lines, but for other genomic regions associated with TILL frequencies were higher, e.g., 15.5% for a region on chromosome 5 explaining 6.6% of the genetic variance in the DH lines. In this case, DH lines carrying the focus haplotype showed a significant increase of 1.69 scores compared to 13 breeding lines not carrying the haplotype (Supplementary Fig. 8b; P < 0.0004). For a genomic region on chromosome 1 associated with EV_V4 (Supplementary Fig. 9), more than half of the 65 breeding lines carried the unfavorable focus haplotype, including six of the 14 phenotyped lines. The window in which the focus haplotype was located comprised four additional haplotypes and accounted for 5.1% of the genetic variance in the DH lines. We tested the effect of the focus haplotype in the 14 breeding lines and found a significant difference of 0.875 scores between lines with and without the focus haplotype (P < 0.039, Supplementary Fig. 9), indicating that a targeted substitution of the focus haplotype with one of the alternative haplotypes could lead to germplasm improvement.

Introducing landrace alleles into elite germplasm for a target trait comes at the risk of undesired effects on other traits due to pleiotropy or linkage. We tested the identified focus haplotypes for each of the early plant development traits in bivariate models for significant effects on other traits (PH_final, FF, MF, LO, and TILL). Of the 53 favorable haplotypes referred to in Fig. 4, 20 had a significant effect on at least one out of the five other traits. Thereof, only three haplotypes increased LO or TILL, whereas four haplotypes slightly decreased LO or TILL. Fourteen haplotypes increased PH_final and/or led to earlier flowering, whereas one haplotype slightly delayed FF. For some of those haplotypes the effect on traits other than early plant development was substantial (e.g., haplotype J in Supplementary Fig. 10a increasing LO). An enrichment of such haplotypes in the breeding germplasm is therefore not advisable. In contrast, haplotypes which explained more of the genetic variance for early plant development than for other traits (e.g., haplotypes E or G in Supplementary Fig. 10a) can still be used for improving germplasm for early plant development resulting in only slightly altered flowering time and/or PH_final. Of the 80 focus haplotypes unfavorable for early plant development (Fig. 4), 48 were significant for at least one other trait. Thereof, 14 haplotypes decreased TILL, while 40 decreased PH_final and/or delayed flowering. However, most of them had only moderate effects on these traits (Supplementary Fig. 10b). Therefore, in many cases selection against those haplotypes can still be recommended.

Discussion

The importance of genetic variation for selection and genetic improvement of crops is undisputed. Genetic resources of domesticated species, such as landraces, are a valuable source of diversity for broadening the genetic base of elite germplasm¹. However, efficient strategies for utilizing this native diversity for the improvement of quantitative traits are lacking. Here, we developed a strategy to discover beneficial haplotypes for quantitative traits in maize landraces (Supplementary Fig. 1). The combination of comprehensive molecular inventories and meaningful phenotypes collected in landrace-derived DH libraries in multi-environment trials allowed detection of haplotype-trait associations for quantitative traits with limited genetic variation in elite material. Even though the DH libraries were derived from only three preselected populations, 26% of landrace haplotypes were absent in the panel of breeding lines, representing the allelic diversity of multiple diverse source populations²⁷. While most of these haplotypes can be expected to be neutral³² or disadvantageous, some might represent beneficial novel variation.

Landraces represent self-contained populations adapted to their geographical origin³³. By focusing on diversity within rather than across landraces, confounding effects of strong adaptive alleles are avoided. Consequently, individual trait-associated haplotypes are expected to have moderate to small effects only. Our results meet these expectations. The majority of haplotype-trait associations detected in the DH libraries explained <5% of the genetic variance for all traits under study, including flowering time. However, as shown for the haplotype affecting PH_V6 on chromosome 9 (Fig. 5b), the genetic variance explained in GWAS is not only a function of effect size, but also of haplotype frequency. As DH and breeding lines were sampled from the same germplasm group (European flint maize), haplotype frequencies were positively correlated between the two panels (Fig. 1b). This exemplifies one of the key challenges when searching for untapped variation for quantitative traits, as haplotypes absent in the breeding material tend to have low frequencies also in landraces with shared historical ancestry. Focusing on a set of landraces preselected for variation in target traits increases the chances that they harbor alleles at frequencies large enough to be detected in GWAS. The success of this strategy was reflected in the high number of significant haplotype-trait associations found for target traits early vigor and early plant height.

The large sample of landrace-derived DH lines employed in this study enabled mapping of haplotypes with moderate effect size and comparably low frequency, but as is known for GWAS studies, some of these significant trait associations might be spurious³⁴. Here, the sequential determination of significance (Supplementary Fig. 4) should have minimized the proportion of false positives³⁵. In addition, the haplotype-based approach enabled tracking of ancestral alleles between landrace-derived and breeding material, and the phenotypic comparison between the two groups supported the usefulness of identified haplotypes for germplasm improvement. Nevertheless, the construction of haplotypes in landrace-derived material warrants further research. Different methods for haplotype construction exist, generating population-specific haplotype blocks based on LD^36,37 or linkage³⁸. Here, we used fixed window sizes, as it is advantageous in comparing haplotype frequencies across datasets varying in their extent of LD. The choice of window size depends on the available marker density and affects the number of haplotypes per window as well as the risk of haplotypes being broken up by recombination. Thus, defining the haplotype inventories of landraces and comparing them to elite germplasm is not trivial. Comprehensive sampling of individuals or lines from a limited number of landraces mitigates difficulties in haplotype construction and at the same time warrants sufficient statistical power and mapping resolution in GWAS through absence of pronounced population structure, rapid decay of LD, and consistency of linkage phases²⁶. Here, we put this strategy into practice and showed its potential in identifying favorable alleles not present in breeding lines for improving quantitative traits.

For early development traits, overall performance did not differ significantly between the DH libraries and the subset of phenotyped breeding lines, but DH lines carrying specific focus haplotypes not present in breeding lines outperformed the set of breeding lines significantly in environments favoring trait differentiation. This is a first step toward identifying haplotypes from genetic resources for germplasm improvement, but the final proof of concept will have to come from crosses of landrace-derived material with elite material. As landraces represent open-pollinated populations, background dependency of the identified trait-associated haplotypes should not be as pronounced as in mapping populations tracing back to few genetic founders, such as multi- or biparental crosses. In our study, the vast majority of trait-associated haplotypes occurring in landraces KE and PE had equal effect signs across landraces and environments, supporting this hypothesis. In addition, for cases where it was possible to contrast different haplotypes in the breeding lines (Supplementary Fig. 9), the effect of the focus haplotype in the breeding lines was consistent with the effect in the DH lines. If the selected landraces and the target germplasm to be improved share historical ancestry, we expect only minor genetic background effects when introducing favorable haplotypes discovered in landraces into elite material.

After identification of trait associations, fine-mapping of the respective genomic regions and functional characterization of candidate genes is a logical next step. With a limited number of annotated genes per trait-associated genomic region, high mapping resolution was obtained in this study. The envisaged functional validation of relevant haplotypes opens many options for utilization: targeted allele mining from genetic resources, unlocking diversity trapped in disadvantageous or incompatible haplotypes, broadening the genetic diversity at relevant loci in elite germplasm, and improvement of unfavorable haplotypes through gene editing³⁹. In addition to targeted haplotype management, genome-wide approaches will also profit from functional knowledge. Pre-breeding programs² might be accelerated through the use of genome-based prediction^40,41. It has been shown that the prediction accuracy is increased if known trait associations are included as fixed effects in prediction models⁴². As our results indicate high stability of haplotype effects across environments and genetic background, as well as limited haplotype-induced correlations between traits, the prospects of the germplasm improvement through the use of landrace-derived material are promising.

By successfully linking molecular inventories of landraces to meaningful phenotypes and identifying beneficial variation for quantitative traits of agronomic importance, the results of this study represent a first step toward the long-term goal of accessing native biodiversity in an informed and targeted way. The strategy proposed in this study and demonstrated experimentally with the European flint germplasm can be extended to other maize germplasm groups and even to other allogamous crop species. The key to an efficient use of genetic resources is to understand how genomic information of gene bank accessions can be translated into plant performance⁴³. We envision a future where haplotypes characterized for their genomic structure, allele content and functional relevance can be freely moved between populations. Our goal is to create plants with novel combinations of alleles that will lead to varieties with novel combinations of traits, thus securing sustainable crop production in a changing world.

Methods

Plant materials

We generated >1000 DH lines derived from three European maize landraces: Kemater Landmais Gelb (KE), Lalin (LL) and Petkuser Ferdinand Rot (PE)²⁸. The landraces were preselected for phenotypic variation in cold-related traits assessed in field trials and population genetic analyses²⁶. The set of breeding lines used in this study was selected from a broad panel of 68 flint lines²⁷. The initial dataset included two US sweetcorn lines, IL14H and P39, which we excluded from our analyses. The remaining 66 lines, released between ~1950 and 2010, were selected to represent the genetic diversity of the European flint elite breeding germplasm. The panel also includes prominent founder lines like EP1, F2, F7, and DK105 (ref. ⁴⁴).

Genotypic data

In total, 1015 landrace-derived DH lines were genotyped with the 600k Affymetrix® Axiom® Maize Array⁴⁵. After stringent quality filtering²⁸, 941 lines (KE = 501, LL = 31, and PE = 409), and 501,124 markers mapped to B73 AGPv4 (ref. ⁴⁶) remained for genetic analyses. Calls indicating heterozygosity (0.19%) were set to missing as in DH lines they can be assumed to result from technical artefacts, and all missing values were imputed separately for each landrace using Beagle version 5.0 (ref. ⁴⁷) with default settings. From the set of 66 breeding lines, 64 lines were genotyped with the same 600k array²⁷, whereas for two lines (EZ5 and F64) overlapping SNP positions (85%) were extracted from the HapMap data⁴⁸, which is based on whole-genome sequences. For making the 600k genotyping data comparable to the HapMap data, all alleles were coded according to the B73 AGPv4 (ref. ⁴⁶) forward strand. The breeding line data were filtered for the 501,124 high-quality markers of the set of DH lines. Applying the same quality filter criteria as for the DH panel (heterozygous calls < 5%; callrate > 90%, except for EZ5 and F64 with callrate >84%), one breeding line (FV66) was removed due to an increased number of heterozygous calls. For the remaining 65 lines, calls indicating heterozygosity (0.31%) were set to missing and missing values imputed using Beagle version 5.0 (ref. ⁴⁷) with default settings. For the combined set of landrace-derived DH lines and breeding lines, PCoA⁴⁹ was conducted based on modified Rogers’ distances⁵⁰, using the R-package ape version 5.3 (ref. ⁵¹). Pairwise r² (ref. ⁵²) between SNPs within 1 Mb distance was calculated for the DH libraries (within and across the three landraces) and the panel of breeding lines, respectively. Average LD decay distance (r² < 0.2) was estimated using nonlinear regression⁵³. If not denoted otherwise, analyses were done using R version 3.6.0 (ref. ⁵⁴). For plotting of results, R-packages ggplot2 version 3.2.0 (ref. ⁵⁵), plot3D version 1.3, and VennDiagram version 1.6.20 were used.

Phenotypic data

In total, 958 DH lines were phenotyped for 25 traits in replicated field trials²⁸. Briefly, line per se performance was evaluated in five locations across Germany, and in two locations in northern Spain in 2017 and 2018, resulting in up to 11 environments (location by year combinations) per trait. In each environment, up to ten separate 10 × 10 lattice designs with two replicates per DH line were used. In addition to the DH lines, 15 breeding lines (duplicate entries) and the original landraces (quadruplicate entries) were included as checks in 2017. Fourteen checks comprised important lines of the European flint breeding pool and were included in the set of 65 genotyped breeding lines. In 2018, only four breeding lines were used as checks (three flint lines). A subset of nine traits was analyzed in this study (Supplementary Table 2), related to early plant development, maturity, as well as agronomic characteristics. After stringent quality filtering based on genotypic data²⁸, phenotypic data of 899 DH lines (KE = 471, LL = 26, and PE = 402) remained for further analyses. Best linear unbiased estimates (BLUEs) for each DH line and check were calculated across environments using a mixed linear model, with genotypes as fixed and environment as well as design factors as random effects²⁸. Analogously, BLUEs were calculated within each environment using the same model without environment-related model terms.

Haplotype construction

For both, the landrace-derived DH lines, as well as the breeding lines, haplotypes were defined as a given nucleotide sequence within nonoverlapping windows of ten SNPs (Supplementary Fig. 4a), using the R-package zoo version 1.8-6 (ref. ⁵⁶). For the 600k chip, the density of SNPs along the chromosomes follows the average recombination rate⁴⁵. Therefore, using a fixed number of SNPs per window leads to similar window sizes as defined based on genetic map units. The median physical window size was 13.5 kb (mean = 37.8 kb), corresponding to 0.006 cM (mean = 0.026 cM) according to a genetic map generated from a F₂ mapping population of a cross of EP1 × PH207 (ref. ⁴⁴). Within each window, haplotypes were coded as presence/absence markers, yielding genotype scores 0 and 2, respectively. To evaluate the potential of untapped variation in landraces for elite germplasm improvement, we compared haplotype frequencies between the landrace-derived DH lines and the panel of 65 breeding lines.

Diversity measures

PIC⁵⁷ and H⁵⁸ were calculated based on SNPs (PIC_SNP and H_SNP) as well as on haplotypes (PIC_hap and H_hap) constructed as described above. nR⁵⁹ was calculated within the genomic windows used for haplotype construction. For all five parameters mean values across SNPs or ten-SNP windows, respectively, were calculated for each DH library individually and for the combined set of 941 DH lines, as well as for the set of 65 breeding lines.

Identification of trait-associated haplotypes

For GWAS in the DH lines, haplotypes which were present less than three times in the panel of 899 phenotyped DH lines were excluded from the analysis. For haplotypes with r² = 1, only one was retained, resulting in 154,104 haplotypes used for GWAS (Supplementary Fig. 4a), with on average 5.73 haplotypes per window. The identification of trait-associated haplotypes was conducted in two steps following Millet et al.³⁵, (i) identification of candidate haplotypes in GWAS (Supplementary Fig. 4b) and (ii) backward elimination in a multi-locus multi-environment model (Supplementary Fig. 4c). GWAS were conducted for single environments, as well as across environments using the corresponding environment-specific and across-environment BLUEs as response variable in the model, respectively. A univariate linear mixed model, implemented in GEMMA version 0.98.1 (ref. ⁶⁰), was used:

$${\mathbf{y}} = {\mathbf{W}}{\upalpha} + {\mathbf{x}}\beta + {\mathbf{Zu}} + {\mathbf{e}},$$

(1)

where y is the n-dimensional vector of phenotypic values (BLUEs), with n being the number of lines; α is a three-dimensional vector of fixed effects (intercept and landrace effects of KE and LL); β is the fixed effect of the tested haplotype; x is the vector of corresponding genotype scores coded as 0 and 2; u is the n-dimensional vector of random genotypic effects, with ${\bf{u}} {\sim} N(0,{\bf{K}}\sigma _g^2)$; and e is the n-dimensional vector of random residual effects, with e~N(0, I_nσ²). K denotes the (n × n) genomic relationship matrix based on SNP markers according to Astle and Balding⁶¹, calculated using the R-package synbreed version 0.12-9 (ref. ⁶²). I_n denotes the (n × n) identity matrix. $\sigma _g^2$ and σ² refer to the genetic and residual variance pertaining to the model defined in Eq. (1), respectively. Matrices W (n × 3) and Z (n × n) assign phenotypic values to fixed and random effects, respectively. Significance of haplotype-trait associations was assessed for each single-environment as well as for the across-environment GWAS based on the likelihood ratio test, as implemented in GEMMA, using a 15% false discovery rate⁶³. Haplotypes with a physical distance of <1 Mb and in high LD (r² ≥ 0.8) were considered to mark the same genomic region. The corresponding trait-associated genomic region was described by the start and end positions of the first and last haplotype fulfilling the defined criteria, respectively. To represent genomic regions equally in subsequent analyses, only the most significant haplotype, the focus haplotype, was retained per region in the respective GWAS, resulting in a set of candidate haplotypes.

In the multi-locus, multi-environment mixed linear model, we conducted a backward elimination of those candidate haplotypes as suggested by Millet et al.³⁵, using the ASReml-R package version 3.0 (ref ⁶⁴):

$$y_{ijk} = \mu + \omega _i + \delta _j + \mathop {\sum }\limits_{q\in {\mathbf{Q}}} x_{kq}\beta _q^i + u_k + e_{ijk},$$

(2)

where y_ijk is the phenotypic value (BLUE) of line k belonging to landrace j tested in environment i; μ is the common intercept; ω_i is the fixed effect of environment i; δ_j is the fixed effect of landrace j; x_kq is the genotype score (0 or 2) of line k for haplotype q; $\beta _q^i$ is the fixed effect of haplotype q in environment i, comprising the haplotype main and haplotype by environment interaction effect, i.e.,

$$\beta _q^i = \beta _q + (\beta \times \omega _i)_q,$$

(3)

u_k is the random genotypic effect of line k, and e_ijk is the random residual error with environment-specific residual error variance. Q represents the final set of haplotypes obtained through step-wise backward elimination based on the Wald test for $\beta _q^i$ (ref. ⁶⁵). At each step, significance of each haplotype was tested when it was the last one entering the model and the least significant haplotype was removed if P ≥ 0.01. The proportion of genetic variance explained by the set of trait-associated haplotypes was estimated by calculating the reduction in genetic variance between models including and excluding the haplotype effects, following Millet et al.³⁵. For evaluating effect stability across landraces for the final set of haplotypes Q, we extended Eq. (2) by changing the term $\mathop {\sum }\limits_{q\in {\mathbf{Q}}} x_{kq}\beta _q^i$ to $\mathop {\sum }\limits_{q\in {\mathbf{Q}}} x_{kq}\beta _q^{ij}$, with

$$\beta _q^{ij} = \beta _q + (\beta \times \omega _i \times \delta _j)_q.$$

(4)

For comparison, GWAS was also performed with 175,810 SNPs (minor allele counts ≥ 3 and r² ≠ 1), analogously as described above.

Favorable and unfavorable haplotypes and their effect stability

The number of environments in which a haplotype was significant was estimated by generating 95% confidence intervals (CI = effect estimate ± 1.96 × standard error) based on Eq. (2), following Millet et al.³⁵. A CI not including 0 indicated significance of the haplotype in a given environment. Haplotypes with constant effect sign across significant environments were classified as favorable or unfavorable. For EV_V4, EV_V6, PH_V4, and PH_V6 positive (negative) effects were defined as favorable (unfavorable). For LO and TILL negative (positive) effects were defined as favorable (unfavorable). No classification was made for PH_final, FF, and MF, as breeding goals vary for these traits. Haplotypes with changing sign of significant effects in different environments were classified as interacting.

Haplotypes associated with multiple traits

We tested if haplotypes identified for early plant development also had an effect on other traits using a bivariate model, similar to Stich et al.⁶⁶:

$$y_{tijk} = \mu _t + \omega _{ti} + \delta _{tj} + x_k\beta _t + u_{tk} + e_{tijk},$$

(5)

where y_tijk is the phenotypic value (BLUE) for trait t of line k belonging to landrace j tested in environment i; μ_t is the intercept for trait t; ω_ti is the fixed effect of environment i for trait t; δ_tj is the fixed effect of landrace j for trait t; x_k is the genotype score (0 or 2) of line k for the tested haplotype; β_t is the fixed effect of the haplotype for trait t; u_tk is the random genotypic effect of line k for trait t, with u ~ N(0, G ⊗ K); and e_tijk is the residual with e~N(0, E ⊗ I_n). G and E correspond to the (t × t) genetic and error variance–covariance matrices among traits pertaining to the model defined in Eq. (5), respectively, and ⊗ denotes the Kronecker product. Haplotypes for which the 95% CIs for both β_t did not include 0 were considered significant for both traits. The proportion of genetic variance explained per trait by significant haplotypes was estimated by calculating the respective reduction in G between models including and excluding the haplotype.

Haplotype comparison between landraces and breeding lines

We assessed frequency distributions of identified trait-associated favorable and unfavorable landrace haplotypes in the panel of 65 breeding lines, and compared them with 500 haplotypes randomly drawn out of the set of haplotypes occurring at least three times in the landrace panel. Significance for differences in means between the frequencies of favorable and random haplotypes, as well as unfavorable and random haplotypes was tested with the Mann–Whitney test (two-sided). When tracking potentially shared ancestral haplotypes between populations, the probability of a haplotype being broken up by recombination depends on the haplotype length, the recombination rate in the respective genomic region and the time span back to the most recent common ancestor. To evaluate to what extent recombination might have occurred in the haplotypes constructed in this study, we considered the physical and genetic length of each haplotype, as well as haplotype similarity (1 – H_hap) and nR within the respective genomic windows.

To evaluate the effect of the selected focus haplotype relative to the alternative haplotypes in a given ten-SNP window, we followed the approach of Bustos-Korts et al.⁶⁷, extending Eq. (2) to:

$$y_{ijk} = \mu + \omega _i + \delta _j + \mathop {\sum }\limits_{q\in {\mathbf{Q}}^\prime } x_{kq}\beta _q^i + x_{kh}\beta _h^i + u_k + e_{ijk},$$

(6)

where ${\mathbf{Q}}^\prime$ represents the set of haplotypes Q as described above without the respective focus haplotype of the window tested, x_kh identifies the haplotype (categorical variable) in the window tested carried by line k, and $\beta _h^i$ represents the effect of the respective haplotype relative to the focus haplotype. Similar as above, significance of haplotype effects relative to the focus haplotype was determined by constructing 95% CIs. We further estimated the proportion of genetic variance explained by the given window by calculating the reduction in genetic variance between the null model (without $\mathop {\sum }\limits_{q\in {\mathbf{Q}}^\prime } x_{kq}\beta _q^i + x_{kh}\beta _h^i$) and the model with the $x_{kh}\beta _h^i$ term.

To evaluate to what extent haplotypes with favorable or unfavorable effects in landraces also have favorable or unfavorable effects in elite material, respectively, we compared performance levels between the landrace-derived DH lines and the 14 breeding lines used as checks. As phenotypic data for the 14 breeding lines were only available for 2017, only the six environments from 2017 were considered. For some traits, differences in means between the landraces were observed²⁸, thus comparisons were conducted for each landrace separately. Significance for differences in means between the respective landrace and the 14 checks was tested based on 10,000 permutations (two-sided test). In addition to a comparison of the overall performance level between all lines of the respective landrace and the 14 breeding lines, we compared means between groups of lines carrying a particular haplotype and lines not carrying the haplotype.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

A reporting summary for this article is available as a Supplementary Information file. The datasets generated and analyzed during the current study are available from the corresponding author upon request. Seeds from all genotypes used in the study are available through material transfer agreements. The genotypic data of 941 DH lines and the phenotypic data of 899 DH lines and 14 breeding lines are available in figshare (https://doi.org/10.6084/m9.figshare.12137142). The 600k data of 63 breeding lines can be accessed at figshare (https://doi.org/10.6084/m9.figshare.3427040.v1), while for two lines genotypic data based on whole-genome sequences were downloaded from CyVerse Data Store (http://cbsusrv04.tc.cornell.edu/users/panzea/download.aspx?filegroupid=34). The Source data underlying Figs. 1 and 3–5, as well as Supplementary Figs. 2, 3, 5, and 7–10 are provided as a Source data file. Source data are provided with this paper.

Code availability

All custom codes used in the article can be found at Github (https://github.com/Manfred-Mayer/GWAS_DHs_landraces).

References

McCouch, S. et al. Feeding the future. Nature 499, 23–24 (2013).
Article ADS CAS PubMed Google Scholar
Sood, S., Flint-Garcia, S., Willcox, M. C. & Holland, J. B. In Genomics of Plant Genetic Resources: Volume 1. Managing, Sequencing and Mining Genetic Resources (eds. Tuberosa, R., Graner, A. & Frison, E.) 615–649 (Springer Netherlands, Dordrecht, 2014).
Mir, C. et al. Out of America: tracing the genetic footprints of the global diffusion of maize. Theor. Appl. Genet. 126, 2671–2682 (2013).
Article CAS PubMed Google Scholar
Doebley, J. F., Goodman, M. M. & Stuber, C. W. Exceptional genetic divergence of Northern Flint corn. Am. J. Bot. 73, 64–69 (1986).
Article PubMed Google Scholar
Romay, M. C. et al. Comprehensive genotyping of the USA national maize inbred seed bank. Genome Biol. 14, R55 (2013).
Article PubMed PubMed Central CAS Google Scholar
Gouesnard, B. et al. Genotyping-by-sequencing highlights original diversity patterns within a European collection of 1191 maize flint lines, as compared to the maize USDA genebank. Theor. Appl. Genet. 130, 2165–2189 (2017).
Article CAS PubMed Google Scholar
Duvick, D. N., Smith, J. S. C. & Cooper, M. In Corn: Origin, History, Technology, and Production (eds. Smith, C. W., Betran, J. & Runge, E. C. A.) (Wiley, New York, 2004).
Reif, J. C. et al. Trends in genetic diversity among European maize cultivars and their parental components during the past 50 years. Theor. Appl. Genet. 111, 838–845 (2005).
Article PubMed Google Scholar
Lobell, D. B. & Tebaldi, C. Getting caught with our plants down: the risks of a global crop yield slowdown from climate trends in the next two decades. Environ. Res. Lett. 9, 074003 (2014).
Article ADS Google Scholar
Hartfield, M. & Otto, S. P. Recombination and hitchhiking of deleterious alleles. Evolution 65, 2421–2434 (2011).
Article PubMed Google Scholar
Bouchet, S. et al. Association mapping for phenology and plant architecture in maize shows higher power for developmental traits compared with growth influenced traits. Heredity 118, 249–259 (2017).
Article CAS PubMed Google Scholar
Flint-Garcia, S., Jampatong, C., Darrah, L. & McMullen, M. D. Quantitative trait locus analysis of stalk strength in four maize populations. Crop Sci. 43, 13–22 (2003).
Article CAS Google Scholar
Khush, G. S. Green revolution: the way forward. Nat. Rev. Genet. 2, 815–822 (2001).
Article CAS PubMed Google Scholar
Gamuyao, R. et al. The protein kinase Pstol1 from traditional rice confers tolerance of phosphorus deficiency. Nature 488, 535–539 (2012).
Article ADS CAS PubMed Google Scholar
Bailey-Serres, J. et al. Submergence tolerant rice: SUB1’s journey from landrace to modern cultivar. Rice 3, 138–147 (2010).
Article Google Scholar
Romero Navarro, J. A. et al. A study of allelic diversity underlying flowering-time adaptation in maize landraces. Nat. Genet. 49, 476–480 (2017).
Article CAS PubMed Google Scholar
Yu, X. et al. Genomic prediction contributing to a promising global strategy to turbocharge gene banks. Nat. Plants 2, 16150 (2016).
Article CAS PubMed Google Scholar
Huang, X. et al. Genome-wide association studies of 14 agronomic traits in rice landraces. Nat. Genet. 42, 961–967 (2010).
Article CAS PubMed Google Scholar
Zhang, T. et al. Identification of loci associated with drought resistance traits in heterozygous autotetraploid alfalfa (Medicago sativa L.) using genome-wide association studies with genotyping by sequencing. PLoS ONE 10, e0138931 (2015).
Article PubMed PubMed Central CAS Google Scholar
Milner, S. G. et al. Genebank genomics highlights the diversity of a global barley collection. Nat. Genet. 51, 319–326 (2019).
Article CAS PubMed Google Scholar
Zhao, K. et al. Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nat. Commun. 2, 467 (2011).
Article ADS PubMed CAS Google Scholar
Brachi, B., Morris, G. P. & Borevitz, J. O. Genome-wide association studies in plants: the missing heritability is in the field. Genome Biol. 12, 232 (2011).
Article PubMed PubMed Central Google Scholar
Greaves, J. A. Improving suboptimal temperature tolerance in maize- the search for variation. J. Exp. Bot. 47, 307–323 (1996).
Article CAS Google Scholar
Revilla, P., Butrón, A., Cartea, M. E., Malvar, R. A. & Ordas, A. In Abiotic Stresses: Plant Resistance Through Breeding and Molecular Approaches (eds. Ashraf, M. & Harris, P. J. C.) (The Haworth Press, Inc, New York, 2005).
Rodríguez, V. M., Romay, M. C., Ordás, A. & Revilla, P. Evaluation of European maize (Zea mays L.) germplasm under cold conditions. Genet. Resour. Crop Evol. 57, 329–335 (2010).
Article Google Scholar
Mayer, M. et al. Is there an optimum level of diversity in utilization of genetic resources? Theor. Appl. Genet. 130, 2283–2295 (2017).
Article PubMed PubMed Central Google Scholar
Unterseer, S. et al. A comprehensive study of the genomic differentiation between temperate Dent and Flint maize. Genome Biol. 17, 137 (2016).
Article PubMed PubMed Central CAS Google Scholar
Hölker, A. C. et al. European maize landraces made accessible for plant breeding and genome-based studies. Theor. Appl. Genet. 132, 3333–3345 (2019).
Article PubMed PubMed Central Google Scholar
Beavis, W. D. In Molecular Dissection of Complex Traits (ed. Paterson, A. H.) 145–162 (CRC Press, New York, 1998).
Melchinger, A. E., Utz, H. F. & Schön, C. C. Quantitative trait locus (QTL) mapping using different testers and independent population samples in maize reveals low power of QTL detection and large bias in estimates of QTL effects. Genetics 149, 383–403 (1998).
CAS PubMed PubMed Central Google Scholar
Doebley, J., Stec, A. & Gustus, C. Teosinte branched1 and the origin of maize: evidence for epistasis and the evolution of dominance. Genetics 141, 333–346 (1995).
CAS PubMed PubMed Central Google Scholar
Kimura, M. Evolutionary rate at the molecular level. Nature 217, 624–626 (1968).
Article ADS CAS PubMed Google Scholar
Dwivedi, S. L. et al. Landrace germplasm for improving yield and abiotic stress adaptation. Trends Plant Sci. 21, 31–42 (2016).
Article CAS PubMed Google Scholar
Korte, A. & Farlow, A. The advantages and limitations of trait analysis with GWAS: a review. Plant Methods 9, 29 (2013).
Article CAS PubMed PubMed Central Google Scholar
Millet, E. J. et al. Genome-wide analysis of yield in Europe: allelic effects vary with drought and heat scenarios. Plant Physiol. 172, 749–764 (2016).
ADS CAS PubMed PubMed Central Google Scholar
Gabriel, S. B. et al. The structure of haplotype blocks in the human genome. Science 296, 2225–2229 (2002).
Article ADS CAS PubMed Google Scholar
Kim, S. A., Cho, C.-S., Kim, S.-R., Bull, S. B. & Yoo, Y. J. A new haplotype block detection method for dense genome sequencing data based on interval graph modeling of clusters of highly correlated SNPs. Bioinformatics 34, 388–397 (2017).
Article PubMed Central CAS Google Scholar
Pook, T. et al. HaploBlocker: creation of subgroup-specific haplotype blocks and libraries. Genetics 212, 1045–1061 (2019).
Article CAS PubMed PubMed Central Google Scholar
Ran, F. A. et al. Double nicking by RNA-guided CRISPR Cas9 for enhanced genome editing specificity. Cell 154, 1380–1389 (2013).
Article CAS PubMed PubMed Central Google Scholar
Gorjanc, G., Jenko, J., Hearne, S. J. & Hickey, J. M. Initiating maize pre-breeding programs using genomic selection to harness polygenic variation from landrace populations. BMC Genomics 17, 30 (2016).
Article PubMed PubMed Central CAS Google Scholar
Brauner, P. C. et al. Genomic prediction within and among doubled-haploid libraries from maize landraces. Genetics 210, 1185–1196 (2018).
Article CAS PubMed PubMed Central Google Scholar
Bernardo, R. Genomewide selection when major genes are known. Crop Sci. 54, 68–75 (2014).
Article Google Scholar
Mascher, M. et al. Genebank genomics bridges the gap between the conservation of crop diversity and plant breeding. Nat. Genet. 51, 1076–1081 (2019).
Article CAS PubMed Google Scholar
Haberer, G. et al. European maize genomes highlight intraspecies variation in repeat and gene content. Nat. Genet. 52, 950–957 (2020).
Unterseer, S. et al. A powerful tool for genome analysis in maize: development and evaluation of the high density 600 k SNP genotyping array. BMC Genomics 15, 823 (2014).
Article PubMed PubMed Central Google Scholar
Jiao, Y. et al. Improved maize reference genome with single-molecule technologies. Nature 546, 524–527 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Browning, B. L., Zhou, Y. & Browning, S. R. A one-penny imputed genome from next-generation reference panels. Am. J. Hum. Genet. 103, 338–348 (2018).
Article CAS PubMed PubMed Central Google Scholar
Bukowski, R. et al. Construction of the third-generation Zea mays haplotype map. Gigascience 7, 1–12 (2018).
Article PubMed Google Scholar
Gower, J. C. Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53, 325–338 (1966).
Article MathSciNet MATH Google Scholar
Wright, S. Evolution and the Genetics of Populations: Variability within and among Natural Populations (The University of Chicago Press, Chicago, IL, 1978).
Google Scholar
Paradis, E. & Schliep, K. ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35, 526–528 (2018).
Article CAS Google Scholar
Hill, W. G. & Robertson, A. Linkage disequilibrium in finite populations. Theor. Appl. Genet. 38, 226–231 (1968).
Article CAS PubMed Google Scholar
Hill, W. G. & Weir, B. S. Variances and covariances of squared linkage disequilibria in finite populations. Theor. Popul. Biol. 33, 54–78 (1988).
Article MathSciNet CAS PubMed MATH Google Scholar
R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, Vienna, Austria, 2019).
Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer, New York, 2016).
Book MATH Google Scholar
Zeileis, A. & Grothendieck, G. zoo: S3 infrastructure for regular and irregular time series. J. Stat. Softw. 14, 6 (2005).
Article Google Scholar
Botstein, D., White, R. L., Skolnick, M. & Davis, R. W. Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am. J. Hum. Genet. 32, 314–331 (1980).
CAS PubMed PubMed Central Google Scholar
Nei, M. Analysis of gene diversity in subdivided populations. Proc. Natl. Acad. Sci. USA 70, 3321–3323 (1973).
Article ADS CAS PubMed MATH PubMed Central Google Scholar
Hudson, R. R. & Kaplan, N. L. Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111, 147–164 (1985).
CAS PubMed PubMed Central Google Scholar
Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012).
Article CAS PubMed PubMed Central Google Scholar
Astle, W. & Balding, D. J. Population structure and cryptic relatedness in genetic association studies. Stat. Sci. 24, 451–471 (2009).
Article MathSciNet MATH Google Scholar
Wimmer, V., Albrecht, T., Auinger, H. J. & Schön, C. C. synbreed: a framework for the analysis of genomic prediction data using R. Bioinformatics 28, 2086–2087 (2012).
Article CAS PubMed Google Scholar
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate - a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).
MathSciNet MATH Google Scholar
Butler, D., Cullis, B. R., Gilmour, A. & Gogel, B. ASReml-R Reference Manual (The State of Queensland, Department of Primary Industries and Fisheries, Brisbane, 2009).
Kenward, M. G. & Roger, J. H. Small sample inference for fixed effects from restricted maximum likelihood. Biometrics 53, 983–997 (1997).
Article CAS PubMed MATH Google Scholar
Stich, B., Piepho, H.-P., Schulz, B. & Melchinger, A. E. Multi-trait association mapping in sugar beet (Beta vulgaris L.). Theor. Appl. Genet. 117, 947–954 (2008).
Article PubMed Google Scholar
Bustos-Korts, D. et al. Exome sequences and multi-environment field trials elucidate the genetic basis of adaptation in barley. Plant J. 99, 1172–1191 (2019).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We are indebted to the technical staff at KWS SAAT SE & Co. KGaA (KWS), Misión Biológica de Galicia, Spanish National Research Council (CSIC), Technical University of Munich, and University of Hohenheim for their skilled support in carrying out the extensive phenotypic evaluation required for this study. We are grateful to Therese Bolduan (KWS) and Tanja Rettig (KWS) for their contribution in planning and conducting field trials. We thank the technical staff at KWS for DNA extraction, as well as Hans Rudolf Fries (Technical University of Munich) for processing the genotyping arrays and Stefan Schwertfirm and Sylwia Schepella for technical assistance. We thank Fred van Eeuwijk and Emilie Millet (both Wageningen University and Research) for fruitful discussions on statistical analyses. This study was funded by the Federal Ministry of Education and Research (BMBF, Germany) within the scope of the funding initiative Plant Breeding Research for the Bioeconomy (Funding ID: 031B0195, project MAZE) and by KWS under Ph.D. fellowships for M.M. and A.C.H.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Plant Breeding, TUM School of Life Sciences Weihenstephan, Technical University of Munich, 85354, Freising, Germany
Manfred Mayer, Armin C. Hölker, Eric González-Segovia & Chris-Carolin Schön
Campus Office, TUM School of Life Sciences Weihenstephan, Technical University of Munich, 85354, Freising, Germany
Eva Bauer
KWS SAAT SE & Co. KGaA, 37574, Einbeck, Germany
Thomas Presterl & Milena Ouzunova
Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, 70593, Stuttgart, Germany
Albrecht E. Melchinger

Authors

Manfred Mayer
View author publications
You can also search for this author in PubMed Google Scholar
Armin C. Hölker
View author publications
You can also search for this author in PubMed Google Scholar
Eric González-Segovia
View author publications
You can also search for this author in PubMed Google Scholar
Eva Bauer
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Presterl
View author publications
You can also search for this author in PubMed Google Scholar
Milena Ouzunova
View author publications
You can also search for this author in PubMed Google Scholar
Albrecht E. Melchinger
View author publications
You can also search for this author in PubMed Google Scholar
Chris-Carolin Schön
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

C.-C.S., M.O., and M.M. conceived the study; C.-C.S., M.O., and A.E.M. acquired funding for the study; M.M., A.C.H., E.B., T.P., M.O., A.E.M., and C.-C.S. generated phenotypic and genotypic data; A.C.H. contributed to analyses of phenotypic data; E.G.-S. contributed to haplotype construction; M.M. performed analyses and drafted the manuscript; C.-C.S. edited the manuscript; all authors read and approved the final manuscript.

Corresponding author

Correspondence to Chris-Carolin Schön.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Communications thanks James Holland, Yunbi Xu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review

Reporting Summary

Description of Additional Supplementary Files

Supplementary Data 1

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Mayer, M., Hölker, A.C., González-Segovia, E. et al. Discovery of beneficial haplotypes for complex traits in maize landraces. Nat Commun 11, 4954 (2020). https://doi.org/10.1038/s41467-020-18683-3

Download citation

Received: 12 May 2020
Accepted: 06 September 2020
Published: 02 October 2020
DOI: https://doi.org/10.1038/s41467-020-18683-3

This article is cited by

Identification of superior haplotypes for seed protein content in pigeonpea (Cajanus cajan L.)
- Harsha Vardhan Rayudu Jamedar
- Prasad Gandham
- Rachit K. Saxena
Journal of Plant Biochemistry and Biotechnology (2024)
Assessing sequence variation, haplotype analysis and molecular characterisation of aspartate kinase2 (ask2) gene regulating methionine biosynthesis in diverse maize inbreds
- Hriipulou Duo
- Rashmi Chhabra
- Firoz Hossain
Molecular Genetics and Genomics (2024)
Assessing the potential of genetic resource introduction into elite germplasm: a collaborative multiparental population for flint maize
- Dimitri Sanchez
- Antoine Allier
- Alain Charcosset
Theoretical and Applied Genetics (2024)
Effects of marker density on genomic prediction for yield traits in sweet corn
- Guilherme Repeza Marquez
- Shichen Zhang-Biehn
- Gustavo Vitti Moro
Euphytica (2024)
Strategies to assess genetic diversity for crop breeding
- Tomohito Ikegaya
- Kenta Shirasawa
- Kenji Fujino
Euphytica (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.