Introduction

Genetic mapping of important agronomic traits, followed by marker-assisted selection (MAS), provides a powerful tool for crop genetic improvement. Genes can be mapped through four basic approaches: linkage analysis using bi- or multi-parental populations, association or linkage disequilibrium (LD) analysis using natural populations, comparative analysis using mutated populations and near-isogenic (introgression) lines, and selective analysis using sub-populations based on selective sweeps.

Association mapping has been used to detect the underlying major genes in the gene pools and their introgression to improve traits in major crop breeding programs1. It has been based on two basic methods, one using candidate gene-based markers to confirm the association2 and the other using whole genome scan3, the latter being called genome wide association studies (GWAS). GWAS using single nucleotide polymorphism (SNP) marker loci has successfully identified genes and pathways for agronomic traits in many crops of economic importance, including rice4, maize5, wheat6, sorghum7 and barley8. This method generally consists of five stages: selection of diverse germplasm, estimation of the level of population structure, phenotypic evaluation, genotyping for candidate genes or whole genome genotyping, and statistical test for genotype-phenotype association9.

In contrast to linkage mapping, GWAS based linkage disequilibrium (LD) offers a potentially useful and robust approach for mapping causal genes with moderate or large effects10, which has several advantages: extensive genetic variations in a more representative genetic background, higher resolution, and utilization of historic phenotypic data on cultivars without the need to develop special mapping populations11. The simple statistical model for GWAS is focusing on single-SNP tests, and the test results frequently show high false positives owing to specific problems such as population structure, relatedness and polygenic background effects. Therefore, a variety of statistical analytical methods have been developed, such as, the mixed linear model (Q + K model), which is the most popular method that effectively eliminates false positives by incorporating population structure (Q) and relative kinship matrix (K)12, multi-trait mixed model (MTMM) for multiple traits13, multi-locus mixed-model (MLMM) based on multiple loci14, factored spectrally transformed linear mixed model (FaST-LMM) with the number and square of rank of the relationship among individuals15, settlement of MLM under progressively exclusive relationship (SUPER) using influential bin markers and a small set of markers to define the relationship among the individuals16, multi-trait set linear mixed-model (mtSet-LMM) between sets of variants and multiple traits17, and a random-SNP-effect MLM (RMLM) with a modified Bonferroni correction and a multi-locus model with less rigorous selection criteria from RMLM (MRMLM)18.

There are many types of populations that have been used in genetics, genomics and crop improvement19,20. These populations have been used individually, and in very few cases, in combination. The primary objective for this article was to review all available populations, and introduce the concept of multiple-hybrid population (MHP) as a new population type, which is more suitable for GWAS in hybrid crops using hybrid vigor. Using maize as an example, we developed an MHP from diallel and NC II mating designs. We will present the experimental design, parent classification, data analysis strategies and applications of the MHP in conventional quantitative genetics, GWAS, and breeding.

Populations in Genetics, Genomics and Crop Improvement

Populations used in genetics, genomics and crop improvement have their advantages and disadvantages. For the convenience of comparison, we will first discuss all available populations and then focus on the MHP in the next section.

Biparental populations

The populations derived from two parental lines are most widely used in genetics, genomics and breeding, with the following advantages and disadvantages:

  • F2: containing sufficient genetic information, but failing to distinguish the genotypes just by phenotyping owing to the presence of heterozygotes, and thus the phenotyping is usually measured with its derived F2:3 families, progenies produced by bulked pollination of F2 individuals, and the F2 genotypes maintained by multiple tillers and ratoning procedures19,20.

  • BC1: including two types of genotypes which could be distinguished theoretically just by phenotyping if the target gene is dominant, but difficult to maintain the materials for long time, although the genotypes may be maintained by following the procedures for F2 individuals as discussed above.

  • RIL (recombinant inbred line): consisting of genetically stable genotypes, which can be maintained permanently without change of the population constitution, and suitable for distinguishing dominant and co-dominant statuses, but difficult to be bred for some species, especially for cross-pollinated plants21,22.

  • DH (doubled haploid): including homogeneous and breeding-true genotypes and reflecting the gene segregation and recombination rate of F1 gametes, with different protocols available for the development of DH lines including microspore culture, chromosome elimination, and haploid inducers, depending on crop species19,20.

An immortalized F2 population can be developed through crosses among RIL or DH populations, which can be used permanently for QTL analysis20,23.

Multiparental populations

There are three major types of multi-parental populations, four-way crosses (4WC), NAM (nested association population) and MAGIC (multiparent advanced generation inter-cross). The 4WC were widely used in commercial animal and plant breeding, formed by a cross between two hybrids, F1 (P1 × P2) and F1’(P3 × P4), by which genes and QTL can be identified24 with epistatic QTL mapping developed using penalized maximum likelihood (PML)25.

NAM population is developed by crossing a reference genotype (line) with a panel of lines maximizing the genetic diversity and RIL families are then developed. Different sets of RIL populations share a common parental line, while RILs within each population are derived from two parents. The NAM population combines the advantages of both bi-parental segregating and natural populations, which can be used for integrated linkage and association mapping26. The first NAM population in plants was developed by using 200 RILs from each of the 25 crosses derived from 25 diverse inbred lines crossed with a reference inbred line B73, which resulted in nearly 5000 RILs in total27. This NAM population has been successfully applied to the study of flowering time28, resistance to southern corn leaf blight29, leaf architecture30 and carbon and nitrogen metabolism31. However, 48.7% of marker genotypes were from B73 among these RILs and allelic frequencies from common parent are much higher than those from other 25parents with phenotype deviation due to the only one common parent.

The MAGIC population starts with multiple bi-parental intercrosses based on variety-specific founders selected, and then every two bi-parental F1s are intermated to produce a double hybrid. The two double hybrid F1s are intermated again to develop a hybrid with eight parents involved20. This process continues until all parental lines are included in a final hybrid, and then RILs can be developed by selfing at least five generations or DHs generated through an available protocol32. The first MAGIC population has been developed in Arabidopsis thaliana by using a set of 527 RILs descended from a heterogeneous stock of 19 intermated accessions32. MAGIC populations have been developed to significantly increase the effectiveness of whole genome scan in wheat33, rice34 and maize35.

Mating-design populations

There are several populations that can be derived from specific mating designs, including:

  • Diallel: multiple parents are crossed to produce all possible bi-parental F1s.

  • North Carolina Design (NC) I: some individuals randomly selected as males from a biparental F2 population are crossed with other randomly selected as females.

  • NC II: n parental lines are divided into two groups, one group as males and the other as females, to produce hybrids of all possible combinations.

  • NC III: n individuals are selected from an F2 population to backcross its parents, P1 and P2.

  • TTC (triple test crosses): an extension of NC III, i.e., n individuals (n > 20) are selected from an F2 population to backcross with both parental lines, P1 and P2, and their F1s.

  • sTTC (simplified triple testcrosses): n cultivars or strains are selected from the germplasm pool to cross with two varieties or strains, PH and PL, with the highest and lowest phenotypic values.

Natural populations

The natural populations are those consisting of many individuals that are fixed or can be maintained individually through selfing or outcrossing, including inbred lines, varieties, landraces and wild relatives, which have been widely used in association mapping. There are two types of natural populations. The natural populations composed of elite lines have lower diversity and slower LD decay with a longer distance, while those with a core collection of diverse germplasm have rich genetic diversity and more rapid LD decay with a shorter distance. Both types can be combined in applications. First, rough mapping for a particular trait can be performed with the genetic populations with lower diversity and a small amount of markers to cover the whole genome; then, using natural populations with rich diversity in the genomic region of interest, a particular gene can be fine mapped for gene cloning36.

Multiple-Hybrid Populations (MHP) for GWAS

Populations described above, except for those derived from various mating designs, are most appropriate for crops using inbreds or varieties for commercialization but not suitable for hybrid crops that use hybrids for production, as genetic researches based on inbreds reveal neither what are hidden in hybrids nor what are associated with hybrid performance. Populations for hybrid crops and hybrid performance may come from two sources, one with testcrosses using all individuals from each of the populations as parents to testcross with one or several testers, and the other with a large number of multiple hybrids generated from such as mating designs described in the previous section as a population, which can be simply called as multiple hybrid population (MHP). Considering all available mating designs that have been used in quantitative genetics, hybrids from a full diallel design with n parents would be one of the best MHPs, as the full set hybrids can be divided into three subsets, two diallels and one NC II, as needed. As such an attempt with a limited number of hybrids, a partial NC II was used for QTL mapping to test effects associated with traits, and the significant effects identified by empirical Bayesian were used to estimate genetic effects for the missing F1s with elite parents and hybrids predicted37. The generated hybrids, along with the parental genotype and phenotype, can be used to predict the missing hybrids and the rest hybrids. The relative sizes of the three subsets can be optimized to best match different ecotypes of parental lines, and the number of hybrids can be optimized so that a best design can be achieved with the smallest sample sizes as possible for the largest “genetic gain” (Fig. 1). As an example, we developed an MHP with hybrids from Griffing IV diallel crosses and NC II design using temperate and tropical elite maize inbreds as parental lines.

Figure 1: Development of MHP with diallel and NC II mating designs.
figure 1

(a) An MHP developed in this study with two sets of diallel hybrids, one from 26 temperate maize inbred lines (blue triangle, P1-26) and the other from 17 tropical maize inbred lines (red triangle, P35-51), and one set of NC II hybrids (green box) with 13 temperate (P16-28) by 21 tropical inbred lines (P29-51). See Table 1 for the parent numbers and names. The 724 maize hybrids made in this study can be used to predict the missing hybrids (empty boxes) and the rest hybrids (empty areas) that can be made from all the parents (any of the 1275 hybrids that can be derived from the full diallel set with 51 parents). (b) A full diallel (largest triangle) can be divided into three subpopulations (including two diallels and one NC II). Depending on the numbers of parental lines from different ecotypes, the three subpopulations, represented by blue, green and red, may vary to optimize crossing designs. The maize MHP developed in this study is shown on the bottom left.

The parents used for developing MHP can be a representative sample of the population to which inference is desired and a core collection from a gene bank, varieties or landraces representing the elite germplasm for a breeding program, or a set of inbred lines representing a synthetic outcrossing population. MHPs have several distinct characteristics that make them very unique compared to other types of populations and very useful in genetics and plant breeding.

Suitability for combining ability and heterosis analysis

The evaluation of hybrids produced by inbred lines is an important step towards the development of hybrid varieties in crops such as maize and this process theoretically should be done for all possible ways of hybridization (diallel crosses), where selection of favorable traits for each inbred can be determined. The diallel and NC II design analysis can provide elaborate information on the genetic identity of genotypes especially on dominance-recessiveness relationships and some genetic interactions38. With diallel or NC designs, many studies have been conducted for combining ability and heterosis analysis. For example, 30 crosses developed in maize with six parents according to Griffing III method to identify hybrids expressing a high level of heterosis39. The overall increase of GCA and SCA was found in NC II design between some common testers and 25 mutant lines from SP4 derived from three inbreds carried into satellite, providing useful information for maize breeding through mutation40. A Combined Relative Level (CRL) model using 112 metabolite levels in young roots from four parental inbred lines and their diallel hybrids in maize indicates that parental metabolite profiles can be used together with selected hybrids as a training set to predict biomass of all possible hybrids41. A complete diallel series of crosses involving eight parents was developed to evaluate GCA and SCA for corn grits42. A partial NC II design for cotton and rapeseed using epistatic association mapping revealed additive and additive by additive interaction effects for GCA, and dominance related effects for SCA. Mid-parent heterosis, dominance and dominance by dominance interaction effects affect heterosis more than over-dominance and complete dominance43. The MHP developed in our research will be used to estimate combining ability, heterosis, and genotype by environment interaction and details will be reported elsewhere.

Easy for sharing through parental lines

Compared to natural populations used for GWAS, which needs to share at least several hundreds, even thousands, of varieties, MHP just needs to maintain and share a much less number of samples. Sharing germplasm accessions that can be used as parental lines is more frequent for germplasm exchange. For n varieties, n(n−1)/2 hybrids can be produced. For example, 100 varieties can be used to generate 99 × 100/2 = 4500 hybrids. It would be much more difficult to share 4500 samples compared with sharing 100 parental lines. In our study, the MHP, consists of 724 hybrids, generated through just crossing 51 maize inbreds with each other (Fig. 1), which is time- and labor-saving.

Flexibility for testing across diverse environments

Diverse germplasm including inbreds and hybrids can be further investigated to identify new sources of genetic variations for fundamental experiments and commercial breeding. MHPs provide an opportunity for testing across diverse environments because of their better adaptation compared to inbred populations. The most typical example is the large-scale yield trials and multi-location tests before commercialization. Testing of multiple hybrids across diverse environments has been also used in genetic study. For example, testcrosses between Tx714 and 346 diverse inbreds evaluated under well watered and non-irrigated trials showed high genetic variance, and for 10 quantitative trait variants for agronomic traits as revealed by 60,000 SNPs, three of them explained 5–10% of phenotypic variation in grain yield under both water conditions44. In our research, hybrids from the MHP showed a high level of genetic and phenotypic variation across different environments (data not shown).

Through multi-environmental trials, adapted hybrids can be identified for specific environments. The difficulty of choosing appropriate varieties has contributed to restricted breeding progress for biotic and abiotic stress tolerance in highly variable target environments. Hybrids produced from diverse inbreds will show significantly different responses to diverse environments and thus the hybrids most adapted to a specific environment can be identified and used for further testing and breeding. In this regard, 91 hybrids made among seven high- and seven low-Fe-Zn content lines were evaluated in six locations, and one low-Fe-Zn parental line showed a significant positive GCA effect and its hybrids emerged as a highly promising variety45. A total of 84 hybrids were investigated for their resistance to parasitic weed Striga under artificial infestation and free environments in two locations, and three inbreds with high GCA effect for yield and Striga resistance and four high yielding hybrids showed the potential for further use46. In our MHP, some elite hybrids, especially large-scale commercialized ones, showed stronger adaptation under various environmental conditions.

Suitability for both hybrid and inbred crops

Progeny selection is one of the most important stages in plant breeding. Crossing patterns and hybrids as those generated in MHP are commonly used in both hybrid and inbred breeding programs. For hybrid crops, the crosses are usually made among heterotic groups to achieve a high level of heterosis and better adaptation to specific locations and crop seasons. A total of 91 hybrids, evaluated along with their parental lines for combining ability and heterotic pattern, showed extent differentiation for different yield traits, and three best heterotic patterns identified were potentially useful for commercial maize breeding47. In our MHP, diallel and NC II crosses made between temperate and tropical elite inbreds were used to explore the genetic factors associated with combining ability and heterosis for hybrids development. MHP can be also used in self-pollinated species, although production of hybrids is much more difficult than outcrossing species. There are numerous available examples using diallel crosses in self-pollinated crops. For example, non-additive gene effect for most of yield components was revealed based on 28 rice hybrids from a partial diallel crossing of eight inbreds, five of which showed significant favorable SCA effect on yield48. Earlier stem elongation in floating rice can greatly improve the chance of floating rice to survive in the flood, and therefore a set of 6 × 6 half-diallel crosses with four floating and two non-floating rice varieties was developed with results indicating that the additive effect was higher than dominant effects and the dominant alleles were concentrated in the floating parental lines49. A partial diallel based on six indica and seven japonica rice genotypes was used to investigate the genetic variations of yield and cold tolerance, with significant heterosis and combining ability revealed for tested characters50.

Savings in genotyping

As only the parental lines need to be genotyped, from which hybrid genotypes can be inferred, using MHP can save great investment in genotyping, compared to genotyping the same number of inbreds. As we calculated in the precious section, genotypes for 4,500 hybrids can be inferred from 100 parental lines genotyped, the former being 45 folds larger. In Brassica napus, 205 SSR markers were used to examine the polymorphisms among 441 parents (298 sterile and 143 restorer lines), and the genotypes for the partial NC II hybrids could be then deduced from their parents37, with 8 main-effect and 37 interacted QTL identified for oil content, which could be used to predict 10 elite restorer lines, 10 elite sterile lines and 10 elite hybrids.

A Maize Multiple Hybrid Population

An MHP from diallel and NC II designs

A total of 51 maize inbred lines, representing a broad selection of breeding germplasm from temperate and tropical regions, were used for development of an MHP (Fig. 1; Table 1). These included 28 temperate and 23 tropical inbred lines as parents, by which three subpopulations of MHP were developed: (1) temperate diallel consisting of 325 crosses made in Griffing IV using 26 elite maize inbred lines, including 13 U.S. and 13 Chinese inbred lines that represent different heterotic groups; (2) temperate and tropical NC II consisting of 263 hybrids made between 13 temperate and 21 tropical inbred lines; (3) tropical diallel consisting of 136 crosses made in Griffing IV using 17 tropical inbred lines as parents, most of which come from CIMMYT.

Table 1 51 maize inbred lines used for development of a multiple-hybrid population (MHP).

Chinese temperate inbreds used as MHP parents

Among 15 Chinese temperate inbreds used to develop the MHP (Table 2), Ye478, HZ4 (Huangzao 4), Dan340, Mo17, Tie7922 and Qi319 have been used as common tester lines for six Chinese heterotic groups, Reid, SPT, LRC, Lancaster, PA and PB, respectively51. These key inbreds have been playing a very important role in development of both inbreds and hybrids across maize regions in China. For example, Chang7-2 and LX9801 are among lines derived from HZ4 in summer maize region, Huang-Huai-Hai River Zone52. A total of 20 commercial hybrids have been developed by using Dan340-derived lines as parents, and 17 elite inbreds have been developed using Dan340’s sister lines53. As a common tester line for heterotic group ‘SPT’, HZ4 has been used to develop more than 40 hybrids and nearly 70 elite inbreds, including Ji853, LX9801 and Chang7-254. H21, an inbred line with high drought tolerance derived from Pioneer hybrid P7859955, and two superior hybrids made directly were planted widely in Huang-Huai-Hai River Zone. Four elite hybrids widely used in production were developed by Nan21-3, which was developed by selfing the elite exotic hybrids56. Qi205, a quality protein maize (QPM), was a breakthrough progress in protein maize in China57. More than 50 varieties, including Ludan50, Ludan981 and Ludan963, were released with Qi319, highly resistant to southern corn rust, and its derived lines as parents58. Over twenty inbreds were derived from Si287 with two widely used hybrids, Jidan27and Jidan46. Developed by seven continuous selfings from a U.S. hybrid 3382, Tie 7922 was the parent for 30 hybrids and numerous inbreds, including TieC8605-2, Dan9046 and Liao234559. Ye478 has been used to produce nearly 50 hybrids, such as Zhongdan8, Yedan12, Yedan13, and Yedan19. Zheng22, belonging to the heterotic group LRC, was used in Yuyu18, a hybrid widely used in production. Zheng58 is the female parent of ‘Zhengdan958’, which had been planted over 4 million ha per year in three consecutive years60. Dan598, developed from Pioneer hybrid P78599 from tropical zone, was used to make several widely planted hybrids, mainly Danke series61. Huang C was the male for one of the best hybrids, Nongda108, which was awarded the First-Class Prize for China National Science and Technology Progress in 200254.

Table 2 Key Chinese inbreds used in a multiple-hybrid population (MHP)and their contribution to hybrid and inbred breeding programs.

Tropical inbreds used as MHP parents

Tropical germplasm has abundant genetic variability and special resistance to diseases, pests and abiotic stresses. Introgression of favorable genes from tropical maize can broaden the genetic basis of temperate maize, improve abiotic and biotic stress tolerance, optimize heterotic patterns, and develop improved temperate-tropical hybrids. There are two primary ways for utilization of tropical germplasm in temperate maize breeding. One way is to conduct adaptive selection for tropical germplasm in transition regions, development of temperate-tropical composite populations and reciprocal recurrent selection for performance of temperate by tropical hybrids. Another method is to directly modify tropical maize that do not adapt in temperate conditions by selection for precocious, short flowering interval, and disease and lodging resistance. From the 1960s to early 1980s, U.S. scientists made great efforts for utilization of tropical maize. With more than 20,000 varieties of tropical maize collected62, they found that maize inbreds containing tropical germplasm were not only a useful source to expand the genetic diversity of temperate inbreds, but also competitive in crosses with temperate materials, producing high yielding hybrids63. For example, temperate by tropical hybrids showed greater adaption than temperate hybrids under heat stress environment64. In China, some tropical germplasm, such as Tuxpeno, Suwan and Mohuang9, were introduced in 1980s, with several excellent inbreds developed, such as 8703, S37, Taixi113 and 8501. In our MHP, 23 tropical inbred lines widely used in breeding programs of China and CIMMYT were included as parental lines, and our results also showed that the tropical lines had great potential for improvement of yield traits for temperate maize. Among three Chinese tropical inbred lines, Jiao51 was selected from landrace ‘Jiaomaerhuangzao’ in Guizhou, by which three elite hybrids, Jiaosandanjiao, Andan136 and Yudan11, were released65. Chuan 29 Female was selected based on natural pollination seeds from a single cross using pedigree method, with hybrid Chuandan29 released66. 18–599 was an elite inbred line in southwest China, with three elite hybrids, Shengyu9, Dong315 and Chunxi11, released for planting in mountain and deep hillock regions.

Temperate by tropical hybrids in maize production

Maize exhibits an astounding capacity for environmental adaptation with wide distribution around the world, but genetic bottlenecks resulting from natural and artificial selection for adaptation and productivity lead to the differentiation between temperate and tropical maize. Improving the adaptability of temperate maize largely depends upon the use of tropical maize germplasm, which hosts rich sources of favorable genes and alleles67, particularly for tolerance to abiotic and biotic stresses. A shortcut to the favorable alleles hosting in tropical maize is to develop temperate-tropical hybrids. Considering the significant differentiation between the two groups, the best way to test temperate-tropical crosses is by NC II, where a set of temperate inbreds are used as parents to cross with a set of tropical inbreds.

Moving maize from tropical to temperate means the selection for adaptation to changing day length and temperature conditions. Such photo-thermal responses have been investigated intensively. An improved population BS16 was developed by selective adaption for ETO (developed at Estacion Tulio Ospina), with the characteristic of 21 days earlier in silking, increasing yield and combining ability68. Through GEM (Germplasm Enhancement of Maize) program that targeted at moving tropical lines into breeding program, six tropical lines were released69. A collection of 152 tropical populations were used to investigate photoperiod sensitivity under long-day conditions, showing that the highland populations displayed a weak photoperiod sensitivity70. Longer leaf stay-green has been considered as a visible character of temperate by tropical hybrids, and the ratio of visible source leaves (RSL) showed significant correlation with grain yield. As a result, RSL can be used as a selection criterion for improving stay-green and yield traits in maize breeding programs71.

Because of the maize responses to photoperiod and temperature regimes, typical temperate or tropical maize hybrids have limited growing seasons or zones. However, temperate-tropical hybrids, which combine genetic merits from both ecotypes, show much stronger adaptation to environmental conditions. As a result, depending on the relative genetic contribution from two parental lines, some temperate-tropical hybrids may be planted in wider maize production areas from temperate to tropical zones, while others may only adapt to typical temperate, subtropical or typical tropical regions. A previous research indicated that temperate-tropical hybrids could adapt to more diverse growing environments than temperate-temperate or tropical-tropical hybrids72. A biofuel potential investigation with 12 temperate-tropical maize and two grain and silage hybrids indicated that temperate-tropical hybrids showed more stalk biomass and 50% more sugar with supplemental fertilizer N, and could produce equivalent ethanol with a small amount of nitrogen fertilizer73.

Agronomic performance of parental lines and their hybrids

Agronomic traits for inbreds and hybrids included in our MHP were evaluated for three years (2013–2015) in two locations (Beijing and Henan) with randomized block design each with two replications. Phenotypic performance across environments was evaluated using the best linear unbiased predictions (BLUPs)74 estimated by SAS PROC MIXED procedure with genotype, environment and year as random effects (Table 3). Compared to temperate parental lines, tropical inbreds showed higher plant and ear heights, delayed growth and development with significant larger days to tassel, silk and anthesis. For yield traits, however, temperate inbreds were overall better than tropical lines with higher grain number, row number, hundred grain weight and grain weight per plant. Among three hybrid subpopulations, significant delayed flowering time were found in both NC II and tropical diallel hybrids, and greater flowering time interval, anthesis-silking interval, was shown in NC II hybrids. For yield traits, NC II crosses showed higher ear length, ear diameter and hundred grain weight, while temperate diallels had greater grain number.

Table 3 Agronomic traits observed for parental lines and hybrids in a multiple-hybrid population (MHP).

The 28 temperate inbreds used in the MHP could be divided into six heterotic groups, plus four Pioneer germplasms as an independent group (Fig. 1; Table 4). The hybrids from Lancaster parental lines showed the best performance for grain number per row, while the hybrids from PB parents showed the highest plant and ear heights, grain weight per plant, hundred grain weight and delayed flowering with a heterotic group-representative line, Dan340. Reid-derived hybrids had the shortest days to tassel while Pioneer hybrids had the shortest days to silk and anthesis. LRC hybrids showed lower ear height and greater row number and grain number, compared with other groups. Overall, PB germplasm have an edge on increasing yield, Pioneer germplasm can shorten flowering time, and LRC parents can be used to achieve high yielding by increasing grain number.

Table 4 Agronomic traits for the hybrids from parental inbreds in different heterotic groups.

In conclusion, tropical germplasm can be used to improve the yield potential for temperate lines, while temperate germplasm, such as those from the U.S., are important donors to broaden genetic basis for specific breeding programs. As high genetic diversity and great trait variation were observed among parental lines and hybrids, the MHP developed in this study can be used in GWAS for many important traits. With known pedigrees and genotypic information from high-density markers, the structure and relationship among parental lines and thus among the hybrids can be well estimated and thus controlled in GWAS.

Genotypic information generated for the MHP

The 51 parental lines used to develop the MHP have been genotyped by a newly developed maize 55K-SNP chip75. The chip contains 30,133 SNPs selected from 600 K Affymetrix® Axiom® Maize Genotyping Array, which evenly distributed on the genome, 4,049 high polymorphic SNPs from widely used 56K-SNP chip MaizeSNP50, 9,395 SNPs from whole genome RNA-seq76, 4,067 SNPs that are tropical specific and generated by resequencing to fill the gaps in the B73 reference genome, and 132 SNPs from the tags for published transgenic events. Genotyping the parents with the improved 55K-SNP chip will benefit genetic and breeding studies because of the improved genome coverage, compared to other SNP chips available.

Applications of the MHP with diallel and NC II mating designs

The MHP proposed in this article can be used to estimate combining ability and heterosis for yield and yield components using hybrid phenotypes compared with their parents, which can be further analyzed in GWAS along with other traits such as hybrid phenotypes per se. Associated information can be used for genome-wide prediction of hybrid performance. The MHP and its derived secondary populations can be also used as populations in genetics, genomics and plant breeding. As basic information required for applications of the MHP, a cluster analysis was performed to classify the parental lines (Fig. 2a). A neighbor-joining tree was constructed based on Roger’s genetic distance77, which clearly separated the parental lines into two major groups, temperate and tropical. The 28 temperate inbred lines were further divided into six heterotic groups, Lancaster, LRC, Reid, PA, SPT and PB, as revealed in previous studies78,79,80.

Figure 2: Neighbor-joining tree for the 51 parental inbreds and 724 hybrids (maize multiple-hybrid population, MHP) and LD decay across the whole genome.
figure 2

(a) The full tree separated the 51 maize inbreds into two major groups based on Roger’s genetic distance. The temperate inbred lines can be further divided into subpopulations corresponding to six Chinese maize heterotic groups. The tropical inbred lines showed a more complex genetic relationship with no definitive subgrouping. (b) The NJ tree for 724 hybrids was used to determine K of population structure for MHP. (c) Genome wide average LD for the MHP.

Genetic analysis of combining ability and hybrid performance

Evaluation of combining ability and heterosis is the first step for breeding inbred lines towards the development of commercial hybrids. This process theoretically should go through for all parental lines and their possible hybrids. Both diallel and NC II designs can provide elaborate information on the genetic identity of genotypes, especially on dominance-recessive relationships and some genetic interactions38. As an example, a complete diallel crosses involving eight parents was produced to evaluate GCA and SCA for corn grits41. The MHP used in this research can be used to understand combining ability, heterosis, hybrid performance and genotype × environment interaction. This large-scale diallel-NC II design not only increases the power of statistical analysis, but also improves combining ability-based breeding efficiency for both temperate and tropical maize.

GWAS based on markers, alleles and haplotypes

Genotyping of parental lines and then inferring their hybrid genotypes can be done through various genotyping platforms including sequencing (both whole genome and simplified genome sequencing or genotyping by sequencing, GBS) and chip-based genotyping. After genotyping, various markers such as SNP, copy number variations (CNV) and structure variation (SV), and their alleles and combinations (haplotypes) can be developed to cover the whole genome. For example, haplotypes have been used to construct HapMaps in maize with three different versions of HapMap developed81,82,83. RNA-seq and other sequencing technologies such as methylation-based sequencing have been used to identify expressed genes, epialleles and haplotypes associated with epigenetics84,85. Resequencing and GBS have been widely used in maize germplasm fingerprinting86, genetic diversity analysis87 and GWAS88.

Using numerous markers identified through various sequencing technologies, high density chips have been developed. In maize there are several high-density SNP chips available, including 56 K SNP chip89, 600K-SNP chip90 and a new 55K-SNP chip with improved genome coverage75. In human, high-density chips containing SV markers have been also developed91. These high-density chips can be updated with functional- and gene-markers so that more gene-related information can be generated. High-density chips provide a quick approach to reveal alleles, genotypes, haplotypes for a large number of samples with fixed and comparable markers. They have been widely used in maize germplasm fingerprinting27, genetic diversity analysis80 and GWAS92.

The currently available studies are mostly limited in a single application of genotyping methods, each with specific disadvantages. For example, gene arrays can not determine accurate positions of the target genes in multiple cell type, GBS may generate large quantities of missing data, and whole-genome resequencing shows high coverage but poor pertinence versus targeted resequencing. Therefore, simultaneous use of multiple genotyping techniques might be imperative. In soybean, 1.4 million tag SNPs were used as a reference to impute a large set of SNPs with a panel of 301 soybean accessions through whole genome resequencing, GBS and SNP-array (SoySNP50K), and this imputation can be used to fill in missing genotypes and untyped loci with high accuracy93. In maize, CRTRB1, LCYE and other key genes or genomic regions that govern rate-critical steps in the upstream pathway were identified for various carotenoids using GWAS with 380 CIMMYT inbred lines genotyped by 55 K chip and GBS94.

By genotyping the MHP parental lines with the 55 K chip, the genotypes of hybrids can be deduced. In the current MHP, 724 hybrids have been generated by using 51 parental lines, which is much less than what can be generated in all possible ways (51 × (51-1)/2 = 1275). The SNP loci that are heterozygous in one of the two parental lines are scored as missing. SNP markers with missing data rate <20% were extracted to deduce hybrid genotypes. A total of 37,527 SNPs (with minor allele frequency >5% and missing data rate less than 25%) for 724 hybrids were used for structure analysis and GWAS. Polygenic component and main effect (additive and dominant effect) analysis were estimated by a web tool PEPIS (the pipeline for estimating EPIStatic genetic effects, http://bioinfo.noble.org/PolyGenic_QTL/)95 (Fig. 3). For PEPIS, three files, including additive genotypic data, dominance data and phenotypic data, were uploaded. The genotype of each hybrid at each specific marker locus was coded as three genotypes, A for one homozygous genotype, B for the second and H for heterozygous genotype. The K matrix corresponding to the marker-generated kinship was calculated and used to estimate variance components with restricted maximum likelihood method (REML). The polygenic structure for the trait was examined with the given variance ratios, and genome scanning for main and epistatic effects was performed for each marker (main) effect and marker pair interaction (epistatic) effects95. With the sub-pipelines in PEPIS, the sub-pipeline 1 was used to calculate kinship matrix, and sub-pipeline 2 for polygenic component analysis and genome scanning for main and epistatic effect QTL. As a comparison, ‘Q + K’ model for classical GWAS analysis was also performed. Population structure (Q) was determined by fastSTRUCTURE96, the number of subpopulation was corrected by neighbor-joining genetic distance77, a cluster dendrogram was constructed using FigTree v1.4.2 software (Fig. 2b), and the phenotypic contribution of population structure was estimated with SAS PROC GLM procedure. The kinship matrix (K) and its contribution to phenotype were calculated by TASSEL 5.0 software97. The LD (r2) was calculated to estimate the degree of LD between pairwise SNPs and the sliding window size was set as 500 with a step of 50 markers using TASSEL 5.097 (Fig. 2c). Four flowering traits, DTT (days to tassel), DTS (days to silk), DTA (days to anthesis) and ASI (anthesis-silking interval) are taken as examples for GWAS in this report (Fig. 4). Association analysis were conducted using the compressed mixed linear mixed model (CMLM) and the P3D method including 37,527 SNP markers, population structure (Q), kinship matrix (K) and phenotypic information with TASSEL 5.0 software97. The casual genes or QTN associated were identified with a significant threshold (-log10 P ≥ 5.81) corrected by the Bonferroni test.

Figure 3: Polygenic component ratios of the flowering traits for maize multiple-hybrid population (MHP).
figure 3

Additive effect accounted for the most of variance for DTT (days to tassel, DTS (days to silk) and DTA (days to anthesis), while additive by additive effect accounted for 50.2% of the variance for ASI (anthesis-silking interval).

Figure 4: GWAS for four flowering traits with five overlapping peaks identified in the genome using a maize multiple-hybrid population (MHP).
figure 4

(a) Manhattan plots for PEPIS. (b) Manhattan plots for TASSEL. DTT refers to days to tassel, DTS for days to silk, DTA for days to anthesis and ASI for anthesis-silking interval.

Population structure was assessed by fast STRUCTURE for K values ranging from 1 to 10, and the population was divided into six subgroups with the correction of NJ genetic distance (Fig. 2b). The phenotypic contribution of population structure for DTT, DTS, DTA and ASI were 0.56, 0.58, 0.58 and 0.03, respectively, and the kinship contribution for phenotype were 0.99, 0.99, 0.98 and 0.52, respectively. The distributions of r2 over the whole genome were presented in Fig. 2c. The r2 values declined sharply as the distance increased and the average r2 was estimated at 100 kb, when the cut-off value for r2 was set to 0.2. The polygenic variance component ratios for the flowering traits are shown in Fig. 3. Additive genetic variance accounted for 79.5%, 76.8% and 78.5% of trait variance for DTT, DTS and DTA, respectively, whereas additive by additive genetic variance accounted for 50.2% variance for ASI, dominant effects accounted secondarily for 16.3%, 18.6%, 16.4% and 20.7%, respectively. Our result is consistent with the previous study for the additive effects for the flowering traits with unimportant epistasis28. In PEPIS, polygenic model for additive, dominant, additive by additive, additive by dominant, dominant by additive and dominant by dominant adopted to hybrids and main-effect distribution including additive and dominant are shown in Fig. 4a, and it was confirmed by the results for additive model in TASSEL (Fig. 4b). Therefore, we can run GWAS with hybrid phenotypes using the TASSEL software for additive effect analysis for the traits that are largely controlled by additive genetic variance. Given the additive effect accounted for the most part for the flowering traits and the purpose for this paper is GWAS with maize MHP, dominant effects analysis will be discussed elsewhere.

A large number of significant markers were identified for flowering traits and significant signals around known loci were used to identify candidate genes and QTN co-located with SNPs based on the B73 reference genome (http://www.maizegdb.org/gbowse). Comparing GWAS analyses from PEPIS and TASSEL, we found consistent results for flowering traits with five overlapping peak regions significant for pleiotropy allelic effects for four flowering traits on chromosomes 2, 3, 6 and 8 (Fig. 4). A peak was mapped closed to GRMZM2G065276 on bin2.06, a homolog of FCA on Arabidopsis chromosome 498. Arabidopsis FCA combined with another gene FPA encoded RNA binding protein, downregulated transcription of FLC, and promoted flowering99. Similarly, two QTN were detected on bin3.07, which were significantly associated with DTS (as qdsilk1 and qdsilk8 mapped in a previous research100). The related loci mdh3 (malate dehydrogenase3) showing four alleles in Kalmia latifolia101 regulated maize pollen tube growth102 and encoded a cytosolic malate dehydrogenase for glyoxylate cycle activity in Arabidopsis103. The peak on bin6.01 contained GRMZM2G004959 which had a homologous gene in Arabidopsis (AT3G61230.1) expressing in pollen and tube growth104. Peak SNPs on the short arm of chromosome 8 (bin 8.01) revealed a phosphatidylethanolamine-binding protein (ZCN9) gene GRMZM2G021614105, the homologous gene in rice LOC_Os01g02120.1 promoting the transition from vegetative to reproductive growth. The peak mapped on bin 8.05 was predicted with one candidate gene GRMZM2G091276, a homologous gene in rice (LOC_Os05g50890.1) OsGH3.5 coding JA-amino acid synthetase1 modulating light and JA signal in the photomorphogenesis106 and flower opening and anther dehiscence107.

For ungenerated hybrids (1275–724 = 551), their performance can be predicted using genomic best linear unbiased prediction (GBLUP)108 or empirical Bayesian approach37, and in general Bayes and GBLUP possess high accuracy. With an optimized scheme, we should be able to predict performance for much more hybrids with much less hybrids tested. For example, we may be able to predict a full set of diallel hybrids based on the results from NC II mating design using the same set of the parental lines. A statistical method has been developed and applied to such prediction in rice108, where a set of 278 selected IMF2 hybrids, developed from a RIL population between Zhenshan 97 and Minghui 6322, was used as a training sample to predict the rest 21,667 hybrids, with 16% yield increase in the top 100 selection compared with the average of all hybrids. The results revealed GBLUP as the best method for prediction compared with LASSO and SSVS. The GBLUP involves three stages: construction of respective kinship matrix for training sample and all hybrids, parameter estimation via cross-validation and prediction for missing and rest hybrids108. The empirical Bayesian approach was developed and used for a partial NC II mating design in Brassica napus including 284 F1 hybrids, and 143 restorers and 298 sterile lines were used to predict the phenotypes for missing hybrids, with QTL effects estimated by empirical Bayesian. GCA and SCA would be estimated and elite parents and hybrids could be predicted37. In our following research, the GBLUP approach108 will be used for analysis of the MHP and reported elsewhere. A recent research showed that predictability of yield was nearly twofold with metabolomics data compared with genomic prediction, despite the latter would be the most efficient method for high-heritability traits109, and hybrid prediction by metabolomics data may be a novel pathway in crop breeding programs that should be explored more in the future.

Breeding using MHPs

Breeding hybrid crops is an important tool that may remarkably promote yield enhancement from 30 to 400% resulting of the heterosis110, and hybrid crops were used to investigate heterosis for parental inbreds with maximum combining ability. The populations created should have a higher possibility of acquiring diverse and accurate phenotype performance than natural populations111. Diallel and NC mating designs have been widely used in plant breeding programs to maximize selection response and the opportunity for managing coancestry for breeding populations. A prominent feature of diallel and NC mating designs is that both designs provide breeding populations with known pedigrees, which can be used to estimate specific parental genetic effects for backward selection112. Estimation of combining ability and heterosis can be used to guide our future breeding activities using the hybrids and their parental lines and derivatives. GCA and SCA, two essential factors in developing breeding strategies, are two main genetic parameters that can be obtained from diallel and NC analysis. GCA reflects the stand or fall of the inbred lines per se while SCA is the ability to develop elite hybrids with specific partners. A line × tester population from a collection of 302 diverse inbreds with a common tester B73 was used to predict heterosis by inbred performance and genetic distance between parents for traits, and the result suggested that heterosis could be explained by trait-dependence113. An unbalanced NC II for 400 hybrids and 79 inbreds was used for joint analysis of hybrids and parental lines to predict the performance of untested hybrids114. A set of NC II crosses between 285 Dent and two Flint inbreds genotyped with 56,110 SNPs and 130 metabolites was used to predict combining ability, allowing a reliable screening of large collections of diverse inbred lines for the potential to create superior hybrids115. A set of 136 hybrids and 17 parental lines was used to estimate combining ability and heterosis for stress and non-stress environments116, and SCA was significantly correlated with genetic distance. The MHP used in this research contains three subpopulations from temperate, tropical and temperate by tropical hybrids, which can be used for breeding different ecotypes and for gene pyramiding and recombination to combine the merits from different maize germplasm sources.

Secondary populations derived from the MHP

Many secondary populations can be generated from the parental lines and hybrids of MHP by selfing the hybrids or crossing among parental lines and hybrids. The secondary populations include not only F2, BC and RILs, but also MAGIC and NAM. MHP-based researches will provide important information on genetics and genomics such as genes, alleles, haplotypes and population structure about their parental lines and associated hybrids, which can be used to guide further researches on the secondary populations.

As discussed in the previous section, the MHP developed in this study can be shared through distribution of the 51 parental lines, which is much simpler than sharing all possible hybrids (1275) that can be generated from the 51 parents. By sharing the parental lines, a flexible number of hybrids can be generated and tested for agronomic traits of interest to specific users. Genotypic information will be generated and accumulated through genotyping parental lines using different genotyping platforms conducted by collaborators worldwide, which can be shared through a website. As a start point of sharing, we are happy to distribute the 51 parental lines, along with their genotypic information available so far, to those who are interested in using the MHP in their genetics and breeding researches. As a return, we would appreciate all collaborators for sharing their phenotype information generated for specific traits in specific environments or locations and genotypic information generated with their own genotyping platforms.

Conclusions

Compared with bi- and multi-parental populations and natural ones, MHPs are more suitable for hybrid crops. From various mating designs, a large number of hybrids can be generated to form MHPs, and a set of diallel hybrids from a relatively large number of parental lines would be the best. Each set of diallel hybrids can be divided into three subsets of hybrids with distinct genetic properties, two diallels each representing one ecotype and one NC II representing between-ecotype hybrids. A partial set of hybrids can be generated, selectively or randomly, from the full set, and then used to predict the rest ungenerated hybrids. By sharing parental lines along with their genotypic information, a large number of hybrids can be generated and tested by worldwide collaborators.

In addition to conventional quantitative genetics on combining ability, heterosis and hybrid performance, MHPs can be widely used in modern genetics, genomics and breeding. By high-density genotyping, MHPs can be used in GWAS for phenotypic data collected under diverse environments, including traits of agronomic importance, and heterosis and combining ability as well, which can be based on markers, alleles and haplotypes. MHPs per se and their derived secondary populations can be used in breeding for both inbred lines and hybrids. Taking 724 hybrids derived from 51 parental lines and four flowering traits as an example, we compared two independent GWAS methods, PEPIS software developed for hybrids and TASSEL software designed for inbred line populations. The two methods revealed highly consistent results with five overlapping chromosomal regions identified and used for discovery of candidate genes and QTN. Our results indicate that MHPs are powerful in GWAS for hybrid-related traits and will be widely used in the molecular breeding era.

Additional Information

How to cite this article: Wang, H. et al. Development of a multiple-hybrid population for genome-wide association studies: theoretical consideration and genetic mapping of flowering traits in maize. Sci. Rep. 7, 40239; doi: 10.1038/srep40239 (2017).

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.