Introduction

Analysis of quantitative trait loci (QTL) has progressed substantially from the early days of single marker analysis. Methodology now ranges from regression-based (multiple) interval mapping to likelihood-based full Bayesian approaches (eg Haley and Knott, 1992; Jansen, 1993; Jansen and Stam, 1994; Zeng, 1994; Satagopan et al, 1996; Kao et al, 1999; Carlborg et al, 2000; Corander and Sillanpää, 2002; Jansen et al, 2003). In addition to normally distributed traits, approaches have been developed specifically for binary traits (Yi and Xu, 2000; McIntyre et al, 2001; Yi and Xu, 2002b; Coffman et al, 2005) and for ordinal and categorical traits (Rao and Li, 2000; Yi et al, 2004). Upon detecting QTL in a single experiment, a key issue of interest is the generalizability of the findings. Many authors have compared results for different mapping populations based upon relative QTL position (Welz and Geiger, 2000; Kolb et al, 2001; Kamoshita et al, 2002; Simko, 2002; Clancy et al, 2003; Flint-Garcia et al, 2003; Toojinda et al, 2003; Tuberosa et al, 2003; Chardon et al, 2004). These comparisons have led to approaches in meta-analysis for QTL results (Goffinet and Gerber, 2000; Khatkar et al, 2004), and bioinformatic tools are being developed to facilitate this effort (Arcade et al, 2004; Sawkins et al, 2004). Findings often indicate concordance of intervals for some QTL, but typically a considerable proportion of family-specific QTL appear to be identified.

Comparing QTL findings from different families fails to account explicitly for segregation of different allelic combinations among different parents and the context dependency of QTL effects. These deficiencies immediately lead to questions about the generalizability of QTL findings from experimental populations to breeding populations. Previous quantitative genetic studies, focusing on the general (GCA) and specific combining abilities (SCA) of parents for a particular complex trait (Griffing, 1956), were popular precisely because they allowed an estimation of the genetic architecture of the phenotype, with inference extending back to the reference population. This provided an understanding of the utility of selection programs for the trait of interest (Hallauer and Miranda, 1988).

Modern breeding therefore faces this problem: efficient marker-assisted selection requires knowledge of the value of a QTL in the context of the (population-wide) genetic architecture of the trait, but traditional QTL study designs do not generate this knowledge. Classic quantitative genetic studies provide insight into the general behavior of a trait, but do not identify specific genomic regions which can be selected and introgressed.

The benefits of both approaches can be combined by analyzing multiple recombinant families in a mating design with shared parents. Gilbert (1985a, 1985b) pioneered this approach, partitioning single-gene effects from overall genomic (polygenic) effects in diallel crosses. We propose that using populations (eg F2 or RIL) developed from a common group of parents and constructed using classic mating designs to jointly infer the combining abilities and QTL location is a synergistic combination of tools available to the modern breeder.

Important progress along this road has been made in theoretical studies on QTL detection methods in interconnected families, exploring segregation probabilities among more complex crosses (Xu, 1996), the efficiency of different statistical methods for QTL detection (Rebaï and Goffinet, 1993; Liu and Zeng, 2000; Rebaï and Goffinet, 2000; Yi and Xu, 2002a; Jannink and Wu, 2003); the number of parents in the mating design (Muranty, 1996), and the trade-off between the number of families and family size (Xie et al, 1998; Xu, 1998; Wu and Jannink, 2004). These studies have shown that using populations with shared parents leads to more generalizable inferences about QTL, and can lead to increased QTL detection power. In addition to these theoretical efforts, large populations of RILs with shared parental lines are being constructed (eg the NSF-funded project ‘Molecular and Functional Diversity in the Maize Genome’, NSF DBI 0321467).

Prior studies indicate that to produce generalizable results on the genetic architecture of complex traits several parents are needed (see, for instance Wu and Jannink, 2004). In this paper, we address the question of how best to develop RIL populations in an interconnected design with a given number of parents. The classic diallel design, where every parent is mated with every other parent, is extremely labor intensive. In combining ability analysis, it has long been established that partial-diallel designs (eg NC design II, factorial (FCT) designs, or circulant designs) may be preferable, as they are less labor intensive and can still result in considerable power (Kempthorne and Curnow, 1961; Dhillon and Singh, 1978).

In this study, we investigate the efficiency of several partial-diallel mating designs for the joint analysis of QTL and combining abilities. To relate QTL results obtained in a diallel of F2 families back to variance components of a randomly mating reference population, we first provide a theoretical derivation of the among- and between-family variances for F2 families, and F2 extensions of the GCA and SCA variance components. We then use simulation techniques to determine the trade-offs between the number of crosses and the number of individuals examined within each cross under several different patterns of inheritance. We explore how different designs affect QTL detection power, and power for inferences about the underlying genetic architecture of the trait.

Methods

Simulations

We simulated diploid F2 progeny derived from mating F1 hybrids between 10 inbred parents in a half-diallel mating design without selfs. Note that, as is common practice, we refer here to inbred ‘parents’ when the inbreds are in fact the grandparents of the F2 progeny evaluated. Inbred parents were assumed to derive from a randomly mated reference population. This design yielded 45 F2 families. At each locus the 10 parents all carried a distinct allele, resulting in 55 recognizable diploid genotypes per locus across the set of 45 F2 families (10 homozygous and 45 heterozygous genotypes). Each F2 family consisted of 500 individuals, and we performed 1000 replicate simulations.

We used an infinite alleles model for the reference population such that all QTL alleles and all marker alleles were different for each parent. This model has the advantage of simplicity and agrees well with standard quantitative genetic variance component models. QTL effects were simulated by specifying the additive, dominance, and additive-by-additive epistatic variance contributions of each locus (or pair of loci in the case of epistasis) to the total phenotypic trait variance. Genetic variances were specified with reference to a randomly mating population from which the inbred founding parents were assumed to have been extracted. In all simulations, genetic variances in the reference population contributed a total of 30% of phenotypic variance, leaving a residual error variance of 70%. Genetic effects were simulated as follows. For each QTL, 10 additive effects were sampled from a standard normal distribution. Each effect, denoted ai, with i=1, …, 10, corresponded to the allele carried by one of the founding inbred parents. The effects were standardized to have zero mean and a variance equal to half that specified for the QTL. Thus, in a randomly-mating population, a QTL with those alleles at equal frequencies would have generated the specified additive variance. Note that this simulation approach deviates from what would happen in a repeated sampling of an infinite alleles population. QTL variance in our simulation was constant from one simulation run to the next and always equaled the specified variance. In a repeated sampling of reference population, the sampled QTL variance would change from one run to the next. Our approach simplifies the interpretation of the results because of the constant QTL variance it provided. For dominance, 55 effects, denoted dij with i, j=1, …, 10 and dij=dji, were sampled from a standard normal distribution. Each effect corresponded to the difference between the value conferred by a given diploid genotype and the value predicted by the additive effects of the genotype's two alleles. These effects were standardized so that for any j, for any i, and the variance of the dij was equal to that specified for the QTL. A consequence of this sampling scheme is that the variance of the dii, called the variance of homozygous dominance deviations (Edwards and Lamkey, 2002), is expected to be equal to the variance of the dij as a whole. In addition, the covariance between additive effects and homozygous dominance deviations is expected to be zero. While neither of these assumptions may hold in vivo (Edwards and Lamkey, 2002), they provide a useful starting point for exploring the question at hand. For additive-by-additive epistasis, 100 effects, denoted aaij with i, j=1, …, 10, were sampled from a standard normal distribution. These effects were standardized so that for any j, for any i, and the variance of the aaij was equal to one-fourth of that specified for the QTL pair. Each effect was associated with the two-locus haplotype receiving allele i at the first QTL of the pair and allele j at the second QTL. The effect corresponded to the difference between the value conferred by the haplotype and the value conferred by the additive effects of the haplotype's two alleles. The variance was again set so that the specified additive-by-additive epistasis would be generated by the QTL pair in a randomly mating reference population. Having sampled the genetic effects, the genotypic value of each progeny was generated by first simulating the progeny's QTL genotype, and then summing the genetic effects associated with that genotype.

Simulated genomes consisted of eight unlinked QTL. Two additive-effect QTL were adjacent to marker loci and each contributed 7.5% of the phenotypic variation. Marker loci were at the QTL position itself (M2, M5), flanking the QTL at 5 cM (M1, M4) and at 10 cM (M3,M6) according to the map M1M2M3 on one linkage group and M4M5M6 on another. Markers were multiallelic and informative in all crosses. The six remaining QTL together contributed 15% of the phenotypic variance and introduced unmarked or polygenic variation in trait values. We explored three different partitions of the polygenic variance into its additive, dominance and epistatic components: only additive variance (15%: 0%: 0% for additive, dominance, and epistatic, respectively); only dominance and epistatic (0%: 6%: 9%); and additive plus dominance plus epistatic variance (6%: 6%: 3%). In all cases, additive and dominance variances were partitioned evenly across the six polygenes, and the epistatic variance was partitioned evenly across three pairs of polygenes. One additional linkage group with three linked markers (M7M8M9) without QTL was also simulated.

We examined all 45 F2 families in our analysis of the half-diallel design. We compared this design to three other partial-diallel designs: the single-round robin (SRR), double-round robin (DRR), and FCT (Figure 1). The SRR, DRR, and FCT designs were constructed by subsetting the half diallel. This ensured that the comparisons across designs were made on the same data. Each design was analyzed over a range of family sizes (10, 25, 50, 100, 200, or 500 individuals per F2 family included in the analysis). The total size of a given experiment is the product of the number of families by the number of individuals analyzed per family.

Figure 1
figure 1

Different partial-diallel designs considered in our simulations. Shaded cells indicate crosses that are included in the analysis. (a) Half diallel; (b) SRR; (c) DRR; (d) FCT.

QTL detection and analysis of combining abilities

We discuss the following ANOVA models for detection of marked QTL:

where Y is the phenotype for the quantitative trait of interest, μ is the overall mean, M is the effect of marker m nested within family f in Model 2, and φ is the effect of family f (f=1, …, n1), where n1 is the number of families (10, 20, 25, 45).

The. family effect can be partitioned into an additive component that is attributed to parental main effects and a nonadditive component that is attributed to parental interactions. These are F2 extensions of the GCA and SCA variance components, which are traditionally defined with respect to the F1 progeny (Sprague and Tatum, 1942):

where Gi and Gj are the GCA effects of the parents i and j and S is the effect of the parental combination (Griffing, 1956; Lynch and Walsh, 1998). Sums of squares associated with the GCA effect were obtained by including binary indicator variables in the design matrix that coded the involvement of each parent in the cross (see Johnson and King, 1998). Following a hierarchical decomposition of variance inherent to diallel designs, we used type I, sequential, analyses to first obtain the sums of squares associated with GCA and then those associated with SCA (after accounting for GCA). Significance tests were carried out according to standard diallel theory: GCA mean squares were tested over SCA mean squares, and SCA mean squares were tested over the model error mean squares (Lynch and Walsh, 1998). All tests were conducted at a nominal significance level of α=0.05. Note that Model 3 cannot be fit for the SRR as GCA and SCA effects are only separately estimable when the number of families generated exceeds the number of parents used in the mating design. Therefore, only Model 2 was fit for the SRR data.

After Model 2 has lead to the detection of QTL (M2 and M5), it can be reordered to test for residual genetic effects not associated with the marked QTL. We refer to loci generating such residual effects as polygenes (Mather, 1941). Their presence is examined using Model 4:

where M2 and M5 are the markers at the identified QTL positions. As with Model 1, marker effects are fitted before family structure is taken into account. The effects of the two marked QTL are first removed (type I sums of squares) and the family effects (f) estimated are residual effects, leading to a test of the null hypothesis that no undetected polygenetic effect is present. A rejection of this null hypothesis indicates the presence of unmarked polygenic effects. As in Model 3, the family effect can be partitioned into its GCA and SCA components:

which splits the residual family effect into a GCA effect due to parental main effects and an SCA effect due to parental interactions. If the effects considered in the models were orthogonal, then the order of the model would be irrelevant, and Model 4 would give identical answers to Model 2. However, the lack of orthogonality in these models makes the consideration of both Model 2 and Model 4 relevant. Note that all the models considered here do not include epistatic terms. A more general approach would be to account for the interactions among loci, as well as their main effects. In the case where the mode of action of the QTL is unknown, models that include epistatic terms are to be preferred. As our simulations only included additive QTL, for demonstration purposes, we did not include these terms in our models.

When parental effects are considered random, the GCA and SCA components of variance can also be estimated. As a demonstration, for a single case (half-diallel design, polygenes contributing only dominance and epistatic variance, 500 individuals per F2 family) we estimated the variance components by fitting a family effects model, Yfmk=μ+φf+ɛfmk, and Model 3 using REML (Searle et al, 1992; Zhu and Weir, 1996). The estimated variances were compared to the expected variances of GCA and SCA effects according to the theoretical decomposition of among-F2 family variance. While different approaches have been developed for the analysis of diallel designs (Hayman, 1954a, 1954b; Griffing, 1956; Gardner and Eberhart, 1966), partial diallels (Kempthorne and Curnow, 1961; eg Fyfe and Gilbert, 1963; Viana et al, 1999) and extensions of analysis to F2 generations (Jinks, 1956; Hill et al, 2001), for the purposes of demonstration we follow the approach of Cockerham (1983) to obtain covariances among relatives derived by self-fertilization. All models were fit and tested in SAS (SAS Institute Cary, NC, USA).

Results

Decomposition of among-F2 family variance

The following decomposition applies to among-F2 family variances in the absence of QTL analysis, as well as to among-family variances that are residual once QTL main effects have been removed. Cockerham (1983) gave formulas for obtaining covariances between relatives derived from self-fertilization caused by single-locus effects. The relevance of these results to our context stems from the following reasoning. Assume a randomly mating reference population from which inbreds are extracted, then crossed in some form of diallel. The resulting F1 progeny genotypes occur in the diallel in equal frequencies as they would in the reference population. The F1 are subsequently self-fertilized to obtain F2 progeny. Cockerham's (1983) results provide the relevant single-locus total among-progeny variance components and among-family variance components. We present these results first, then extend them to two-loci additive-by-additive epistasis.

Denoting the progeny inbreeding coefficient Fg, the total among-progeny variance is

The among-family variance is

By subtraction, we can obtain the within-family variance as

To extend these results for covariances caused by two-loci additive-by-additive effects, define as the probability that, for both of two unlinked loci, a random allele from g is IBD to a random allele from g′, where g and g′ are the products of self-fertilizing their common ancestor that was itself the product of t generations of self-fertilization. The resemblance between g and g′ due to additive-by-additive epistasis is . Three cases can be distinguished. First, with probability (1–Ft)2, the common ancestor was not inbred at either locus. In that case, the probability of IBD between random alleles from g and g′ is ½ at each locus, and since the loci are unlinked, they segregate independently and . Second, with probability Ft(1–Ft), the common ancestor is inbred at one locus, but not the other. In that case . Finally, with probability , the common ancestor is inbred at both loci, in which case . Summing these conditional probabilities gives the unconditional probability . Thus, for independent loci, is the square of the coefficient defined by Cockerham (1983). In our context, setting t to zero and one provides the coefficients for among-family and total among-progeny variance, respectively. Adding these terms to the results above gives

The among-family variance can be partitioned further into components caused by GCA and SCA. The necessary coefficients of identity to deduce the GCA component follow. Denote by g and g′ progeny derived from the inbred line crosses A × B and A × C, respectively, after g generations of selfing. The lines g and g′ share one common inbred ancestor, and their covariance will therefore provide the GCA variance. Denote by Fg, , , , , , and the inbreeding coefficients of g and g′, their coefficients of coancestry, and their three- and four-gene coefficients of identity for the relationship defined here, and using a notation similar to Cockerham (1983). A two-locus coefficient of coancestry , similar to the coefficient described above, will account for additive-by-additive epistatic contributions to the GCA variance. The inbreeding coefficient is given by the usual . The coefficient of coancestry , irrespective of the levels of inbreeding g and g′. For unlinked loci, . For there is a probability of Fg that g is inbred, and if it is inbred, a probability of ¼ that a random gene from g′ is IBD to it. Therefore, . Similarly, . There is a probability of that both g and g′ are inbred at a locus, so . If both g and g′ are inbred at a locus, there is a probability of ¼ that they are also IBD to each other, so . Finally, is the half the probability that neither g nor g′ are inbred and that two pairs of alleles between g and g′ are IBD. Since g and g′ have only one common ancestor, this probability is zero. Specifying these coefficients to the situation of F2 families, we find that the covariance between progeny with one common inbred ancestor is

Since each family has two parents, the among-family variance consists of twice the GCA variance plus the SCA variance. The SCA variance is obtained by subtraction:

Dr Wyman Nyquist independently obtained these same results (pers. comm.). As with traditional diallel analyses, GCA variance is largely associated with additive QTL effects, while only nonadditive genetic effects contribute to SCA variance, including dominance, additive-by-additive, and additive-by-dominance effects. Note that the additive, dominance, and epistatic variances simulated for each QTL were specified with reference to a randomly-mating population and contribute to GCA and SCA differently when measured in the F1 versus the F2 generation, due to the generation of inbreeding resulting from intermating F1 individuals (Jinks, 1956; Hill et al, 2001).

In our simulations, VA, VD, and VAA are specified. Given a finite number of inbred parents used in the diallel, the variance of the inbred dominance deviations, D2, will depend on the deviations sampled in each simulation run. In the expectation, however, it will be equal to VD. As we sample additive and dominance effects independently, the expectation of D1 is zero. However, for a specific simulation, D1 may be either positive or negative due to the specific effects sampled. Given that the expectation of the average inbred dominance deviation is zero in our simulations, the expectation of the squared average, H*, is equal to the variance of the average. That variance depends on the number of inbred parents included in the diallel, nPar, and is VD/nPar. In practice, the variance contributed by that component will generally be small.

QTL analysis

As expected, a naïve search for marker–trait associations that failed to take into account the family structure of the F2 progeny (Model 1) resulted in highly significant effects of all markers, irrespective of linkage to a QTL (Figure 2). In the half-diallel design, for example, tests for differences among the 55 genotypic classes of any marker on the neutral linkage group (without QTL) yielded a significant result in at least 94% of the replicate simulations, even at the lowest family size. When the family structure was appropriately accounted for (Model 2), the type I error estimated by examining unlinked markers was found to be consistent with the threshold set for significance testing (0.05, see Figure 2b). All further analyses, therefore, follow Model 2.

Figure 2
figure 2

Marker-trait analysis with and without accounting for the family structure of the F2 progeny, for analyses including different numbers of individuals per F2 family. For each of nine markers the proportion of significant simulations is given (1000 replicated simulations; α=0.05). QTL were present on groups 1 and 2 (at M2 and M5), but not on group 3. (a) Naïve search that does not take into account the family grouping of F2 individuals; (b) nested approach, where marker effects are tested within families. Results are shown for the half-diallel design, under purely nonadditive polygene action.

For a fixed number of individuals per F2 family, designs with more families have more power to detect the marked QTL, as the total experiment size increases with the number of families (Table 1, Figure 3a). In contrast, when the total experimental size (number of F2 families times the number of individuals per family) was fixed, the partial-diallel designs with fewer and larger families (eg single and double round-robin designs) were more powerful in detecting QTL than designs with more and smaller families (Figure 3b). For instance, at a fixed total experimental size of 500 individuals, an additive QTL that accounted for 7.5% of the phenotypic variance was picked up in about 80% of the replicated simulations in the SRR design, as opposed to only about 35% in the half-diallel design. Power was affected slightly by the mode of action of the polygenes; marked QTL were more readily detected if polygenes acted additively (Table 1).

Table 1 QTL analysis: power to detect the two marked QTL at a fixed number of 10, 25, and 50 individuals per F2 family, in different partial-diallel designs and with different modes of polygene action
Figure 3
figure 3

Power of different partial-diallel designs to detect an additive QTL that contributes 7.5% of the phenotypic variance, expressed as the proportion of simulations (out of 1000) in which the completely linked marker showed a significant association with the phenotype at the α=0.05 level. Each diallel design was analyzed with 10, 25, 50, 100, and 200 numbers of F2 individuals per family, yielding different total experimental sizes for the designs that include different numbers of families. Results are shown for one of the two specified QTL, under purely nonadditive polygene action. (a) Comparison of designs at equal numbers of individuals per family; (b) Comparison of designs at equal total experimental sizes.

Exploring undetected polygene effects

Power to detect residual polygenic effects was dependent on the number of families as well as the total experiment size (Figure 4). To obtain high power to detect these residual effects, large experiment sizes were needed, containing between 1000 and 2000 progeny. Neither the mode of action of the polygenes (additive versus nonadditive) nor the specific partial-diallel design had a clear effect on the power to detect this effect (Figure 4). Further, Model 5 correctly identified the mode of polygene action: (1) a significant GCA effect but no SCA effect when only additive polygenes were specified; (2) a significant SCA effect but no GCA effect when only nonadditive polygenes were specified; and (3) significant GCA and SCA effects when polygenes contributed additive as well as nonadditive variance (Figure 4). Fitting the marker effects did, however, absorb part of the total genetic variance attributable to polygenic effects, as illustrated by the fact that SCA was detected less easily after fitting the marker effects even when the linked QTL themselves did not contribute any nonadditive variance (Figure 5).

Figure 4
figure 4

Analysis of residual among-family variance after fitting the two detected QTL. Upper panels show the proportion of replicated simulations (out of 1000) in which the among-family variance was significant at the α=0.05 level, in different diallel designs and for different total experimental sizes. In the middle and lower panels, the among-family variance is partitioned into its components GCA and SCA.

Figure 5
figure 5

Proportion of replicated simulations (out of 1000) in which GCA (circles) and SCA (triangles) effects were significant at the α=0.05 level, in different diallel designs and for different total experimental sizes. Dashed lines show results for models that account for effects of the detected QTL before testing GCA and SCA effects. Solid lines show results for models that do not include detected QTL. Results are shown for simulations where the polygenes contributed only nonadditive variance; marked QTL contributed only additive variance.

Among-F2 family variance components

Variance components were estimated in PROC MIXED using the REML option (SAS Institute, Cary, NC, USA) by sequentially fitting the model terms. That is, the GCA is fit first and the residuals from the GCA model are then used as the dependent variable in the SCA analysis. The model is fit sequentially because of the structure of the GCA and SCA terms. Since PROC MIXED estimates type III variance components, a single model including GCA and SCA will be problematic because these terms are not orthogonal. Estimated variances were concordant with expectations, but appear to slightly underestimate their expected values, although the empirical confidence interval does contain the true parameter value (Table 2).

Table 2 Expected and observed variance components in the F2 generation

Discussion

Joint analysis of multiple interconnected families can be a powerful strategy to extend QTL analysis beyond the limitations of single crosses (Rebaï and Goffinet, 1993; Muranty, 1996; Rebaï and Goffinet, 2000; Jannink and Jansen, 2001). We find that, given a fixed number of parents and total experimental size, some partial-diallel designs are more efficient than others in detecting QTL. Strikingly, we find that using the diallel as a basis for developing interconnected families allows for combining QTL analysis with traditional analysis of GCA and SCA, providing insight into the genetic architecture of the trait and revealing the mode of action of undetected QTL.

QTL detection power in different designs

For markers unlinked to QTL, associations with the phenotype arise (see Figure 2a) because of linkage disequilibrium generated by the mating design. Alleles at unlinked loci will be in linkage equilibrium within families, but they will be in linkage disequilibrium across families. If we denote by M1 and Q1 alleles from a marker and an unlinked QTL carried by an arbitrary inbred parent P1, linkage equilibrium means that P(Q1∣M1)=P(Q1). In gametes carrying M1, P(Q1∣M1)= ½. Within families of P1, P(Q1)= ½, and linkage equilibrium holds. Across all families, however, it is generally not true that P(Q1)= ½. For example, with 10 parents in the design and assuming that each parent contributes to an equal number of families of equal size, P(Q1)=0.1, such that Q1 and M1 are associated more strongly than expected under equilibrium, which will generate a spurious association between marker and phenotype. To avoid this problem, it is necessary to test for QTL association by nesting marker effects within families. This is clearly demonstrated by our findings using Model 1 (Figure 2). Note that, if all genetic variance is accounted for by marked QTL in a multiple QTL model, no residual genetic variation will be available to associate with unlinked markers, and there would be no need to account for family structure. While this situation is theoretically possible, it seems unlikely in practice, given that many small QTL are likely to remain undetected. In addition, non-nuclear effects and interactions between nuclear and extra-nuclear effects may also be detected as a polygene effect in these designs.

Comparing different partial-diallel mating designs using outbred parents, Muranty (1996) showed that, when the number of families and the number of offspring per family are held constant, QTL detection power is affected by the number of parents used in the design, but the arrangement of the parents in specific diallel designs has little impact. Based on this result, we explored QTL detection power with a fixed number of parents in order to compare different mating designs under the assumption that all designs sample the same genetic variation (10 parents). For a fixed experiment size, we found that QTL detection power was greatest for the mating design with the fewest but largest families (the SRR), while power was lowest for the design with the most, smallest families (the half diallel). This is in agreement with previous findings (eg Soller and Genizi, 1978; Wu and Jannink, 2004). In small families, the stochastic nature of segregation may cause some QTL or marker genotypes to be represented by very few (or zero) individuals. For example, for an F2 family size of 10, the probability that one of the homozygote genotypes is missing is 10.9%. When family size increases to 20, this probability drops to 0.6%, but the probability that one of the homozygote genotypes is represented by only one individual is still 4.8%. In such situations the family in question contributes little information about the marker effect, reducing power of the QTL analysis. As all our designs involved 10 parents, the minimum number of families was 10 (in the SRR design). Further reduction in the number of families, by reducing the number of parents involved in the crossing design, may further increase the power to detect QTL segregating in the pedigree, but may decrease QTL detection efficiency due to insufficient sampling of QTL alleles of the base population from which the parents were derived (Muranty, 1996; Wu and Jannink, 2004).

We focused on detecting additive QTL that each explained 7.5% of total variance, with sample sizes ranging from 10 to 500 per F2 family. Power of QTL detection was considerable even with small family sizes, due to the ability to borrow power across families and due to the relatively simple QTL effects that we simulated. For exact localization of QTL (as opposed to detection) the effects of small family size may be different, as the power that can be borrowed across families will depend on the degree that recombination probabilities between markers are similar between the families. Also, with more complex (eg epistatic) QTL models more segregants per family will need to be examined. We also use a very straightforward single-marker analysis for QTL detection; however, the approach described is general and can be used in combination with any QTL detection procedure. The important point is to account for the family structure in the testing procedure.

Another simplification of our analysis is that we assumed an infinite alleles model such that for both markers and QTL all parents were assumed to carry unique alleles. Clearly, having completely informative markers will increase QTL detection power relative to the more realistic case in which markers will not be informative in all families. The use of multiple linked markers in a region, denser marker maps and interval mapping approaches can mitigate this problem. Also, realistically, individual QTL may not segregate in all families as in the simulations. In the simulations, though the two parents of a family will always carry distinct QTL alleles, the difference in their allelic effects may be small, mimicking a nonsegregating situation. The overall variance of allelic effects will have a much greater influence on QTL detection power than whether that variance is composed of an ‘infinite’ allelic series or of a discrete set of few alleles.

We noted a modest but consistent effect of the polygenes’ mode of action on QTL detection power: power was lower when polygenes contributed nonadditive variance (Table 1). This is caused by the differential contribution of VA, VD, and VAA to within-F2 family variance: 15% nonadditive variance, specified with reference to a randomly-mating base population, contributes more variance within F2 families than 15% additive variance does (see Decomposition of F2 family variance). In the QTL detection models, polygenic variance contributes to the error variance, making the QTL more readily detectable when polygenes act additively.

Detecting polygenic effects

Having detected QTL and statistically controlled for their effects, the remaining genetic variation was explored using traditional quantitative genetic approaches to assess the modes of gene action of undetected QTL. A first step in this process involves determining whether there is residual variation unaccounted for by the marked QTL, as indicated by a significant family effect in Model 4. An important result of our simulations is that the detection of this effect does not depend on the mode of action of the unmarked polygenes (Figure 4). Having detected residual (polygenic) variance, it becomes meaningful to determine the mode of gene action generating the variance by partitioning it into components attributable to GCA and SCA effects. Knowing the predominant mode of polygene action will give insight into the feasibility of identifying their map locations through further experimentation. In particular, additive-effect loci will be more easily identifiable than polygenic loci that generate purely dominant or epistatic variances.

Of note, when marker effects are fit before family effects, as is necessary to make inferences on residual polygenic effects, a portion of the polygenic variance is absorbed in the marker classes due to mating-design-wide linkage disequilibrium between polygenes and markers (see Discussion above). This affects the analysis of polygenic effects (see Figure 5). Even so, in the simple regression-based approach proposed here, polygene action is still detectable and can even be partitioned into additive and nonadditive components. For the half-diallel, DRR, and FCT designs, we tested whether the polygenic effect was predominantly additive (partitioned to GCA) or predominantly dominant or epistatic (partitioned to SCA). Not surprisingly, we find that designs in which each parent contributes to more families (more connected designs) are more efficient in the GCA/SCA analysis of this polygenic variance than less connected designs (Kempthorne and Curnow, 1961).

For a fixed experiment size, more connected designs have fewer individuals per family than less connected designs. As a result, the power for detection of the QTL is lower. The potential gain of insight that more connected designs provide into the mode of action of polygenes must be balanced against the need for an adequate number of segregants per family for QTL detection power. If the primary goal of the study is to detect the relevant QTL, our results suggest that a SRR design is a good initial choice – although subsequent experiments using more connected designs would be required to obtain a comprehensive understanding of the QTL and polygene effects. If, in contrast, the main goal is to understand the inheritance of the trait with an initial estimate of QTL location, one of the more connected designs will be a better choice.

Variance component estimations

Given specification of VA, VD, and VAA of the QTL and polygenes in our simulations, we derived theoretical expectations for the GCA and SCA variances in the F2. The empirically observed variances appeared to deviate from these expectations; in particular, among-family variance was slightly lower than predicted (Table 2). Since marker effects were not included in the models estimating variance components in Table 2, absorption of family effects by markers does not explain this underestimation. The theoretical expectations were derived assuming random mating, but the partial-diallel mating designs are not random insofar as selfing is not allowed. The exclusion of selfing means that parents were paired using sampling without replacement, which generates a negative covariance between the effects of parents of a family. This negative covariance would be expected to depress the among-family variance. Support for this hypothesis comes from the fact that GCA variance showed a smaller deviation from expectation than overall family variance. That is, predicted family variance from the GCA/SCA analysis is 2 × 9.52+5.87=24.91, which is greater than the directly estimated family variance of 22.04 (Table 2). Each parent's GCA effect is estimated while controlling for the GCA of its mates, which, to a large extent, eliminates any negative covariance.

Conclusion

We demonstrate that by taking a classical quantitative genetic approach to mapping we can both locate specific genomic regions associated with a phenotype, and dissect the genetic architecture of the trait. Having multiple populations in the study ensures that QTL findings are more generalizable than findings from single-cross experiments. By combining diallel mating designs with QTL mapping, the identification of specific loci responsible for trait variation and the assessment of the generalizability of the findings will allow breeders to decide how to focus their efforts. In addition, the quantitative assessment of the amount and type of unexplained genetic variation in the reference population can serve as a guide to further experimentation.

As multiple mapping populations are expensive to generate, co-ordinated efforts to create suitable populations will be crucial. We show here that efforts to create populations in a design that is suited to GCA and SCA estimation (or to augment the existing designs to such an interconnected design) can greatly increase the value of the mapping resource and therefore should be strongly encouraged.