Introduction

In any attempt to make appropriate use of quantitative genetic variation in animal and plant breeding programs, one needs to appraise the genetic architecture of the traits, using specific mating designs to statistically evaluate the relevant parameters. Most such designs offer statistical tests for the significance of, and estimation of, the additive and dominant components of the polygenic variation (Zeng, 1999; Mackay, 2001). More complicated designs are needed to obtain efficient statistical inference about the epistatic component of the genetic variation (in addition to the additive and dominant parts). The triple test cross (TTC), which was originally proposed by Kearsey and Jinks (1968), provides not only a direct test for significance of the epistatic variance component but also unbiased estimates of additive and dominant components whenever epistasis among polygenes is absent. Pooni and Jinks (1976) demonstrated the distinct superiority of the TTC over the alternative strategies, in statistical power for detecting complementary and duplicate epistasis based on a random model of polygenic effects. Since its innovation, various modifications or extensions have been made and popularized its applications in both animal and plant breeding (Jinks et al., 1969; Jinks and Perkins, 1970; Pooni et al., 1980; Goldringer et al., 1997).

The recent advances in molecular biology have allowed construction of fine-scale genetic marker maps for dissecting quantitative genetic variation into chromosomal loci (QTL) (Lander and Botstein, 1989; Haley and Knott, 1992; Luo and Kearsey, 1992; Zeng, 1994; Satagopan and Yandell, 1996; Kao et al., 1999; Sen and Churchill, 2001). QTL analysis opens the opportunity to characterize epistatic effects between QTL as well as effects at individual QTL (Holland, 1998; Boer et al., 2002; Kao and Zeng, 2002; Yi and Xu, 2003; Yi et al., 2005), and reveals ubiquitous evidence for epistatic effects detected in both animal and plant species (Fijneman et al., 1996; McMullen et al., 1998; Hua et al., 2003; Moore, 2003). However, many statistical problems and issues of experimental design remain to be resolved for improving statistical inference of epistasis (Flint and Mott, 2001; Doerge, 2002; Jansen, 2003). For instance, Kao and Zeng (2002) recently pointed out that a two-way ANOVA exploiting genetic marker and trait phenotype data from an F2 segregating population was, in principle, inappropriate for testing for pairwise epistasis, even though this approach has been widely used in analyses of such data sets (Yu et al., 1997; Li et al., 2001; Hua et al., 2003).

The present paper aims at developing a quantitative genetics model and method for detecting epistasis by making use of the TTC mating design with marker information and exploring statistical power of the experimental design with or without incorporating marker information.

Theory and analysis

A finite loci model of TTC without marker information

We start by re-describing the TTC mating design, for the benefit of readers unfamiliar with it. A random sample of m individuals from the F2 generation obtained by crossing two inbred lines P1 and P2 are backcrossed to three testers, that is, the parental lines P1, P2 and F1. This generates 3m families, each of which is replicated by raising either n plots or n individuals in a randomized block experiment. Kearsey and Jinks (1968) demonstrated that to test for epistasis was equivalent to testing if , with , and being progeny means of the three families from crossing the testers to the ith F2 individual (i=1, 2, …, m).

In the present study, we consider two QTL, A and B, each with two alleles (Aa and Bb). They are linked with a recombination fraction r. The genotypes of the three testers are denoted as AABB, aabb and AaBb, respectively. There are nine possible genotypes, AABB, AABb, AAbb, AaBB, AaBb, Aabb, aaBB, aaBb and aabb, in an F2 population. Their genetic effects can be written as

where μ is the population mean, a1 and a2 (d1 and d2) are the additive (dominant) genetic effects at loci A and B; iaa, iad (ida) and idd are additive × additive, additive × dominant and dominance × dominance epistatic effects, respectively. The indicator variables are defined as

Under a random genetic effect model, the expected variance component between can be worked out as

It can be seen that epistatic effects and linkage between the two loci determine the above epistatic component. However, the estimate of the variance component could be biased downwards because of the multinomial variance of sample means (Falconer and Mackay, 1996, pp 51–56) and a highly unbalanced hierarchical structure of the data (Luo, 1993; Knott, 1994). This problem may disappear if the QTL effects are assumed to be fixed such as in Knott (1994) and Luo (1998).

Let ft be the probability of the ith genotypes at QTL in the F2 population (t=1, 2, …, 10), and ftjk be the frequency of the kth QTL genotype (k=1, 2, …, 10) within the jth full-sib family (j=1, 2, 3) from the tth F2 parent (t=1, 2, …, 10). If we let mt be the number of the tth genotype in the F2 samples and ntjk be the number of individuals with the kth QTL genotype within the jth full-sib family from the tth F2 parent, they may be considered to be random variables following multinomial distributions with two sets of parameters: ft and and ftjk and n, respectively.

Let βt (t=1, 2, …, 10) be the effect of family of , with , and being progeny means of the three families from crossing the testers to the ith F2 individual (i=1, 2, …, m) and γijk be the fixed effect of the kth QTL genotype (k=1, 2, …, 10) within the jth full-sib family (j=1, 2, 3) from the ith F2 parent (i=1, 2, …, m). The expected value of βt and γtjk can be written in terms of the effect of the QTL. According to the method of analysis of variance under an unbalanced two-way nested design as described in Searle (1987, p 74), we can work out the following statistics:

The expected mean square between

which has m−1 degrees of freedom and its significance indicates the presence of dominance × dominance and additive × dominance epistasis (Jinks and Perkins, 1970), The expected sum of squares of

which has m degrees of freedom and its significance infers the presence of all epistatic effects (Jinks and Perkins, 1970), and the expected mean square within full-sib families

which has 3m(n−1) degrees of freedom and is used to test for significance of the above two expected statistics. In equations (3), (4) and (5), Tj (j=1, 2, 3) is the coefficient of orthogonal contrast and takes values T1=T2=1 and T3=−2 (Snedecor and Cochran, 1989, p 257). It must be noted that covariances in the number of progeny with any given QTL genotype between different marker genotype classes are equal to zero and this holds throughout the paper.

A model of TTC in the case of finite loci with marker information

We incorporate marker information into the above TTC analysis by considering two scenarios – one- or two-marker loci. Firstly, we consider a marker locus linked to the two QTL.

We consider a marker locus lying between two linked QTL with genotypes MM, Mm, mm observed for each F2 parent. The genotypes of the parental lines P1, P2 and their offspring F1 are denoted as AAMMBB, aammbb and AaMmBb, respectively. Let r12 be the recombination fraction between loci A and M, and r23 between M and B. Without interference, the recombination fraction between A and B is given by r13=r12(1−r23)+r23(1−r12). The m families of L1, L2 and L3 populations can be divided into three categories according to marker genotype of F2 parent. The number of individuals with each of the marker genotypes, ms (s=1, 2, 3), follows a trinomial distribution. Under the QTL effect model given by equation (1), frequencies of F2 QTL genotypes given marker genotypes are described in Table 1.

Table 1 Genotypic frequencies at two linked loci given genotypes at the marker locus in an F2 population

It can be readily shown that the effect of the sth marker genotype αs of

(s=1, 2, 3) is

where βt describes the progeny means of the tth family group (t=1, …, 10) of within the sth marker genotype, fst describes the conditional probabilities of the tth family group given sth marker genotype shown in Table 1. We can work out

When there are two molecular loci linked to the two QTL, we consider one of all possible orders of marker-QTL loci, M1AM2B, to demonstrate the following analysis. Let rij be the recombination fraction between the ith and jth loci and assume there is no recombination interference. Under the two-marker model, we can work out the effect of the ith maker genotype αs (s=1, …, 9)

Equations (7) and (8) show that the marker-associated quantitative genetic effects are entirely determined by epistatic effects and linkage parameters and that significant variation between the marker effects is an indicator of the presence of epistasis. It should be noted that r24 represents recombination between two QTL. When r24=0, the model degenerates to a single QTL.

Let fs be the frequency of the sth marker genotype, fst the conditional probability of the tth family group given the sth marker genotype of and fstjk the conditional probability of the kth QTL genotype (k=1, 2, …, 10) within the jth full-sib family (j=1, 2, 3) from the tth F2 parent within the sth marker genotype. The expected mean squares must be calculated following an analysis of variance under an unbalanced linear model such as that described in Searle (1987, p 73).

The expected mean square between-marker genotypes is

and the expected mean square within-marker genotypes is

where G is the number of marker genotypes (G=3 and 9 for the one- and two-marker model, respectively), ωst is the effect of the tth family group within the sth marker genotype and γstjk is the effect of the kth QTL genotype (k=1, 2, …, 10) within the jth full-sib family (j=1, 2, 3) from the tth F2 parent within the sth marker genotype as described before.

Power prediction

The above analysis demonstrates that the significance of the epistatic variance can be evaluated by testing for significance of the expected mean square between family (EMSβ) against that within full-sib family (EMSw) with or without use of marker information.

As both the between- and the within-marker genotype mean squares follows a noncentral χ2 distribution with degree of freedom predefined, the F statistic for significance test of the between-marker variances given by equation (10) follows a doubly noncentral F distribution. The power of the F test statistic can be calculated from the probability as follows:

where F(ν1, ν2, λ, δ) represents a doubly noncentral F variable with degrees of freedom ν1, ν2 and the noncentral parameters λ and δ for the numerator and denominator mean squares, respectively (Bulgren, 1971).

Calculation of noncentral parameters

The distribution parameters can be determined by following Johnson et al. (1995, vol 2, p 131) for the situations with or without incorporating marker information.

When no marker information is involved in the analysis, the noncentral parameter of the numerator of the F-statistic is given by

which has m degrees of freedom, when significance of the expected sum squares given by equation (4) is tested.

which has m−1 degree of freedom, when significance of the expected mean squares given by equation (3) is tested, where

and

and the noncentral parameter of denominator is

Under the marker-QTL model, the noncentral parameter of the numerator statistic is

where

and

The noncentral parameter of the denominator statistic is

where

and

The power of function (11) can be evaluated using the cumulative distribution of the doubly noncentral F distribution, which was approximated by an infinite Poisson-weighted series of multiple of incomplete beta function (Bulgren, 1971).

It is noted that F(ν1, ν2, λ′, δ′, 1−α) stands for the upper α-point (α=0.05) of the doubly noncentral F distribution with the same degree of freedom but the noncentral parameters of numerator λ′ and denominator δ′ are calculated under the null hypotheses in equations (12), (13), (14), (15) and (16). For any given parameters of the doubly noncentral F distribution ν1, ν2, λ′, δ′ and a given significance level α, the threshold Fc was calculated from numerically solving equation In the present studies, α was set to be 5%.

In the above analysis, we presented the formulation for only one of all possible combinations of the marker-QTL linkage orders. Following the same principles, we formulated the analysis for other combinations but do not present them here to simplify the paper.

However, when two markers are considered, m families of L1, L2 and L3 populations are divided into nine categories. One of the complications is that the two genetic markers are so close to each other that some expected frequency of some marker genotypes is too small to be observed in practice. In the simulated analysis, the actual observed marker genotypes may be used for analysis of variance. When the expected number of genotypes is less than two in the theoretical analysis, we classify them as missing data. In this situation, some statistics, such as the degree of freedom and the expected sum of squares between- and within-marker genotypes, must be correspondingly adjusted and approximated in both theoretical and simulated analyses.

Simulation study and numerical analysis

Simulation study

To validate the analytical predictions aforementioned, we carried out simulations that mimic a mating experiment with TTC design (100 (families) × 20 (progenies per family)). In our simulations, we varied genetic parameters and sample sizes. In particular, genetic crossover events between genes at linked marker loci and/or QTL were simulated according to the random walk ‘algorithm’ described elsewhere (Luo and Kearsey, 1992) and recombination interference was ignored. Each individual phenotype was generated as its genotypic value from equation (1) plus a random number sampled from a standard normal distribution. The simulation for each of simulated parameter configurations was repeated 1000 times. For each set of simulation parameters, the simulation was also carried out under the null hypothesis (iaa=iad=ida=idd=0). The 95 percentile of the F values derived from 1000 simulations under the null hypothesis was used as 5% threshold to test for significance of the corresponding alternative hypothesis. The proportion of the significant tests in 1000 simulations was defined as the empirical power, which was used to compare with the theoretical prediction.

Results

Tabulated in Tables 2, 3 and 4 are the expected sum of squares between families, the expected mean square within full-sib families and F statistic, together with their corresponding standard deviations, over 1000 replicates of simulations and those predicted from calculations based on the theoretical analyses developed in the present study. The theoretical predictions are in good agreement with the simulated observations, validating the theoretical model presented here. In Tables 2, 3 and 4, simulated observations of the powers and the thresholds of statistically testing for epistasis are shown together with the theoretical predictions for all the simulated populations. The theoretical calculations of the power provided adequate predictions to the corresponding simulated values.

Table 2 Comparison of powers for detecting epistatic components with a1=a2=0.5, d1=d2=0.25 and the genetic distance between the QTL of 45 cM
Table 3 The effect of linkage and linkage phases on epistatic detection with a1=a2=0.5, d1=d2=0.25 and iaa=iad=ida=idd=0.5
Table 4 Comparison of powers of additive effects and dominance effects for detecting epistasis with iaa=iad=ida=idd=0.5 and the genetic distance of 30 cM between the QTL

Table 2 shows that some kinds of epistasis are more likely to be detected than others. TTC design indicated higher power for testing additive-by-additive (i11) variance component than that for detecting additive-by-dominance (i12 or i21) variance, whereas the design showed lowest power for detecting dominance-by-dominance (i22) variance component, conditional on the same genetical background. Table 3 shows that there is a trend toward increase in the power as the genetical distance between the two QTL increases, which implies that epistasis may be more difficult to be detected when the QTL are tightly linked. The test tends to be more efficient when the QTL are linked in coupling than that in repulsion, which was verified by Pooni and Jinks (1976) in a theoretical study of the same design. The effects of heritability and dominance ratio on epistatic detection are summarized in Table 4. The decline of heritability (narrow sense heritability) and dominance increases the power for detecting epistasis and the power reaches the highest value when the additive and dominance effects approach zero on the condition that the epistatic effects are constant.

Having demonstrated the superiority of the theoretical predictions without marker information, we now implement the theoretical analyses when marker information(s) exists. In the same genetic background (i.e., we set a1=a2=0.5, d1=d2=0.25 and iaa=iad=ida=idd=0.5 in all the simulated populations), 10 populations were simulated for 10 different sets of parameters as summarized in Table 5. To allow fair comparisons, the genetic distance between two QTL was set to 45 cM in all simulations.

Table 5 Numerical results of analysis of variance

The TTC has greater power to detect epistasis when marker information exists. Comparison among populations 1, 7, and 10 shows that the ranking of epistatic detection is AM1M2B>AM1B>AB, where loci A and B are QTL and M1 and M2 are marker loci, and the power is determined by the marker that is closest to the QTL when two markers locate at the same side of QTL (comparison between populations 6 and 9). The relative position of QTL and markers may affect the power of epistatic detection. There is a trend toward decrease in the power of the epistatic detection as the number of marker loci between the two QTL decreases (comparison among populations 1, 3, 5 and 6; see also comparison between populations 7 and 9).

Comparing the power of epistatic detection with one-marker, two-marker and without marker information, it will be seen from Table 5 that the standard deviations of estimated expected mean square with one marker indicated higher value than that with two markers, whereas the corresponding value without marker information showed the lowest value.

Using marker information, tabulated in Table 5 are expected mean squares of between-marker genotype, within-marker genotype and the F ratio estimated from simulation, together with their corresponding standard errors predicted from theoretical calculations, as well as the observed powers and their corresponding theoretical predictions. It can be seen from Table 5 that theoretical predictions of the powers using equation (11) also provide an adequate approximation for the simulated values in all 10 populations.

Discussion

The TTC, which was originally proposed by Kearsey and Jinks (1968), provides not only a direct test for epistatic variance component but also unbiased estimates of additive and dominant components whenever epistasis among polygenes is absent. It has been shown that it is the most advanced design so far to investigate the genetic architecture of both experimental and natural populations (Kearsey and Jinks, 1968). The analysis of designs with or without marker information, presented here, provides useful predictions of statistical power. Our comments are confined to two-locus epistatic effects but these modifications indicate the general issues.

To calculate the power we need the distribution of the test statistics under both the null and the alternative hypothesis. Under the assumption of the fixed model, as both between- and within-marker genotype mean squares follow noncentral χ2 distribution, the F statistic for significance test of the between-marker variances (or between families) given follows a doubly noncentral F distribution under both alternative and null hypothesis. The power for a given degree of freedom, significance level and the noncentral parameters of numerator and denominator have been calculated (Bulgren, 1971; Johnson et al., 1995). Derivations in the present paper have shown that the power for detecting epistasis can be expressed as a function of design parameters and parameters describing genetic properties of the marker and QTL. The powers from theoretical evaluation agree very well with those from stochastic simulation under a wide range of situations, suggesting reliability of the theoretical analysis.

Assuming no interference (in recombination), we demonstrate that there is an interaction between epistasis and linkage information that is responsible in part for the improvement in sensitivity of detection under the finite locus model, which is very helpful in understanding where the increase in power comes from using markers.

However, in real experimental organisms, genetic interference will affect crossovers, as has been well known since Haldane (1919). Interference will have the effect of decreasing the apparent genetic distances between the loci along the linkage group (McPeek and Speed, 1995), which will result in a decrease in the power for QTL detection (Xu et al., 2005). The simulation experiments in Table 3 show that there is a trend in the decrease of the power of the epistatic detection as the genetic distance between two QTL deceases. In view of its importance, the problem of interference deserves further investigation.

Wade (2002) identified additive-by-additive epistasis as one of the most important kinds of epistasis because it contributed most heavily to the generation of new additive variance. Table 2 shows that the TTC design indicated higher power for testing additive-by-additive (iaa) variance component than that for detecting additive-by-dominance (iad or ida) and dominance-by-dominance (idd) variance component, conditional on the same genetical background. Because additive-by-additive epistasis are relatively easier to detect than other types of epistasis, it seems that more examples of additive-by-additive epistasis are available from QTL studies, for instance, in the mouse, additive-by-additive epistasis has been shown to characterize genes affecting lung tumors (Fijneman et al., 1996).

Pooni and Jinks (1976) demonstrated the greater power of the TTC for detecting epistasis, compared to other alternatives without marker information. Our investigation, presented here, differs from that of Pooni and Jinks in several aspects. Firstly, we develop a quantitative genetics model and method for detecting epistasis by making use of the TTC experiments with marker information and exploring the statistical power of an experimental design with or without incorporating marker information. Secondly, Pooni and Jinks (1976) focus on exploring statistical power for detecting complementary and duplicate epistasis under the infinitesimal model, whereas the statistical power for detecting arbitrary types of epistasis of the mating design was assessed under a finite locus model. In addition, it could be interesting to compare the statistical power for detecting epistasis to that of Pooni and Jinks’s. Table 4 shows that the decline in heritability (narrow heritability) and dominance results in an increase in the power for detecting epistasis. In contrast, Pooni and Jinks found that there was a trend toward an increase in the power as the heritability and dominance ratio increase. The reason for this contradiction is that the epistatic effects are expressed as a linear function of additive and dominance effects in Pooni and Jinks (1976), for example, , and . It is obvious that the power will be improved in this formulation, because the epistatic effects will increase as heritability and dominance ratio increase. Their linear function (above) is hard to justify; consequently, their results may be misleading.

The TTC may have increased power for epistatic detection when marker information exists. More importantly, the availability of the molecular markers offers the opportunity for detecting pairwise interactions between QTL. The simulation experiments show that there is a trend toward decrease in the power for detecting epistasis as the number of marker loci between the two QTL decreases. Therefore, one of the optimal choices for increasing the power of epistatic detection is to explore more molecular markers between two linked epistatic QTL.

Although prediction of power in more general models, with recombination interference, multiple marker alleles, multiple markers, natural population, etc., requires tedious algebra, it is relatively simple to implement the analysis of variance with both real and simulated data. Hence analysis of variance provides a useful tool to enable quick screening of the genetic architecture of a population preliminary to the use of computationally demanding methods such as maximum likelihood or a Bayesian approach. The maximum likelihood or Bayesian approach may, however, provide more power as well as a better framework for the estimation of epistatic effects, which has been well developed in QTL mapping (Kao et al., 1999; Carlborg and Haley, 2004; Yi et al., 2005).