Introduction

In inbred organisms such as rice and laboratory mice, F2 families derived from crossing two lines are suitable experimental populations for mapping QTL expressed differently between two lines. F2 families have also been constructed for detecting QTL in outbred organisms, such as livestock and trees, starting with crosses between individuals sampled from each of two genetically diverged populations.

Interval mapping methods (Lander and Botstein, 1989; Haley et al, 1994) have been successfully applied to QTL analyses of F2 families in outbred organisms as well as inbred organisms. In pigs, many analyses with interval mapping have been performed for F2 families derived from a cross between two breeds and successful detection of several QTL affecting economically important traits have been reported (eg Andersson et al, 1994; Walling et al, 2000; Bidanel et al, 2001). In fruit trees or forest trees, F2 populations originated from a cross between two selected individuals have also been utilized for interval mapping and QTL of useful traits such as fruit quality or adaptive properties were detected (Dirlewanger et al, 1999; Frewen et al, 2000).

The model of interval mapping analysis usually applied to outbred F2 families assumes that founder individuals (grandfathers and grandmothers in the F0 generation) of the families were fixed for different alleles at QTL. However, a high degree of heterozygosity is usually observed within a population of outbred organisms and many QTL are considered to be segregating in a population. There is reduced efficiency in detecting QTL at which the F0 grandparents are heterozygous. These particular QTL may, therefore, remain undetected.

Pérez-Enciso and Varona (2000) proposed a method allowing the detection of differences between means of QTL effects in two outbred populations as well as the variation of QTL effects within each of populations. Thus, not only QTL segregating between two outbred populations but also QTL segregating within each of populations could be detected by their method. However, since estimation of QTL variance within each of founder populations is required in their mixed model, their method is not appropriate for F2 families derived from a small number of F0 founders (particularly designs with only two founders).

Recently, Bayesian methods based on the Markov chain Monte Carlo (MCMC) algorithm have been developed for QTL analyses in outbtred pedigrees as well as inbred ones (Satagopan et al, 1996; Uimari and Hoeschele, 1997; Sillanpää and Arjas, 1998, 1999; Bink and Van Arendonk, 1999; Uimari and Sillanpää, 2001; Yi and Xu, 2001, 2002; Bink et al, 2002; Jannink and Wu, 2003). Utilizing reversible jump MCMC (RJ-MCMC) technique (Green, 1995), it becomes possible to estimate the number of QTL involved in a quantitative trait as well as the locations and effects of QTL. Both homozygous QTL and heterozygous QTL in founder individuals can be detected. For outbred F2 families, Sillanpää and Arjas (1999) devised a Bayesian estimation of QTL using an RJ-MCMC algorithm. Their method assumed that the number of possible QTL genotypes segregating in an F2 family was known and incorporated the parameters of effects for each QTL genotypes in the model. It is, however, usually difficult to infer the number of QTL genotypes in advance of analysis. Another Bayesian method applicable for outbred F2 population was developed by Yi and Xu (2001). Their model assumed that all of the alleles that the F0 grandparents carried were different from each other, and parameterized allelic effects for all of the founder alleles and interaction (dominance) effects between two alleles. Accordingly, the number of QTL genotypes was predetermined and a large number of parameters were included in their model. The pedigree-analysis method proposed by Uimari and Sillanpää (2001) and Bink et al (2002) is also applicable to the analysis of outbred F2 families. In their method, however, only the case of biallelic QTL was treated.

When some of F0 grandparents are homozygous at QTL, a model with fewer parameters is more appropriate for explaining the observed F2 phenotypes A more suitable model would take into account whether each of the F0 grandparents was homozygous at each QTL. The inference about QTL genotype for each of the F0 founders would also provide useful information for breeding programs based on these individuals.

In this paper, we propose a new Bayesian method implementing these innovations, which gives a posterior distribution for the QTL states in each F0 grandparent. We confine ourselves to the discrimination of two QTL states, homozygous or heterozygous, for each of the F0 grandparents. Accordingly, the analysis does not take into account whether common alleles are shared by the F0 grandparents. We parameterize allelic effects and dominance effects for each QTL. The total number of parameters states, thus, RJ-MCMC technique therefore depends on the F0 QTL genotypes. Consequently, RJ-MCMC is used for transition between the models of different dimensions.

Some Bayesian methods incorporate the estimation of marker haplotypes into the MCMC procedure (Sillanpää and Arjas, 1999; Bink and Van Arendonk, 1999; Yi and Xu, 2001). However, the step is very time-consuming and marker haplotypes remain ambiguous in the case of low marker informativeness in founder individuals. In the proposed method, estimation of the marker haplotypes is, therefore, carried out separately from the MCMC procedure. The QTL genotypes for individuals were sampled from the conditional distributions, given the estimated haplotypes.

A Bayesian method allowing the estimation of the number of alleles at QTL present in the experimental families was recently developed by Jannink and Wu (2003). Their method was applied to the analyses of multiple families derived from crosses among inbred lines. The number of alleles segregating among inbred lines was inferred assuming that all of inbred lines were homozygous at QTL. Their model is not, therefore, applicable to the design we consider here: outbred F2 families, in which F0 grandparents may have been heterozygous at QTL. In the following, we focus mainly on a three-generation family that consists of F2 individuals, F1 parents and two outbred F0 grandparents. We assume that phenotypic values of a trait for all of F2 individuals are available and, in addition, marker genotypes for all of individuals in the family.

Materials and methods

Analyzed family

We assume that a male and a female chosen from different populations (breeds or lines) have been crossed to produce m F1 progeny and n F2 progeny. Such data are often available for linkage and QTL study in forest trees (Bradshaw Jr and Stettler, 1995; Wu, 1998; Dirlewanger et al, 1999; Costa et al, 2000; Bliss et al, 2002). In the QTL analyses of pigs, a three-generation family is often constructed, however, multiple males and females are used as founders. The proposed method can be extended to the F2 families from more than two F0 grandparents.

Statistical model

We consider the ordered genotypes at QTL for each individual by discriminating its paternal allele and maternal allele even when an individual is homozygous. The QTL genotypes of F1 and F2 individuals are indicated in terms of the founder origin of alleles F1 and F2 individuals carry. There are, thus, four origins of alleles, denoted by 1, 2, 3 and 4 corresponding to each of four alleles of two F0 grandparents. The origins 1 and 2 indicate paternal allele and maternal allele of a grandfather and 3 and 4 indicate paternal allele and maternal allele of a grandmother, respectively. When no ancestral information for the F0 grandparents is available, which is typical, origins (1,2) and (3,4) are arbitrarily allocated, one to each grandparent.

We assume that the number of QTL affecting the trait is NQ. For the qth QTL allele of origin j (j=1, 2, 3, 4) in the F2 population, the allelic effect is denoted by αqj, and the interaction effect (dominance effect) between a pair of alleles of origin j and k by δqjk (j<k). For identifiability of allelic effects, the restriction, ∑j=14αqj=0, is imposed for each QTL.

Let y represent an n × 1 vector indicating the observed phenotypic values of a quantitative trait for n individuals of the F2 population. We can apply a linear model for y, in which QTL on all chromosomes are considered simultaneously, as follows:

where X is a known design matrix for a vector of nongenetic effect b (including the overall mean), gq (q=1, 2,…, NQ) is an n × 1 vector for contribution of the qth QTL in F2 individuals and e is the vector of residual (environmental) effect following an n-variate normal distribution with mean vector 0 and covariance matrix σe2In (In is the n × n identity matrix). The ith element of gq, gqi, indicates the genetic contribution of the qth QTL for the ith F2 individual. When the grandparental origins of two alleles at the qth QTL are j and k (j,k=1, 2, 3, 4), gqi is expressed as

One of the purposes of the proposed method is to discriminate whether the F0 grandparents are homozygous or heterozygous at QTL, which we will refer to as the ‘QTL state’ in a grandparent. There are four possible patterns for the combination of QTL states in a grandfather and a grandmother; (i) the two grandparents are homozygous for alternative alleles (Q1Q1 × Q3Q3), (ii) the grandfather is heterozygous and the grandmother is homozygous (Q1Q2 × Q3Q3), (iii) the grandfather is homozygous and the grandmother is heterozygous (Q1Q1 × Q3Q4) and (iv) the two grandparents are heterozygous (Q1Q2 × Q3Q4). In the proposed method, it does not matter whether two grandparents share common alleles or not. Thus, a mating type such as Qq × qq (backcross type) is classified into pattern (ii) and a type such as Qq × Qq (F2 intercross type) is classified into pattern (iv). The combination of QTL states at the qth QTL in grandparents are denoted by Sq, which takes values one of 1, 2, 3 and 4 corresponding to (i), (ii), (iii) and (iv) above.

We obtain αq1=αq2 for a homozygous grandfather and αq3=αq4 for a homozygous grandmother at the qth QTL. Allelic effects, αqj, and interactions, δqjk, are reparameterized according to each of the four QTL states as shown in Table 1, where subscript ‘q’ is omitted. As shown in Table 1, for Sq=1, allelic effects are expressed as αq1=αq2=αq1 and αq3=αq4=αq3(=−αq1) and interactions are written as δq13=δq14=δq23=δq24=δq13 and δq12=δq34=0. Moreover, replacing αq1 and dq13 by a/2 and d, respectively, the genetic contributions of the QTL are written as a, d and −a for genotypes Q1Q1, Q1Q3 and Q3Q3, respectively, which is a conventional notation of QTL effects for inbred F2 populations. Taking the restriction on allelic effects into account, the numbers of independent parameters related to the effects of a QTL are 2, 5, 5 and 9 for Sq=1, 2, 3 and 4, respectively, as shown in Table 1. The dimension of a model (1) is changed depending on the QTL states in the grandparents.

Table 1 Correspondence between effects of alleles and interactions indicated by grandparental haplotype origin and actual model parameters for each of QTL states in F0 grandparents

The effects of the qth QTL are expressed by a vector aq, the components of which are independent parameters for allelic and interaction effects at the qth QTL and specified corresponding to Sq. We obtain aq=(aq1,dq13), (aq1,aq2,dq12,dq13,dq23), (aq1,aq3,dq13,dq14,dq34) or (aq1,aq2,aq3, dq12,dq13,dq14,dq23,dq24,dq34) for Sq=1, 2, 3 and 4, respectively, from Table 1. We write genetic effects of all QTL as a = ( a 1 , a 2 , . . . , a N Q ) and the QTL states in all QTL from the two F0 grandparents as S = ( S 1 , S 2 , . . . , S N Q ) .

Let Gqi denote the ordered genotype at the qth QTL for the ith F2 individual, which is expressed by the grandparental origins of two alleles the ith F2 individual carries at the qth QTL. For example, Gqi=12 means that origins of the paternal allele and maternal allele the ith F2 individual carries at the qth QTL are 1 and 2, respectively. Genotypes at the qth QTL for all of n individuals are expressed as Gq=(Gq1,Gq2,…,Gqn) and genotypes at all of QTL are as G = ( G 1 , G 2 , . . . , G N Q ) . Given the number of QTL, NQ, QTL states of the two grandparents, S, and QTL genotypes, G, the distribution of y, p(y∣NQ, S, G, a, b, σe2), is written as a normal distribution from equation (1) as follows:

where gq=(gq1, gq2,…,gqn)′ is a vector of genetic effects at the qth QTL for F2 individuals, obtained from (2) and Table 1 depending on the value of S.

Sampling of QTL genotype

Estimation of marker haplotype

In the proposed method, marker haplotypes (linkage phase among alleles at linked markers) are estimated for all of individuals in the three-generation family in advance of the MCMC cycles (ie, it is excluded from MCMC cycles). Marker haplotypes are estimated following the method of Knott et al (1996), where linkage phases of markers are determined for parental individuals so that the recombination events between adjacent markers on the gametes transmitted to their offspring are minimized. For individuals of each generation, marker haplotypes are determined generation by generation based on the observed marker genotypes. Firstly, marker haplotypes of F0 grandparents are determined by the observed marker genotypes of their F1 progeny. Secondly, the marker haplotype of each F1 individual is determined by the observed marker genotypes of its F2 progeny and confirmed by the haplotypes of its F0 parents. Lastly, marker haplotypes of F2 individuals are determined by the haplotypes of their F1 parents.

In the method of Knott et al (1996), marker haplotypes of sires were determined, based on the observed marker genotypes of half-sib progeny, where markers at which sires were homozygous were uninformative and omitted from consideration. In a three-generation family, even markers at which F0 grandparents are homozygous are informative for detection of QTL and included in consideration of marker haplotypes unless an F0 grandfather and an F0 grandmother are homozygous with a same allele at the markers.

For sampling QTL genotypes, the reconstructed marker haplotype of each individual is translated into a combination of ordered genotypes expressed in terms of parental allelic origins at markers included in the haplotype. Consider, for example, three linked markers A, B and C, in which it is assumed that an F0 male and an F0 female have marker haplotypes A1B1C1/A2B2C2 and A3B3C3/A4B4C4, respectively, and their F1 offspring has marker haplotypes A1B1C2/A3B4C4. Then, for the F1 individual, the combination of ordered genotypes with parental allelic origins at A–B–C is written as 13-14-24, where allelic origins 1 and 2 (3 and 4) indicate paternal allele and maternal allele of an F0 father (mother). Hereafter, ‘haplotype’ means the combination of ordered genotypes expressed with the parental allelic origins at markers concerned as described in the above example.

Ordered genotypes of F1 progeny at the markers homozygous in either of F0 grandparents cannot be uniquely expressed in terms of parental allelic origins. Consider a marker B, at which an F0 grandfather has ordered genotype B1B1, an F0 grandmother has ordered genotype B3B4 and their F1 offspring has ordered genotype B1B3. Then, the ordered genotype of the F1 individual could be expressed in terms of parental allelic origins as 13 or 23, since an allele B1 has two possible parental origins, 1 and 2. Accordingly, when including markers homozygous in F0 grandparents in haplotype, F1 individuals have more than one possible haplotype. Consider, for example, a haplotype of three markers, A, B and C, which are located on a chromosome in this order with recombination values rAB between A and B and rBC between B and C, respectively. When the genotypes of an F0 grandfather and an F0 grandmother are A1B1C1/A2B1C2 and A3B3C3/A4B4C4, respectively, and their F1 offspring has genotype A1B1C1/A3B3C3, the two possible haplotypes of the F1 individual, which can be denoted by a combination of ordered genotypes at A–B–C, are expressed as 13-13-13 and 13-23-13. In this case, the ratio of the probabilities for the occurrences of two possible haplotypes is (1−rAB)(1−rBC)/rABrBC.

The estimated haplotypes for all individuals in the family are denoted by H. For the individuals whose marker haplotypes are not uniquely determined, as in the above example, all of possible haplotypes of the individuals are included in H with the probabilities of occurrence of each haplotype.

Probability distribution for sampling QTL genotype

Let λq denote the location of the qth QTL and λ = (λ 1 , λ 2 , . . . , λ N Q ) denote the locations of all QTL. For updating the QTL genotype G in the MCMC procedure for F2 individuals, probability distributions of QTL genotypes of F1 individuals and F2 individuals given H are required in order to sample a value of G. For the F1 individuals, the ordered genotypes expressed in terms of founder alleles of the qth QTL are expressed as Fq=(Fq1,Fq2,…,Fqm), and genotypes at all of QTL are F = (F 1 , F 2 , . . . , F N Q ) . Given H, the joint distribution of QTL genotypes of F1 and F2 individuals, p(G,F∣H,λ), has the form

where Fqi1 and Fqi2 are the genotypes at the qth QTL for the F1 mother and father of the ith F2 individual. In equation (4), we can write , where p(Fqj∣H,λ) is obtained in terms of the recombination values among QTL and markers linked to QTL.

In the proposed method, we calculate p(Fqj∣H,λ) following the method of Haley et al (1994). Consider a simple example as follows. There are two F0 individuals and an F1 offspring and two linked markers are denoted by A and B. The haplotypes of A and B are written as 12-12 and 34-34 for the two F0 parents and we assume that the genotype of an F1 progeny is (in terms of parental allelic origins) 13-14. Suppose that a QTL (denoted by Q) is located between A and B with recombination values rAQ and rQB for intervals A–Q and Q–B. Ordered genotypes at A–Q–B of two F0 parents are written as 12-12-12 and 34-34-34 in terms of parental allelic origins. There are four possible QTL genotypes expressed by founder allelic origins, that is, 13, 14, 23 and 24, for the F1 individual, thus, possible ordered genotypes of A–Q–B are 13-13-14, 13-14-14, 13-23-14 and 13-24-14. Denoting the probability that the QTL genotype of the F1 individual is ij by pij, we obtain, using the method of Haley et al (1994), p13=K(1−rAQ)2rQB(1−rQB), p14=KrAQ(1−rAQ)(1−rQB)2, p23=KrAQ(1−rAQ)rQB2 and p24=KrAQ2rQB(1−rQB), where K is a constant.

In this example, the probability of the QTL genotype for the F1 individual being ij, p(Fqj=kl∣H,λ) can be expressed as

where means summation over all possible QTL genotypes.

When more than one haplotype is possible for an F1 individual, the above formula for p(Fqj=kl∣H,λ) should be modified accordingly. For the case of two possible haplotypes H1 and H2, p(Fqj=kl∣H,λ) can be written as

where P(Hi) is the probability of occurrence of ith haplotype and p(i)kl is the probability that the QTL genotype is kl given haplotype Hi (i=1,2).

The probability of Gqi given Fqi1 and Fqi2, p(Gqi∣Fqi1, Fqi2,H,λ), is obtained in a similar way. Let the paternal allele and maternal allele of an F1 father (mother) of an F2 individual be denoted by 1 and 2 (3 and 4), respectively. The probability that an F2 individual receives alleles of k and l from its F1 parents given H is calculated in the same way as p(Fqj=kl∣H,λ) (kl=13,14,23,24). Considering the F0 grandparental origin of the QTL alleles carried by each of F1 parents, the grandparental origin of the QTL alleles of the F2 individual is obtained. For example, when QTL genotypes of two F1 parents are expressed as Fqi1=23 and Fqi2=14 and two QTL alleles of an F2 progeny are expressed in terms of F1 parental origin as 2 and 4, QTL alleles of the F2 individual can be related to F0 grandparental origins as Gqi=34. When updating the QTL genotype of each F1 and F2 individual in the MCMC procedure, a QTL genotype is sampled from prior probability distribution (Equation (4)).

Statistical procedure for QTL estimation

Prior and posterior distribution for the model parameters and QTL genotype

The model parameters and QTL genotypes to be inferred via MCMC algorithm are written as θ=(NQ, a, λ, S, G, F, b, σe2). As described below, marker haplotype, H, is estimated in advance of MCMC cycles, thus, H is treated as known variable and is not updated in the MCMC cycles. The joint prior density function for θ can be calculated as the product of the prior density for each of the components

The prior distribution of the number of QTL, p(NQ), is assumed to be a truncated Poisson distribution with mean μ and a predetermined maximum number NQmax. The formula (4) gives the prior density for G and F, p(G,F∣H,λ). Other priors are assumed to be uniform distributions over the possible values of the parameters. The joint posterior density function is written as

MCMC sampling

The model parameters and QTL genotypes θ=(NQ, a, λ, S, G, F, b, σe2) are estimated using MCMC algorithm. The MCMC algorithm consists of the following steps:

  • updating the effects at each QTL

  • updating the fixed effects b and residual variance σe2;

  • updating the QTL locations λ = (λ 1 , λ 2 , . . . , λ N Q ) ;

  • updating the QTL genotypes F and G for each of F1 and F2 individuals;

  • updating the QTL states in grandparents S = ( S 1 , S 2 , . . . , S N Q ) and changing the parameterization of model (1) accordingly;

  • adding one new QTL to the model or removing one existing QTL from the model.

The fixed effects b and residual variance σe2 were updated (step b) using Gibbs sampling (Geman and Geman, 1984). The full conditional distributions for the elements of b, general mean and sex effect were normal distributions and that for σe2 was an inverse χ2 distribution with degrees of freedom equal to n−2. Updating of other parameters was carried out using Metropolis–Hastings sampling (Metropolis et al, 1953; Hastings, 1970). Neither the estimation of marker haplotypes nor handling of missing genotype data is included in MCMC cycles.

Updating QTL effects

The QTL effects a are updated locus by locus and component by component. To update each of allelic effects and interactions, a random variable u is simulated independently from a symmetric uniform distribution around zero. The new proposal value is obtained as the sum of the existing value and u. Denoting the new proposal value of αqj by αqj*, we take αqj*=αqj+u. The new proposal value of a, denoted by a* one component of which is replaced by the new value, is accepted with probability

Updating QTL locations

As seen in the method of Sillanpää and Arjas (1998, 1999) and Yi and Xu (2001), we do not fix the order of QTL when updating the QTL locations. The locations of QTL are modified for one QTL at a time. For the qth QTL, a new proposal value λq* is sampled from a symmetric uniform distribution centered at the previous value λq. Accordingly, a new genotype at the qth QTL is proposed for each of F1 and F2 individuals with probability q(Gq*, Fq*). The new value λq* and (Gq*, Fq*) are accepted jointly with probability

where λ*, G* and F* are proposal values of λ, G and F with the elements corresponding to the qth QTL being replaced by new values. We assumed that proposal probability q(Gq, Fq) is the conditional probability of the QTL genotypes given genotypes of markers linked to the QTL and the recombination values among the loci and equivalent to p(G, F∣H, λ). Therefore, the acceptance probability (8) is simplified to

Updating QTL genotype of F1 and F2 individuals

In this step, QTL genotypes G and F are updated QTL by QTL and individual by individual without the change of the QTL locations. A new genotype at the qth QTL of the jth F1 individual Fqj* is sampled from p(Fqj∣H, λ) based on the λq and recombination values between the QTL and linked markers. In the meantime, the QTL genotypes for F2 offspring of the jth F1 individual are also modified accordingly. Assuming that the jth F1 individual is mated with the kth F1 individual to produce the ith F2 individual, a new QTL genotype is sampled from p(Gqi∣Fqj*, Fqk, H, λ). Using the similar argument as used in derivation of equation (9), the probability of accepting the new genotypes at the qth QTL for the jth F1 individual and its F2 progenies is simplified as

where G* is G with the genotypes of the qth QTL for F2 progeny of the jth F1 parent replaced by the proposal values.

After update of the QTL genotypes for each of F1 individuals and its F2 progeny, each of the F2 individuals are updated for the QTL genotypes individual by individual. A proposal value for the QTL genotype of the ith F2 individual Gqi* is sampled from the distribution p(Gqi∣Fqi1,Fqi2,H, λ) and accepted with probability

where G* contains all elements of G except Gqi replaced by Gqi*.

Updating the QTL states in F0 grandparents

The QTL state in grandparents S is updated by QTL using an RJ-MCMC algorithm. For the qth QTL states, Sq, a new value Sq* is proposed with a probability q(Sq*∣Sq). According to Sq*, the parameter for QTL effects aq is also modified to aq*. In updating the QTL states, the proposal QTL states are restricted such that the only one QTL allele is changed in the F0 grandparents. Thus, the possible moves between the previous QTL state and the new QTL state are as follows; (i) Sq*=1, 2 or 3 for Sq=1, (ii) Sq*=1, 2 or 4 for Sq=2, (iii) Sq*=1, 3 or 4 for Sq=3, (iv) Sq*=2, 3 or 4 for Sq=4. When the dimension of aq* is greater than aq, aq* is obtained via a mapping function, aq*=f(aq,u), where u is a random variable used for dimension matching. When the dimension of aq* is less than aq, a mapping function (aq*,u)=g(aq) is adopted, where g is an inverse of f. As an example of transformation aq*=f(aq,u), with its Jacobian ∣∂aq*/∂(aq,u)∣=1, consider the case of Sq=1 (Q1Q1 × Q3Q3) and Sq*=2 (Q1Q2 × Q3Q3) as follows. For Sq=1 and Sq*=2, the independent components of aq and aq* are written as aq=(a1,d13) and aq*=(a1*,a2*,d12*,d13*,d23*), respectively. Considering the variable for dimension matching u=(u1,u2,u3), a transformation aq*=f(aq,u) can be expressed as

and

We can easily show that the Jacobian ∣∂aq*/∂(aq,u)∣=1. For other combinations of Sq and Sq*, a transformation between aq and aq* is obtained in a similar way.

The proposal values Sq* and aq* are accepted with probability

where S* and a* differs from S and a in that elements Sq and aq are replaced by the proposal values Sq* and aq*, respectively.

Updating QTL number

The QTL number NQ is updated by adding one new QTL to a model with probability pa or deleting one existing QTL from a model with probability pd in the way described in Jannink and Fernando (2004) and Sillanpää et al (2004). Therefore, for a proposal of QTL number NQ*, there are three possible values; NQ*=NQ+1, NQ*=NQ−1 and NQ*=NQ with probabilities pa, pd and 1−pa−pd.

When attempting to add one new QTL, firstly the location and the grandparental QTL states are randomly obtained by sampling from uniform distributions over their possible values. Next, effects of the QTL are determined from uniform distributions and genotypes of F1 and F2 individuals are obtained by equation (4) depending on the haplotypes of markers linked to the QTL and the parental QTL genotypes as explained in the section ‘Probability distribution for sampling QTL genotypes’. The new QTL is accepted with probability

where S*, G* and a* indicate S, G and a with the NQ+1 elements added corresponding with the new QTL.

For deleting an existing QTL, a random choice is made among the existing QTL. The probability for its deletion is

where S*, G* and a* mean S, G and a except that the items corresponding to the qth QTL are replaced by the proposal values.

A simulation study

Simulated data

The proposed method was evaluated empirically by analyzing simulated three-generation families originated by the cross between two F0 grandparents sampled from outbred founder populations. The three-generation families included two F0 grandparents, 40 F1 individuals, consisting of 20 male and 20 female subjects, and 400 F2 individuals in 20 full-sib families produced by crossing each F1 males to a different F1 female, so that the size of each F2 full-sib family was 20. It is possible to actually construct three-generation families of such size in pigs and forest trees, for example.

The simulated genome was composed of two chromosomes each with length 100 cM. We assumed the existence of three QTL, referred to as QTL1, QTL2 and QTL3, respectively. The true positions, QTL states in two grandparents (which means whether each of grandparents is homozygous or heterozygous at the QTL) and allelic effects and interactions are given in Table 2. On each chromosome, there were 11 markers, one located every 10 cM.

Table 2 Parameter values for the QTL states and each allelic effect at three QTL used in the simulation experiments

Two cases, Case I and Case II, were considered with respect to the marker informativeness in F0 grandparents. In Case I, both grandparents were heterozygous and shared no common alleles at any marker. In Case II, we assumed that both grandparents were homozygous for different alleles at all markers (as would be the case for two inbred lines). We simulated 100 data sets and 10 data sets for Case I and Case II, respectively.

Marker genotypes of F1 and F2 individuals were determined by sampling gametes from their parents taking linkage phase and recombination events into consideration. Observed marker genotypes were allocated by reshuffling the linkage phase, consequently, haplotypes H had to be estimated. The residual variance was set at σe2=1.0. The fixed effects comprised the overall mean (set at 0) and a sex effect, which increased phenotypic values of males by 0.3. Phenotypic records were simulated for F2 individuals as the sum of allelic effects at three QTL, sex effect and a residual effect (environmental effect).

Models used for the analysis of simulated data

The proposed method was used to estimate the number of QTL, the QTL state, the position and the genetic effects of each QTL, as well as the overall mean, sex effect and residual variance. For comparison with conventional genetic models used for QTL analyses in F2 populations, each data sets was analyzed using simplified models. In one, it was assumed that the grandparents were homozygous for alternative alleles; in the other, that they were heterozygous and shared no common alleles (S=4). Hereafter, the model used in our new analysis is referred to as M0, and the simplified models as M1 (S=1) and M4 (S=4).

Estimation procedure for simulated data sets

Firstly, haplotypes H were estimated for all individuals in each of data sets as described above. MCMC cycles were then started for the estimation of parameters. The initial value of the QTL number was set at two, and the corresponding locations were determined by sampling from a uniform distribution over the whole genome length. For the analyses with model M0, the initial state of each QTL in two F0 grandparents was assumed to be S=(S1, S2)=(1,1), indicating that the two grandparents were fixed for different alleles at two QTL. The mean of prior Poisson distribution for the QTL number was set at μ=2 and the predetermined maximum number of QTL was set at NQmax=10. The starting values of allelic effects and dominance effect of each QTL and sex effect were randomly chosen from a uniform distribution on [−1,1]. The initial values of overall mean and the residual variance were set to the sample mean and the sample variance of the phenotypic values. Prior distributions of all parameters (except NQ, G and F) were assumed to be uniform.

For each data set in Case I, MCMC cycles were repeated 5 × 104 times and the first 104 cycles (burn-in) were not used for estimating the parameter values. Sampling was carried out every 10 cycles to reduce serial correlation so that the total number of samples kept was 4 × 103. In each data set in Case II, we repeated MCMC cycles 2.5 × 105 times, the first 1.5 × 105 cycles of which were discarded. Sampling was carried out every 20 cycles so that the total number of samples kept was 5 × 103. These sampling schemes for Case I and Case II were based on the evaluation of the convergence of MCMC cycles using QTL occupancy probability (Heath, 1997; Uimari and Sillanpää, 2001), as described below.

Results

The number of cycles required for convergence of MCMC estimation with model M0 was evaluated for Case I and Case II using one simulated data set, on the basis of a plot of cumulative QTL occupation probabilities (Figure 1) as a function of iteration number, that is, Pr(NQ⩽l∣iteration number=k) (where 1⩽l⩽5, 0⩽k⩽105 for Case I and 2⩽l⩽6, 0⩽k⩽3 × 105 for Case II, respectively) following Heath (1997) and Uimari and Sillanpää (2001). In the analysis of a data set in Case I, QTL occupation probability became stable after 104 cycles, while more than 2 × 105 cycles were required in Case II. Therefore, different burn-in periods and sampling points were adopted as described above. Accordingly, the number of data sets simulated was different between Case I (100 data sets) and Case II (10 data sets).

Figure 1
figure 1

Cumulative QTL occupancy probabilities obtained in the analyses of one data set simulated in Case I (a) and in Case II (b). The five areas, one on the top, three between two lines and one on the bottom, in (a) and (b) indicate probabilities of the number of fitted QTL, as denoted by numeral in each area.

The summary of the posterior distributions of the QTL number NQ are shown in Table 3 for each of the two cases over all data sets analyzed with three models, M0, M1 and M4. The mean of the posterior probabilities of each value of NQ and the times of posterior modes being consistent with the true value of NQ (NQ=3) in all of analyses are listed in Table 3. In Case I, three QTL affecting the trait were clearly confirmed in the analyses with M0 and M4. In most of data sets in Case I, the posterior probabilities were concentrated about the true value of NQ and the posterior modes were consistent with the true value of NQ in the analyses using M0 and M4. In Case II, the posterior probabilities of NQ were more dispersed. In the analyses with model M1, there was little difference for the posterior probabilities of NQ between Case I and Case II. Since all of QTL were assumed to be homozygous for alternative alleles in the grandparents, the difference in the marker informativeness between Case I and Case II was not utilized. In some data sets in which QTL1 was undetected by M1, the posterior modes agreed with NQ=2.

Table 3 Posterior probabilities of the QTL number and its expectation in the analyses of simulated data sets

QTL locations were estimated using the posterior QTL intensity function (Sillanpää and Arjas, 1998, 1999) defined as the relative frequency of the cycles in which QTL is detected in small intervals of equal size, say 1 cM, along each chromosome. The means of posterior QTL intensities are presented with standard deviations in Figures 2 and 3.

Figure 2
figure 2

Histograms of the means and the standard errors of posterior QTL intensities on two chromosomes with bin length of 1 cM over 100 data sets for Case I. There were three simulated QTL, which are indicated by closed triangles, located on 55 cM of chromosome 1, 25 cM of chromosome 2 and 75 cM of chromosome 2, respectively. (a) Mean posterior QTL intensities on chromosome 1, (b) mean posterior QTL intensities on chromosome 2, (c) standard errors of posterior QTL intensities on chromosome 1, (d) standard errors of posterior QTL intensities on chromosome 2.

Figure 3
figure 3

Histograms of the means and the standard errors of posterior QTL intensities on two chromosomes with bin length of 1 cM over 10 data sets for Case II. There were three simulated QTL, which are indicated by closed triangles, located on 55 cM of chromosome 1, 25 cM of chromosome 2 and 75 cM of chromosome 2, respectively. (a) Mean posterior QTL intensities on chromosome 1, (b) mean posterior QTL intensities on chromosome 2, (c) standard errors of posterior QTL intensities on chromosome 1, (d) standard errors of posterior QTL intensities on chromosome 2.

The probable regions of QTL were determined with a Bayes factor (Sorensen and Gianola, 2002). The Bayes factor, BF, is defined to be the ratio of posterior odds ratio to prior odds ratio for two competing hypotheses:

where p(Hi∣y) and p(Hi) are posterior and prior distributions of Hi (i=0,1), respectively. For the presence of QTL on each small interval, H1 means that a QTL exists on the interval and H0 means that there is not. Considering that the total number of small intervals was 200, the prior mean of the QTL number was 2 and the prior distribution of QTL position was uniform, we obtained p(H1)=2/200=1/100 and p(H0)=99/100. The posterior distribution p(H1∣y) was determined as QTL intensity on the interval and p(H0∣y) was given as 1−p(H1∣y). The intervals on which we obtained BF>2 composed a probable region for QTL, which corresponded to the intervals with QTL intensity >0.0196. The number of probable regions of QTL depended on the posterior probability of NQ. Thus, in most of data sets of Case I, three probable regions of QTL were confirmed according to the true position of each of the three QTL for models M0 and M4. For some data sets, analyses in which M1 did not detect QTL1, only two probable QTL regions were observed corresponding to QTL2 and QTL3. When the posterior probabilities for NQ>3 were not negligible (say, more than 0.3), as seen in the analyses of some data sets in Case II with M0 and M4, more than three probable QTL regions were obtained. For the evaluation of effectiveness in estimation of positions, QTL states, allelic effects and interaction effects of three simulated QTL, the probable QTL regions around the position of each QTL were combined and assigned to it. The intervals (30 cM, 80 cM) of chromosome 1, (0 cM, 50 cM) and [50 cM, 100 cM] of chromosome2 were regarded as the probable regions of QTL1, QTL2 and QTL3, respectively. In the analyses with M0 and M4, all three QTL were detected in all data sets of Case I and Case II. In the analyses with M1, QTL2 and QTL3 were detected in all data sets, while QTL1 was detected in about half of all data sets (51 data sets in Case I and six data sets in Case II).

For Case I, peaks of QTL intensities for QTL2 and QTL3 were observed around the true positions in most of data sets (Figure 2b). In Case I, the analyses with M1 provided relatively low peaks around QTL1 (Figure 2a). For Case II, QTL2 and QTL3 were confirmed around the true positions by the analyses with M0 and M4 (Figure 3b); however, other peaks of QTL intensity were often observed on chromosome 1 (Figure 3a). For Case II, QTL intensities were concentrated on some intervals, since the update of QTL locations was hampered due to the low marker informativeness. The variation in QTL intensity on each interval over all data sets appeared much greater in Case II than in Case I (Figures 2c, d, 3c, d).

The posterior probabilities of QTL states in grandparents are summarized in Table 4. In Case I, the means of posterior probabilities of S1, S2 and S3 took the highest values at the true QTL states of grandparents. However, the means of posterior probabilities of QTL states for Case II showed the highest value at Si=4 (i=1,2,3) meaning that both of grandparents are generally estimated to be heterozygous at all QTL, thus, the QTL states of QTL2 and QTL3 were often incorrectly estimated. In Table 4, the means of the posterior probabilities that each of two grandparents was homozygous at each of QTL are also shown. The posterior probability that an F0 grandfather was homozygous at QTL was obtained by Pr(S=1)+Pr(S=3), where Pr(S=i) is the probability of S=i for the QTL. Similarly, the posterior probability of an F0 grandmother being homozygous was given by Pr(S=1)+Pr(S=2). These showed good agreement with the true QTL genotypes of grandparents in Case I, while the posterior probabilities obtained in Case II were low for all of QTL meaning that their homozygosity was less accurately estimated (Table 4).

Table 4 Mean of posterior probabilities of the QTL states

The allelic effects and interaction effects of each QTL were estimated as the posterior expectations given each of QTL states on the probable QTL regions. Table 5 shows the means of the estimates of QTL allelic effects using model averaging (Ball, 2001; Sillanpää and Corander, 2002), in which the posterior expectations of allelic effects were averaged with weight according to the posterior probabilities of each QTL state, In Case I, the estimates were slightly less than the true effects except for α1 of QTL3 in the analyses with M0. The estimates of allelic effects and interaction effects were almost unbiased for all of three QTL in the analyses with model M4 for Case I. For the analyses with M0 and M4 in Case II, since two alleles of an F0 grandparent heterozygous at QTL were not well discriminated due to the low marker informativeness, α1 and α2 of QTL1 and QTL3 and α3 and α4 of QTL1 were not separated and estimated incorrectly. As expected, the analyses with model M1 provided biased estimates for QTL1 and QTL3 with QTL state S≠1. The estimates of all of QTL interaction effects ranged from –0.0515 to 0.0616 with standard deviations from 0.1089 to 0.2600 for Case I and from –0.0941 to 0.0772 with standard deviations from 0.0063 to 0.1532 for Case II.

Table 5 Mean of posterior expectations for the QTL allelic effects estimated on the probable QTL region

The means and standard deviations of posterior expectations for general mean (intercept), sex effect and residual variance are shown in Table 6 for Case I and Case II. In both cases, the analyses with models M0, M1 and M4 provided similar estimates for general mean and sex effect. In the analyses with M1, residual variance tended to be overestimated, as it contained the variation of effects of undetected QTL (QTL1). In the analyses with M0 and M4, residual variance was correctly estimated in Case I, while it was underestimated in Case II since a part of residual variance was removed by the fitting of false QTL.

Table 6 Means of posterior expectations for the general mean, sex effect and residual variance

Discussion

F2 families are suitable experimental populations for QTL mapping in outbred organisms as well as inbred organisms. The analyses of outbred F2 families allowed the detection of heteozygesity in the grandparents, as well as genetic differences between them. The Bayesian methods of Sillanpää and Arjas (1999) and Yi and Xu (2001) are applicable for such analyses: however, the number of founder alleles in F0 grandparents is fixed in the estimation procedure. Although a pedigree analysis method with Bayesian approach, proposed by Uimari and Sillanpää (2001) and Bink et al (2002), could be used for F2 families, only biallelic QTL were considered in their methods. In this paper, we proposed a more flexible Bayesian method for the QTL analyses of outbred F2 families, in which we estimate whether each of F0 grandparents is homozygous or heterozygous at QTL. The great benefit of our proposed method is that one does not need to assume biallic QTL while still maintaining low number of parameters. The accurate estimates of the QTL states and QTL effects are valuable information for breeding programs targeting the QTL. Using a model that parameterizes separately all QTL alleles in F0 grandparents, such as the model of Yi and Xu (2001), it is possible to infer the QTL states in F0 grandparents by comparing the estimates of allelic effects between QTL alleles. However, the posterior probabilities derived here provide more useful and direct information.

In Case I, where the informativeness of grandparental markers was enough to discriminate haplotypes for each of F0 grandparents, analyses with the proposed method (M0) provided accurate estimates for QTL number, QTL states, positions and effects the QTL as well as general mean, sex effect and residual variance in most data sets. A simplified analysis, which assumed that F0 QTL were heterozygous (M4), showed similar effectiveness in estimation. In Case II, where marker informativeness was too low to discriminate haplotypes in the F0 grandparents, there was a tendency to fit excessive false QTL in order to explain phenotypic variation in M0 and especially M4. In the analyses with M1, where QTL states were assumed to be fixed at S=1, it is not necessary to distinguish the alleles in an F0 individual and similar effectiveness in estimation was observed in Case I and Case II. However, QTL1 was often undetected with M1 due to the small mean allelic effect. QTL2 and QTL3 with the larger mean allelic effects were successfully detected in most of data sets in the analyses with M1. From the results of simulation experiments and taking the advantage of obtaining information of QTL states in F0 grandparents into account, it could be concluded that the analyses with M0 are preferable to those with M1 and M4 in practical analyses of outbred populations, in which some heterozygosity is anticipated.

The efficiencies of the proposed method depend on the number of F0 grandparents of the F2 family. In this paper, we considered the case of F2 families derived from two F0 grandparents; however, the proposed method can be extended for more grandparents. When the number of pairs of F0 grandparents is n, the number of all possible QTL states is 4n. Thus, as the number of pairs of F0 grandparents is much increased, the convergence of the estimates in the MCMC procedure would become slower and the efficiencies may be decreased due to the large number of possible QTL states. In the case of a large number of F0 founders, some restriction for the number of QTL alleles would be useful, as seen in the assumption of biallelic QTL adopted by Uimari and Sillanpää (2001) and Bink et al (2002). Updating QTL states individual by individual instead of updating combinations of QTL states in pair of founder individuals would be an alternative choice for treating multiple pairs of F0 founders.

In the QTL mapping of F2 families in outbred plants such as forest trees, it is usual that two genetically diverged individuals are used as F0 grandparents. For the analyses of such F2 families, the proposed method would be suitable. The proposed method would be likewise applicable to the pig F2 families where a small number of F0 grandparents are used.

The marker informativeness of F0 grandparents strongly influences the accuracy of estimation in the proposed method (M0). As shown in simulation experiments, we could obtain accurate estimates of the model parameters when all markers were fully informative in F0 grandparents (Case I), while the QTL number, the QTL states and QTL effects were poorly estimated in case of F0 grandparents homozygous at all markers (Case II). When the marker informativeness is insufficient to discriminate the haplotypes of F0 grandparents, the QTL states tend to be estimated as Sq=4 (that is, both F0 grandparents being heterozygous). This is because the number of parameters for allelic effects and dominance effects is maximized for Sq=4, and then the better model-fitting is attained under the insufficient marker informativeness, unless Sq is contrained by the prior.

In the proposed Bayesian method, haplotypes were estimated in advance of MCMC cycles following the method of Knott et al (1996) to save computational time. The correct estimation of haplotypes was reliably obtained as there were a high number of progeny per parent individual. When the number of progeny is small, or marker informativeness is low, however, haplotypes might be poorly estimated in this method. Incorrectly estimated haplotypes lead to inaccurate sampling probability of QTL genotypes, and consequently inaccurate estimation of positions, QTL states and genetic effects of QTL. Nevertheless, in the proposed method, QTL states are variable in the estimation process, thus, for QTL in which a mean effect of two alleles in an F0 grandparent is large and which can be detected assuming its QTL state to be S=1, relatively accurate mapping might be possible even with incorrect haplotype attribution.

When updating QTL position, QTL genotypes for all of the individuals were updated simultaneously. This is a ‘block-update’ (Uimari and Sillanpää, 2001; Yi and Xu, 2002). Since change in QTL position causes change in probable QTL genotypes in founder individuals and probable allele transmission to subsequent generations, such a block-update is required. This approach is particularly important since haplotype estimation is excluded from the MCMC cycles, so inference about a position depends on the QTL genotypes. When only a small number of F1 parents are available, some F0 QTL alleles may not be transmitted to F1 parents. It would then be impossible to accurately evaluate the QTL states of the F0 grandparent. A moderate number of F1 parents, say, five pairs are required for the method to work suitably.

One of the practical objectives of QTL mapping in F2 families is the utilization of the detected QTL in the marker-assisted breeding. The QTL states in founder individuals are then of great importance. The posterior probabilities for the QTL states obtained by the proposed method could therefore be used in the design of breeding programmes for outbred plants and animals.