Introduction

Best linear unbiased prediction (BLUP) of breeding values using the mixed model methodology of Henderson (1984) is the predominant approach for genetic evaluation of animals. Evaluation by BLUP requires that genetic means, variances, and covariances are properly modeled. Where only trait phenotype information is used, genetic group theory under additive inheritance was developed to account for differences in genetic means among genetic groups or populations (eg Thompson, 1979). Elzo (1990) further developed a theory to take account of heterogeneity of variances among genetic groups, and later the theory was extended to account for additive variance due to segregation of alleles among populations with different gene frequencies (Lo et al, 1993).

For situations where trait phenotype and DNA marker information is available in outbred populations, there are also BLUP methods developed for genetic evaluation (eg, Fernando and Grossman, 1989; Goddard, 1992; Hoeschele, 1993; van Arendonk et al, 1994; Saito and Iwaisaki, 1996) and mixed model approaches for quantitative trait loci (QTLs) interval mapping (eg, Grignola et al, 1996, 1997; van Arendonk et al, 1998; Saito and Iwaisaki, 2000). In an outbred population in which there is linkage equilibrium between markers and linked QTLs at the population level the overall genetic mean is not dependent on marker information. The information on marker allele transmission within family is used in genetic evaluation by BLUP. In contrast, in the cases of crossbred populations originated from several genetic groups such as breeds or outbred strains, where linkage disequilibrium between markers and QTLs exists, all the genetic means, variances and covariances depend on marker information. A mixed model for such situations was presented for a single marked QTL, in which a grouping strategy that can take into account crossbreeding and the linkage disequilibrium is included (Goddard, 1992). However, Goddard's model does not properly take account of heterogeneous variances among genetic groups and of the segregation variance. Then, for the similar situation, an alternative approach to model genetic means and (co)variances for genetic evaluation was proposed (Wang et al, 1998), combining the covariance theory for a marked QTL (Wang et al, 1995) and the covariance theory for a multi-breed population of Lo et al (1993).

Very recently, investigating the case where tightly linked multiple QTLs or a cluster of QTLs are contained in a chromosome segment, a mixed model approach was proposed for QTL analysis in F2 crosses between outbred lines that allows for QTL segregation within lines as well as for differences in mean QTL effects between lines (Perez-Enciso and Varona, 2000). This model was presented as a generalization of the approach by Wang et al (1998), in which the variance for multiple linked QTLs under the use of marker information was derived. In this paper, we work with a crossed population with the general pedigree, not with F2 crosses only, derived from crossing between two outbred genetic groups such as breeds and derive an expression for the genetic variance due to a chromosome segment in which linked QTLs are contained, especially for the segregation variance, conditional on flanking marker information.

Theory and methods

In this paper, we consider a crossed population derived from two breeds, A and B, within each of which there exists some genetic variation. We assume multiple linked QTLs in a chromosome segment, linkage equilibrium in A and B and additive gene action within and between breeds. Here, for convenience, we use similar notation to Perez-Enciso and Varona (2000). For use of flanking marker information, it is supposed that marker linkage phases in founders and the parental origins of marker haplotypes in non-founders are known.

Assuming that all alleles of all loci have equal effects, the distributions of the effects of A and B origin alleles at the kth locus in individual i (ghAi,k and ghBi,k) are written as

and

with

where h denote the gametes of paternal and maternal origin (0 and 1), μ is the expected value of the additive effect of linked QTLs in a chromosome segment, q is the number of linked QTLs within the segment, assumed finite, Δ is the difference in the additive effect for linked QTLs in the segment between two breeds, and σ2A and σ2B are the additive variances for the segment in two breeds. Then, for a cross derived from A and B, the variance for the additive effects due to linked QTLs (ghi) for one of the paternal and maternal gametes is given as

Following Lo et al (1993) and defining wkk′ that takes values AA, AB, BA, and BB according to breed origin of each allele at two loci k and k′, Perez-Enciso and Varona (2000) obtained:

where Ew represents the expectation for all the combinations of wkk′. Note that Ew[Cov(ghi,k, ghi,k′ | wkk′)] is zero, if k ≠ k′ because we have assumed linkage equilibrium within pure breeds or if wkk′ = AB or BA. As a result, we have:

where phi represents the expectation of the fraction of the segment originating from breed A in gamete h for individual i, given breed and marker information.

The second term in equation (2) represents the segregation variance for linked QTLs. Let the probability that loci k and k′ are of breed origin b and b′, respectively, be phi,bb′ where b,b′ ∈ {A,B}. Expressing the expectations of the additive genetic effects in the cases where the locus k is of breed origin A and B by ak = μk + Δk/2 and βk = μk − Δk/2, respectively, the second term of equation (2) is written as:

Then, with some arrangement and use of phi,AA(1 − phi,AA) = phi,AA (phi,AB + phi,BA + phi,BB) and so on, equation (4) can be written as

Thus, the additive variance can finally be expressed as

In the case of the model with an infinite number of linked loci considered, assuming that loci are distributed uniformly along the segment, phi can be represented by the expectation of identity-by-descent proportion (IBDP) between the segment of gamete h for individual i and that of an ancestral gamete of breed A. Thus, following the approach of Matsuda and Iwaisaki (1998, 2000) for an outbred population, that uses the concept of IBDP of Guo (1994a,b, 1995) and in which there is assumed to be linkage equilibrium between markers and QTLs, we here consider the current situation where there exist linkage disequilibrium between breeds.

Following Guo's work, let the chromosome region where a cluster of linked QTLs is contained be 0 ≤ t ≤ l, and two flanking markers are located at both edges of the region. The length of the region from the origin at which the nth crossover occurred is denoted by Sn. Since, for the two consecutive regions between an even-numbered and the next (odd-numbered) points and between the latter and the next (even-numbered) points where crossover occurred, the offspring receives two homologous chromosomes (paternal and maternal) of its one parent reversely, one chromosome of the offspring can be represented by a time-continuous, two-state Markov chain c(t) at any point t along the segment that is referred to as the gametogenesis process, as follows

where C is a random variable and takes values 0 and 1 with equal probability, and the time parameter t is the map distance along the chromosome. Since all meiosis are independent, all the gametogenesis processes in a pedigree are also independent and stochastically identical. So, using Haldane's (1919) mapping function, the transition probability matrix for c(t) is given as

Then, in order to assess the relationships between gametes of base animals of a particular genetic group A and gamete h for a given animal i in a succeeding generation, the m relevant gametogenesis processes v(t) = (c1(t),c2(t), cm(t)) are considered as the possible pathways of gene transmission from the base animals to gamete h. Given the m processes lying between two gametes, the statement for the joint gametogenesis process that is the possible pathway of gene transmission is common to all the t, and therefore this joint gametogenesis process is a random vector constituting a random walk on an m-dimensional hypercube Zm = {(η1,η2,. . .,ηm):ηi = 0 or 1}. Thus, defining the set of IBD states representing that the gamete h originated from genetic group A by DA as the collection of vertices on Zm and denoting the information on breed origin at two flanking marker loci by v(0) = v0 and v(l) = vl, phi, in equations (3) and (5) is obtained as

where representing two joint gametogenesis processes by ηj = (ηj1,. . .,ηji,. . .,ηjm) and ηj′ = (ηj′1,. . .,ηj′i,. . .,ηj′m), |•| stands for Σ|ηji − ηj′i|, and denoting the cardinality of DA by |DA|, events {(v(t) = vx)} for x = 1,2,. . .,|DA| are mutually exclusive. So, for example, the number of joint gametogenesis processes in which no crossover occurred between two flanking markers, with the probability of r00(l), is represented by m − |vl − v0|. Now, the information on breed composition for founders and parentage for non-founders is referred to as breed information (denoted as B). Then, given B and flanking marker information (denoted as MF), the second term in equation (5) can be approximated as

where δ2 = Δ2/4, and Rhi,AA,BB (Rhi,AB,BA) is the mean of phi,AAphi,BB (phi,ABphi,BA) for the whole segment, whose conditional expectation is given as

which can be obtained using the appropriate IBD sets, DAB and DAB, according to the situations of breed origin at two different loci and denoting two different breeds by b and b′. Note that in the case of E ( R i , A B , B A h | BM F ) , the IBD sets become DAB and DBA.

These expressions enables us to compute the variance for an infinite number of linked QTLs under use of flanking marker data for individuals in general percentages of breed origin A and B. If one of the two flanking markers is uninformative, then transmission can be followed for another informative marker and the variance can be obtained using the information on the informative one with v(0) = v0 or v(l) = vl (denoted as Ms).

Illustration

In this section, using a crossbred population derived from two breeds A and B as given in Figure 1, we numerically investigate the value of segregation variance for the gamete, originating from individual 3, for individual 4. We compare the three cases of no marker, single marker and flanking marker information available. It is assumed that δ2 = 1.0. The length of the chromosome segment containing multiple linked QTLs (l) are altered within the range of 0.1–0.5 M. For the gamete of concern, the cases considered were: the case where the allele M1 is transmitted, when information on a single marker is used, and the cases where the non-recombinant haplotype M1N1 or the recombinant haplotype M1N4 is transmitted, when data on flanking marker loci are used.

Figure 1
figure 1

Pedigree plot with breed and marker data for a crossbred population, in which A and B denote two breeds and M and N show flanking markers, representing alleles by figures.

With l = 0.1, the segregation variance for the gamete of concern in the case of no marker information is

with

and

giving the value of 0.1338.

In contrast, the corresponding expression in the case with use of marker information is yielded as

where M* is Ms or MF. For the case of a single marker, we have

and

and therefore the segregation variance is given as 0.0311. For the case where flanking marker information is used, the value becomes 0.0021 with

and

Table 1 presents all the values of the segregation variance for the considered gamete for the changed values of l that are also depicted in Figure 2. It is revealed that the values for the cases of non-recombinant and recombinant types are reversely and extremely different from those for the case with a single marker used. It is also shown that as l becomes large, the value resulting from use of a single marker becomes considerably near to the corresponding value in the case with no marker information, while the values in the case with use of flanking markers of non-recombinant type essentially remain low. These deviations in patterns from the case of no marker data clearly indicate that use of flanking marker data substantially increases the information on the segregation variance, as expected.

Table 1 Comparison of the values of segregation variance for the four cases
Figure 2
figure 2

Graphical representation of the values of segregation variance, as listed in Table 1, for the selected lengths of the segment. NM, SM, FM-NR, and FM-R denote the cases with use of no marker, single marker, flanking marker (non-recombinant type), and flanking marker (recombinant type), respectively.

In a crossed population between breeds or outbred lines, marker information may be partially informative. Therefore, while it is obvious that two informative flanking markers provide more information to account for the segregation variance than a single marker, when one of the two markers is uninformative, only the information on the informative marker can be used. In such situations, however, use of different informative flanking markers out of multiple linked markers on the chromosome would be a reasonable approach.

Discussion

In this paper, we first considered the situation in which a finite number of linked QTLs are located on a particular chromosome region and then presented an expression for the variance due to the QTLs in a cross originated from two breeds. Using the concept of IBDP given by Guo (1994a,b, 1995), we further investigated the variance in the mean of the conditional probabilities for an infinite number of linked QTLs in a chromosome segment. The variance for the linked QTLs conditional on flanking marker information was modeled. The results obtained can straightforwardly be used in genetic evaluation by BLUP using marker and trait information (Matsuda and Iwaisaki, 2001a). Moreover, in the segment mapping of Perez-Enciso and Varona (2000), an approach presented here could makes it possible to utilize information not only on F2 crosses but also on any crossbred individual in the population under study, taking into account the segregation variance appropriately. Also in the segment mapping, as indicated by Grignola et al (1997) for the usual interval mapping, it would be possible to detect a plural of clusters of QTLs by conducting the intersection-union test. The effects for QTLs positioned outside the segment or the marker interval can be handled as the effects of the remaining chromosome, as in equation (5) of Perez-Enciso and Varona (2000).

We have allowed the use of flanking markers. Indeed, it is known that in QTL mapping utilizing flanking marker information is more effective than use of only single marker information (eg, Mackinnon et al, 1996). As demonstrated in the numerical investigation, data on flanking marker loci are also informative in taking account of the segregation variance, relative to those on a single marker locus.

Wang et al (1998) presented the theory to properly model means and (co)variances in a multi-breed population, given single marker information, in the presence of gametic disequilibrium between the marker locus and a linked QTL. Compared with the segregation variance in the single QTL situation modeled by them, we note that the current expression for the segregation variance for multiple linked QTLs is of a different form, as would be expected. Our expression for the segregation variance requires the assumption that multiple linked QTLs are in coupling phase or that the differences in the expectations for linked QTLs between two breeds are in the same sign. The variance component due to segregation depends on the differences. Therefore, it is expected that the segregation variance becomes highest and lowest with the coupling and repulsion phases, respectively. When the QTLs are not in coupling phase, the relationship between twice the segregation variance δ2 defined by Lande (1981) and Lo et al (1993) and the difference in the expectation Δ is no longer represented by

Previously, the variation in genetic composition for a finite number of multiple linked genes or a chromosome segment was theoretically investigated by eg, Hill (1993) and Visscher (1996). The theory of this kind was applied to determine whether the detected entity in QTL mapping using inbred lines is a single QTL of relatively large effect or a cluster of multiple QTLs with smaller effects (Visscher and Haley, 1996) and to investigate the power of a chromosome test for detecting the genetic variation on a single chromosome (Visscher and Haley, 1998). Thus, for the crossbred population with general pedigree derived from pure original breeds, the current approach would also be useful to provide information on determining if a genomic region under study contains a single QTL or multiple linked QTLs.

In this study, we assumed that parental linkage phases of flanking markers are known. When phase information is unknown, one way would be to compute the expectation of IBDP for any possible phase, and then to add up the expected IBDPs weighted by the corresponding phase probabilities. Generally, however, such calculations would be tremendous, since it is likely that there are a number of possible linkage phases and parental origins. Hence, Markov chain Monte Carlo approximation would be a method of choice, as used by Grignola et al (1996), Perez-Enciso and Varona (2000) and Perez-Enciso et al (2000). This approach requires one set of assumed linkage phases for markers in a process of the sampling. Therefore, it seems that use of the current approach assuming known marker linkage phases in each Gibbs iteration could be useful in efficiently computing the mean of the conditional probabilities required. Finally, for populations with complex pedigrees of current type, it would be not easy to construct the IBD sets necessitated in equations (8) and (10). On this point, we comment that the application of a computing procedure to systematically calculate the genetic covariance matrix whose elements are the required (co)variances and its inverse by recursively constructing the IBD sets (Matsuda and Iwaisaki, 2001b) can be useful.