Introduction

Rapid progress with molecular marker polymorphisms has made it possible for the first time to map the entire genome of virtually any species (Helentjaris et al., 1986; Stuber, 1992). Because markers are unlimited in number and genomic distribution, this has renewed interest in the use of molecular markers to facilitate the identification of and selection for individual quantitative trait loci (QTLs) that control economically important traits. The steps in marker-assisted selection (MAS) consist of identifying associations between marker alleles and QTLs, or ideally, of estimating the contribution of marker loci to the genotypic value of the trait by the QTLs associated with the markers (MQTL effects), and combining these marker effects with phenotypic information to rank individuals through an index and develop desired lines or populations (Lande & Thompson, 1990; Dudley, 1993; Gimelfarb & Lande, 1994).

MQTL effects can be identified and estimated through linkage disequilibrium (LD) created by crossing two inbred lines or divergent populations (Lande & Thompson, 1990), or through LD created within families in outbred species (Ruane & Colleau, 1995). Recent studies of MAS in breeding have tended to focus on the use of multiple regression of the phenotype on markers as a global method to identify markers linked to QTLs and to estimate marker effects (Lande & Thompson, 1990; Meuwissen & Van Arendonk, 1992; Zhang & Smith, 1992; Gimelfarb & Lande, 1994). Multiple regression has a computational advantage over the maximum likelihood method, while still producing very similar results (Lande & Thompson, 1990; Haley & Knott, 1992; Ruane & Colleau, 1995).

Lande & Thompson (1990); also (Lande, 1992) proposed a theory of marker index selection (MIS) that maximizes the rate of genetic improvement under MAS by combining information on genetic marker polymorphism with data on phenotypic variation among individuals. Gimelfarb & Lande (1994) conducted simulation studies to test the theory and showed that the efficiency of MAS depends on several factors, including: the total number of markers in the genome; the number of markers contributing to the index; the population size; and the heritability of the character. Dudley (1993) discussed the issues on MAS and described selection using markers only. Knapp (1994) reviewed the difference between marker-only selection and MIS and described the method of estimating MQTL effects using ANOVA.

Numerous applications of MAS to breeding have been proposed in the literature. MAS can be used to assist in selecting parents, increasing the effectiveness of back-cross breeding, improving sex-limited traits, speeding the development of superior lines and populations by marker-based seedling assays, and increasing the efficiency of selection by eliminating expensive, slow or difficult trait assays (Tanksley et al., 1981; Edwards et al., 1987; Lande & Thompson, 1990; Stuber, 1992; Dudley, 1993; Knapp, 1994). However, MAS usually deals with one-stage single-trait improvement. Lande & Thompson (1990) implied that the efficiency of MAS can be increased through a two-stage selection of immatures (seedlings, embryos), first based on marker loci followed by conventional phenotypic selection, or through marker-based multiple-trait improvement (see also Dudley, 1993). However, the efficiencies of these alternatives relative to conventional selection procedures have not been quantified.

The objective of this study is not to detect MQTL associations or estimate MQTL effects, but to investigate the efficiency of multistage MAS as compared to conventional selection methods, under the assumption that MQTL effects and index parameters are known. We also incorporate the costs associated with measuring phenotypic characters and scoring marker loci into the objective function to maximize the gain per unit cost.

Improvement of a single trait

One-stage selection

Lande & Thompson (1990; also Lande, 1992) described the theory of marker index selection. Let y be a 2×1 vector that contains a net molecular score (m), which is the sum of the additive effects on the character for any individual associated with these markers and individual phenotypic value (x), i.e. y=[ m x]T. The two components have relative economic weights of w=[0 1]T. Let P and G denote the phenotypic and genotypic variance–covariance matrices of vector y, respectively. The MAS index proposed by Lande & Thompson (1990) is:

where b is the index coefficients derived from classical selection index theory; i.e. b=[ bm bx]T=P−1Gw. This index can be used for either individual selection or selection of lines (Lande & Thompson, 1990; Knapp, 1994).

Two-stage selection

Marker selection at an early stage followed by conventional phenotypic selection of surviving adults has received considerable attention (Tanksley et al., 1981; Soller & Beckmann, 1983; Lande & Thompson, 1990). This sort of sequential selection will reduce the total genetic gain as compared to single-stage selection because some outstanding individuals may be culled at the marker selection stage (Xu & Muir, 1991, 1992). Gimelfarb & Lande (1994) stated that selection is more effective if markers contributing to the index are re-evaluated each generation. When the second-stage selection is based on an index including the molecular score, rather than the adult phenotype value alone, the genetic gain is hard to predict (Lande & Thompson, 1990). The problem is that the optimum culling strategy is difficult to find because numerical multiple integration is required (Xu & Muir, 1992).

In this study, Xu & Muir's (1991, 1992) algebra is followed to derive the formulae for two-stage index selection. The molecular score is selected at the first stage, whereas the adult phenotype together with the molecular score are selected at the second stage. The method for constructing indices is to find b so that the correlation between indices (zi) and the aggregate breeding value, H=wTg, is maximum under the constraint of Cov(zi, zj)=0, for ij. The constraint ensures the existence of an exact solution for truncation points without resorting to numerical multiple integration (Xu & Muir, 1991, 1992). As pointed out by those authors, the genetic gain will be slightly reduced using this index because the restriction of orthogonality amongz=[ z1zn]T produces an effect similar to that of a restricted selection index. However, Xu & Muir (1991) also showed that, under certain conditions, the efficiency of transformed culling may greatly exceed that of conventional independent culling because the former incorporates the information from previous stages.

Suppose now that y=[ m x]T is to be selected in two stages, and that the restriction is started at the second stage. Let z=[ z1 z2]T be a 2×1 vector of the updated selection indices defined by Xu & Muir (1992). At the first stage, selection is based on the marker score (m) only, and thus z1= b11 m. At the second stage, selection is based on the character phenotype (x) and the marker score. The index for the second stage has the form of z2= b12 m+ b22 x. In matrix notation, the two-stage index coefficients have the form of:

where σ2m is the additive genetic variance associated with the marker loci, σ2g and σ2p are the additive genetic and phenotypic variances of the character.

Matrix B is the transformation matrix from y to z, i.e. z=BTy. The responses to selection are:

where Δ z1 and Δ z2 are standardized selection intensities for stages one and two, respectively. Δ Gmz1σm is the correlated response in the average adult phenotype to marker selection on the immatures. Δ Gx has Δ Gm as one component and is the response relating to selection on the character phenotype. The corresponding gain in the aggregate breeding value (Δ HTM) is:

The efficiency of this two-stage MIS relative to conventional phenotypic selection on the adult phenotype is:

where Δ HPzp h2σp is the aggregate genetic gain from phenotypic selection, h2 is the heritability of the character, p is the proportion of the additive genetic variance in the character that is associated with the marker loci (Neimann-Sorensen & Robertson, 1961; Lande & Thompson, 1990) and Δ zi is the selection intensity of the ith stage.

The efficiency of two-stage MIS relative to conventional phenotypic selection with the same final proportion of 6.0% selected is depicted inFig. 1a as a function of p for various values of h2. When h2=1.0, the two-stage MIS has no advantage over the phenotypic selection in terms of aggregate breeding values. The relative efficiency of MIS can be very large for a character with low heritability if a large proportion of the additive genetic variance is associated with the markers. The relative efficiency increases as p increases and h2 decreases.

Fig. 1
figure 1

Efficiency of two-stage marker index selection (MIS) in the improvement of a single character, relative to conventional phenotypic selection (a) and to marker-only selection (b). The efficiency is plotted as a function of p, the proportion of additive variance associated with the marker loci, and for various h2 values.

The predicted aggregate genetic gain from marker-based selection on m is Δ HMzmσm. The efficiency with respect to the aggregate genetic gain of two-stage MIS relative to marker-only selection can be expressed as:

The marker-only selection is more effective than phenotypic selection when σ2m2g> h2 (Smith, 1967; Dudley, 1993), but it is never superior to MIS (Knapp, 1994). Results in this study indicate that two-stage MIS with the same final proportion of 6.0% selected is more efficient than marker-only selection (Fig. 1b). The difference in efficiency decreases as p increases and h2 decreases. When p=1, the two methods are equivalent. Knapp (1994) suggested that MIS should always be used when index parameters can be estimated. Otherwise the marker-only selection must be used when phenotypic index coefficients cannot be estimated, such as in unreplicated progenies. When p0.5, the marker-only selection is nearly as efficient as MIS selection. Thus, marker-only selection can be used to eliminate expensive, slow or difficult phenotypic trait assays particularly when h2<0.5 and p0.5 (Fig. 1b).

Optimization with respect to economic gain per unit cost

The total genetic improvement on the average adult phenotype in a two-stage selection scheme is somewhat reduced as compared to that of single-stage selection. However, the advantage of two-stage selection is justified by cost savings associated with measuring traits because not all individuals need to be recorded for all traits (Cunningham, 1975; Xu & Muir, 1991, 1992). The procedure may be modified to obtain a maximum Δ H per unit cost (Namkoong, 1970; Xu & Muir, 1992). Namkoong (1970) proposed a linear cost function for all individuals in a two-stage selection procedure. The linear cost function per individual (c) for a two-stage selection can be expressed as:

where c1 and c2 are the costs of obtaining measurements on the marker loci and on the character phenotype, respectively, and q1 is the proportion selected at the first stage. The quantity to be maximized is:

subject to the constraint q= q1 q2, where q is a predetermined total proportion selected. Because numerical integration is not required, the solution can be easily obtained using the Newton–Raphson iterative equation system (Xu & Muir, 1992).

The relative efficiency with respect to the gain per unit cost of two-stage MIS to the conventional phenotypic selection is:

where r= c2/ c1, the cost ratio. Under a predetermined proportion selected (q), Q can be maximized by optimum allocation of Δ z1 and Δ z2. The efficiency of two-stage MIS to phenotype selection is related to h2, p, r and qi (or Δ zi).

Under the same final proportion selected, the efficiency of two-stage MIS in terms of gain per unit cost relative to conventional phenotypic selection is plotted in Fig. 2 as a function of p and r for h2=0.1 and h2=0.3. Under the assumption that a molecular marker assay is more expensive than obtaining measurements on the character phenotype (i.e. r≤1), the two-stage MIS for the improvement of a single trait is less efficient than conventional phenotypic selection when h2=0.3. The efficiency increases as h2 decreases and r and p increase. It will be shown later that single-stage MIS with respect to the gain per unit cost is inferior to phenotypic selection because the former is never superior to two-stage MIS.

Fig. 2
figure 2

Efficiency of two-stage marker index selection (MIS) with respect to the gain per unit cost (Δ H/ c) in the improvement of a single character, relative to conventional phenotypic selection. The efficiency is plotted as a function of p, the proportion of additive variance associated with the marker loci, and r, the ratio of the costs for measuring the phenotype and marker loci, and for heritability values of h2=0.1 (above) and h2=0.3 (below).

The efficiency of two-stage MIS in terms of the gain per unit cost relative to one-stage MIS is:

This relative efficiency is plotted in Fig. 3 as a function of p and r for h2=0.1 and h2=0.3 under the same final proportion of 6.0% selected. The two-stage MIS is always superior to one-stage MIS unless r=0. The efficiency increases as p and r increase but is weakly affected by h2.

Fig. 3
figure 3

Efficiency of two-stage marker index selection (MIS) with respect to the gain per unit cost (Δ H/ c) in the improvement of a single character, relative to one-stage MIS. The efficiency is plotted as a function of p, the proportion of additive variance associated with the marker loci, and r, the ratio of the costs for measuring phenotype and marker loci, and for heritability values of h2=0.1 (above) and h2=0.3 (below).

Improvement of multiple traits

One-stage selection

MAS in most cases deals with the improvement of a single trait. When marker effects are significant for more than one trait, MAS on multiple traits is expected to be more efficient in a multivariate context than in a univariate analysis of total economic value alone (Lande & Thompson, 1990; Dudley, 1993; Jiang & Zeng, 1995). This is because marker information can be simultaneously used to provide the MQTL effects for multiple traits. Extension of single-trait to multiple-trait MIS is straightforward. Lande & Thompson(1990) gave the general formula for the multiple-trait MAS index as I=bTmm+bTxx where:

Note that x is a vector of quantitative traits with phenotypic and additive genetic variance–covariance matrices P and G. Vector m corresponds to the molecular scores obtained by summing the vectors of effects on the traits produced by associated molecular marker loci. M is the covariance matrix between the breeding values and MQTL effects. The vector of relative economic weights for quantitative traits is w, and that for the molecular markers is 0. Lande & Thompson (1990) showed that the relative weights on the quantitative characters differ from those under purely phenotypic selection, and that the relative weights on the molecular scores are not proportional to simple economic values of the corresponding characters.

The relative efficiency of multiple-trait MIS over conventional index selection without marker information is:

where Δ zI and Δ zp are the standardized selection intensity for MIS and phenotypic index selection, respectively, the numerator is the gain from MIS, and the denominator is the gain from phenotypic index selection.

Two-stage selection

Multiple-trait MIS can be readily extended to the case of two-stage MIS, with the MQTL effects selected at the first stage and the phenotypic characters, together with the MQTL effects, selected at the second stage. Under the constraint that the indices at different stages are independent (Xu & Muir, 1991, 1992), the vector of index coefficients in a two-stage selection that maximizes the rate of genetic improvements is given by:

(see Appendix A). The response to selection on this index is:

where Δ z1 and Δ z2 are the selection intensities for stages one and two.

The relative efficiency of two-stage MIS with respect to the gain per unit cost compared to phenotypic index selection is:

where r is the ratio of the cost of obtaining measurements on all phenotypic characters to the cost of scoring marker loci, Δ HTM=wT[M1/2Δ z1+ (PM)−1/2(GMz2] is the aggregate genetic gain from two-stage MIS, and Δ HP=(wTGP−1Gw)1/2Δ zp is the gain from phenotypic index selection.

The selection indices developed in this way are readily extended to more than two-stage selection (Appendix A). Because numerical integration is not involved in the calculation, the procedure can be used for any number of traits and for any reasonable number of stages without concerns about the amount of computation time required.

Numerical example

Two examples are given to illustrate multistage MIS for the improvement of multiple traits as compared with conventional index selection, marker-only selection and one-stage MIS with the same final proportion selected. Comparisons are made on the relative efficiencies of aggregate genetic gains (Δ H) or the gain per unit cost (QH/ c) for the different selection procedures. Parameters investigated are heritability (h2), proportion of additive genetic variance associated with marker loci (p), genotypic (rg) and phenotypic (rp) correlations, the cost ratio (r) of measuring phenotypic traits to marker loci, and economic weights (w). Because of the large number of possible combinations with four-trait selection, comparisons are limited to the cases of equal h2, rg or rp, p, and r with several levels for each parameter and w=1 for each factor (Appendix B).

In two-stage MIS, MQTL effects are selected at the first stage, followed by MQTL effects plus phenotypic traits selected at the second stage. The results show that the relative efficiency with respect to the aggregate genetic gain of one- (ROM/P) and two-stage (RTM/P) MIS to conventional index selection for the improvement of two traits in all cases tested ranges from 100 to 278% and from 95 to 268%, respectively (Table 1). In general, two-stage MIS has relatively smaller aggregate genetic gains than one-stage MIS because of early culling. In all the cases, MIS is notably superior to marker-only selection.

Table 1 Efficiency of single (Δ HOM) or two-stage (Δ HTM) marker index selection with respect to genetic gain for the improvement of two traits relative to phenotypic (ROM/P or RTM/P) and to marker-only (ROM/M or RTM/M) selection

The efficiency with respect to the gain per unit cost of two-stage MIS to conventional index selection (QTM/P) for the improvement of two traits ranges from 52 to 301% (Table 2). The inferiority of MIS to conventional index selection occurs in the cases where r≤0.5 (for two traits together, r≤1.5). When r=2 (two traits together), MIS is superior to the phenotypic index selection. In addition, two-stage MIS in all the cases tested is notably superior to both one-stage MIS (QTM/OM) and marker-only selection (QTM/M).

Table 2 Efficiency of two-stage marker index selection in terms of the gain per unit cost (QTM) for the improvement of two traits relative to phenotypic (QTM/P), marker-only (QTM/M), and one-stage marker index (QTM/OM) selection

In four-stage MIS for the genetic improvement of four traits, MQTL effects are selected at the first stage, followed by MQTL effects plus two, three and four phenotypic traits selected at the second, the third and the fourth stage, respectively. Results show that the efficiency of MIS to conventional index selection is even larger (Table 3). The four-stage MIS is superior to phenotypic index selection when r≥0.8 (four traits together). The relative efficiency of MIS to the conventional index selection ranges from 54 to 506%. In these cases, the reduced gain in the four-stage MIS is offset by the cost savings arising from early culling.

Table 3 Efficiency of four-stage marker index selection in terms of the gain per unit cost (QMI) for the improvement of four traits relative to phenotypic index (QMI/P), marker-only (QMI/M), and one-stage marker index (QMI/OM) selection

Discussion

With the advent of new molecular technology, large numbers of marker loci can be determined in virtually any species. Researchers can now routinely score large numbers of such polymorphic loci on many individuals in a population and determine the architecture of quantitative traits with unprecedented precision (Stuber, 1992; Dudley, 1993). However, few breeders appear to adopt MAS in applied breeding programmes. The main impediment is ctost (Lande, 1992; Strauss et al., 1992; Dudley, 1993). Despite numerous improvements made in the last decade, scoring molecular markers is more expensive than obtaining measurements on phenotypic characters (Lande, 1992). The application of MAS to applied breeding programmes depends on the relative cost and the expected return compared to conventional breeding methods. In this study, we incorporated the costs associated with collection of data into the objective function to maximize the gain per unit cost. The results of MAS with respect to the gain per unit cost, as compared with conventional phenotypic selection, are discouraging unless new progress is made in further reducing the costs associated with molecular marker assays.

Throughout this paper, a number of assumptions have been made. First, a large sample size is assumed in the calculation of index parameters. The significance of increasing sample size on MAS is twofold: it increases the proportion of the additive genetic variance in a character likely to be detected through MQTL associations; and it reduces sampling error (Lande & Thompson, 1990). Secondly, MQTL effects and index parameters are assumed to be known. We are focusing on evaluating the relative efficiency of multistage MIS for the improvement of multiple traits with respect to aggregate genetic gain (Δ H) or the gain per unit cost (Δ H/ c). Thirdly, a number of factors that may favour MAS have not been included in the comparison. These factors include pollen control coefficients, selection cycle length and relative selection intensity (Fehr, 1987; Dudley, 1993; Edwards & Page, 1994).

An optimum multistage index selection can be performed even if the index parameters will be estimated simultaneously. Consider a two-stage MAS for a single trait; the markers are selected at the first stage, whereas the adult phenotype, together with the markers, are selected at the second stage. Essentially, the selection criterion at the first stage is based on the probability of allelic transmission at QTLs plus estimated breeding values of the polygenes. What the markers can do is to help us guess better which QTL allele has been transmitted to the offspring. The estimated breeding values of individuals surviving the initial cullings will be updated when the phenotypic records are available. Fernando & Grossman's (1989) mixed-model equations can be directly applied because the preliminary cullings can be incorporated into best linear unbiased prediction under the selection model (Xie & Xu, 1996). By measuring both the genetic gain and the cost consumed, an optimum selection index can be constructed.

Marker loci are expected to be significantly associated with many phenotypic characters (Edwards et al., 1987; Stuber et al., 1987; Stuber, 1992). In such a case, simultaneous improvement of several traits using a marker-based selection index can be achieved (Dudley, 1993; Jiang & Zeng, 1995). Alternatively, an index value, based on phenotypic characters and MQTL effects, is desirable (Lande & Thompson, 1990). For simultaneous improvement of several traits, deterministic analyses in this study indicate that multiple-trait MIS can be used to increase substantially the aggregate breeding value in quantitative characters. The potential efficiency of multiple-trait MIS relative to conventional index selection in terms of aggregate breeding values depends on heritability (h2), proportion of marker-associated additive genetic variance (p), genotypic (rp) and phenotypic (rg) correlations, and economic weights (w). The multiple-trait MIS is expected to be more effective than conventional phenotypic selection and marker-only selection. Unfortunately, it is impossible to estimate marker and phenotypic index parameters in unreplicated progeny tests. In these cases, marker-only selection can be used because it is nearly as efficient as MIS when p0.5 for traits with low heritability (Knapp, 1994).

It should be emphasized that a two-stage selection procedure or independent culling does not increase the total aggregate genetic gain per se; rather, it increases the gain per unit cost. The results of this study show that the two-stage MIS with respect to the gain per unit cost is inferior to conventional phenotypic selection when h2≥0.3 and the cost ratio (r) of obtaining measurements on phenotypic characters to scoring marker loci is smaller than unity. The efficiency of multistage MIS to conventional selection increases as r increases and h2 decreases. In some species with a long life cycle or a large body volume such as beef cattle and trees, it is quite possible that r1.0, i.e. the cost of obtaining measurements on phenotypic characters is more expensive than that of scoring marker loci. In these situations, multistage MIS with respect to the gain per unit cost would be expected to be more effective than conventional index selection. In view of this, the initial applications of MAS in commercial breeding will be in those situations where MAS is likely to provide additional advantages, such as in the case of sex-limited traits, for characters that are difficult or expensive to record, or in species where individuals have high value.