Introduction

Many efforts have been devoted in the last decade to increase the efficiency of marker analysis of quantitative traits. Among the most effective approaches are interval analysis (Lander & Botstein, 1989; Knott & Haley, 1992), selective sampling (Lebowitz et al., 1987; Darvasi & Soller, 1992), replicated progeny testing (Soller & Beckmann, 1990) and sequential experimentation (Motro & Soller, 1993). Especially encouraging are the recent successful attempts to improve the efficiency of QTL mapping by taking into account simultaneous segregation at many genomic segments affecting the trait in question (Jansen & Stam, 1994; Zeng, 1994; Jansen, 1996). A complementary situation, when one QTL (or a chromosome segment) affects several traits simultaneously, can also be considered, resulting in increased resolution power (Korol et al., 1987, 1994, 1995; Preygel & Korol, 1989; Jiang & Zeng, 1995; Ronin et al., 1995). Such analysis may be of major importance in formulating marker-assisted breeding strategies, dissecting heterosis as a multilocus multitrait phenomenon, obtaining unbiased parameter estimates of QTL effects in selective genotyping of correlated traits, developing an optimized programme for evaluation and conservation of genetic resources, revealing the genetic architecture of fitness systems in natural populations, etc. The combination of the multi-interval and multitrait mapping strategies may help to cope with a difficult problem arising when the chromosome under consideration contains several QTLs (e.g. Jiang & Zeng, 1995). It is known that if one attempts to fit a single-locus model to such a case, a ghost QTL can be found in an interval which has no effect on the trait (Knott & Haley, 1992; Martinez & Curnow, 1992). It is especially difficult to recognize situations with trans effects of linked QTLs (Haley & Knott, 1992; Luo & Kearsey, 1992).

Employing Monte Carlo simulations, we demonstrate here the advantages of multitrait analysis in detection of linked QTL effects within the framework of mixture mapping models. In spite of an increased number of parameters to be estimated, compared to the single-trait formulation, the proposed method allows for an improvement of detection power and estimation accuracy of linked QTLs. The results are illustrated by examples based on data of the North American Barley Genome Mapping Project.

Bivariate mixture model with two linked QTLs

Formulation of the model

Consider a situation of two linked loci, A/a and B/b, residing in two marker intervals, M11/m11−M12/m12 and M21/m21−M22/m22, and affecting correlated quantitative traits, x and y. We will demonstrate that joint treatment of correlated traits may provide a better power of detection and higher precision of parameter estimation for linked QTLs than the usual single-trait analysis. The consideration will be confined to a dihaploid or backcross situation, although other types of mapping populations can be treated in a similar way. We assume additive gene effects across loci and traits but this restriction can easily be omitted. Generally, our analysis is free of the standard simplifying assumption of equal variances (covariances) in the QTL groups. As was shown elsewhere (Korol et al., 1996), this last assumption can reduce the resolution power of the QTL mapping analysis. This may confer certain limits to the employment of the usual regression mapping models based on the ‘no variance effect’ assumption.

For the case of two linked QTLs in a dihaploid mapping population, four bivariate density distribution functions should be specified, for each allelic combination at the linked QTLs, fAB(x,y), fAb(x,y), faB(x,y) and faa(x,y). The expected mean values of the traits and variance–covariance matrices will be denoted by μuAB, μuAb, μuaB and μuab (u=x or y), and ΣAB, ΣAb, ΣaB, Σab, respectively. Each matrix Σi specifies the residual variances and covariance of the pair x, y: Σ2ix, Σ2iy and COVixy=Riσ2ixσ2iy, caused by segregation of genes from other chromosomes and nongenetic factors (e.g. environmental heterogeneity). Based on marker scores and measurements of the traits x and y, we should contrast, for the considered chromosome, the hypotheses that one (H1) or two (H2) intervals affect the observed variation of x and y and compare both with H0 (‘no effect of the chromosome tested’). Clearly, H1 is a complex hypothesis: for any pair of intervals one can assume that either the first or the second interval has no effect on the traits of interest. Actually, the situation is even more complicated, because a series of ‘partial’ hypotheses should be considered ranging from ‘full’ H2 (both traits depend on both QTLs) to H0 (no effect of the QTLs on either of the two traits). For the dihaploid case, the two-interval consideration results in 16 marker groups. The expected joint distribution of the traits x and y in each is a mixture of four densities, fAB(x,y), fAb(x,y), faB(x,y) and faa(x,y):

where the size of the group i and mixture proportion πij depend on relative positions of the putative QTLs with respect to marker loci, recombination rates, and interference mode and level (Jiang & Zeng, 1995). Some reasonable assumptions can be made leading to a simplification of the analysis arising from a reduction in the number of marker groups and/or f-components within groups (see below). For an arbitrary individual of the mapping population we present its bivariate phenotype (x,y) as

where x and y are the individual's phenotype scores of the analysed traits, μx and μy are trait means, dax and day are the effects of substitution at the A/a locus with respect to mean values of x and y (i.e. dax=μxAA−μxaa and day=μyAA−μyaa), ga denotes the genotype at locus A/a (ga=−1 for aa and 1 for AA); the same symbol usage is applied to locus B/b. In general, one may assume that the putative QTLs affect not only the mean values of the traits but also the trait variances and covariance. In such a case, ex is a random variable with zero mean and variances σ211u, σ212u, σ221u and σ222u for {(ga,gb)}={(−1,−1), (−1,1), (1,−1), (1,1)}, u=x or y. The variables ex and ey are assumed to be correlated with correlation coefficients R11xy, R12xy, R21xy and R22xy for {(ga,gb)}={(−1,−1),(−1,1),(1,−1),(1,1)}, respectively. Correlation between the traits x and y within the QTL groups may be caused by other segregating QTLs or nongenetic correlation. Although we may assume that loci A/a and B/b can also affect trait variances and covariance, in most cases we will deal mainly with the situation of equal variance– covariance matrices in the QTL groups, ΣAB=ΣAb=ΣaB=Σab=Σ.

LOD-score test and parameter estimation

The log-likelihood for a sample of two-dimensional measurements xk, yk in marker groups with sizes Ni (i=\(\overline{1,16}\)) can be written as:

where Θn2 is the vector of genetic parameters characterizing the effects and positions of the putative QTLs. In the general case, dau≠0, dbu≠0 (u=x or y), and all σ2iju are different as well as all Rijxy, so that Θn2={r1,r2,μx,μy,dax,day,dbx,dby,σ211x,σ212x,σ221x, σ222x, σ211y, σ212y, σ221y, σ222y, R11,R12,R21,R22} is the vector of n2=20 unknown parameters, specifying recombination rates in the pair of trial intervals and joint distributions of traits x and y in the QTL groups. The assumption of no effect of genes from the intervals M11/m11–M12/m12 and M21/m21–M22/m22 on the traits (x,y) can be presented by another set of parameters, Θ=Θn0={μx,μy,σx,σy,R} (the null hypothesis {H0: Θ=Θn0}) as contrasted to the foregoing ‘full’ H2: Θ=Θn2, or to any of the ‘partial’ alternatives {H1: Θ=Θn1}. According to the likelihood ratio test approach (Wilks, 1962), if H0 is true, the statistic

is distributed asymptotically as chi-squared with n1−n0 degrees of freedom, where S0 and S1 are the parameter spaces corresponding to H0 and H1, respectively (Wilks, 1962). The same method is applied to compare H2 and H1. If χ2 exceeds some critical value, corresponding to a preset level α, then the H1 hypothesis can be rejected. In such a case, the numerical values providing the maximum to L(Θn2) could be considered as ML-estimates of the parameters characterizing our putative QTL loci A/a and B/b. However, in multi-interval mapping the problem of the exact asymptotic distribution of the test statistic remains unsolved even for the single-trait analysis (see Zeng, 1994). This is especially true when linked QTLs are considered (Lander & Botstein, 1989). If so, one could use extensive Monte Carlo simulations to obtain an empirical critical value of the statistics for each situation.

One more comment on multi-interval mapping models of correlated trait complexes is worth mentioning (see also Korol et al., 1995). Introduction of additional parameters specifying the QTL mapping model should be justified statistically by comparison to the corresponding ‘reduced’ model. This is relevant to any complication of the mapping model, the replacement both of single-trait mapping analysis by its multitrait analogue and of a single-interval model by a two-interval one. Parameters which do not affect the significance level should be removed from the model.

Monte Carlo simulations

Generating the data

For each situation studied, 200 repeated mapping populations were generated using pseudorandom numbers. A bivariate normal distribution was used for the trait groups AABB, aaBB, AAbb and aabb. The compositions of the marker groups (mixtures hi, i=\(\overline{1,16}\)) were modelled as four-component distributions. The length of the marker interval was 20 cM with the QTLs in the middle of their intervals. No double exchanges were assumed within the intervals in the data presented below (hence Morgan's mapping function is suitable). The simulated chromosomes consisted of eight intervals each, with QTLs residing in intervals 3 and 4 or 3 and 5.

Obtaining numerical solutions

Optimization was by modified gradient and Newton methods. The possibility of multiple maxima was tested by optimizing a few test cases using various starting points. In all cases only a single maximum was found. Thus, for all Monte Carlo experiments the simulated parameter sets were used as starts (Titterington et al., 1985). When dealing with real data on barley, multiple random initial points were used to provide the unique solution.

Estimation of the power of the test

To estimate the power of the log-likelihood ratio test we used the critical level of the statistics (eqn 3) χ2=χ2critical based on the asymptotic distribution (chi-squared with d.f.=n1−n0 when H0 vs. H1 is tested, or d.f.=n2−n1 for H1 vs. H2 comparisons). The goodness of fit of the expected distribution was tested by simulations using 5000 trials. Provided H1 is true, the proportion of cases where the second QTL is revealed when it really exists was measured for different situations using critical values obtained in these simulations. A similar approach was also proposed by Doerge & Churchill (1996) on the basis of a permutation test. We found that the asymptotic and simulated distributions result in close estimates of power.

Simulation results

For the dihaploid case, we have simulated and analysed several situations when two QTLs (A/a and B/b) residing in adjacent or nonadjacent intervals affect two correlated traits, x and y. In order to show the advantages of joint analysis of correlated traits, we compare power of the test for detection of both QTLs and accuracy of parameter estimates with those obtained for situations with no correlation between the involved quantitative traits.

Four basic configurations were analysed, with the two QTLs residing in adjacent (AD) and nonadjacent (NA) intervals and acting in the same direction (coupling phase, CP) and in opposite ones (repulsion phase, RP), correspondingly. Following are the results obtained for several possible combinations of the traits and QTLs involved.

(i) One of the traits, x, depends on both QTLs, A/a and B/b, whereas the correlated trait y is independent of these QTLs (see also Korol et al., 1994, 1995; Ronin et al., 1995). The results of scanning along possible pairs of intervals for all four configurations (adjacent and nonadjacent locations of the QTLs, each at coupling and repulsion phase) are presented in Table 1. Based on these data the following conclusions can be made. The maximum of the average LOD scores of the two-QTLs model is attained at the true pair of the intervals. Moreover, the modal class of the bivariate distribution of the individual LODs also corresponds to the true pair of intervals (see the part of Table 1 which lies below the diagonal). Although the correlated trait y does not depend on either of the two QTLs, the additional information provided by y allows, as expected, the LOD values to increase. Moreover, the differences between LODs corresponding to the true and neighbouring positions of the QTLs are also increased when the correlated trait y is taken into account. These results hold for both adjacent and nonadjacent configurations, independently of the phase (coupling or repulsion).

Table 1 Variation of bivariate LOD scores across interval pairs in the case of two linked QTLs affecting a quantitative trait when another trait correlated with the target one and independent of the chromosome under consideration is involved

How are these effects reflected in the test power and estimation precision? Table 2 illustrates the gains of the two-trait analysis compared to the single-trait analysis. These are manifested in: (a) increase in power of detection of any QTL activity in the marked chromosome, as reflected in the differences between the mean LOD values for H2 and H0; (b) higher power of discrimination between H2 and H1. The benefit of two-trait analysis is higher for NA configurations (nonadjacent location of the QTLs), and for the repulsion phase as compared to the coupling one; (c) reduced biases of parameter estimates, manifested mainly for the coordinates of the QTLs within the intervals; and (d) lower variances in all parameter estimates.

Table 2 Efficiency of bivariate two-interval mapping of two linked QTLs affecting a quantitative trait assisted by another trait correlated with the target one and independent of the chromosome under consideration

(ii) A/a affects both of the traits whereas B/b affects only one trait, x; within each of the four QTL groups x and y are correlated because of nongenetic mechanisms and segregation of genes from other chromosomes. Comparisons of the test power and estimation accuracy enable us to conclude that bivariate mapping analysis is superior to the single-trait one (Table 3). This is manifested in: (a) a higher power of QTL detection (compare LODs for H2 vs. H0) and discrimination between ‘two-QTLs’ and ‘single-QTL’ hypotheses, i.e. H2 vs. H1; and (b) lower variances of parameter estimates. Notably, the reduction in the variance for recombination distances is more pronounced for the QTL affecting both traits as compared to that affecting only one of the traits. Also, a slight reduction is found in biases of the estimates of QTL positions resulting from the bivariate analysis. But in contrast to the case (i), the bivariate model has no advantages with respect to the biases of the QTL effects (Table 3)

Table 3 Efficiency of bivariate two-interval mapping of linked QTLs, when the first of the correlated QTs depends on both QTLs (A/a and B/b) whereas the second trait depends only on one of the QTLs (A/a)

(iii) Both A/a and B/b have pleiotropic effects on both traits, x and y. All conclusions reached for the previous case hold also here (see the results in Fig. 1 and Table 4). For instance, correlation between the analysed traits leads to a pronounced increase in the proportion of the modal class which corresponds to the true pair of intervals (compare (b) vs. (a) and (d) vs. (c) in Fig. 1. In addition, we compare here two estimates of the power of the test ‘H2 vs. H1’ (the assumption of the presence of two QTLs contrasted to that of one). These estimates were obtained when the critical value of the test was calculated from the asymptotic chi-squared approximation (eqn 3), and from 5000 Monte Carlo simulations of the situation when H1 is true. It appeared that the resulting two estimates of the power are close (compare βt and βmk in the last column of Table 4.

Fig. 1
figure 1

Variation of LOD scores across interval pairs in the case when both of the correlated quantitative traits depend on two linked QTLs (residing in adjacent intervals nos 3 and 4). For most of the parameter values used in simulations see Table 4; (a) R=0, coupling phase; (b) R=0.7, coupling phase; (c) R=0, repulsion phase; (d) R=0.7, repulsion phase.

Table 4 Efficiency of bivariate two-interval mapping of linked QTLs, when both of the correlated QTs depend on both QTLs (A/a and B/b)

Examples from the barley ‘Steptoe×Morex’ dataset

As an example of the application of multi-interval two-trait analysis, we have chosen two economically important traits — malt extract (%) and alpha amylase activity — from the Steptoe×Morex barley mapping data set (Hayes et al., 1996). The Steptoe ×Morex population is an extensively described mapping and QT reference population (Hayes et al., 1993, 1994). Malting quality is determined by a number of component traits. Malting is a carefully controlled germination process in which complex proteolytic pathways are manipulated to develop an ideal substrate for subsequent fermentation. Kernel carbohydrates are hydrolysed by α-amylases. Malt extract percentage is a measure of soluble sugars and proteins and thus expresses the overall efficiency of the malting process. Thus, α-amylase may be a component of malt extract, and coincident QTLs for the two traits could be attributed to pleiotropic effects of α-amylase. This is probably the case on chromosome 1, where malt extract and α-amylase QTLs were detected in the vicinity of the Amy2 locus (Hayes et al., 1993).

In Table 5 some examples of two-trait mapping analysis of α-amylase and malt extract are presented. For the measurements in environment 4, previous analysis revealed a QTL in interval no. 12 of chromosome 2 (ABG14–His3C) (Hayes et al., 1994). Two-interval analysis of the same trait allows the detection of a second significant QTL in interval no. 2 (ABG703–CHS1B). Although two-trait twointerval analysis results in the same power of detection of a second interval affecting α-amylase, it gives a slightly different location for both QTLs. The same conclusions were reached using data from environment no. 14. However, in this case the detectability of the second QTL for α-amylase by single-trait analysis is rather poor (P≈0.27) whereas the significance for the second QTL in the two-trait analysis is P≈0.037. The next example in Table 5 concerns the pair ‘α-amylase–kernel weight’ in environments 9, 10 and 13, for chromosome 1. Usual single-trait interval analysis (see Hayes et al., 1994) reveals a QTL for α-amylase at interval 10 (Brz–ABC156D) or 11 (ABC156D–ABG22A) manifesting in environments 9 and 10. In environment 13, in addition, a factor from interval 15 (ABC455–Amy2) was also detected (Hayes et al., 1994). Two-interval single-trait analysis shows a significant (P=0.007) additional effect of the interval no. 21 (ABG461–Cat3) in environment 9; the putative QTL is in repulsion phase with that of interval 10. Incorporation of the second trait, kernel weight, into the mapping model increases the significance (P=0.002). Especially pronounced enhancement in detection power of the test for the presence of the QTL for α-amylase in interval 21 was found for environment 10: from P=0.045 in single trait analysis to P=0.003 in bivariate analysis.

Table 5 Some examples of two-interval mapping analysis on barley data of NABGMP (Steptoe/Morex dihaploid mapping population)

Concerning the detection of the effect of interval 21 on α-amylase, the situation for environment 13 is similar to those in environments 9 and 10. However, in the last case the two-interval analysis is complicated by the presence of an additional QTL at interval 15. It is noteworthy that when the interval 10 (or 11) was ignored, its effect was ‘absorbed’ by the interval 15, so that the two-interval analysis for the pair of intervals 15 and 21 results in a biased (upward) estimation of the effect of interval 15. This can be seen when interval pair 10 and 15 is considered (the last row of Table 5).

Coincident QTLs may result from tight linkage or pleiotropy. In these terms, there are three possible scenarios involving malt extract and α-amylase: (i) coincident malt extract and α-amylase QTLs, where the latter is a determinant of the former, as on chromosome 1; (ii) malt extract QTL without accompanying α-amylase QTL, where, by default, the observed extract is attributed to a factor other than α-amylase, such as another enzyme or starch structure/composition; or (iii) an α-amylase QTL with an accompanying malt extract QTL.

Based on single-interval mapping, chromosome 2 appears to be an example of scenarios (ii) and (iii) as described above: α-amylase QTL at LOD ≲2.0 is seen near the centromere, and malt extract QTLs with LOD ≲2.0 are seen on the short arm (Hayes et al., 1993). However, with a two-trait analysis, the malt extract QTLs are observed to coincide with α-amylase QTLs (see Table 5). This provides an additional and potentially valuable insight into the basis of the observed malt extract effect, transforming it from a type (ii) to a type (i) scenario. These results provide new tools for studying the biochemistry of the malting process, as barley α-amylases are of two types (the Amy1 and Amy2 groups) and loci determining the isoforms map to chromosomes 6 and 1, respectively. No α-amylase loci have been mapped to chromosome 2, so the observed α-amylase QTLs may be attributable to genes that somehow regulate or modulate α-amylase expression.

Discussion

An approach to increase the resolution power of interval mapping of QTLs was proposed earlier based on analysis of correlated trait complexes (Korol et al., 1987, 1994, 1995; Preygel & Korol, 1989; Jiang & Zeng, 1995; Ronin et al., 1995). It is well known that, in an attempt to fit a single-locus mapping model to a case with several QTLs, a QTL can be found in an interval which actually does not affect the considered trait (Haley & Knott, 1992; Martinez & Curnow, 1992). As a result, the estimated effect of this ghost locus could be much higher than that of any of the real QTLs in the chromosome. An opposite and even more difficult situation could be when the chromosome in question contains a couple of linked QTLs in repulsion phase. Then, a conclusion of ‘no effect’ on the considered trait may result from single-interval mapping analysis. That trans-association of QTLs could be a rather common phenomenon even in interspecific crosses has been demonstrated by DeVicente & Tanksley (1993) in tomato: they found that up to 36 per cent of the detected QTLs had alleles with effects opposite to the direction expected from the parental differences.

The usual way of dealing with several linked QTLs is multiple regression analysis or mixture-model interval analysis which include markers as regression-derived cofactors to account for segregation of QTLs on the same chromosome (Jansen & Stam, 1994; Jiang & Zeng, 1995). Employing Monte Carlo simulations, we have demonstrated here the advantages of multitrait analysis in the detection of linked QTLs within the framework of mixture models. In spite of an increased number of parameters to be estimated, the proposed method allows for an improvement in the correct detection and estimation of linked QTLs. Although our model deals with bivariate trait distributions, multivariate situations can also be considered without a necessity for a further increase in the number of parameters. This seems to be possible if for any pair of intervals one can use the first two principal components of the multivariate complex (Korol et al., 1994; Ronin et al., 1995; Weller et al., 1996).