Abstract
Regression interval mapping and multiple interval mapping are compared with regard to mapping linked quantitative trait loci (QTL) in inbred-line cross experiments. For that purpose, a simulation study was performed using genetic models with two linked QTL. Data were simulated for F2 populations of different sizes and with all QTL and marker alleles fixed for alternative alleles in the parental lines. The criteria for comparison are power of QTL identification and the accuracy of the QTL position and effect estimates. Further, the estimates of the relative QTL variance are assessed. There are distinct differences in the QTL position estimates between the two methods. Multiple interval mapping tends to be more powerful as compared to regression interval mapping. Multiple interval mapping further leads to more accurate QTL position and QTL effect estimates. The superiority increased with wider marker intervals and larger population sizes. If QTL are in repulsion, the differences between the two methods are very pronounced. For both methods, the reduction of the marker interval size from 10 to 5 cM increases power and greatly improves QTL parameter estimates. This contrasts with findings in the literature for single QTL scenarios, where a marker density of 10 cM is generally considered as sufficient. The use of standard (asymptotic) statistical theory for the computation of the standard errors of the QTL position and effect estimates proves to give much too optimistic standard errors for regression interval mapping as well as for multiple interval mapping.
Introduction
In identification of genes and gene variants, mapping of quantitative trait loci (QTL) is the first step. So, for follow-up operations, it is of great importance to get accurate estimates of the QTL parameters and to have a realistic idea about the accuracy of QTL position and effect estimates.
Lander and Botstein (1989) developed a maximum-likelihood method based on a single QTL model for mapping QTL in F2 or backcross populations from inbred line cross experiments, which they called interval mapping. Haley and Knott (1992) proposed regression interval mapping as a computationally easier least-squares-based alternative method. For single QTL designs, several studies have shown only minor differences between regression interval mapping and maximum-likelihood (single QTL) interval mapping (eg Haley and Knott, 1992; Xu, 1995, 1998a, 1998b). As regression interval mapping is computationally easier than (single QTL) interval mapping, it is widely applied to QTL analyses studies.
Computer simulations to evaluate the efficiency of (single QTL) interval mapping for detecting and estimating independent polygenes showed that the estimates of variances associated with correctly identified QTL were greatly overestimated if small populations (100 progenies) were used (Beavis, 1998). The statistical power of detecting a small QTL was as low as 3% and the estimated effects were typically inflated 10-fold.
A maximum-likelihood method using a multiple QTL model to simultaneously map multiple QTL was proposed by Kao et al (1999). The authors called this method multiple interval mapping. For designs with two or more linked QTL, however, there are only a few studies on the differences between regression interval mapping and multiple interval mapping. Differences between multiple interval mapping and the regression method for multiple QTL models were studied by Kao (2000) analytically and numerically by simulation. In the theoretical part, Kao (2000) pointed out that the conditional probabilities as used in regression interval mapping have to approximate the conditional posterior probabilities used in multiple interval mapping to give similar results. In the presence of multiple QTL, differences in QTL effects and interactions among QTL influence the similarity of these probabilities and thus the comparability of the two methods. For linked QTL, there is an important additional aspect. In this case, the conditional expectations of QTL are correlated and these correlations tend to be higher than those between the conditional posterior expectations. As a result, the closer linked QTL are, the more regression interval mapping may lose power and give estimates with larger standard errors as compared to multiple interval mapping. Simulation studies are needed because these differences cannot be addressed analytically. In the simulation part, Kao (2000) assumed for simplicity of comparison that the QTL positions are known and estimated the QTL parameters and the likelihood values at the true positions. This is an idealistic assumption, which allowed Kao (2000) to compare a wide range of scenarios, but that on the other side has also several shortcomings: (i) the approach does not cover the aspect of QTL detection and model selection, (ii) the position estimates of regression interval mapping and multiple interval mapping may show rather large standard errors, that is, may be rather far away from the true positions, (iii) there may be significant differences between the position estimates of regression interval mapping and of multiple interval mapping, (iv) only complete simulation studies allow the computation of standard errors of position and QTL parameter estimates, and (v) allow comparisons between estimated and empirical standard errors. From the wide range of scenarios studied, Kao (2000) concluded that especially when QTL are linked, regression interval mapping as compared to multiple interval mapping may have a serious problem and even more if these QTL are in repulsion. From the study of, for example, Mayer et al (2004), it is known that, in comparison with a monogenetic background, a reliable and accurate estimation of QTL positions and QTL effects of multiple QTL in a linkage group requires much more information from the data, and that even the analyses of relatively large data sets can lead to estimates with relatively large standard errors. While Kao (2000) performed just one likelihood calculation per replicate and scenario, a grid search of the maximum of the likelihood function is obviously computationally much more demanding.
In contrast to the findings in the literature for single QTL analyses (eg Soller and Genizi, 1978; Dupuis and Siegmund, 1999; Piepho, 2000, Mayer et al (2004) found in a simulation study using multiple interval mapping methodology for an F2 schema with linked multiple QTL that the reduction of the marker interval size from 10 to 5 cM led to a higher power in QTL detection and to a remarkable improvement of the precision of the QTL position as well as the QTL effect estimates. They also found that the asymptotic standard deviation of the position estimates was not a good criterion for the accuracy of the position estimates. Similarly, confidence intervals based on the standard asymptotic statistical theory had a clearly smaller empirical coverage probability as compared to the nominal probability. For the asymptotic SD of the additive, dominance and epistatic effects, there were similar results.
The aim of this study is to compare regression interval mapping and multiple interval mapping in a multiple QTL situation with linked QTL, where the QTL are in coupling or in repulsion. This comparison includes the aspects of QTL detection and estimation of QTL positions. The two methods are compared with regard to power, accuracy of the QTL position and effect estimates, and bias in the estimation of the residual variance with special consideration given to the influence of the marker interval size. Further, the accuracy of standard error estimates using standard (asymptotic) statistical theory is evaluated.
Materials and methods
Simulation
Data were simulated for an F2 population from inbred line cross experiments with all QTL and marker alleles fixed for alternative alleles in the parental lines. Two QTL on the same chromosome with length 50 cM were assumed. The 2 QTL were located at positions 25 and 35 cM. The additive genetic effects simulated were a1=0.25 (QTL 1) and a2=0.50 (QTL 2) and the dominance effects were d1=0.25 (QTL 1) and d2=0 (QTL 2). There were no epistatic effects. The marker interval length was set either to 10 or 5 cM. The markers were located at the positions 0, 10, 20, 30, 40, 50 cM (marker interval size 10 cM) and 0, 5, 10, 15, 20, 22.5, 27.5, 30, 32.5, 37.5, 40, 45, 50 (marker interval size 5 cM). The residuals were scaled to give a QTL variance in the F2 population of 0.50. The population size was 200 and 500 individuals. The number of simulated and analyzed replicates was 200 for the population size of 200 and 100 for the larger population size of 500.
To study the influence of QTL in repulsion, another scenario with additive effects set to a1=1.0 (QTL 1) and a2=−1.0 (QTL 2) was simulated. There were no dominance or epistatic effects. As above, the contribution of the QTL to the trait variation was 0.50. The population size was 500 and the marker interval length 10 cM.
Regression interval mapping
The regression interval mapping method, which regresses the quantitative trait value on the conditional expected genotypic value was developed by Haley and Knott (1992) and is described in detail in their paper. For extension to multiple traits, see Knott and Haley (2000). Some theoretical aspects on the extension of regression interval mapping for multiple QTL can be found in Kao (2000).
For the QTL analyses on the basis of regression interval mapping, the web-based package QTL Express (Seaton et al, 2002), in particular, the module ‘F2 analysis’ was used. This mapping software can be found at http://qtl.cap.ed.ac.uk. With this program either 1- or 2-QTL models can be analyzed. The analyses were performed with the 2-QTL model version, whereby the genetic model included additive and dominance effects (no epistatic effects). For model identification and testing purposes, the computer program provided F-test statistics with 2 (2 QTL vs 1 QTL) and 4 (2 QTL vs 0 QTL) degrees of freedom, respectively. The nominal error probability used in testing was α=0.05. For 2-QTL analyses, no results on the accuracy of position estimates, but standard errors of effect estimates are provided.
The length of the chromosome of 50 cM used for data simulation was chosen with regard to the limitations of the QTL Express servlet in dealing with the number of markers required for the 5 cM coverage.
Multiple interval mapping
Multiple interval mapping as proposed by Kao et al (1999) is a maximum-likelihood method that uses multiple-marker intervals simultaneously to identify multiple QTL. Detailed descriptions and discussions of the multiple interval mapping methodology can be found in Kao et al (1999), Zeng et al (1999) and Kao (2000).
For the analyses in this study, a computer program was used that has been described in detail by Mayer et al (2004). The number of QTL that can simultaneously be included in the analysis model is only constrained by computing time and computing resources. The paper of Mayer et al (2004) also gives details on the computation of the asymptotic variance–covariance matrix of the parameter estimates. For this purpose, an approach as described by Louis (1982) and proposed by Kao and Zeng (1997) was realized. Model selection was based on likelihood ratio test statistics comparing a 2-QTL model vs a 1-QTL model and vs a no QTL model if appropriate. The Bonferroni correction was used to account for multiple testing in QTL detection.
Results
Power
The power for identifying the two linked QTL on the basis of the F-values (regression interval mapping) and likelihood ratio test statistics (multiple interval mapping) as described in the methods section is shown in Table 1. For every combination of population size and marker interval length, the power of multiple interval mapping exceeded that of regression interval mapping.
Position estimates
There are distinct differences in the QTL position estimates between regression interval mapping and multiple interval mapping. The differences between the estimates are larger for the smaller population size and for the wider marker intervals. In the scenario with the population size of 200 and a marker interval size of 10 cM, the QTL position estimates were identical in only 27% (QTL 1) and 31% (QTL 2) of the repetitions, respectively, using a 2-QTL analysis model for all the replicates. The 90% quantiles in the absolute difference of the position estimates for QTL 1 are 13, 10, 5.5 and 2 cM for the cases with marker interval size of d=10 cM and population size N=200, d=10 cM and N=500, d=5 cM and N=200, and d=5 cM and N=500, respectively. For QTL 2, the respective 90% quantiles are 7, 8, 3 and 1 cM.
The means, empirical SD and mean-squared errors (MSE) of the QTL position estimates from the significant replicates can be found in Table 2. The mean position estimates are similar for multiple interval mapping and regression interval mapping. With regard to the empirical SD and the MSE of the position estimates multiple interval mapping is slightly superior. The relative superiority is higher with larger population size. On the basis of all replicates, the relative superiority was also higher with wider marker intervals (results not shown).
Further, the reduction of the marker interval size from 10 to 5 cM led to a remarkable improvement of the QTL position estimates. The reduction in the empirical SD and the MSE seems to be similar for both methods. The relative values of the MSE for the marker interval of 5 cM as compared to 10 cM range from 74.4 to 38.5% (population size of 200) and from 40.8 to 21.6% (population size of 500), respectively.
Parameter estimates
Considering the significant replicates (Table 3), the variation and the MSE of the effect estimates is generally smaller for multiple interval mapping. The reduction of the marker interval size from 10 to 5 cM had a clearly positive effect on the accuracy of the QTL effect estimates, especially in the larger families of 500 individuals (Table 3). The relative reduction in the empirical MSE of the position estimates was similar for the two methods.
As can be seen from Table 4, regression interval mapping tended to overestimate the residual variance and thus to underestimate the proportion of variance explained by the QTL. This effect could be observed for the wider marker interval size of 10 cM, but for the shorter marker interval size of 5 cM there was almost no difference between the two methods.
Standard error of estimates
In Table 5, the empirical standard errors of the position and effect estimates are compared with the estimated standard errors of estimates over all replicates. The QTL express servlet provided standard error estimates for the effect estimates, but there is no implementation for calculating standard errors of position estimates. The means and SD of the estimated standard errors of the estimates for regression interval mapping and the estimated standard errors of multiple interval mapping (MIM*) are very similar. The MIM* standard errors were estimated under the assumption that the QTL positions were known without error, meaning that the asymptotic SD of the multiple interval mapping estimates were computed at position 25 cM (QTL 1) and position 35 cM (QTL 2). It should be observed that these estimated standard errors are much smaller than the empirical SD. Taking regression interval mapping and a marker distance of 10 cM as an example, the empirical SD of the estimates of the additive genetic effects is about three times as high as the estimated standard error. The means of the asymptotic SD for multiple interval mapping, calculated for the maximum-likelihood estimates, are somewhat larger, but still clearly smaller than the empirical SD. Further, the SD of the estimated standard errors are generally relatively large. There was no great difference in the SD of the estimated standard errors between regression interval mapping and MIM*. Thus, the asymptotic SD of multiple interval mapping estimates as well as the standard error calculated with the regression interval mapping servlet in a particular replicate was not a good criterion for the accuracy of the position and effect estimates.
QTL in repulsion
Table 6 shows the estimates of the QTL positions, QTL effects and relative QTL variance for the scenario with QTL in repulsion, resulting from regression interval mapping and multiple interval mapping. The power of both methods in identifying two QTL is 100%. However, there are distinct differences in the accuracy of the estimates. The multiple interval mapping estimates of QTL location and QTL effects were very accurate, whereas regression interval mapping resulted in estimates with remarkably higher MSE. Further, regression interval mapping clearly overestimated the residual variance and thus underestimated the relative QTL variance.
Discussion
Properties of regression interval mapping and multiple interval mapping estimates
For single QTL models, theoretical considerations as well as simulation studies (eg Haley and Knott, 1992; Xu 1995, 1998a, 1998b; Dupuis and Siegmund, 1999; Kao, 2000; Rebai and Goffinet, 2000) have shown that regression interval mapping and (single QTL) interval mapping (Lander and Botstein, 1989) lead to very similar results with regard to power of QTL identification, and that there are only minor differences in the estimates of QTL position and QTL parameters. As regression interval mapping is computationally easier to realize and faster in computation, regression interval mapping is often the method of choice for QTL mapping.
The situation may be different when multiple interval mapping is used instead of (single QTL) interval mapping in the presence of multiple QTL as has been shown by Kao (2000). Therefore, the regression interval mapping method should be used with some caution even as an initial procedure to obtain preliminary results. As the real genetic background of a trait is generally not known in QTL studies, results of regression interval mapping may prevent the use of multiple interval mapping, which would give more conclusive results. This is especially true if QTL are closely linked, interact or are in repulsion phase.
In studying the effects of QTL in repulsion, Kao (2000) simulated two QTL with opposite effects in two neighboring 40 cM marker intervals and 5–20 cM from the marker between them. Regression interval mapping had a serious problem in detecting the neighboring QTL in a distance up to 20–30 cM. The MSE of the estimates of the genetic effects were also clearly increased. If epistasis was simulated, the problem of regression interval mapping in detecting closely linked QTL increased further. In this study, with a similar design for analyzing the effects of repulsion, the power of detecting two QTL was 100% for both methods. Although the design is similar, there are three important differences that may explain the different findings and show the need of full simulation studies. Firstly, the population size was higher in this study (N=500 vs 200). Secondly, the marker interval size in Kao (2000) was wider (40 cM) as compared to 10 cM in this study. Thirdly, for simplicity of comparison, Kao (2000) assumed the true QTL positions were known and calculated the test statistics at these positions, whereas in this study, the power calculations were made at the position estimates. As these estimates are partly far away from the true values (as can be seen from the MSE in Table 6), and for the other two reasons, the power should be higher in this study. The comparatively high standard errors of the QTL effect estimates using the regression method are also partly caused by this inaccuracy in the QTL position estimates. Another important factor is that in 22 out of the 100 replicates, the QTL position estimates were at the positions of 28 or 29 cM for QTL 1 and 31, 32 for QTL 2 causing large absolute effect estimates with opposite signs.
Kao et al (1999) found that such strong repulsion linkage was important in a real example: an analysis of cone number, tree diameter and branch quality in a sample of 134 radiata pine. The number of QTL detected was 7, 6 and 5 for these three traits, respectively, using multiple interval mapping. These QTL individually contributed from ∼1 to 27% of the total genetic variation. Further significant epistasis between four pairs of QTL in two traits was detected. Together, the QTL explained 56, 52 and 38% of the phenotypic variances for the three traits, respectively.
Mayer et al (2004) showed that with multiple QTL models, much more information from the data is necessary (compared to single QTL models) for QTL identification and accurate estimation of QTL parameters. The reason is that the likelihood surface often showed several local maxima of almost equal height. Furthermore, these peaks were surrounded by plateaus or connected by ridges. For this reason, in the present study, the size of the F2 population was varied. For a single QTL model and a relative QTL variance of 0.50, a population size of 200 individuals would be sufficiently large to guarantee a high power of QTL detection and accurate estimates of the QTL position and QTL effects. In the scenario of this study with 2 QTL in coupling, the power and accuracy of QTL mapping was only moderate and the increase in the population size to 500 individuals led to much improved results. So this is another confirmation for the marked increase in the requirements for QTL mapping in more complex genetic situations.
Multiple interval mapping is a maximum-likelihood method. ML methods are of particular interest because of their large sample properties. The optimality properties may not apply for small samples. However, in this study, even with the population size of only 200, multiple interval mapping performed better than regression interval mapping. With the larger population size of 500, the relative superiority of multiple interval mapping increased.
Kao (2000) found that the relative superiority of multiple interval mapping increased with wider marker intervals. In his simulation study, he varied the marker interval size from 40 to 10 cM, a usual range for single QTL analyses. Kao points out that in real data for linkage analysis the marker map may have wider marker intervals and therefore the use of multiple interval mapping may be beneficial. The conclusion from this study is that it is generally desirable to have a very dense marker map, as the reduction of the marker interval size down to 5 cM led to both a higher power and also to more accurate estimates of the QTL parameters. This conclusion contrasts with findings in the literature for single QTL scenarios, where a marker density of 10 cM is generally considered as sufficient. The advantages of close marker intervals for mapping linked multiple QTL were first described by Mayer et al (2004) on the basis of a simulation study using multiple interval mapping and are now also confirmed in this study for the regression interval mapping method.
Standard errors of estimates
Owing to their computational burden, bootstrapping approaches similar to the ones used for single QTL analyses (eg Visscher et al, 1996; Lebreton and Visscher, 1998) seem to be intractable for multiple QTL models. This study (along with Mayer et al, 2004) has demonstrated that standard error estimates for QTL parameters and confidence intervals based on the large sample properties of multiple interval mapping as a maximum-likelihood method have to be interpreted with considerable caution. The means of the asymptotic SD estimates of the QTL position and effect were clearly smaller than the empirical SD. Similarly, the standard errors of regression interval mapping estimates can be grossly underestimated, as the uncertainty about the QTL locations is not considered in their computation. Taking the model with QTL effects in coupling and a marker interval of 10 cM as an example, the empirical SD of the estimates of the additive genetic effects was about three times as high as the estimated standard error irrespective of the population size.
Conclusions
This study has extended the analyses of Kao (2000) by including not only the estimation of the QTL parameters but also the aspects of QTL detection and estimation of the QTL. In conclusion, in situations with linked QTL, multiple interval mapping generally outperformed regression interval mapping with regard to power, accuracy of position and effect estimates and estimation of the residual variance. This was especially true for the scenario with QTL in repulsion. Furthermore, both approaches give highly overoptimistic estimates of the standard errors if they are calculated using statistical methods that do not fully allow for uncertainty about the QTL positions.
References
Beavis WD (1998). QTL analysis: power, precision, and accuracy. In: Paterson AH (ed) Molecular Dissection of Complex Traits. CRC Press: Boca Raton, FL. pp 145–162.
Dupuis J, Siegmund D (1999). Statistical methods for mapping quantitative trait loci from a dense set of markers. Genetics 151: 373–386.
Haley CS, Knott SA (1992). A simple regression method for mapping quantitative trait in line crosses using flanking markers. Heredity 69: 315–324.
Kao C-H (2000). On the differences between maximum likelihood and regression interval mapping in the analysis of quantitative trait loci. Genetics 156: 855–865.
Kao C-H, Zeng Z-B (1997). General formulas for obtaining the MLEs and the asymptotic variance-covariance matrix in mapping quantitative trait loci when using the EM algorithm. Biometrics 53: 653–665.
Kao C-H, Zeng Z-B, Teasdale RD (1999). Multiple interval mapping for quantitative trait loci. Genetics 152: 1203–1216.
Knott SA, Haley SC (2000). Multitrait least squares for quantitative trait loci detection. Genetics 156: 899–911.
Lander ES, Botstein D (1989). Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121: 185–199.
Lebreton CM, Visscher PM (1998). Empirical nonparametric bootstrap strategies in quantitative trait loci mapping: conditioning on the genetic model. Genetics 148: 525–535.
Louis TA (1982). Finding the observed information matrix when using the EM algorithm. J R Stat Soc Ser B 44: 226–233.
Mayer M, Liu Y, Freyer G (2004). A simulation study on the accuracy of position and effect estimates of linked QTL and their asymptotic standard deviations using multiple interval mapping in an F2 schema. Gen Sel Evol 36: 455–479.
Piepho HP (2000). Optimal marker density for interval mapping in a backcross population. Heredity 84: 437–440.
Rebai A, Goffinet B (2000). More about quantitative trait locus mapping with diallel designs. Gen Res 75: 243–247.
Seaton G, Haley CS, Knott SA, Kearsey M, Visscher PM (2002). QTL express: mapping quantitative trait loci in simple and complex pedigrees. Bioinformatics 18: 339–340.
Soller M, Genizi A (1978). The efficiency of experimental designs for the detection of linkage between a marker locus and a locus affecting a quantitative trait in segregating populations. Biometrics 34: 47–55.
Visscher PM, Thompson R, Haley CS (1996). Confidence intervals in QTL mapping by bootstrapping. Genetics 143: 1013–1020.
Xu S (1995). A comment on the simple regression method for interval mapping. Genetics 141: 1657–1659.
Xu S (1998a). Further investigation on the regression method of mapping quantitative trait loci. Heredity 80: 364–373.
Xu S (1998b). Iteratively reweighted least squares mapping of quantitative trait loci. Behav Genet 28: 341–355.
Zeng Z-B, Kao C-H, Basten CJ (1999). Estimating the genetic architecture of quantitative traits. Genet Res 74: 279–289.
Acknowledgements
I would like to thank the anonymous referees and the editor whose comments helped to strengthen this paper.
Author information
Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mayer, M. A comparison of regression interval mapping and multiple interval mapping for linked QTL. Heredity 94, 599–605 (2005). https://doi.org/10.1038/sj.hdy.6800667
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/sj.hdy.6800667
Keywords
- mapping
- QTL
- simulation
Further reading
-
Efficiency of low heritability QTL mapping under high SNP density
Euphytica (2017)
-
Effect of advanced intercrossing on genome structure and on the power to detect linked quantitative trait loci in a multi-parent population: a simulation study in rice
BMC Genetics (2014)
-
Genetic control of soybean seed isoflavone content: importance of statistical model and epistasis in complex traits
Theoretical and Applied Genetics (2009)