Introduction

Since the introduction of molecular genetic markers, linkage analysis has become one of the most important tools in biological research. A special type of linkage analysis can be carried out for quantitative traits. Quantitative traits, often called complex traits in contrast to simple Mendelian traits, are traits where the relation between genotype and phenotype cannot be observed directly. A gene affecting a quantitative trait is called a quantitative trait locus (QTL). The genetic dissection of a quantitative trait — so-called QTL analysis — is usually carried out using interval mapping (Lander & Botstein, 1989) or a related method (Haley & Knott, 1992; Jansen, 1992, 1993; Martinez & Curnow, 1992; Zeng, 1993, 1994). For this purpose an experimental population segregating for the quantitative trait is created and its linkage map of molecular markers is calculated. The basic procedure of the QTL analysis is such that on many positions on the linkage map a test statistic is calculated. In analogy with the genetic mapping of simple Mendelian traits, this statistic is the LOD score. This is essentially a likelihood ratio statistic. Subsequently, regions on the genome are identified that show significant values of the test statistic; such regions are supposed to contain a QTL. This procedure, however simple, has a major problem: what value of the test statistic constitutes a significant value? A single LOD score is approximately related to a chi-squared distribution; the distribution of the maximum of a series of LOD scores, however, cannot be determined in a straightforward manner. Because of linkage, tests on neighbouring positions on the genome are not independent — closely linked markers will have equivalent test statistics. Also, the larger the genome, the more tests will be performed, thus increasing the probability that a fixed LOD threshold will be exceeded. Hence, if for the QTL analyses in various species an equal experiment-wise significance level is desired — usually 5% — the appropriate LOD thresholds will depend on the genome size of the species (in terms of recombination). Genome size varies greatly over species. Although most chromosome map lengths lie within the range of 50–250 cM, the numbers of chromosome pairs of diploid species start at two for Haplopappus gracilis (a plant), there are several species with just three pairs (e.g. Crocus balansae (a plant), Crepis capillaris (a plant), Tipula maxima (an insect)) and they go to beyond 50 pairs (John & Lewis, 1975; Dyer, 1979). Most agronomically interesting species have fewer than 25 pairs.

Several papers address the problem of statistical significance in QTL analysis and present solutions that are based on complex mathematical formulae, on cumulative distribution functions of the LOD score for specific situations obtained by simulation, or on permutation tests (Lander & Botstein, 1989; Van Ooijen, 1992; Feingold et al., 1993; Churchill & Doerge, 1994; Rebaï et al., 1994; Doerge & Churchill, 1996; Doerge & Rebaï, 1996; Dupuis & Siegmund, 1999). All solutions, however, have their own specific drawbacks. The mathematical formulae are seldom employed in the literature; apparently they are too complex, despite an available computer program (Rebaï et al., 1994). The cumulative distribution functions of the LOD score are available only for a very limited number of genome sizes and population types. The permutation test requires very long computation times and is difficult to use in multiple-QTL models. The lack of an easy and relatively quick way of obtaining the appropriate significance threshold for the biological species under investigation results in QTL analyses that employ a certain LOD significance threshold, for which the significance level is not known, and which is possibly set at too low a value. As a consequence, the literature describing QTL analyses might contain false-positive QTLs at too high a rate. Lander & Kruglyak (1995) present in their paper on guidelines for QTL analysis in human, mouse and rat genetics, a number of precalculated threshold values that should be used in several types of analyses and experimental situations. This relieves geneticists from difficult or inconvenient computations. Because the LOD significance threshold depends on genome size, which varies greatly over species, it will be clear that the genetical research in all other biological species definitely is in need of an uncomplicated way to obtain the appropriate, correct LOD significance threshold. This paper presents the tails of the cumulative distribution functions of the LOD score in various situations as obtained by extensive simulations. With a simple method — a basic formula and four tables — the LOD threshold at the desired significance level for the standard experimental populations for most diploid species with their own genome sizes can be obtained easily. Besides F2 and first-generation backcross (and equivalent types), the population types include a recombinant inbred (RI) family and a full-sib (FS) family of a cross between noninbred genotypes of an outbreeding species; for the last type no approximating formula has been published so far.

Methods

The tables present the results of stochastic (i.e. Monte Carlo) simulation of a diploid species with one chromosome on which no QTL was segregating. Various configurations were simulated. Of each configuration 1 000 000 repetitions were simulated. The chromosome map length was 50 up to 250 cM (Haldane mapping function) with a fully informative marker every 1 cM; this is considered a reasonable approximation of a dense map. The population types were (a) a first-generation backcross (BC1, e.g. F1 × P1) (b) an F2 (F1 × F1) (c) an RI family of the tenth generation (F10, which is the ninth generation after the initial segregating meioses of the F1) derived by single-seed descent from an F2, or (d) an FS family of a cross between noninbred genotypes of an outbreeding species (four alleles per marker). The population size was always 100. The individuals were assigned a random quantitative trait value according to a normal distribution regardless of the genotype, that is, there was no QTL. In each simulated population the LOD score, as defined by Lander & Botstein (1989), was calculated at all marker positions by fitting the appropriate model with two (BC1), three (F2), two (RI family) or four (FS family) QTL phenotypes. For the RI family the occasional heterozygous QTL phenotypes were fitted as strictly intermediate between the two homozygous QTL phenotypes. Subsequently, the maximum LOD on the chromosome was determined and recorded. From these data the cumulative distribution function of the maximum LOD score under the null hypothesis that no segregating QTL is present, was determined.

The LOD significance threshold

In experimentation one always wants to know the probability of arriving at the wrong conclusions. In QTL analysis these are (a) the conclusion that there is a segregating QTL whereas in reality there is not, or (b) not detecting a QTL which actually is present. The first type of error results in a false positive (type I), the second in a false negative (type II). The probability of false positives — the significance level — is controlled by choosing the appropriate significance threshold. The rate of false negatives is determined by the experimental set-up and the sizes of the genetic effects of the QTLs.

In a QTL study an experiment is set up to create a population that segregates for the quantitative trait. The values of the quantitative trait of the individuals in the population are recorded. The genotypes of segregating markers are determined and the linkage map is estimated. In the subsequent QTL analysis, tests for the presence of a segregating QTL are performed at many map positions on the genome — say every 1 cM. For these tests the LOD score is used. The areas on the genome are identified that show high values of the LOD score which are unlikely to occur if no QTL were segregating. It is concluded that statistically significant areas on the genome contain a segregating QTL. To know the significance level one needs to know the distribution of the test statistic under the null hypothesis (H0) that no segregating QTL is present. Although under H0 a single LOD score is approximately a chi-square random variable (multiplied by a constant), the maximum of a series of LOD scores on a chromosome behaves according to a more complex type of distribution. Owing to the very nature of linkage these series of tests on a chromosome are not mutually independent. Because of the independent assortment of chromosomes in meiosis, tests on different chromosomes are mutually independent.

In biology one usually desires an experiment-wise significance level of 5%. In the QTL analysis of a single segregating population this is equivalent to the genome-wide significance: the probability of obtaining a LOD above the threshold somewhere on the whole genome just by chance is 5%. A genome-wide threshold will depend on the number and length of the chromosomes, but also on the numbers of markers on the chromosomes. When just a few markers are tested per chromosome — the so-called sparse map case — a lower threshold is needed at the same genome-wide significance level than when many markers are tested per chromosome — the so-called dense map case (Lander & Botstein, 1989). Lander & Kruglyak (1995; and Kruglyak & Lander, 1995) strongly recommend the use of the dense map threshold, regardless of the actual density of the map used. One of the reasons is that geneticists will always deploy many additional markers in regions that show signs of a segregating QTL after an initial sparse map search. With modern marker techniques many markers will be used anyway.

To obtain a genome-wide significance the average map length of the chromosomes of the investigated species is used, because (a) usually the chromosome length does not vary much within a genome, (b) the genome-wide threshold is predominantly determined by the total genome length and nearly independent of the number of chromosome pairs in the genome (Kruglyak & Lander, 1995), and (c) it is difficult to think of any other easy solution. If chromosomes are assumed to be of equal length, then the property of independent chromosome assortment at meiosis can be used to obtain the relationship between the genome-wide and the corresponding chromosome-wide significance. By analogy the latter is the probability of obtaining a LOD above the threshold somewhere on a single chromosome just by chance. Suppose the required genome-wide significance level is αg, the corresponding chromosome-wide significance level is αc, the number of chromosome pairs is n and the average chromosome length is l (in cM). Then the following relationship holds:

The LOD threshold for the genome-wide significance level αg can now be obtained from the cumulative distribution function (c.d.f.) of the maximum LOD under H0 on a single chromosome of length l by looking up the LOD that has a c.d.f. value of 1 − αc. Table 1 Table 2 Table 3Table 4 present for several situations the tail of the c.d.f. of the maximum LOD score under H0 on a single chromosome. The data were obtained by stochastic simulation. The situations comprise chromosome map lengths of 50 up to 250 cM (at multiples of 50 cM), which should suffice for most biological species. Further, the situations comprise experimental populations segregating for two, three and four QTL genotypes in the first meiotic generation and an RI family in the tenth generation; this should suffice for most experimental population types. The population types are discussed below.

Table 1 Cumulative distribution function† of the maximum LOD on a chromosome for QTL analysis based on two QTL genotypes†
Table 2 Cumulative distribution function† of the maximum LOD on a chromosome for QTL analysis based on three QTL genotypes†
Table 3 Cumulative distribution function† of the maximum LOD on a chromosome for QTL analysis based on two QTL genotypes† in an RI family
Table 4 Cumulative distribution function† of the maximum LOD on a chromosome for QTL analysis based on four QTL genotypes†

The way to use these tables is as follows. Calculate 1 − αc with the above formula for the required αg. Look up the LOD score at 1 − αc in the table for the c.d.f. of the maximum LOD for the appropriate population type. Usually the average map length l will not be a multiple of 50 cM. Therefore, look up the LOD under the two map lengths below and above l, and subsequently interpolate to obtain the required LOD threshold. For example, if we have an F2 of a species with eight chromosome pairs and 120 cM average chromosome length, and we want a genome-wide false-positives rate of 5% (αg=0.05), we obtain 1 − αc=0.9936. When we look up 0.9936 in Table 2, which applies to an F2, under 100 cM and 150 cM, we find that the corresponding LODs are 3.7 and 3.9, respectively, so that by interpolating a LOD of 3.8 (rounded upwards in the safe direction) is obtained as the desired 5% genome-wide significance threshold. In (rare) cases where the average chromosome length is larger than 250 cM, use must be made of the already mentioned fact that the genome-wide threshold is predominantly determined by the total genome length and nearly independent of the number of chromosomes in the genome. For instance, the 5% LOD thresholds for a genome with a single 200 cM chromosome, for one with two 100 cM chromosomes, and for one with four 50 cM chromosomes lie within a 0.1 LOD range of each other.

Population types

The population types to which the four tables apply, differ with respect to the number of QTL phenotype classes freely fitted in the analysed model, with respect to there being either one or two initial heterozygous parental genotypes, and with respect to the generation number after the initial segregating meiosis/meioses. All populations of Tables 1, 2 and 3 are derived from a single (or two identical) heterozygous F1 genotype(s) as the parent(s) that generate(s) the segregation. The populations of Tables 1, 2 and 4 are the first generation, and that of Table 3 is the ninth generation after the initial segregating meioses. Table 1 is for segregation into two QTL genotypes, that is, just one of the parents causes segregation or there is only one parent. Thus, Table 1 applies to a first-generation backcross (BC1), a population of haploids such as of some fungi, or a population of doubled haploids. Table 2 is for segregation into three QTL genotypes, which applies to an F2 (i.e. F1 × F1) where no restrictions are imposed on the heterozygous QTL phenotype in the analysis — any level of dominance is allowed. Table 3 is for an RI family in the tenth generation (F10), where the QTL segregates predominantly into two homozygous QTL genotypes. Table 4 is for segregation into four QTL genotypes, which applies to an FS family of a cross between noninbred genotypes of an outbreeding species. For Table 4, and until recently also for Table 3 (Dupuis & Siegmund, 1999), no approximating formulae have been published, although the corresponding situations are quite frequent in experimental set-ups; for instance RI families are often used in plant science (Burr & Burr, 1991), and using FS families is important in forest and fruit tree genetics (Grattapaglia & Sederoff, 1994; Hemmat et al., 1994; Grattapaglia et al., 1996; Maliepaard et al., 1997, 1998).

For an F2 a model is often fitted in which the heterozygous QTL phenotype is strictly intermediate (also called an additive model). In such a case in effect just two QTL phenotypes are fitted. The difference from the BC1, where also two QTL phenotypes are fitted, is that both parents, instead of one, have a segregating meiosis. This results in a slightly lower correlation between tests on linked markers in the F2, so that its c.d.f. of the maximum LOD on a chromosome is slightly different. This was verified by simulation. For a chromosome length of 50 cM and at c.d.f. values of 0.95, 0.99 and 0.999 the c.d.f. values for the BC1 were approximately 0.003, 0.0003 and 0.00003, respectively, larger. For a chromosome length of 250 cM the differences were approximately 0.003, 0.0006 and 0.0001, respectively. In all these instances the differences were much smaller than the differences with c.d.f. values at 0.1 LOD smaller or larger. This means that for normal practice Table 1 can also be used for an F2 where the heterozygous QTL phenotype is modelled as strictly intermediate.

RI families are employed in varying generations, but usually not before the F5. Because there is recombination from generation to generation, the correlation between tests on linked markers declines in later generations. Therefore, the 50 and 250 cM cases of the F5 and the F20 were also simulated and compared to the F10. For a chromosome length of 50 cM and at c.d.f. values of 0.95, 0.99 and 0.999 the c.d.f. values for the F5 were approximately 0.004, 0.0006 and 0.0001, respectively, larger. For a chromosome length of 250 cM the differences were approximately 0.003, 0.0005 and 0.0001, respectively. For the F20 the c.d.f. values were hardly different. In all these instances the differences were much smaller than the differences with c.d.f. values at 0.1 LOD smaller or larger. Therefore, Table 3 can be used reliably for RI families of the usual generation numbers. The reason for these small differences is that most recombination occurs before the F5, whereas after the F5 recombination has little effect because of fixation. For short distances the accumulated amount of recombination in late RI generations approaches twice the recombination frequency in a BC1. Because also in an RI family two QTL classes are fitted, the c.d.f. of the maximum LOD on a chromosome of a certain length in an RI family is close to that of a chromosome of twice that length in a BC1. Compare for instance the 50 cM column of Table 3 with the 100 cM column of Table 1.

Approximate multiple-QTL models

Multiple-QTL models are more powerful than single-QTL models when there are several segregating QTLs, but they require extreme computation times. As an alternative, Jansen (1992, 1993) and Zeng (1993, 1994), independently introduced approximate multiple-QTL models. Here, markers take over the role of the nearby QTLs and are fitted as cofactors while testing for a single QTL elsewhere in the genome. This way, the cofactors function as a genetic background control and absorb most of the genetic effects of their nearby QTLs from the residual variance. As a result, the power of the QTL analysis is enhanced, while reasonable computation times are retained.

In the mapping procedure with approximate multiple-QTL models (termed MQM mapping by Jansen, 1994), just as in interval mapping, tests for the presence of a single segregating QTL are performed at many positions in the genome. The difference between the two methods lies in the use of cofactor markers for background control of other segregating QTLs. The background control is part of both the null (no QTL) and the alternative (yes, a QTL) hypothesis. The tests in MQM mapping therefore have the same degrees of freedom as those in interval mapping. Simulation research of Jansen (1994) has shown that for MQM mapping the same LOD thresholds can be used as for interval mapping, under the condition ‘that the residual degrees of freedom for estimating the variance are adequate’. For testing under the presence of a linked QTL it is recommended that this linked QTL is flanked by two marker cofactors. Further, it is recommended that the number of parameters in the model is less than twice the square root of the number of individuals. In practice this means that this condition will be satisfied if there are not too many cofactors and the population is sufficiently large. The assignment of a marker cofactor essentially means that a QTL is concluded to be present. Experience so far has shown that the number of QTLs detected in a QTL analysis rarely exceeds 10, so that at least under current experimental practice the presented method of calculating LOD thresholds can also be applied to mapping with approximate multiple-QTL models.

Discussion

Using the presented method of calculating the LOD significance threshold will lead to a predictable rate of false-positive QTLs with reasonable accuracy. The values in the tables are accurate to about four decimal places (for more precise information about the accuracy, use can be made of the fact that each value in the tables is an estimate of a binomial probability). For the calculation of very high levels of significance use might be made of fitting some function through the tabulated data. As an alternative to a LOD threshold, the genome-wide significance level for the maximum LOD obtained in an analysis can be calculated with the method applied inversely. It must be realized that the calculated thresholds must be used as guidelines. Decisions with respect to further study, or utilization in breeding, of the particular genomic region should be based upon additional considerations, such as: What was the actual density of the markers used in the study? What generation was the RI family in? Does the trait behave according to normality? Does the estimated genetic effect of the QTL justify further study? If several QTLs were detected, which ones explained most of the genetic variation?

The tables are based on simulations of a marker density of 1 cM. This is sufficiently representative for the dense map case in current experimental practice. Initial mapping populations usually consist of 100 up to 500 individuals. The effective deployment of higher marker densities requires much larger experimental populations. Such large populations must be used in the subsequent fine mapping. Lander & Kruglyak (1995) xpresent LOD thresholds for QTL mapping in mouse and rat: for the backcross 3.3 LOD, for the F2 intercross 4.3 LOD. According to the method presented here, these values are 3.1 and 4.0 LOD, respectively. The discrepancy is thought to be caused by differences in marker density: infinitely dense vs. 1 cM between markers; Lander & Kruglyak (1995) calculated a difference in LODs of about 7%, which corresponds to these findings.

Although the LOD score test appears to be reasonably robust against the data (after fitting the QTLs) having a skewed instead of a normal distribution (Doerge & Rebaï, 1996), other deviations from normality have not been investigated. Of course, the use of a permutation test avoids the problem of deviations from normality. An important drawback of the permutation test is that it will take several hours of computation time (on a 200-MHz PC) to obtain and analyse 1000 samples. This must be repeated for each trait. The question remains whether such sets of only 1000 samples would provide more accuracy than the use of the tables in this paper. In cases where the permutation test is going to be employed, the presented simulation results will be very useful for comparison.

Whether the calculated rate of false positives is acceptable depends on the general agreement on significance levels. In biology the usual rate is 5% for each experiment. In this respect, however, performing a QTL analysis is a peculiar kind of experiment. For a segregating population, trait and marker data are determined. Each marker is tested for association with the trait. At a certain LOD threshold there exists a much larger opportunity for finding spurious linkage when many markers are tested because the investigated species has a large genome, than when few markers are tested. Now, what is considered an experiment in this QTL analysis? (a) The trait data plus one marker, (b) the trait data plus all markers on a single chromosome, or (c) the trait data plus all markers? This is important with respect to the experiment-wise 5% false-positives rate. There seems to be agreement that a whole genome scan (option (c)) should have a false-positives rate of 5%. However, this leads to the — at first sight strange — phenomenon that an Arabidopsis (n=5) geneticist may find a certain QTL effect that will be designated significant, whereas a wheat (n=21) geneticist detecting a similarly sized QTL effect cannot call it significant. Moreover, the wheat geneticist would have had to carry out a lot more work to obtain the results; that is, her/his experiment is much larger. On reflection it is clear that using a genome-wide 5% error rate should have such an effect: figuratively speaking, by allowing the collection of more wheat marker data the wheat geneticist simply gets many more shots at the bull’s-eye.

Although Lander & Kruglyak (1995) use the same definition of genome-wide significance, their proposed classification of mapping results, suggestive and (highly) significant linkage, is based on the expected number of times that a LOD score above a certain threshold is obtained just by chance, in which multiple false positives per chromosome are allowed. From a statistical point of view it is an unusual approach; in statistics the definition of significance is based on a certain probability of obtaining false positives, rather than on a certain expected number of false positives. Although the result is not much different for a 5% probability of a false positive against an expected number of 0.05 false positives, it is preferable to stick to the normal statistical approach of a significance level in the classification of mapping results, e.g. significant linkage should relate to a genome-wide 5% false-positives rate.

Lander & Kruglyak (1995) propose the term ‘suggestive linkage’ to allow for the publication of results that are not significant but point to a certain level of association between markers and trait. The use of a certain ‘suggestive’ level of significance is very appealing. The definition should be related to the fact that the analysis of the markers on a single chromosome can in a way be considered as a separate experiment. Because for each experiment a 5% error rate is an accepted rate, the term ‘suggestive linkage’ might be used for a chromosome-wide significance level of 5%. In recent years genetical research has discovered the potential power of comparative mapping (McKusick, 1997). For that purpose the results of mapping experiments must be comparable across species boundaries in an objective fashion. Because chromosome map length varies considerably across species, using a standard chromosome length of 100 cM in the definition of ‘suggestive linkage’ will allow an objective comparison of mapping results across species boundaries. Therefore, the proposal is to define the term ‘suggestive linkage’ for a chromosome-wide significance level of 5% for a standard chromosome length of 100 cM. For the various experimental population types that correspond to the four tables in this paper, the LOD thresholds for suggestive linkage are the fixed LOD values 1.9 (BC), 2.7 (F2), 2.1 (RI) and 3.2 (FS), respectively.

The presented method provides reasonably accurate approximations to LOD significance thresholds. Mathematical formulae would have presented a more elegant solution, though these would probably be rather complex and are presently not available for some of the usual experimental situations. Genetical research in many species is expanding and is certainly in need of convenient ways to calculate significance thresholds applicable to the species under study with its own specific genome size. Therefore, the current results provide an equivalent, easy and pragmatic solution.