Introduction

In recent years, the analysis of quantitative traits has largely centred round the use of molecular markers to locate and measure the effects of the individual underlying genes, quantitative trait loci or QTL (Kearsey and Farquhar, 1998; Tanksley, 1993). In some situations, it has also been possible to examine epistatic interactions and genotype environment interaction involving QTL (Monforte et al, 2001). These approaches have thus provided a very powerful tool for the genetical analysis of quantitative traits and have been a major impetus to quantitative genetical research and breeding.

There are, however, limitations to these approaches. Firstly, the accuracy of chromosomal location of QTL is often low and they focus on those QTL that have large effects. Secondly, the size of the estimated QTL effects will be biased because only those estimates that exceed some given threshold of significance are reported (Utz et al, 2000). Concentration on the proportion of the variation explained by individual QTL tends to encourage acceptance of a low number of genes, and many studies have emphasised how few QTL are necessary to explain the observed genetic variation (Doganlar et al, 2000). However, this can often be misleading. If, for example, a trait is controlled by one QTL with an effect of nine units and 19 QTL each with individual effects of one unit, the former will account for 81% of the total variation but only 32% of the combined gene effects. Thus, the contribution to the variance is relevant to the response to selection per generation but not to the genetic potential and the ultimate limits to selection.

Conventional biometrical genetical procedures, on the other hand, deal with the combined effects of all the QTL on the means, variances and correlations of traits. These procedures, mainly developed and improved during the second half of the last century, use statistical relations among relatives to infer the nature of the additive, dominance and epistatic effects of the genes. The estimates obtained are automatically weighted by the sizes of the gene effects and their linkage relations. However, they tell us nothing about the location or effects of the individual QTL (Kearsey and Pooni, 1996; Lynch and Walsh, 1998).

The experiments reported in the present paper were designed to be analysed by a combination of conventional biometrical and current molecular methods. The intention was to reconcile and integrate the conclusions from the two approaches in the hope of gaining a more complete picture of the underlying genetical control of several traits. They are based on the well-analysed lines of Arabidopsis thaliana, Columbia and Landsberg, and generations derived from them.

Materials and methods

Plant material

The inbred strains of A. thaliana, Columbia (Col) and Landsberg (Ler), derived from seed obtained from the Nottingham Arabidopsis Stock Centre (NASC, UK), were used as source material. The 16 basic generations (BGs) shown in Table 1, which include both parents and all possible reciprocal F1's, F2's and first backcrosses (Bc1 and Bc2) derived from them, were produced by hand emasculation and pollination. The 101 recombinant inbred lines (RILs) (Lister and Dean, 1993) derived from Ler × Col were each selfed and also crossed as seed parent to Ler, Col and their F1 in a triple test cross design (Kearsey and Jinks, 1968), but sufficient seeds were obtained for all four families from only 85 of the RILs.

Table 1 Origins of the 16 basic generations from Ler and Col, together with the model for genetical and cytoplasmic effects fitted to the generation means

A total of 20 seeds from each basic generation family and five seeds from each RIL or TTC family were sown in separate pots containing JI compost and the 2020 pots were immediately, individually randomised in a single block in an unheated polytunnel in long (16 h) days. A total of 22 traits, representing plant size and development, were scored over the following 2 months and these are listed in Table 2.

Table 2 Traits scored and models fitted to each trait

Genetical analysis

The means and within-family variances of each of the 16 basic generations were calculated for all traits. Models that included standard genetical and cytoplasmic effects (Table 1) were fitted to the means, the parameters being estimated by weighted least squares using the reciprocal of the variance of each family mean as the weight. The genetical parameters were as follows: additive [a], dominance [d] and epistatic additive by additive [aa], additive by dominance [ad] and dominance by dominance [dd] (see Kearsey and Pooni (1996), for definitions and estimation procedures). For consistency, [a] has been defined as 1/2(Ler−Col), and so will be negative when Columbia has the highest score. The nongenetical, cytoplasmic or maternal effects tested were: inbred vs F1 mothers, Col vs Ler mothers and F1 vs reciprocal F1 mothers. The simplest model was accepted for which all the parameters were individually significant and the χ2, testing the adequacy of the model, was nonsignificant. The genetical and environmental variance parameters, VG and VE, were estimated from the within-family variances of the F2 and nonsegregating generations (Kearsey and Pooni, 1996).

For each trait, the subset of RILs (65) for which the selfed families and the three TTC families had complete progeny sets were chosen, but a few individual plants were inevitably lost due to pests or disease. We refer to the crosses of the n RILs to Col, Ler and the F1 as ‘L1i’, ‘L2i’ and ‘L3i’ (i=1–n), respectively. The RIL families were analysed by one-way ANOVA, while the TTC families were subjected to standard ANOVAs (Table 3) to test for additive (L1i+L2i), dominance (L2iL1i) and epistatic variation (L1i+L2i−2L3i) as described by Kearsey and Pooni (1996) and Kearsey and Jinks (1968). These ANOVAs will be illustrated in the ‘Results’ section. Phenotypic (VP), additive (VA), dominance (VD) and environmental (VE) components of variance were estimated from these ANOVAs and used to calculate narrow (VA/VP) and broad ((VA+VD)/VP) sense heritabilities and dominance ratios, √(2VD/VA) (see Kearsey and Pooni, 1996).

Table 3 ANOVAs of (i) RILs and (ii) TTC, together with parameter estimates to illustrate methods of analysis: data for ‘Days to first leaf’

QTL analysis was carried out for every trait for which the earlier ANOVAs (RILs and TTC) had indicated significant variation between families. The method of Kearsey and Hyne (1994) (also known as marker difference regression, MDR; Lynch and walsh, 1998) was employed, using the ‘QTL Café’ (http://web.bham.ac.uk/g.g.seaton/). Significance testing and confidence intervals were obtained by simulation. The analyses were carried out on three independent sets of data: (i) the means of the RILs; (ii) the means of the TTC (L1i+L2i) values; and (iii) the means of the TTC (L2iL1i) values. The first two identify QTL on the basis of their additive and the third on their dominance genetic effects. The software provides estimates of QTL effects, locations and their respective confidence intervals. A total of 65 markers were used for the QTL analysis. They were chosen from the published genotype data at NASC to be evenly spaced across the five chromosomes and markers for which the genotypes for most RILs were known.

Results

A summary of the significant parameters for the 22 traits is presented in Table 2, together with the observed means of the two parental lines and their F1. In all cases, fairly simple models could be fitted to the data resulting in a nonsignificant χ2, and all but two traits showed significant genetic effects. The two exceptions, both scored at flowering, were cauline leaves and bud number. Additive and/or dominance effects exist for 19 traits and epistasis for 15 traits, so epistasis is a common feature and must imply two or more QTL for these traits. Where it exists, dominance is for early maturity and greater size. It can be seen that the F1 mean is generally intermediate to the two parents; better-parent heterosis only occurs for “height at flowering” although it is not significant (t=0.83 for 75 df; P>0.3). Maternal effects were also a minor component of variation and were detected in only two cases; a difference between the progeny of crosses with reciprocal F1 mothers for “days to maturity” and between crosses with Col vs Ler mothers for “rosette size at 36 days”. These effects were allowed for as parameters in model fitting and the df for the χ2 are consequently reduced by 1.

Figure 1 shows the distribution of the RIL means for a sample of traits scored either at a fixed time or at a set physiological age, namely at flowering. It can be seen, by comparing these distributions with the confidence intervals of the parental lines, that there is considerable transgressive segregation in all cases, indicating that increasing alleles are dispersed among the parents. This, again, implies two or more QTL for such traits. Where there are differences between the parental lines, Col is always the faster developer and has the larger size.

Figure 1
figure 1

Distribution of RIL means for various traits, together with the means of the parental lines, Col and Ler. The width of the frame containing the line label indicates the 95% confidence interval for that mean.

Table 3 illustrates the basic ANOVAs of the RILs and TTC families. Given the completely randomised design, the RILs analysis involves a simple one-way ANOVA with the expected mean squares and parameter estimates as shown. The coefficients of the expected mean squares are not whole numbers because of missing data. They have been calculated, following Sokal and Rohlf (1981), as n0= {1/(a−1)}{Σni−(Σni2ni)}, where ni is the size of the ith family and a is the number of families. The TTC families were analysed as two separate ANOVAs. The first follows the standard NCIII design (Kearsey and Pooni, 1996) to detect additive and dominance effects from the crosses of the RILs to the two parents. The second is based on an analysis of the comparison (L1i+L2i−2L3i), where L1i, L2i and L3i represent the means of the families derived from crossing RILi to Col, Ler and the F1, respectively. This comparison is solely a function of epistasis and should be zero for all ‘i’ in the absence of epistasis. The correction term in the ANOVA tests for additive-by-additive epistasis, while the variation among RILs is due to additive by dominance and dominance-by-dominance epistasis (Kearsey and Jinks, 1968; Kearsey and Pooni, 1996). In the example shown in Table 3, only the latter is significant. When epistasis is present, estimates of the additive (VA) and dominance (VD) components from the NCIII ANOVA are inflated, but it is not easy to correct for this. The estimates given for VA and VD, therefore, must be accepted as being biased.

The estimates of VA and VD and tests for epistasis from the analyses illustrated in Table 3 are shown for all traits in Table 4. VA and/or VD are significant for all traits while epistasis was detected for 14 traits. Using these estimates plus those from the basic generations, heritabilities and average dominance ratios, √(2VD/VA), were calculated and they are summarised in Table 5. The conclusions concerning epistasis from the BGs and TTC are generally highly consistent, indicating the almost complete absence of [aa] but the general presence of [ad] epistasis. Of the seven traits showing no epistasis in the BGs, four also showed no epistasis in the TTC.

Table 4 Summary of data from RILs and TTC: estimates of variance components and results of tests for epistasis from the TTC
Table 5 Estimates of heritabilities (h2n, h2b), dominance ratio and number (k) of QTLs controlling each trait

QTL analysis was carried out on 84 RILs and 74 TTC families for which both L1 and L2 data were available. The RILs and L1+L2 data provide locations and additive effects (a) while L2L1 provides locations and dominance effects (d). Such small numbers of genotypes are far from ideal and reduce the power of the analysis considerably. Of the 330 analyses performed (22 traits × 5 chromosomes × 3 sets {RILs, L1+L2 and L2L1}), 38 were significant at P<1% on a whole chromosome basis (compared to those expected by chance alone of 330/100=3.3) giving an expected false-positive rate of about 8% (3.3/38). However, consistency of locations and effects across the three independent sets of data strengthens belief in the QTL identified. The estimated QTL locations and effects are shown in Table 6 for cases where P<1%, together with the additive and dominance effects of the same traits estimated from the basic generations, for comparison. The locations are also shown in Figure 2.

Table 6 QTL locations and effects compared to genetic effects from basic generations
Figure 2
figure 2

Chromosome maps indicating locations of markers and QTLs. Arrows indicate confidence interval of QTL. Solid arrows are QTL located in present study, dashed arrows indicate leaf-related QTLs identified by Jansen et al (1995). Down-arrow indicates Col allele increases score; up-arrow indicates Ler allele increases score. Vertical bars indicate QTL clusters likely to indicate the same QTL.

Those traits representing height at various ages locate a single major QTL at 50 cM on chromosome II, the site of the erecta mutation (Table 6). This location, the size and direction of the genetic effects are consistent over time and across RILs and TTC families. Thus, the additive effect of the QTL, a, is negative (ie Col is larger than Ler) and the dominance effect, d, is positive as before, that is Col alleles are dominant. However, there are inconsistencies. The additive effect of the ‘erecta’ QTL is considerably less than the additive effect [a] at 30–40 days but is larger by 45 days and beyond. This small initial effect, which increases greatly at maturity, is entirely consistent with the known effects of erecta. The size of d is considerably smaller than [d] as estimated from the basic generations by weighted least squares (eg 68.9 as opposed to 184.9 for height at maturity). Conversely, d is always much larger than we would expect from the F1 mean (eg 68.9 opposed to 32.3 for height at maturity). These discrepancies between the effects from the basic generation and QTL analyses are consistent with our previous conclusion that more than one QTL is involved, but the other QTLs have failed to achieve significance. Similar effects associated with erecta are found for rosette size at all ages, except at 26 days, and for leaf size at 40 days. However, there is also evidence for another QTL for rosette size on chromosome IV.

A further QTL at 60 cM on chromosome IV affects time to maturity, rosette size at 26 days and number of cauline leaves at 26 days. There is some indication of QTL on chromosomes I (cauline leaves at 26 days and at flowering) and V (rosette leaves at 26 days and at flowering, and cauline leaves at flowering), but the locations are inconsistent. No QTL was detected on chromosome III.

Discussion

All traits show significant genetical variation that, apart from two exceptions (buds and cauline leaves at flowering), is consistently detectable from the basic generations (means and variances), and from the RILs and TTC ANOVAs. Heritabilities are typically 20–40% and, in the case of the height traits, increase consistently with age. This trend is not observed for the other traits but is not uncommon for quantitative traits (Jayasekera et al, 1994). There is good agreement between the heritability estimates from the Basic Generation, RILs and the TTC families which points to a consistent expression of alleles under the varying genotypic backgrounds.

Given the repeated measurements over time and the nature of the traits scored, one would expect them to be correlated inter se. Ler develops more slowly than Col (Table 2) and hence, not surprisingly, it is smaller at any given chronological time from sowing. Thus it has fewer leaves, smaller rosettes and is shorter. However, this is also true when these same traits are scored at a fixed physiological time, that is, at flowering.

Dominance effects among the means were detected consistently for the height traits, but among the other traits, only rosette size at 26 days and days to maturity had significant dominance. Where it exists, dominance was for faster development and greater height. ‘Better parent heterosis’ does not occur for any trait except ‘height at flowering’, although the F1 is not significantly taller than tallest parent, Col. Conversely, the TTC ANOVAs consistently detect dominance variation, except for ‘buds at flowering’, and the dominance ratios, √(2VD/VA), indicate partial dominance for most traits through to complete dominance for height. There is no evidence for significant overdominance. The fact that VD occurs when [d] does not, implies ambidirectional dominance for such traits, because the former is a function of Σd2 and the latter Σd. Similarly, the very large ratios of [d]/[a] (from 2 to 7) for heights contrast with the dominance ratios of around 1, indicating directional dominance and dispersed increasing alleles in the parents.

Model fitting confirms the above interpretation to a large extent. While additive differences [a] are significant for 18 of the 22 traits under study, dominance and dominance × dominance epistasis are present for only eight traits, six of which are heights. In all cases where both [d] and [dd] are significant, they take opposing signs indicating the presence of duplicate epistasis; that is, heterozygosity at several loci has less heterotic effect than would be suggested by their individual effects. The only trait for which [d] is negative and significant is days to maturity, indicating that genes conferring early flowering are dominant to those responsible for late flowering. Another interesting feature of the results in Table 2 is the consistency with which [ad] is detected in this study. This component is significant for 15 traits and it takes a negative sign in every case. Detection of [ad] with such a high frequency and the consistency of its sign suggests that Ler and Col have an excess of genes in coupling for most traits, particularly for plant height, leaf size, rosette size, rosette leaves and maturity. There is little evidence for additive-by-additive [aa] epistasis either in the BGs or in the TTC.

The fact that there is epistasis, dispersion and ambidirectional dominance indicates that these traits must be controlled by at least two QTL. Following Wright (1934), it is possible to estimate the number of genes, k, from (Σa)2a2; Σa is best estimated as half the difference between the extreme RILs, while Σa2 can be estimated from 2VA. These estimates of k are shown in Table 5 and they suggest that there are between three and 15 genes for each trait. It is well known that such estimates are imprecise and tend to be minimal estimates (Mayo and Hopkins, 1985; Kearsey and Pooni, 1996), but they are all consistent in suggesting that, despite their close origins, Col and Ler now differ by several genes controlling most quantitative traits.

The QTL analyses are consistent with the biometrical approach in many respects (Table 6). Thus, the direction and size of the additive effects for height, rosette size and rosette leaf number agree, with Col alleles giving larger plants: the positive direction of the dominance effects for height and rosette size and the absence of evidence for dominance for rosette leaf number. However, there are also significant inconsistencies. The basic generations and TTC indicated significant gene interaction for most traits implying two or more QTL, although QTL analysis seldom revealed more than one QTL for any trait. This may reflect the low power of QTL detection given the restricted sets of genotypes. However, the estimated sizes of the dominance effects of the QTL for height were much larger than the corresponding additive effects and also larger than expected given the observed F1 values. Both these effects predict overdominance at the QTL and heterosis, neither of which was detected by the more sensitive biometrical analyses. The estimates of additive and dominance effects from the TTC have equal precision, so this apparent exaggeration of dominance cannot be due to estimation bias. The squared additive effects of the QTL from RILs and TTC in Table 6 equal or slightly exceed the estimated additive genetic variances from these generations, suggesting that all the variance has been explained by these QTL. The squared dominance effects, however, are considerably larger than the estimated dominance variation. These effects strongly suggest an upward bias of estimates as shown by Melchinger et al (1998) and Utz et al (2000). We have good evidence from other work (Koumproglou et al, 2002) that there is another QTL, located at the top of chromosome III, at which the Col allele also increases height. Jansen et al (1995) also located a QTL affecting leaf number in this location. It is almost certainly flowering related and decreases in effect with time. The combined effects of the two QTL at 30 days to maturity would probably match the additive effect [a] from the basic generations. This other gene may be interacting with erecta in the heterozygous condition causing the discrepancies in the amount of dominance discussed above.

There are potentially other analyses that may be carried out on these data, such as the use of principal components analysis to identify key components for QTL and generation mean analysis and these are ongoing.